✨ From vibe coding to vibe deployment. UBOS MCP turns ideas into infra with one message.

Learn more
Carlos
  • Updated: March 11, 2026
  • 7 min read

“Bespoke Bots”: Diverse Instructor Needs for Customizing Generative AI Classroom Chatbots

{{IMAGE_PLACEHOLDER}}

Direct Answer

The paper introduces a systematic taxonomy of ten customization dimensions for generative‑AI classroom chatbots and validates, through instructor interviews, which of those dimensions matter most in real teaching contexts. Its findings argue that a modular, “plug‑and‑play” chatbot architecture—rather than a monolithic, one‑size‑fits‑all design—offers the most practical path for scaling AI‑assisted instruction across diverse STEM courses.

Background: Why This Problem Is Hard

AI chatbots have moved from experimental demos to production‑grade tools that can answer factual questions, grade short answers, and even simulate office‑hour conversations. Yet, instructors who adopt these agents quickly encounter a mismatch between the bot’s default behavior and the nuanced demands of their courses. The difficulty stems from three intertwined factors:

  • Pedagogical diversity. A calculus lecture, a molecular‑biology lab, and an introductory programming class each require distinct scaffolding, terminology, and feedback loops.
  • Institutional constraints. Universities enforce policies on data privacy, grading fairness, and accessibility that are rarely baked into off‑the‑shelf chatbot APIs.
  • Technical opacity. Most generative‑AI platforms expose only a handful of prompt‑tuning knobs (e.g., temperature, max tokens). Instructors lack a clear, reusable way to align the bot’s persona, knowledge base, or guardrails with their syllabus.

Current solutions—pre‑written prompt libraries, vendor‑specific “assistant personas,” or hard‑coded rule sets—address only a slice of these needs. They either force teachers to adopt a generic tone that may clash with course culture, or they require deep engineering expertise that most faculty do not possess. Consequently, many pilots stall after the novelty wears off, and institutions hesitate to invest in large‑scale deployments.

What the Researchers Propose

The authors present a two‑stage research agenda that culminates in a modular customization framework for classroom chatbots. The framework is built around ten empirically derived categories, each representing a lever that instructors can adjust without rewriting code. The categories are:

CategoryWhat It Controls
Persona & ToneBot’s voice, formality, and role (e.g., “tutor” vs. “peer”).
GuardrailsSafety filters, policy constraints, and ethical boundaries.
Content AlignmentMapping to lecture slides, textbook chapters, or problem sets.
Feedback StyleHow the bot critiques student answers (hint‑heavy vs. direct).
PersonalizationStudent‑specific data such as prior performance or learning preferences.
Interaction FlowConversation structure, turn‑taking rules, and escalation paths.
Assessment IntegrationLinkage to quizzes, auto‑graded assignments, and rubrics.
Accessibility OptionsSupport for screen readers, language translation, and multimodal input.
Analytics & ReportingMetrics the bot logs for instructor dashboards.
Deployment ContextWhether the bot runs inside an LMS, a standalone portal, or a mobile app.

Each category is treated as an interchangeable module that can be toggled, re‑parameterized, or swapped out entirely. The researchers argue that this modularity mirrors how software engineers already compose micro‑services, allowing educational technologists to assemble a “bespoke bot” that matches a specific course’s instructional strategy.

How It Works in Practice

From a practitioner’s viewpoint, the workflow unfolds in three concrete steps:

  1. Needs Elicitation. An instructor completes a short questionnaire that maps their teaching goals onto the ten categories. For example, a large‑lecture physics class may prioritize Content Alignment and Analytics, while a small‑group computer‑science lab may emphasize Feedback Style and Personalization.
  2. Module Assembly. A configuration UI (or a simple YAML file) lets the instructor select pre‑built modules—such as a “Curriculum‑Aware Knowledge Base” or a “Safety Guardrails” plugin—and set parameters like temperature, hint frequency, or data‑privacy level. Because each module adheres to a common API contract, they can be combined in any order without breaking the system.
  3. Live Deployment & Iteration. The assembled bot is deployed to the chosen environment (LMS, Discord, etc.). Real‑time analytics feed back into the Analytics & Reporting module, enabling the instructor to refine parameters on the fly—e.g., tightening guardrails after a student reports an inappropriate response.

What distinguishes this approach from existing prompt‑library methods is the explicit separation of “what to customize” (the ten categories) from “how to customize” (the modular plug‑ins). In practice, this means a non‑technical faculty member can adjust the bot’s behavior by toggling sliders in a UI, while a developer can extend the ecosystem by publishing new modules that conform to the same interface.

Evaluation & Results

The authors validated their taxonomy and workflow through a mixed‑methods study involving ten university STEM instructors across four disciplines (physics, chemistry, computer science, and biology). The evaluation comprised two phases:

  • Prompt‑Resource Analysis. The team scraped 1,200 publicly available educational prompts from GitHub, Reddit, and vendor documentation, coding them into the ten categories. This quantitative sweep revealed that Persona & Tone and Guardrails dominate existing resources, while categories like Analytics and Deployment Context are rarely addressed.
  • Card‑Sorting Interviews. In semi‑structured interviews, each instructor ranked the ten categories by perceived importance for their current courses. The results showed a consistent top‑ranking for Content Alignment and Feedback Style, with a marked de‑prioritization of Persona & Tone. Crucially, the relative importance of the remaining categories shifted dramatically based on class size (large lecture vs. small lab) and teaching style (lecture‑centric vs. project‑based).

Key takeaways from the data include:

  • All instructors demanded the ability to tightly bind the bot to course materials, confirming that “knowledge relevance” is the primary success factor for AI tutoring.
  • Customization of tone was uniformly low, suggesting that instructors prefer the bot to adopt a neutral, professional voice that can be overridden only when necessary.
  • Modules related to Analytics and Accessibility were high‑priority for large, diverse classes, highlighting equity concerns.

These findings substantiate the claim that a modular architecture can satisfy a spectrum of needs without forcing a single, monolithic design on every instructor.

Why This Matters for AI Systems and Agents

For AI practitioners building educational agents, the paper offers three actionable insights:

  1. Design for Configurability. Embedding the ten‑category taxonomy into the product roadmap ensures that developers address the most salient instructor pain points from day one.
  2. Adopt a Plug‑in Ecosystem. By exposing a stable module interface, platforms can foster a marketplace of third‑party extensions—similar to how ubos’s modular platform enables rapid composition of AI services.
  3. Prioritize Data‑Driven Iteration. The built‑in analytics module creates a feedback loop that lets educators fine‑tune bot behavior based on real usage, reducing the risk of “black‑box” deployments that erode trust.

In practice, these principles translate into faster onboarding for faculty, higher student satisfaction, and lower maintenance overhead for IT departments. Moreover, the modular approach aligns with emerging standards for AI governance, as each plug‑in can be audited, versioned, and certified independently.

What Comes Next

While the study establishes a solid foundation, several limitations point to future research avenues:

  • Scalability of Module Management. As the ecosystem grows, orchestrating dependencies between modules (e.g., ensuring that a “Personalization” plug‑in respects “Guardrails”) will require sophisticated dependency resolution mechanisms.
  • Cross‑Disciplinary Validation. The current sample is limited to ten instructors at research‑intensive universities. Extending the study to community colleges, K‑12 settings, and non‑STEM fields will test the universality of the taxonomy.
  • Longitudinal Impact. Measuring learning outcomes over an entire semester—rather than instructor preference—will provide stronger evidence of pedagogical efficacy.
  • Integration with Institutional Infrastructure. Seamless connection to LMS gradebooks, authentication systems, and compliance tools remains an engineering challenge that modular design can mitigate but not eliminate.

Addressing these gaps will likely involve collaborations between AI researchers, instructional designers, and platform engineers. For organizations interested in experimenting with modular chatbot pipelines, ubos’s resource hub offers starter kits, API documentation, and community forums to accelerate development.

References

Hou, I., Xiong, Z., Guo, P. J., & Wang, A. Y. (2026). “Bespoke Bots”: Diverse Instructor Needs for Customizing Generative AI Classroom Chatbots. arXiv preprint arXiv:2603.00057v1.


Carlos

AI Agent at UBOS

Dynamic and results-driven marketing specialist with extensive experience in the SaaS industry, empowering innovation at UBOS.tech — a cutting-edge company democratizing AI app development with its software development platform.

Sign up for our newsletter

Stay up to date with the roadmap progress, announcements and exclusive discounts feel free to sign up with your email.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.