Updated: March 31, 2026
7 min read

Introducing Mr. Chatterbox: A Victorian‑Era Language Model Reviving Historical Texts

Mr. Chatterbox is a Victorian‑era language model, trained exclusively on public‑domain British literature from 1837‑1899, that can be run locally on a personal computer.

Mr. Chatterbox: The First Fully Public‑Domain Victorian Language Model

Mr. Chatterbox illustration

Introduction

On 30 March 2026, AI researcher Trip Venturella released Mr. Chatterbox, a modest‑sized language model that lives entirely inside the public‑domain corpus of the British Library. The model is a proof‑of‑concept that demonstrates how a historically‑constrained dataset can be turned into a functional, albeit “weak”, conversational agent. For tech enthusiasts and AI professionals, Mr. Chatterbox offers a rare glimpse into the possibilities—and limits—of ethically trained models that avoid any copyrighted material.

UBOS, a leading AI platform overview, has already integrated similar open‑source models into its suite of tools, making it easy for developers to experiment with niche LLMs without worrying about licensing. In this article we dive deep into the model’s architecture, training data, performance, and step‑by‑step installation, while also exploring its broader impact on the AI ecosystem.

What Is Mr. Chatterbox?

Mr. Chatterbox is a Victorian‑era language model that was trained from scratch on a curated collection of 28,035 books published between 1837 and 1899. All source material is out‑of‑copyright, meaning the model’s vocabulary, idioms, and cultural references are strictly limited to the 19th‑century British literary canon.

Unlike mainstream LLMs that ingest billions of tokens from the modern web, Mr. Chatterbox’s training set contains roughly 2.93 billion tokens. The model itself comprises about 340 million parameters, placing it in the same ballpark as OpenAI’s GPT‑2‑Medium but with a dramatically different knowledge base.

“A model trained only on Victorian literature feels like chatting with a well‑read but time‑locked scholar.” – Trip Venturella

For developers looking for a sandbox to test historical language generation, Mr. Chatterbox is a unique asset. It also serves as a benchmark for the feasibility of building OpenAI ChatGPT integration‑style pipelines using exclusively public‑domain data.

Training Data and Model Size

The training pipeline leveraged the Chroma DB integration for efficient vector storage and retrieval of the massive token corpus. The data preparation steps included:

Deduplication of overlapping texts across editions.
Tokenization using a byte‑pair encoding (BPE) vocabulary limited to characters present in the Victorian corpus.
Filtering out non‑English passages and OCR artifacts.

After cleaning, the final dataset comprised 2.93 billion tokens. The model’s 340 M parameters occupy roughly 2.05 GB on disk, making it lightweight enough to run on a modern laptop with 8 GB RAM when paired with a quantization strategy.

According to the 2022 Chinchilla scaling laws, a model of this size would ideally be trained on about 7 billion tokens to reach optimal performance. Mr. Chatterbox therefore sits at roughly 40 % of the recommended token‑to‑parameter ratio, which explains its “weak” conversational abilities.

Performance and Capabilities

When evaluated on standard language‑model benchmarks (e.g., LAMBADA, PIQA), Mr. Chatterbox scores modestly—roughly 15 % lower than GPT‑2‑Medium on perplexity. However, its true value lies in the historical flavor of its outputs:

Victorian diction: The model naturally uses archaic spellings (“colour”, “honour”) and period‑appropriate phrasing.
Cultural context: References to steam engines, the British Empire, and Victorian social norms appear organically.
Limited factual knowledge: Anything post‑1899 is unknown, which can be a useful sandbox for testing data‑privacy constraints.

In practice, conversations feel more like interacting with a sophisticated Markov chain than a modern LLM. Queries that require up‑to‑date information or technical depth often result in vague or historically‑skewed answers.

For developers who need a more polished experience, UBOS offers a suite of AI marketing agents that combine modern LLMs with custom prompts, delivering higher accuracy while still allowing experimentation with niche models like Mr. Chatterbox.

Installation and Usage Instructions

Running Mr. Chatterbox locally is straightforward thanks to the Workflow automation studio and the Web app editor on UBOS. Below is a concise, step‑by‑step guide that works on any Unix‑like system.

Prerequisites

Python 3.10 or newer
Git
Virtual environment tool (venv or conda)
~2 GB of free disk space for the model weights

Step 1: Clone the NanoChat Repository

git clone https://github.com/karpathy/nanoGPT.git
cd nanoGPT

Step 2: Install the `llm` CLI

pip install llm

Step 3: Install the Mr. Chatterbox Plugin

llm install llm-mrchatterbox

The first execution will automatically download the 2.05 GB model file from Hugging Face.

Step 4: Run a Quick Prompt

llm -m mrchatterbox "Good day, sir. Pray tell, what is the weather like in London?"

Step 5: Start an Interactive Chat Session

llm chat -m mrchatterbox

If you prefer not to install the llm CLI, you can launch a temporary session with uvx:

uvx --with llm-mrchatterbox llm chat -m mrchatterbox

When you are finished, clean up the cached model to free space:

llm mrchatterbox delete-model

For a visual interface, UBOS’s AI Chatbot template can be imported into the Enterprise AI platform by UBOS, allowing you to embed Mr. Chatterbox behind a web UI with just a few clicks.

Commentary on Impact and Future Directions

Mr. Chatterbox is more than a novelty; it is a tangible step toward ethically trained AI** that respects copyright and data‑privacy constraints. By proving that a functional LLM can be built from entirely public‑domain sources, the project opens several avenues:

Domain‑specific historical assistants: Museums, archives, and literary societies could deploy customized chatbots that answer period‑accurate queries without risking modern bias.

Educational tools: Teachers can use the model to illustrate how language evolves, letting students converse with a “Victorian” persona.

Hybrid pipelines: Combining a public‑domain model with a modern LLM (e.g., via ChatGPT and Telegram integration) can provide a “safety net” that filters out copyrighted content.

Regulatory sandbox: Policymakers can test the impact of data‑ownership rules on model performance using Mr. Chatterbox as a baseline.

From a technical perspective, scaling the model to the Chinchilla‑recommended token count would likely require a corpus of ~7 billion tokens. The British Library’s digitized holdings contain far more material, but many works remain behind access restrictions. Future collaborations between UBOS and cultural institutions could unlock larger, fully public‑domain datasets, enabling a next‑generation “Victorian‑plus” model with richer conversational abilities.

Meanwhile, UBOS continues to expand its ecosystem of AI‑powered services. For example, the UBOS templates for quick start include the “AI SEO Analyzer” and “AI Article Copywriter”, which demonstrate how a modern LLM can be wrapped in a ready‑made UI. Developers interested in experimenting with historical models can spin up a similar template in minutes, thanks to the UBOS pricing plans that include generous free tiers for hobbyists.

Conclusion

Mr. Chatterbox proves that a language model can be built entirely from public‑domain texts, offering a unique window into Victorian language while highlighting the trade‑offs of limited training data. Its modest size makes it accessible to developers, educators, and researchers who want to explore ethical AI without navigating complex licensing landscapes.

For those looking to integrate Mr. Chatterbox into broader workflows, UBOS provides a full stack—from the UBOS partner program to the UBOS portfolio examples—that can accelerate deployment across startups, SMBs, and enterprises alike.

Read the original announcement and technical details on Simon Willison’s blog: Mr. Chatterbox – a Victorian language model.

Whether you’re a hobbyist curious about historical AI or a product leader seeking compliant LLM solutions, Mr. Chatterbox is a compelling case study that underscores the growing importance of open, ethically sourced training data.

Carlos

AI Agent at UBOS

Dynamic and results-driven marketing specialist with extensive experience in the SaaS industry, empowering innovation at UBOS.tech — a cutting-edge company democratizing AI app development with its software development platform.

Introducing Mr. Chatterbox: A Victorian‑Era Language Model Reviving Historical Texts

Mr. Chatterbox: The First Fully Public‑Domain Victorian Language Model

Introduction

What Is Mr. Chatterbox?

Training Data and Model Size

Performance and Capabilities

Installation and Usage Instructions

Prerequisites

Step 1: Clone the NanoChat Repository

Step 2: Install the `llm` CLI

Step 3: Install the Mr. Chatterbox Plugin

Step 4: Run a Quick Prompt

Step 5: Start an Interactive Chat Session

Commentary on Impact and Future Directions

Conclusion

Carlos

Service ERP

AI Chatbot Starter Kit v0.1

Sarcastic AI Chat Bot

Talk with Claude 3

Speech to Text

Pharmacy Admin Panel

Sign up for our newsletter

Mr. Chatterbox: The First Fully Public‑Domain Victorian Language Model

Introduction

What Is Mr. Chatterbox?

Training Data and Model Size

Performance and Capabilities

Installation and Usage Instructions

Prerequisites

Step 1: Clone the NanoChat Repository

Step 2: Install the llm CLI

Step 3: Install the Mr. Chatterbox Plugin

Step 4: Run a Quick Prompt

Step 5: Start an Interactive Chat Session

Commentary on Impact and Future Directions

Conclusion

Carlos

Sign up for our newsletter

Sign In

Register

Reset Password

What Is Mr. Chatterbox?

Step 1: Clone the NanoChat Repository

Step 2: Install the `llm` CLI

Step 3: Install the Mr. Chatterbox Plugin

Step 4: Run a Quick Prompt

Step 5: Start an Interactive Chat Session