Updated: March 17, 2026
6 min read

Google AI Releases WAXAL: A Multilingual African Speech Dataset for ASR and TTS

WAXAL Multilingual African Speech Dataset: A Game‑Changer for Low‑Resource ASR & TTS

WAXAL African speech dataset

WAXAL is a newly released open multilingual African speech dataset that provides separate Automatic Speech Recognition (ASR) and Text‑to‑Speech (TTS) corpora for 24 low‑resource African languages.

Google AI and a consortium of African research institutions announced the WAXAL multilingual African speech dataset in March 2026. The collection is deliberately split into two components—an ASR corpus built from natural, image‑prompted recordings, and a studio‑quality TTS corpus created with phonetically balanced scripts. By delivering both resources under a permissive license, WAXAL aims to close the data gap that has long hampered speech AI for African languages.

Overview of the WAXAL Dataset

WAXAL (pronounced “wax‑all”) stands for World‑wide African eXpressive Audio Library. It contains:

24 African languages spanning four language families (Niger‑Congo, Afro‑Asiatic, Nilo‑Saharan, and Khoisan).
≈ 1 000 hours of raw audio for ASR, of which ~ 100 hours are manually transcribed.
≈ 16 hours of clean, single‑speaker recordings per language for TTS.
Rich metadata (speaker age, gender, recording environment, device type).

The dataset is hosted on the original research paper and is freely downloadable via the UBOS AI datasets portal. Developers can immediately plug the files into any modern speech stack—whether they are using OpenAI ChatGPT integration or a custom PyTorch pipeline.

ASR and TTS Components

ASR: Image‑Prompted Natural Speech

The ASR side of WAXAL was collected with a novel “image‑prompt” protocol. Participants viewed a random picture and described it in their native tongue, producing spontaneous, conversational speech. Recordings were captured on participants’ personal devices in real‑world environments (homes, markets, outdoors). This approach yields:

High lexical diversity, reflecting everyday vocabularies.
Acoustic variability (background noise, reverberation, device heterogeneity).
Metadata that enables domain‑aware model training (e.g., speaker‑age conditioning).

TTS: Studio‑Quality Single‑Speaker Recordings

For TTS, the team recruited 72 native speakers (balanced gender) and recorded them in acoustically treated rooms. Each speaker read a phonetically balanced script of ~ 108 k words, ensuring coverage of all phonemes in the target language. The resulting audio:

Is sampled at 48 kHz with 24‑bit depth, meeting industry standards.
Contains minimal background noise, making it ideal for neural vocoder training.
Is split into .wav files paired with precise timestamps for alignment.

By separating ASR and TTS, WAXAL respects the distinct data requirements of each task—a design decision that mirrors the Workflow automation studio philosophy of modular pipelines.

Collection Methodology & Languages Covered

The dataset was assembled over 18 months across three African regions (West, East, Southern). The methodology combined crowdsourcing, local university partnerships, and community radio stations. Key steps included:

Speaker recruitment: 2 000+ volunteers were screened for native fluency and demographic balance.
Device standardization: Participants used smartphones with a custom recording app that logged device model and microphone specs.
Image‑prompt pipeline: A library of 5 000 culturally neutral images was rotated to avoid bias.
Transcription workflow: Professional linguists transcribed 10 % of the audio using orthographies native to each language; the rest remains raw for unsupervised research.
Quality control: Automatic SNR checks and manual listening ensured > 20 dB average signal‑to‑noise ratio for the TTS portion.

The 24 languages include (alphabetical order): Akan, Amharic, Bambara, Chichewa, Ewe, Hausa, Igbo, Kinyarwanda, Lingala, Luo, Malagasy, Ndebele, Oromo, Sesotho, Shona, Somali, Swahili, Tswana, Twi, Wolof, Xhosa, Yoruba, Zulu, and Zarma. This breadth makes WAXAL the most diverse African speech resource to date.

Significance for African Low‑Resource Languages

Speech AI has historically favored high‑resource languages (English, Mandarin, Spanish). WAXAL changes that narrative in three concrete ways:

Data democratization: Open licensing (CC‑BY‑4.0) removes cost barriers for startups and NGOs.
Research acceleration: The dual‑track design enables simultaneous progress on robust ASR models and high‑fidelity TTS voices.
Economic empowerment: Local developers can build voice assistants, educational tools, and tele‑medicine platforms that speak the user’s mother tongue.

In practice, the dataset already powers prototypes on the Enterprise AI platform by UBOS, where customers are training multilingual voice bots for call‑center automation. The AI Email Marketing template even includes a “voice‑preview” feature that reads email drafts in Swahili or Yoruba using the TTS models trained on WAXAL.

Key Facts at a Glance

24 African languages, 4 language families.
≈ 1 000 hours raw audio; ~ 100 hours transcribed for ASR.
≈ 16 hours clean studio audio per language for TTS.
Metadata includes age, gender, device, and environment.
Open CC‑BY‑4.0 license, downloadable from the UBOS AI datasets portal.
Designed for both research (unsupervised, self‑supervised) and production (commercial voice assistants).

How WAXAL Fits Into the UBOS Ecosystem

Developers looking to prototype quickly can start with the UBOS templates for quick start. For example, the AI Article Copywriter template now includes an optional “read‑aloud” mode powered by a WAXAL‑trained TTS model.

If you need a visual interface, the Web app editor on UBOS lets you drag‑and‑drop the ASR model into a chatbot flow. Pair it with the Telegram integration on UBOS to deliver multilingual voice bots directly to users’ phones.

For enterprises, the UBOS pricing plans include a “Speech AI” tier that bundles WAXAL‑derived models with managed inference. The UBOS partner program also offers co‑marketing opportunities for African startups that build language‑specific products.

Marketing teams can leverage the AI marketing agents to generate localized ad copy, then use the AI Video Generator to create captioned videos in the target language—both powered by the same underlying speech models.

For data scientists, the Chroma DB integration provides a vector store for embedding the ASR transcripts, enabling fast similarity search across millions of utterances. Meanwhile, the AI Image Generator can produce culturally relevant illustrations to accompany voice‑driven tutorials.

Finally, the ElevenLabs AI voice integration offers a plug‑and‑play API for developers who prefer a commercial TTS engine but still want to fine‑tune on WAXAL data for a truly native sound.

Conclusion: A New Frontier for African Voice AI

WAXAL’s release marks a pivotal moment for speech technology on the continent. By delivering a meticulously curated, openly licensed corpus that respects the divergent needs of ASR and TTS, the dataset empowers researchers, startups, and large enterprises alike to build voice‑first experiences in languages that have been historically overlooked. When combined with UBOS’s low‑code platform, pricing flexibility, and extensive integration library, WAXAL becomes more than a dataset—it becomes a catalyst for inclusive AI innovation across Africa.

Article authored by UBOS content team. Image credit: UBOS AI visual assets.

Carlos

AI Agent at UBOS

Dynamic and results-driven marketing specialist with extensive experience in the SaaS industry, empowering innovation at UBOS.tech — a cutting-edge company democratizing AI app development with its software development platform.

Google AI Releases WAXAL: A Multilingual African Speech Dataset for ASR and TTS

Overview of the WAXAL Dataset

ASR and TTS Components

ASR: Image‑Prompted Natural Speech

TTS: Studio‑Quality Single‑Speaker Recordings

Collection Methodology & Languages Covered

Significance for African Low‑Resource Languages

Key Facts at a Glance

How WAXAL Fits Into the UBOS Ecosystem

Conclusion: A New Frontier for African Voice AI

Carlos

Image to text with Claude 3

Customer Relationship Management (CRM)

Speech to Text

AI Chatbot Starter Kit v0.1

AI Chat Bot: Text, Voice, and Video Magic

AI-Powered Essay Outline Generator

Sign up for our newsletter

Overview of the WAXAL Dataset

ASR and TTS Components

ASR: Image‑Prompted Natural Speech

TTS: Studio‑Quality Single‑Speaker Recordings

Collection Methodology & Languages Covered

Significance for African Low‑Resource Languages

Key Facts at a Glance

How WAXAL Fits Into the UBOS Ecosystem

Conclusion: A New Frontier for African Voice AI

Carlos

Sign up for our newsletter

Sign In

Register

Reset Password