Corpus Manager
Build and manage bilingual and monolingual corpora from high-quality African language datasets. Preview results, download in multiple formats, and track license compliance.
Data Privacy & Licensing Transparency
Every corpus source below has been vetted for ethical compliance. License badges on each source show commercial use rights, attribution requirements, and share-alike obligations at a glance.
Traceable Provenance
Each source links to its original dataset, authors, and collection methodology.
Privacy by Design
All PII is stripped during processing. Voice data is anonymised before corpus inclusion.
Contributor Consent
Data originates from projects with informed consent frameworks for contributors.
When you build and export a corpus, a License Manifest is available for download alongside your data, documenting all obligations. For data removal requests or questions, visit Contact Support or our Privacy Policy.
Corpus Configuration
Configure your corpus building parameters
Available Data Sources
Select corpus sources to include
High-quality African language texts
Large collection of parallel texts
Parallel corpus from Jehovah's Witnesses publications
News articles and social media content
No corpus built yet
Configure your parameters and click "Build Corpus" to get started.
