Corpus Manager
Build and manage bilingual and monolingual corpora from high-quality African language datasets. Track licenses and ensure compliance for your language processing needs.
Corpus Configuration
Configure your corpus building parameters
Available Data Sources
Select corpus sources to include
High-quality African language texts
CC-BY-4.0
90%
Languages: tw, yo, ha, sw, ig~50,000 entries
Large collection of parallel texts
Various (CC-BY, CC0)
80%
Languages: en, tw, yo, ha, sw, ig~100,000 entries
Parallel corpus from Jehovah's Witnesses publications
Research Use Only
85%
Languages: en, tw, yo, ha, sw, ig, ee, gaa~75,000 entries
News articles and social media content
CC-BY-SA-4.0
70%
Languages: tw, yo, ha, sw, ig, ee~200,000 entries