Training Datasets

Explore and validate Mozilla Common Voice and JW300 parallel corpora data powering AfricanGPT

Data Transparency & Privacy

We believe in full transparency about the data used to train AfricanGPT. Every dataset listed here is sourced ethically, with clear licensing and provenance.

Dataset Notice: Before using any dataset for training, research, or commercial work, check its source, license, consent status, and permitted use.
Data Rights: Only upload voice recordings, text, documents, or language data that you own or have permission to share. Do not upload private, sensitive, confidential, or third-party data without consent.