Training Datasets

Explore and validate Mozilla Common Voice and JW300 parallel corpora data powering AfricanGPT

This section provides access to the training data used to improve AfricanGPT's language understanding. You can browse voice recordings, translation pairs, and help validate data quality.

Data Transparency & Privacy

We believe in full transparency about the data used to train AfricanGPT. Every dataset listed here is sourced ethically, with clear licensing and provenance.

Dataset Notice: Before using any dataset for training, research, or commercial work, check its source, license, consent status, and permitted use.

Data Rights: Only upload voice recordings, text, documents, or language data that you own or have permission to share. Do not upload private, sensitive, confidential, or third-party data without consent.