Common Voice logo

Speech datasets obtained from the Common Voice project:

Relevant datasets to use from the archives:

  • Common Voice

    • train.tsv - the training set

    • dev.tsv - the validation set

    • test.tsv - the test set

  • Coqui STT

    • train.csv - the training set

    • dev.csv - the validation set

    • test.csv - the test set

Conversion into other formats can be achieved with the wai.annotations library.

