04 octobre 2025

Datasets and corpus in kabyle

 We know that there are people looking for some datasets in kabyle language for their projects, this is why we are publishing some of them via our community account on HuggingFace.

Imsidag community on HugginFace


 

    Some of these datasets are a parallel corpus and some are monolingual (kabyle only). We would like to emphasis that some datasets are not yet cleaned and published as is in purpose for people interested in building tools such as "fixers, cleaners, standardizers ..." for kabyle language.

    These datasets may help you to create and improve a kabyle spellchecker dictionary for example, word frequency analyzers, calculate n-grams, calculate syllable weight (CV), build word games from the datasets ... and so on.

    Please find them all on : https://huggingface.co/Imsidag-community/datasets 

Aucun commentaire:

Enregistrer un commentaire

Datasets and corpus in kabyle

 We know that there are people looking for some datasets in kabyle language for their projects, this is why we are publishing some of them ...

Mastodon