Kabyle in Floss: Datasets and corpus in kabyle

04 octobre 2025

Datasets and corpus in kabyle

We know that there are people looking for some datasets in kabyle language for their projects, this is why we are publishing some of them via our community account on HuggingFace.

Some of these datasets are a parallel corpus / corpora and some are monolingual (kabyle only). We would like to emphasis that some datasets are not yet cleaned and published as is in purpose for people interested in building tools such as "fixers, cleaners, standardizers ..." for kabyle language.

These datasets may help you to create and improve a kabyle spellchecker dictionary for example, word frequency analyzers, calculate n-grams, calculate syllable weight (CV), build word games from the datasets ... and so on.

Please find them all on :

HuggingFace Imsidag Community : https://huggingface.co/Imsidag-community/datasets

Boffire datasets : https://huggingface.co/boffire/datasets

Aucun commentaire:

Enregistrer un commentaire

Kabyle in Floss

Rechercher dans ce blog

04 octobre 2025

Datasets and corpus in kabyle

Aucun commentaire:

Kabyle Numeral System Proposal

Rechercher dans ce blog