Rechercher dans ce blog

04 octobre 2025

Datasets and corpus in kabyle

 We know that there are people looking for some datasets in kabyle language for their projects, this is why we are publishing some of them via our community account on HuggingFace.

Imsidag community on HugginFace


 

    Some of these datasets are a parallel corpus / corpora and some are monolingual (kabyle only). We would like to emphasis that some datasets are not yet cleaned and published as is in purpose for people interested in building tools such as "fixers, cleaners, standardizers ..." for kabyle language.

    These datasets may help you to create and improve a kabyle spellchecker dictionary for example, word frequency analyzers, calculate n-grams, calculate syllable weight (CV), build word games from the datasets ... and so on.

    Please find them all on : 

HuggingFace Imsidag Community : https://huggingface.co/Imsidag-community/datasets 

Boffire datasets : https://huggingface.co/boffire/datasets 

Aucun commentaire:

OpenStreetMap - Cartes et navigation en kabyle

 Beaucoup ne savent pas que le projet libre et ouvert de cartographie collaborative OpenStreetMap prend en charge la langue kabyle sur son ...

Mastodon