We know that there are people looking for some datasets in kabyle language for their projects, this is why we are publishing some of them via our community account on HuggingFace.
![]() |
Some of these datasets are a parallel corpus and some are monolingual (kabyle only). We would like to emphasis that some datasets are not yet cleaned and published as is in purpose for people interested in building tools such as "fixers, cleaners, standardizers ..." for kabyle language.
These datasets may help you to create and improve a kabyle spellchecker dictionary for example, word frequency analyzers, calculate n-grams, calculate syllable weight (CV), build word games from the datasets ... and so on.
Please find them all on : https://huggingface.co/Imsidag-community/datasets
Aucun commentaire:
Enregistrer un commentaire