Datasets Open Source
Des jeux de donnees gratuits et librement reutilisables. Explorer, comparer, acceder aux sources.
Task803 Pawsx German French Translation
ML / IADataset Card for Natural Instructions (https://github.com/allenai/natural-instructions) Task: task803_pawsx_german_french_translation Additional Information Citation Information The following paper in
French News Classification
ML / IADataset Card for french-news-classification This dataset has been created with distilabel. Dataset Summary This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that genera
ICD 10 CM HardNegatives
ML / IADataset ICD 10 CM HardNegatives disponible sur HuggingFace pour le traitement du langage francais.
Dinlang Gespin 2023
ML / IA[!NOTE] Dataset origin: https://www.ortolang.fr/market/corpora/dinlang-gespin-2023 Description You will find here the video excerpts related to the article 'Coordinating eating and languaging: the cho
Phonologie Du Francais Contemporain
ML / IA[!NOTE] Dataset origin: https://www.ortolang.fr/market/corpora/pfc Description PFC: Base de données sur le français oral contemporain dans l’espace francophone Base de données sur le français oral con
Lucie Training Dataset
ML / IALucie Training Dataset Card The Lucie Training Dataset is a curated collection of text data in English, French, German, Spanish and Italian culled from a variety of sources including: web data, video
Rapports Francais Sur Lintegration
ML / IA[!NOTE] Dataset origin: https://www.ortolang.fr/market/corpora/rapports-francais-sur-lintegration Description Ce corpus, créé ad hoc pour l'article « Dire l’intégration. Les rapports français et allem
Recherches Francais Parle
ML / IA[!NOTE] Dataset origin: https://www.ortolang.fr/market/corpora/recherches-francais-parle Description Durant 27 ans, la revue Recherches sur le français parlé a paru aux Publications de l'Université de
Corpus Oral De Francais De Suisse Romande
ML / IA[!NOTE] Dataset origin: https://cocoon.huma-num.fr/exist/crdo/meta/cocoon-114a10e8-b61c-42fb-8a10-e8b61c72fbb1 Description Le corpus OFROM (Corpus Oral de Français de Suisse ROMande) est un corpus, un
English French Translation
ML / IA[!NOTE] Dataset origin: https://www.kaggle.com/datasets/adewoleakorede/english-french-translation Dataset I used this dataset for my project on translating from English to French using the transformer
Tibetan To French Translation Dataset
GouvernementThis dataset consists of three columns, the first of which is a sentence or phrase in Tibetan, the second is the phonetic transliteration of the Tibetan, and the third is the French translation of the
Wikipedia
ML / IAPlain text of Wikipedia Dataset Description Size Example use (python) Data fields Notes on data formatting License Aknowledgements Citation Dataset Description This dataset is a plain text version of