CATIE

non-profit

Verified

https://www.catie.fr/

CATIE_AQ

catie-aq

Activity Feed Request to join this org

AI & ML interests

Create NLP models and datasets applied to French, to very long sequences and the combination of the two ;)

Recent Activity

bourdoiscatie updated a Space 4 days ago

CATIE-AQ/FAT5-rapport

bourdoiscatie updated a Space 4 days ago

CATIE-AQ/FAT5-report

bourdoiscatie updated a model 8 days ago

CATIE-AQ/french_paraphrase_flan-t5-large

View all activity

CATIE-AQ 's collections 21

NanoBEIR-fr 🍺

French translation of zeta-alpha-ai's NanoBEIR collection

CATIE-AQ/NanoArguAna-fr

Viewer • Updated Jun 10 • 3.74k • 36
CATIE-AQ/NanoClimateFEVER-fr

Viewer • Updated Jun 10 • 3.61k • 49
CATIE-AQ/NanoDBPedia-fr

Viewer • Updated Jun 10 • 7.25k • 73
CATIE-AQ/NanoFEVER-fr

Viewer • Updated Jun 10 • 5.1k • 36

CATIE French sparse embedding

A few experiments after the release of sentence transformers v5.0. Could be seen as a V0 before the publication of more powerful french sparse models

CATIE-AQ/CSR_Sparse_Encoder_camembert-large_STS

Feature Extraction • 0.3B • Updated Jul 2 • 9 • 2
CATIE-AQ/SPLADE_camembert-base_STS

Feature Extraction • 0.1B • Updated Jul 2 • 46 • 2
CATIE-AQ/SPLADE_moderncamembert-cv2_STS

Feature Extraction • 0.1B • Updated Jul 2 • 11 • 2
CATIE-AQ/SPLADE_camemberta2.0_STS

Feature Extraction • 0.1B • Updated Jul 2 • 8 • 2

CATIE English FAT5-Flan

Adapted weights for Google Flan-T5 to use with FAT5

CATIE-AQ/FAT5-small-flan-en

Feature Extraction • 77M • Updated Oct 10, 2024 • 13
CATIE-AQ/FAT5-base-flan-en

Feature Extraction • 0.2B • Updated Oct 10, 2024 • 330
CATIE-AQ/FAT5-large-flan-en

Feature Extraction • 0.8B • Updated Oct 10, 2024 • 11
CATIE-AQ/FAT5-xl-flan-en

Feature Extraction • Updated 8 days ago • 22

CATIE French prompts datasets

A collection of French prompts datasets created by CATIE.

CATIE-AQ/smoltalk2_LongAlign_64k_context_french_no_think

Viewer • Updated Jul 29 • 95 • 12 • 1
CATIE-AQ/CFP

Viewer • Updated Nov 3 • 56.3k • 30
CATIE-AQ/DFP

Viewer • Updated Nov 3 • 108M • 2.92k • 8
CATIE-AQ/stsb_multi_mt_fr_prompt_sentence_similarity

Viewer • Updated Jul 15 • 155k • 82

CATIE French prompts models

A collection of French prompts models created by CATIE.

CATIE-AQ/mistral7B-FR-InstructNLP-LoRA

Text Generation • Updated Sep 22 • 20 • 3

French caption datasets

Datasets with an image, a prompt question (like "describe this image") and an answer Can be used to train VLMs.

CATIE-AQ/caption-floschne-xm3600-clean

Viewer • Updated Jul 15 • 8.56k • 29
CATIE-AQ/caption-vidore-tabfquad_test_subsampled-clean

Viewer • Updated Jul 15 • 280 • 44
CATIE-AQ/caption-vidore-vdsid_french-clean

Viewer • Updated Jul 15 • 5k • 69
CATIE-AQ/caption-manu-tabfquad_retrieving-clean

Viewer • Updated Jul 15 • 1.83k • 66

French table-to-text datasets

In 2021 before the release of LoRA, we were interested in Prefix-tuning, which we wanted to apply to French. So we had to translate table-to-text data

CATIE-AQ/web_nlg_french

Viewer • Updated Jul 11 • 35.4k • 66
CATIE-AQ/e2e_nlg_french

Viewer • Updated Jul 11 • 33.5k • 31
CATIE-AQ/viggo_french

Viewer • Updated Jul 11 • 5.1k • 59

CATIE French QA pack

CamemBERT models finetuned on the QA task (SQuAD 2.0 format) + the dataset used (~220,000 rows) + a Space demo

Running

QAmembert

❓

Find answers in French texts using QAmemBERT models
CATIE-AQ/QAmemberta

Question Answering • 0.1B • Updated Nov 26, 2024 • 24 • 1
CATIE-AQ/QAmembert2

Question Answering • 0.1B • Updated Nov 26, 2024 • 12
CATIE-AQ/QAmembert

Question Answering • 0.1B • Updated Nov 26, 2024 • 171 • 14

CATIE French NLI pack

CATIE-AQ/frenchNLI

Viewer • Updated Jul 29 • 570k • 70 • 1

CATIE French Paraphrase pack

CATIE-AQ/frenchPARAPHRASE

Viewer • Updated 8 days ago • 255k • 18
CATIE-AQ/french_paraphrase_flan-t5-large

Text Generation • 0.8B • Updated 8 days ago • 25 • 1
CATIE-AQ/french_paraphrase_flan-t5-base

Text Generation • 0.2B • Updated 8 days ago • 17

XMRec French part (reviews and metadata datasets)

Reviews and metadata datasets from https://xmrec.github.io/data/fr/ by Bonab et al. (2021)

CATIE-AQ/XMRec_reviews_fr

Viewer • Updated Jun 23 • 48.7k • 26
CATIE-AQ/XMRec_reviews_fr_Arts_Crafts_and_Sewing

Viewer • Updated Jun 23 • 965 • 22
CATIE-AQ/XMRec_reviews_fr_Automotive

Viewer • Updated Jun 23 • 458 • 21
CATIE-AQ/XMRec_reviews_fr_Books

Viewer • Updated Jun 23 • 25.8k • 39

CATIE French dense embedding

CATIE-AQ/distilcamembert-base-embedding

Sentence Similarity • 68.1M • Updated Nov 3 • 16
CATIE-AQ/camembert-base-embedding

Sentence Similarity • 0.1B • Updated Nov 3 • 21

CATIE French FAT5 UL2

Flash Attention T5 models in French developped by CATIE.

CATIE-AQ/FAT5-small

0.1B • Updated Mar 17 • 14 • 2
Running

Le FAT5 : Flash Attention T5

⚡

French version of the blog post introducing FAT5 model
Running

9

FAT5 (Flash Attention T5) report

⚡

9

English version of the blog post introducing FAT5 model

CATIE French DPO and conversation datasets

By conversation we mean multi-tour exchanges. For classical prompts (i.e. single-turn) see the CATIE French prompts datasets collection.

CATIE-AQ/facebook-community-alignment-dataset_french_dpo

Viewer • Updated Jul 29 • 71.5k • 47 • 1
CATIE-AQ/aya_french_dpo

Viewer • Updated Jul 29 • 418 • 37 • 2
CATIE-AQ/facebook_menlo_french_dpo

Viewer • Updated Oct 14 • 138 • 40 • 1
CATIE-AQ/everyday-conversations-llama3.1-2k-in-french

Viewer • Updated Jul 31 • 2.38k • 52 • 1

CATIE French think and toolcalling datasets

CATIE-AQ/smoltalk2_smolagents_toolcalling_french

Viewer • Updated Jul 29 • 9.08k • 29 • 1
CATIE-AQ/smoltalk2_aya_think_dataset_french_split

Viewer • Updated Jul 29 • 2.79k • 37 • 2

French VQA datasets

Clean VQA datasets with an image, a question and an answer. Can be used to train VLMs.

CATIE-AQ/VQA-floschne-maxm-clean

Viewer • Updated Jul 15 • 619 • 78
CATIE-AQ/VQA-cmarkea-doc-vqa-clean

Viewer • Updated Jul 15 • 60.9k • 732
CATIE-AQ/VQA-cmarkea-table-vqa-clean

Viewer • Updated Jul 15 • 84.1k • 220
CATIE-AQ/VQA-ByteDance-MTVQA-clean

Viewer • Updated Jul 15 • 3.63k • 140 • 1

French visual retriever datasets

Datasets with an image and a question. Can be used to train visual retrievers (ColPali and co.).

CATIE-AQ/retriever-manu-tabfquad_retrieving-clean

Viewer • Updated Jul 15 • 1.83k • 63
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean

Viewer • Updated Jul 15 • 280 • 38
CATIE-AQ/retriever-vidore-vdsid_french-clean

Viewer • Updated Jul 15 • 5k • 85
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean

Viewer • Updated Jul 15 • 1.32k • 32

CATIE French NER pack

CamemBERT models finetuned on the NER task (3 or 4 entities) + the datasets used (420,000 or 385,000 rows respectively) + a Space demo

Running

NERmembert

🔍

Find named entities in French texts using NERmemBERT models
CATIE-AQ/NERmembert-base-3entities

Token Classification • 0.1B • Updated Nov 26, 2024 • 76 • 2
CATIE-AQ/NERmembert2-3entities

Token Classification • 0.1B • Updated Dec 3, 2024 • 46
CATIE-AQ/NERmemberta-3entities

Token Classification • 0.1B • Updated Dec 5, 2024 • 35 • 1

CATIE French STS pack

CATIE-AQ/frenchSTS

Viewer • Updated Jul 15 • 45.7k • 87 • 1

CATIE French Summarization pack

CATIE-AQ/LMF2-1.2B_french_summary

Summarization • 1B • Updated Oct 28 • 29 • 1
CATIE-AQ/LMF2-700M_french_summary

Summarization • 0.7B • Updated Oct 29 • 10
CATIE-AQ/LMF2_350M_french_summary

Summarization • 0.4B • Updated Oct 28 • 9
mradermacher/LMF2-1.2B_french_summary-GGUF

1B • Updated Oct 29 • 237

CATIE French long sequences datasets

CATIE-AQ/french_books_summaries

Viewer • Updated Jul 15 • 949 • 26 • 1
CATIE-AQ/french_books

Viewer • Updated Jul 15 • 2.08k • 34 • 2
CATIE-AQ/french_narrativeqa

Viewer • Updated Jul 15 • 4.21k • 60 • 1