French translation of zeta-alpha-ai's NanoBEIR collection
CATIE
non-profit
Verified
AI & ML interests
Create NLP models and datasets applied to French, to very long sequences and the combination of the two ;)
Recent Activity
View all activity
A few experiments after the release of sentence transformers v5.0. Could be seen as a V0 before the publication of more powerful french sparse models
-
CATIE-AQ/CSR_Sparse_Encoder_camembert-large_STS
Feature Extraction • 0.3B • Updated • 9 • 2 -
CATIE-AQ/SPLADE_camembert-base_STS
Feature Extraction • 0.1B • Updated • 46 • 2 -
CATIE-AQ/SPLADE_moderncamembert-cv2_STS
Feature Extraction • 0.1B • Updated • 11 • 2 -
CATIE-AQ/SPLADE_camemberta2.0_STS
Feature Extraction • 0.1B • Updated • 8 • 2
Adapted weights for Google Flan-T5 to use with FAT5
A collection of French prompts datasets created by CATIE.
A collection of French prompts models created by CATIE.
Datasets with an image, a prompt question (like "describe this image") and an answer
Can be used to train VLMs.
In 2021 before the release of LoRA, we were interested in Prefix-tuning, which we wanted to apply to French. So we had to translate table-to-text data
CamemBERT models finetuned on the QA task (SQuAD 2.0 format) + the dataset used (~220,000 rows) + a Space demo
Reviews and metadata datasets from https://xmrec.github.io/data/fr/ by Bonab et al. (2021)
Flash Attention T5 models in French developped by CATIE.
By conversation we mean multi-tour exchanges. For classical prompts (i.e. single-turn) see the CATIE French prompts datasets collection.
-
CATIE-AQ/facebook-community-alignment-dataset_french_dpo
Viewer • Updated • 71.5k • 47 • 1 -
CATIE-AQ/aya_french_dpo
Viewer • Updated • 418 • 37 • 2 -
CATIE-AQ/facebook_menlo_french_dpo
Viewer • Updated • 138 • 40 • 1 -
CATIE-AQ/everyday-conversations-llama3.1-2k-in-french
Viewer • Updated • 2.38k • 52 • 1
Clean VQA datasets with an image, a question and an answer.
Can be used to train VLMs.
Datasets with an image and a question.
Can be used to train visual retrievers (ColPali and co.).
-
CATIE-AQ/retriever-manu-tabfquad_retrieving-clean
Viewer • Updated • 1.83k • 63 -
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean
Viewer • Updated • 280 • 38 -
CATIE-AQ/retriever-vidore-vdsid_french-clean
Viewer • Updated • 5k • 85 -
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean
Viewer • Updated • 1.32k • 32
CamemBERT models finetuned on the NER task (3 or 4 entities) + the datasets used (420,000 or 385,000 rows respectively) + a Space demo
-
NERmembert
🔍Find named entities in French texts using NERmemBERT models
-
CATIE-AQ/NERmembert-base-3entities
Token Classification • 0.1B • Updated • 76 • 2 -
CATIE-AQ/NERmembert2-3entities
Token Classification • 0.1B • Updated • 46 -
CATIE-AQ/NERmemberta-3entities
Token Classification • 0.1B • Updated • 35 • 1
French translation of zeta-alpha-ai's NanoBEIR collection
A few experiments after the release of sentence transformers v5.0. Could be seen as a V0 before the publication of more powerful french sparse models
-
CATIE-AQ/CSR_Sparse_Encoder_camembert-large_STS
Feature Extraction • 0.3B • Updated • 9 • 2 -
CATIE-AQ/SPLADE_camembert-base_STS
Feature Extraction • 0.1B • Updated • 46 • 2 -
CATIE-AQ/SPLADE_moderncamembert-cv2_STS
Feature Extraction • 0.1B • Updated • 11 • 2 -
CATIE-AQ/SPLADE_camemberta2.0_STS
Feature Extraction • 0.1B • Updated • 8 • 2
Flash Attention T5 models in French developped by CATIE.
Adapted weights for Google Flan-T5 to use with FAT5
By conversation we mean multi-tour exchanges. For classical prompts (i.e. single-turn) see the CATIE French prompts datasets collection.
-
CATIE-AQ/facebook-community-alignment-dataset_french_dpo
Viewer • Updated • 71.5k • 47 • 1 -
CATIE-AQ/aya_french_dpo
Viewer • Updated • 418 • 37 • 2 -
CATIE-AQ/facebook_menlo_french_dpo
Viewer • Updated • 138 • 40 • 1 -
CATIE-AQ/everyday-conversations-llama3.1-2k-in-french
Viewer • Updated • 2.38k • 52 • 1
A collection of French prompts datasets created by CATIE.
A collection of French prompts models created by CATIE.
Clean VQA datasets with an image, a question and an answer.
Can be used to train VLMs.
Datasets with an image, a prompt question (like "describe this image") and an answer
Can be used to train VLMs.
Datasets with an image and a question.
Can be used to train visual retrievers (ColPali and co.).
-
CATIE-AQ/retriever-manu-tabfquad_retrieving-clean
Viewer • Updated • 1.83k • 63 -
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean
Viewer • Updated • 280 • 38 -
CATIE-AQ/retriever-vidore-vdsid_french-clean
Viewer • Updated • 5k • 85 -
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean
Viewer • Updated • 1.32k • 32
In 2021 before the release of LoRA, we were interested in Prefix-tuning, which we wanted to apply to French. So we had to translate table-to-text data
CamemBERT models finetuned on the NER task (3 or 4 entities) + the datasets used (420,000 or 385,000 rows respectively) + a Space demo
-
NERmembert
🔍Find named entities in French texts using NERmemBERT models
-
CATIE-AQ/NERmembert-base-3entities
Token Classification • 0.1B • Updated • 76 • 2 -
CATIE-AQ/NERmembert2-3entities
Token Classification • 0.1B • Updated • 46 -
CATIE-AQ/NERmemberta-3entities
Token Classification • 0.1B • Updated • 35 • 1
CamemBERT models finetuned on the QA task (SQuAD 2.0 format) + the dataset used (~220,000 rows) + a Space demo
Reviews and metadata datasets from https://xmrec.github.io/data/fr/ by Bonab et al. (2021)