TiChi-OpMiner: LLaMA LoRA for Tibetan–Chinese Code-Mixed Opinion Mining

TiChi-OpMiner is a LoRA-fine-tuned LLaMA model for joint sentiment and stance prediction on Tibetan–Chinese code-mixed texts.
Given a short code-mixed sentence, the model outputs:

  • Sentiment: positive, neutral, or negative
  • Stance: support, neutral, or oppose

The model was fine-tuned using LLaMA Factory with parameter-efficient training (LoRA).


1. Model Details

  • Model name: your-username/tichi-opminer
  • Base model: meta-llama/Llama-2-7b-chat-hf (decoder-only LLM)
  • Fine-tuning method: LoRA (Low-Rank Adaptation)
  • Frameworks: PyTorch, Transformers, LLaMA Factory
  • Task type: joint opinion mining (sentiment + stance)
  • Input languages: Tibetan (bo, Tibetan script) and Chinese (zh, simplified Chinese), with occasional English tokens / hashtags
  • Output: natural language or JSON-style labels describing sentiment and stance

2. Intended Use

Intended use

  • Research on code-mixed NLP for low-resource languages
  • Experiments on Tibetan–Chinese opinion mining (social media posts, comments, short messages)
  • As a starting point for further fine-tuning on related tasks (e.g., Tibetan-only or Chinese-only sentiment / stance classification)

Not intended / out of scope

  • High-stakes decision making (e.g., legal, medical, financial, political decisions)
  • Use on very long or domain-mismatched documents (e.g., technical reports, legal contracts)
  • Any deployment scenario where incorrect predictions could cause serious harm without additional human review

3. Training Data

The model is trained on a 100K-instance Tibetan–Chinese code-mixed corpus:

  • Each instance is a short sentence that mixes Tibetan and Chinese.
  • Every sentence is annotated with:
    • Sentiment: positive, neutral, negative
    • Stance: support, neutral, oppose
  • The corpus was created with a template-based generation pipeline plus manual checking to ensure:
    • Natural code-mixing patterns
    • Fluent Chinese glosses
    • Consistent joint labels
  • The label space is designed so that:
    • positive ↔ support, negative ↔ oppose, neutral ↔ neutral.

If you also publish the dataset on Hugging Face, add a link here, e.g.
Dataset: your-username/tichi-opminer-dataset


4. Training Procedure

  • Fine-tuning framework: LLaMA Factory
  • Method: LoRA adapters on top of the base LLaMA model
  • Objective: instruction-style text generation; the model receives a prompt containing the code-mixed text and returns both sentiment and stance.

Example (conceptual) training prompt:

You are an assistant that predicts sentiment and stance for Tibetan–Chinese code-mixed text.

Text: ང་ཚོ今天ཡིན་ནསདགའ་བསུ་བྱེད་ཡོད།
Please answer in JSON format:
{"sentiment": ..., "stance": ...}
Downloads last month
34
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dylanyang963/TibetanChinese-CodeMixed-OpMiner

Adapter
(1188)
this model