--- license: apache-2.0 library_name: transformers pipeline_tag: text-generation tags: - qwen - scientific-reasoning --- # SciReasoner 8B: Laying the Scientific Reasoning Ground Across Disciplines [![arXiv](https://img.shields.io/badge/arXiv-2509.21320-b31b1b.svg)](https://arxiv.org/abs/2509.21320) [![Hugging Face](https://img.shields.io/badge/HuggingFace-SciReason-FFAE1A)](https://huggingface.co/SciReason) [![License](https://img.shields.io/badge/License-Apache_2.0-2D7DB1.svg)](https://www.apache.org/licenses/LICENSE-2.0) This repository contains the weight of **SciReasoner-8B**, a scientific reasoning foundation model. It was presented in the paper [SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines](https://huggingface.co/papers/2509.21320). Code: https://github.com/open-sciencelab/SciReason --- ## Usage: ## πŸ”§ Environment Setup ```bash git clone https://github.com/open-sciencelab/SciReason.git cd SciReason conda create --name scireason python=3.10 -y conda activate scireason pip install -r requirements/training.txt pip install -e . ``` > **Note**: > The above instructions are for reference only. > You may need to adjust them depending on your operating system and environment. --- ## πŸš€ Running Evaluation The evaluation script will automatically download the required datasets and models from [Hugging Face](https://huggingface.co/SciReason). Please ensure your environment has internet access. ### Evaluate all datasets ```bash opencompass examples_scireasoner/eval_all.py --max-num-worker 1 ``` * **Default model:** [SciReasoner-8B](https://huggingface.co/SciReason/SciReasoner-8B) * You can replace it with your own model if needed. * The `--max-num-worker` option controls concurrency: * By default, each process uses one GPU. * Adjust it according to your available GPUs. --- ### Evaluate few-shot performance (e.g., for closed-source models like `o3`) ```bash opencompass examples_scireasoner/eval_all_fewshot.py --max-num-worker 1 ``` This script evaluates the few-shot capabilities of your model on all datasets. --- ### Evaluate specific datasets or custom models * **To evaluate specific datasets:** Modify the configuration file to set `datasets` as a list of the datasets you want to test. * **To use custom models:** Modify the configuration file to set `models` to your target model. * Reference format: `opencompass.configs.models.scireason.hf_scireasoner_8b` * For more model configuration options, please check the [OpenCompass documentation](https://opencompass.readthedocs.io/en/latest/). Got it! I’ll add a **FAQ section** with the issue and solution clearly explained. Here’s how it fits into your README: ## ❓ FAQ ### 1. `meteor_score` Error If you encounter an error related to `meteor_score`, you may need to download NLTK resources. **Solution:** In an environment with internet access, run: ```python import nltk nltk.download('wordnet') ``` By default, the files are downloaded to `/root/nltk_data`. If you are using a **conda environment** and running on a compute node or container, download them into your conda environment instead: ```python import nltk import os conda_path = os.path.join(os.environ["CONDA_PREFIX"], "nltk_data") nltk.download('wordnet', download_dir=conda_path) ``` You can check all search paths using: ```python import nltk print(nltk.data.path) ``` ### 2. Running on compute nodes without internet access If your compute node cannot access the internet due to security policies, you need to **pre-download/cache the datasets and models** on a node with internet access first. **Recommended steps:** 1. Set the environment variable `HF_HOME` to a **shared/public directory** for Hugging Face cache. 2. On a node with internet access, run a dummy model once to pre-cache everything: ```bash opencompass examples_scireasoner/eval_all_debug.py --max-num-worker 16 ``` 3. Now, you can run the actual evaluation code on the compute node without needing internet access. ### 3. Resuming from checkpoints & step-wise evaluation Because the datasets are large and evaluation can be time-consuming, **OpenCompass supports resuming from checkpoints** and running evaluations in separate stages. * To resume from a checkpoint, use the `-r` flag with the timestamp of the previous run: ```bash opencompass examples_scireasoner/eval_all.py -r ``` * To run specific stages only, use the `--mode` flag with one of the following options: * `all` – Run the full pipeline (default) * `infer` – Run inference only * `eval` – Run evaluation only * `viz` – Run visualization only For more details, please refer to the [OpenCompass Quick Start Guide](https://opencompass.readthedocs.io/en/latest/get_started/quick_start.html). ### 4. Dataset size cache issue If you only want to test a **subset** of a dataset by modifying the code to trim it, be aware that **OpenCompass caches the dataset size**. Before running the evaluation, it is recommended to either: - Delete the entire cache file: ``` rm .cache/dataset_size.json ``` - Or remove the corresponding line for the modified dataset from the cache file. This ensures that OpenCompass recalculates the dataset size correctly. --- ## πŸ—οΈ Codebase and References This repository is built on top of [OpenCompass v0.4.2](https://github.com/open-compass/opencompass/tree/0.4.2) with custom modifications. We plan to merge the changes back into the main OpenCompass branch in the future. For more usage details, please refer to the [OpenCompass documentation](https://opencompass.readthedocs.io/en/latest/). --- ## πŸ“œ License This project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). You are free to use, modify, and distribute this project under the terms of the Apache 2.0 license. See the [LICENSE](LICENSE) file for full details.