Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

IlyasMoutawwakil

posted an update 3 days ago

Post

2000

After 2 months of refinement, I'm happy to announce that a lot of Transformers' modeling code is now significantly more torch-compile & export-friendly 🔥

Why it had to be done 👇
PyTorch's Dynamo compiler is increasingly becoming the default interoperability layer for ML systems. Anything that relies on torch.export or torch.compile, from model optimization to cross-framework integrations, benefits directly when models can be captured as a single dynamo-traced graph !

Transformers models are now easier to:
⚙️ Compile end-to-end with torch.compile backends
📦 Export reliably via torch.export and torch.onnx.export
🚀 Deploy to ONNX / ONNX Runtime, Intel Corporation's OpenVINO, NVIDIA AutoDeploy (TRT-LLM), AMD's Quark, Meta's Executorch and more hardware-specific runtimes.

This work aims at unblocking entire TorchDynamo-based toolchains that rely on exporting Transformers across runtimes and accelerators.

We are doubling down on Transformers commitment to be a first-class citizen of the PyTorch ecosystem, more exportable, more optimizable, and easier to deploy everywhere.

There are definitely some edge-cases that we still haven't addressed so don't hesitate to try compiling / exporting your favorite transformers and to open issues / PRs.

PR in the comments ! More updates coming coming soon !

1 reply

codelion

posted an update 1 day ago

Post

1740

Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models

I wrote a deep dive into how Magic AI's 100M token context window might work, starting from their HashHop benchmark and building up to MALM - a Memory-Augmented Language Model.

Key insight: treating each key as a single token enables perfect retrieval at unlimited context lengths.

The article covers:

- How HashHop works and why its perfect accuracy is suspicious
- Building a tokenized solver that achieves 100% accuracy
- Scaling to MALM for real code search tasks
- Why this approach could handle 100M+ tokens

Read the full article: https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop

Try the model: codelion/malm-165m

Code: https://github.com/codelion/hash-hop

DavidAU

posted an update 2 days ago

Post

2006

Uncensored, Heretic GGUF quants of GLM 4.7 (30B-A3B) with correct Llamacpp and all updates ; NEO-CODE Imatrix W 16 bit OTs.

Also specialized quants (balanced for this model), and all quants are NEO-CODE Imatrix W 16 bit output tensor.

DavidAU/GLM-4.7-Flash-Uncensored-Heretic-NEO-CODE-Imatrix-MAX-GGUF

"Reg quants, non-heretic" :

Also 16 bit ot, NEO-CODE Imatrix and specialized:

DavidAU/GLM-4.7-Flash-NEO-CODE-Imatrix-MAX-GGUF

consome2

posted an update about 22 hours ago

Post

1198

We’ve released two conversational speech datasets from oto on Hugging Face 🤗
Both are based on real, casual, full-duplex conversations, but with slightly different focuses.

Dataset 1: Processed / curated subset
otoearth/otoSpeech-full-duplex-processed-141h
* Full-duplex, spontaneous multi-speaker conversations
* Participants filtered for high audio quality
* PII removal and audio enhancement applied
* Designed for training and benchmarking S2S or dialogue models

Dataset 2: Larger raw(er) release
otoearth/otoSpeech-full-duplex-280h
* Same collection pipeline, with broader coverage
* More diversity in speakers, accents, and conversation styles
* Useful for analysis, filtering, or custom preprocessing experiments

We intentionally split the release to support different research workflows:
clean and ready-to-use vs. more exploratory and research-oriented use.

The datasets are currently private, but we’re happy to approve access requests — feel free to request access if you’re interested.

If you’re working on speech-to-speech (S2S) models or are curious about full-duplex conversational data, we’d love to discuss and exchange ideas together.

Feedback and ideas are very welcome!

2 replies

prithivMLmods

posted an update 1 day ago

Post

1860

Introducing QIE-2511-Zoom-Master for highlight-guided area zoom-in, enabling lossless zooming within a drawn square area, and QIE-2511-Object-Remover-v2 for precise object or highlight-guided area cleanup. These experimental adapters are trained based on QIE-2511. Find the adapters below.

🕹️QIE-2511-Zoom-Master : prithivMLmods/QIE-2511-Zoom-Master
🕹️QIE-2511-Object-Remover-v2: prithivMLmods/QIE-2511-Object-Remover-v2

🤗Demo: prithivMLmods/Qwen-Image-Edit-Object-Manipulator

📂Collection: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-exps

To learn more, visit the app page or the respective model pages.

1 reply

raincandy-u

posted an update about 17 hours ago

Post

564

🤗 Just released Rain-100M, an experimental ~97M-parameter Qwen3-style language model trained from random initialization.

Repo: raincandy-u/Rain-100M

Data: HuggingFaceFW/fineweb-edu, ~3B tokens, English only

Tokenizer: custom 16k BPE, context length 4096

Architecture: 12 Transformer layers, hidden size 768, 12 heads, MLP 2048, SiLU, bf16

Rain-100M is a raw base model (not instruction-tuned or safety-aligned), aimed at small-scale research, debugging training pipelines, and CPU/edge experiments. If you run evaluations, finetunes, or visualizations with it, I would be very interested in your results!

Juanxi

posted an update 2 days ago

Post

2008

Recent Updates on ScalingOpt | Your Stars are Appreciated

We are pleased to announce several key updates to the ScalingOpt project:

Pyramid Visualization Structure
Following a suggestion from Yufei, we have introduced a pyramid-based visualization framework to systematically outline the layered architecture of Foundation Models—from foundational principles to infrastructure-level details. This addition is designed to assist teams in organizing and presenting related materials more clearly.

Integration of Optimizer Summaries by Yifeng
We extend a warm welcome to Yifeng (author of MARS), who has joined the project. He has contributed a comprehensive summary of over 100 optimizers, now available in ScalingOpt. This resource can be accessed via the “Optimization Summary Sheet” on the homepage or under the Optimizers page, featuring a reader-friendly interface that supports easy viewing, downloading, and citation.

Growing Community of Members
We continue to update and expand the list of active members. Researchers interested in Optimization & Efficient AI are encouraged to join and participate in discussions. Feedback and suggestions are also highly welcomed and will be reviewed and incorporated on an ongoing basis.

Tutorials in Progress
The tutorial development is actively underway. Currently, we have prepared over 300 slides and are refining and expanding the content in collaboration with contributors.

This community is driven purely by passion and a commitment to open knowledge sharing. Your support through starring the repository is greatly appreciated!

1 reply

Ujjwal-Tyagi

posted an update 3 days ago

Post

1713

There is a new open-source music generation model called HeartMuLa. It offers strong, competitive performance compared to Suno and supports English, Chinese, Japanese, Korean, and Spanish. It is optimized to run easily on RTX GPUs and other consumer-grade hardware. HeartMuLa/HeartMuLa-oss-3B
https://github.com/HeartMuLa/heartlib

1 reply

danielhanchen

posted an update 3 days ago

Post

2296

You can now fine-tune embedding models in our free Unsloth notebook! 🤗

Fine-tuning embedding models improves retrieval & RAG by aligning vectors to your domain-specific notion of similarity, improving search, clustering, and recommendations on your data.

⭐ Blog + Notebooks: https://unsloth.ai/docs/new/embedding-finetuning

Unsloth trains embedding models 1.8-3.3x faster with 20% less VRAM, 2x longer context & no accuracy loss vs. FA2 setups.

We'd like to thank Hugging Face and Unsloth contributor: electroglyph for making this possible!

1 reply

kanaria007

posted an update 2 days ago

Post

1482

✅ New Article: *Jumps as Atomic Moves* (v0.1)

Title:
🧠 Jumps: Atomic Moves in Structured Intelligence (and How to Make Them Safe)
🔗 https://huggingface.co/blog/kanaria007/jumps-atomic-moves-in-si

---

Summary:
In SI-Core, a *Jump* is the smallest *effectful* unit of reasoning+action: a move that consumes observations, proposes/chooses an action, and (optionally) commits results + memory updates.

This article makes Jumps operational: *what a Jump must declare*, how it is gated (OBS/ETH/RML), how it produces auditable traces, and how to keep it safe under uncertainty—without collapsing into “just prompt chaining.”

> If you can’t name the Jump, you can’t audit it.
> If you can’t gate it, you can’t ship it.

---

Why It Matters:
• Stops hidden behavior: every effectful move becomes *declared + inspectable*
• Prevents “jumping in the dark” via *OBS gating + sandbox-only paths*
• Makes policy enforceable: ETH overlay can *allow/modify/block/escalate* per Jump type
• Improves rollback reality: map Jump effects to *RML level*, not vibes
• Enables evaluation that matters: jump traces → *SCover / CAS / RIR / SCI* and failure taxonomy

---

What’s Inside:
• A practical Jump contract: inputs/required obs, scope, candidate generation, chooser policy, outputs, memory writes
• Gate sequence: *OBS → eval_pre → (sandbox) → ETH → commit → RML trace → ledger*
• Jump taxonomy: read-only / advisory / effectful / irreversible, and how to treat each
• Safety patterns: conservative defaults, human-in-loop, break-glass, and “publish_result=false” sandboxes
• Testing: golden traces, property tests, chaos drills, and “why this jump?” explainability hooks

---

📖 Structured Intelligence Engineering Series
this is the *how-to-implement / how-to-operate* layer for Jumps as atomic, auditable moves.

Recently active users