waifu-research-department (The Waifu Research Department)

nnilayy

authored a paper 17 days ago

A Brain Wave Encodes a Thousand Tokens: Modeling Inter-Cortical Neural Interactions for Effective EEG-based Emotion Recognition

Paper • 2511.13954 • Published 19 days ago • 3

lunarflu

posted an update 28 days ago

Post

633

The #1 trending AI/ML dataset today 🏆

Massive scale, diversity and end-to-end potential from nvidia !
nvidia/PhysicalAI-Autonomous-Vehicles

lunarflu

posted an update 28 days ago

Post

485

The new King 👑has arrived!

Moonshot AI now the top model on Hugging Face 🔥
moonshotai/Kimi-K2-Thinking

lunarflu

posted an update 28 days ago

Post

2660

💸🤑You don’t need 100 GPUs to train something amazing!

Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!

Check out the #1 trending space on 🤗 :
HuggingFaceTB/smol-training-playbook

s3nh

posted an update about 2 months ago

Post

530

Eduhelp with more empathy, based on model finetuned on
psychotheraputic preferences just landed on

Beck-8B as a base model, 13000 steps on educational dataset.
Time to go further and build more 🥰
s3nh/EduHelp_Beck_8B
Thanks to @basilic_ai for computations <3

s3nh

posted an update about 2 months ago

Post

4100

Just tried to create an educational assistant for younger people who can struggle with visualsation of 'what is this sorcery all about'.
Its first step of my spare time projects, sft on Qwen3-8B,

EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.

s3nh/EduHelp-8B

Glad to share my work, have a wonderful day!

2 replies

·

nnilayy

authored a paper about 2 months ago

Bridging Text and Video Generation: A Survey

Paper • 2510.04999 • Published Oct 6 • 4

lunarflu

posted an update 2 months ago

Post

2268

Cool stuff these past weeks on huggingface! 🤗 🚀 !
• 📈Trackio, local-first W&B alternative
https://github.com/gradio-app/trackio/issues
• 🌍EmbeddingGemma, 300M-param, multilingual embeddings, on-device
https://huggingface.co/blog/embeddinggemma
• 💻Open LLMs in VS Code (Inference Providers)
https://x.com/reach_vb/status/1966185427582497171
• 🤖Smol2Operator GUI agents
https://huggingface.co/blog/smol2operator
• 🖼️Gradio visible watermarking
https://huggingface.co/blog/watermarking-with-gradio

KaraKaraWitch

posted an update 4 months ago

Post

562

What if LLMs used thinking emojis to develop their state?

:blob_think: Normal Thinking
:thinkies: Casual Thinking
:Thonk: Serious Thinking
:think_bold: Critical Thinking
:thinkspin: Research Thinking
:thinkgod: Deep Research Thinking

The last 2 are gifs. But the upload doesn't render them :)

(Credits: SwayStar123 on EAI suggested it to be a range selector, Original base idea was from me)

1 reply

·

Skylion007

authored a paper 6 months ago

The Diffusion Duality

Paper • 2506.10892 • Published Jun 12 • 37

KaraKaraWitch

posted an update 6 months ago

Post

347

"What's wrong with using huggingface transformers?"

Here's a quick example. Am I supposed to be going in with the full knowledge of the inner workings of a LLM model?

import pathlib
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("<ModernBERT>")
# Triton is **required**, but no where in the documentation is specified that triton is needed.
# Installing triton in windows isn't super straightforward. Thankfully someone has already built wheels for it.
#  - https://github.com/woct0rdho/triton-windows/releases

model = AutoModelForSequenceClassification.from_pretrained(
    "<ModernBERT>",  # reference_compile=False
)
# By default it uses CPU. Which is slow. Move to a cuda device.
# This will actually error out if you use "gpu" instead.
model = model.to("cuda")


with torch.no_grad():
    # Not setting `return_tensors="pt"` causes
    #   File "C:\Program Files\Python310\lib\site-packages\transformers\modeling_utils.py", line 5311, in warn_if_padding_and_no_attention_mask
    #     if self.config.pad_token_id in input_ids[:, [-1, 0]]:
    #   TypeError: list indices must be integers or slices, not tuple
    # or...
    #  File "C:\Program Files\Python310\lib\site-packages\transformers\models\modernbert\modeling_modernbert.py", line 836, in forward
    #    batch_size, seq_len = input_ids.shape[:2]
    #  AttributeError: 'list' object has no attribute 'shape'
    block = tokenizer(
        pathlib.Path("test-fic.txt").read_text("utf-8"), return_tensors="pt"
    )
    block = block.to("cuda")
    # **block is needed to fix "AttributeError: 'NoneType' object has no attribute 'unsqueeze'" on attention_mask.unsqueeze(-1)
    logits = model(**block).logits

# Not moving to cpu will cause the sigmoid/softmax ops to fail.
logits = logits.to("cpu")
# print(logits)
predicted_class_ids = torch.softmax(logits, -1)[
    0
].numpy()

3 replies

·

KaraKaraWitch

posted an update 7 months ago

Post

2762

> New Model
> Looks at Model Card
> "Open-Weights"

1 reply

·

ajibawa-2023

posted an update 8 months ago

Post

4533

Hi All, I recently released two Audio datasets which are generated using my earlier released dataset: ajibawa-2023/Children-Stories-Collection

First Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection-Large has 5600++ stories in .mp3 format.

Second Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection has 600 stories in .mp3 format.

3 replies

·

Skylion007

authored a paper 9 months ago

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published Mar 12 • 74

Skylion007

authored a paper 11 months ago

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9 • 95

s3nh

posted an update 12 months ago

Post

2540

Welcome back,

Small Language Models Enthusiasts and GPU Poor oss enjoyers lets connect.
Just created an organization which main target is to have fun with smaller models tuneable on consumer range GPUs, feel free to join and lets have some fun, much love ;3

SmolTuners

3 replies

·

lunarflu

posted an update about 1 year ago

Post

2755

great blogpost! 🔥@wolfram
https://huggingface.co/blog/wolfram/llm-comparison-test-2024-12-04

ajibawa-2023

posted an update about 1 year ago

Post

3864

New Dataset: Software-Architecture
Link: ajibawa-2023/Software-Architecture

I am releasing a Large Dataset covering topics related to Software-Architecture. This dataset consists of around 450,000 lines of data in jsonl.

I have included following topics:

Architectural Frameworks

Architectural Patterns for Reliability

Architectural Patterns for Scalability

Architectural Patterns

Architectural Quality Attributes

Architectural Testing

Architectural Views

Architectural Decision-Making

Advanced Research

Cloud-Based Architectures

Component-Based Architecture

Data Architecture

Emerging Trends

Event-Driven Architecture

Evolvability and Maintainability

Microservices and Monolithic

Microservices Architecture

Security Architecture

Service-Oriented Architecture

Software Design Principles

and Many More!

This dataset is useful in LLM development. Also those who are working on developing Software development related LLMs then this dataset can be useful.

This dataset is very useful to Researchers as well.

4 replies

·

Skylion007

authored a paper over 1 year ago

Self-Directed Synthetic Dialogues and Revisions Technical Report

Paper • 2407.18421 • Published Jul 25, 2024

lunarflu

posted an update over 1 year ago

Post

2138

@Blane187 could you please modify the title of your blogpost? content is cool, title could be nicer imo https://huggingface.co/blog/Blane187/wtf-is-rvc

3 replies

·

AI & ML interests

Team members 302

waifu-research-department's activity