AI & ML interests

Waifu and Husbando Research

lunarfluΒ 
posted an update 28 days ago
lunarfluΒ 
posted an update 28 days ago
view post
Post
485
The new King πŸ‘‘has arrived!

Moonshot AI now the top model on Hugging Face πŸ”₯
moonshotai/Kimi-K2-Thinking
lunarfluΒ 
posted an update 28 days ago
view post
Post
2660
πŸ’ΈπŸ€‘You don’t need 100 GPUs to train something amazing!

Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!

Check out the #1 trending space on πŸ€— :
HuggingFaceTB/smol-training-playbook
s3nhΒ 
posted an update about 2 months ago
view post
Post
530
Eduhelp with more empathy, based on model finetuned on
psychotheraputic preferences just landed on


Beck-8B as a base model, 13000 steps on educational dataset.
Time to go further and build more πŸ₯°
s3nh/EduHelp_Beck_8B
Thanks to @basilic_ai for computations <3
s3nhΒ 
posted an update about 2 months ago
view post
Post
4100
Just tried to create an educational assistant for younger people who can struggle with visualsation of 'what is this sorcery all about'.
Its first step of my spare time projects, sft on Qwen3-8B,

EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.

s3nh/EduHelp-8B

Glad to share my work, have a wonderful day!
  • 2 replies
Β·
lunarfluΒ 
posted an update 2 months ago
view post
Post
2268
Cool stuff these past weeks on huggingface! πŸ€— πŸš€ !
β€’ πŸ“ˆTrackio, local-first W&B alternative
https://github.com/gradio-app/trackio/issues
β€’ 🌍EmbeddingGemma, 300M-param, multilingual embeddings, on-device
https://huggingface.co/blog/embeddinggemma
β€’ πŸ’»Open LLMs in VS Code (Inference Providers)
https://x.com/reach_vb/status/1966185427582497171
β€’ πŸ€–Smol2Operator GUI agents
https://huggingface.co/blog/smol2operator
β€’ πŸ–ΌοΈGradio visible watermarking
https://huggingface.co/blog/watermarking-with-gradio
KaraKaraWitchΒ 
posted an update 4 months ago
view post
Post
562
What if LLMs used thinking emojis to develop their state?

:blob_think: Normal Thinking
:thinkies: Casual Thinking
:Thonk: Serious Thinking
:think_bold: Critical Thinking
:thinkspin: Research Thinking
:thinkgod: Deep Research Thinking

The last 2 are gifs. But the upload doesn't render them :)

(Credits: SwayStar123 on EAI suggested it to be a range selector, Original base idea was from me)
  • 1 reply
Β·
KaraKaraWitchΒ 
posted an update 6 months ago
view post
Post
347
"What's wrong with using huggingface transformers?"

Here's a quick example. Am I supposed to be going in with the full knowledge of the inner workings of a LLM model?
import pathlib
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("<ModernBERT>")
# Triton is **required**, but no where in the documentation is specified that triton is needed.
# Installing triton in windows isn't super straightforward. Thankfully someone has already built wheels for it.
#  - https://github.com/woct0rdho/triton-windows/releases

model = AutoModelForSequenceClassification.from_pretrained(
    "<ModernBERT>",  # reference_compile=False
)
# By default it uses CPU. Which is slow. Move to a cuda device.
# This will actually error out if you use "gpu" instead.
model = model.to("cuda")


with torch.no_grad():
    # Not setting `return_tensors="pt"` causes
    #   File "C:\Program Files\Python310\lib\site-packages\transformers\modeling_utils.py", line 5311, in warn_if_padding_and_no_attention_mask
    #     if self.config.pad_token_id in input_ids[:, [-1, 0]]:
    #   TypeError: list indices must be integers or slices, not tuple
    # or...
    #  File "C:\Program Files\Python310\lib\site-packages\transformers\models\modernbert\modeling_modernbert.py", line 836, in forward
    #    batch_size, seq_len = input_ids.shape[:2]
    #  AttributeError: 'list' object has no attribute 'shape'
    block = tokenizer(
        pathlib.Path("test-fic.txt").read_text("utf-8"), return_tensors="pt"
    )
    block = block.to("cuda")
    # **block is needed to fix "AttributeError: 'NoneType' object has no attribute 'unsqueeze'" on attention_mask.unsqueeze(-1)
    logits = model(**block).logits

# Not moving to cpu will cause the sigmoid/softmax ops to fail.
logits = logits.to("cpu")
# print(logits)
predicted_class_ids = torch.softmax(logits, -1)[
    0
].numpy()

  • 3 replies
Β·
KaraKaraWitchΒ 
posted an update 7 months ago
view post
Post
2762
> New Model
> Looks at Model Card
> "Open-Weights"
  • 1 reply
Β·
ajibawa-2023Β 
posted an update 8 months ago
view post
Post
4533
Hi All, I recently released two Audio datasets which are generated using my earlier released dataset: ajibawa-2023/Children-Stories-Collection

First Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection-Large has 5600++ stories in .mp3 format.

Second Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection has 600 stories in .mp3 format.
Β·
s3nhΒ 
posted an update 12 months ago
view post
Post
2540
Welcome back,

Small Language Models Enthusiasts and GPU Poor oss enjoyers lets connect.
Just created an organization which main target is to have fun with smaller models tuneable on consumer range GPUs, feel free to join and lets have some fun, much love ;3

SmolTuners
Β·
lunarfluΒ 
posted an update about 1 year ago
ajibawa-2023Β 
posted an update about 1 year ago
view post
Post
3864
New Dataset: Software-Architecture
Link: ajibawa-2023/Software-Architecture

I am releasing a Large Dataset covering topics related to Software-Architecture. This dataset consists of around 450,000 lines of data in jsonl.

I have included following topics:

Architectural Frameworks

Architectural Patterns for Reliability

Architectural Patterns for Scalability

Architectural Patterns

Architectural Quality Attributes

Architectural Testing

Architectural Views

Architectural Decision-Making

Advanced Research

Cloud-Based Architectures

Component-Based Architecture

Data Architecture

Emerging Trends

Event-Driven Architecture

Evolvability and Maintainability

Microservices and Monolithic

Microservices Architecture

Security Architecture

Service-Oriented Architecture

Software Design Principles

and Many More!

This dataset is useful in LLM development. Also those who are working on developing Software development related LLMs then this dataset can be useful.

This dataset is very useful to Researchers as well.
Β·
lunarfluΒ 
posted an update over 1 year ago