Jeremy Hill
K-ai-Innovations
AI & ML interests
None yet
Recent Activity
reacted
to
samerzaher80's
post
with ๐
14 days ago
Need Help Getting arXiv Endorsement for My AI Research Paper
Hi everyone,
I hope you're doing well. Iโm trying to publish my new AI research paper on arXiv under the cs.AI category, but I currently need an endorser who is already authorized for cs.AI submissions.
If anyone here is registered as a cs.AI endorser and is willing to help, I would truly appreciate it.
Here is the official arXiv endorsement request link:
๐ https://arxiv.org/auth/endorse?x=EZEMO7
(Backup: http://arxiv.org/auth/endorse.php โ Code: EZEMO7)
My research:
Itโs part of the AetherMind project โ a self-reflective NLI reasoning system inspired by human cognitive consistency and used also in Alzheimerโs research. If needed, I can share the abstract or full PDF.
Thank you so much to anyone who can support.
โ Sameer S.Najm
reacted
to
nouamanetazi's
post
with ๐
about 1 month ago
After training ๐๐ฆ๐จ๐ฅ๐๐๐ on ๐๐๐ ๐๐๐๐๐ฌ for nearly a month, I've come to realize something most people overlook: ๐ข๐ง๐๐ซ๐๐ฌ๐ญ๐ซ๐ฎ๐๐ญ๐ฎ๐ซ๐ ๐ข๐ฌ ๐ญ๐ก๐ ๐ฆ๐๐ค๐-๐จ๐ซ-๐๐ซ๐๐๐ค ๐๐๐๐ญ๐จ๐ซ ๐ข๐ง ๐๐๐ ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ . ๐ฅ
Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious ๐๐๐๐ ๐๐ซ๐ซ๐จ๐ซ๐ฌ, or when your expensive GPU cluster is running at ๐๐% ๐๐๐๐ข๐๐ข๐๐ง๐๐ฒ, the problem isn't your model. It's most probably a ๐ฆ๐ข๐ฌ๐ฎ๐ฌ๐ ๐จ๐ ๐ญ๐ก๐ ๐ก๐๐ซ๐๐ฐ๐๐ซ๐. ๐ ๏ธ
Questions that seemed simple but had no clear answers: Why is ๐๐จ๐ ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ ๐ฌ๐ฅ๐จ๐ฐ๐๐ซ ๐ญ๐ก๐๐ง ๐๐๐ง๐ฌ๐ ๐ฆ๐จ๐๐๐ฅ๐ฌ? Which ๐๐๐๐ ๐๐ฅ๐๐ ๐ฌ should we actually set? How often should we checkpoint without killing throughput?
That's why we built ๐๐ก๐ ๐๐ฆ๐จ๐ฅ ๐๐ซ๐๐ข๐ง๐ข๐ง๐ ๐๐ฅ๐๐ฒ๐๐จ๐จ๐ค ๐: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the ๐ข๐ง๐๐ซ๐๐ฌ๐ญ๐ซ๐ฎ๐๐ญ๐ฎ๐ซ๐ ๐ฅ๐๐ฒ๐๐ซ that most teams get wrong.
We validated real vs theoretical bandwidth across the entire stack: ๐๐๐๐ ๐ก๐ข๐ญ๐ญ๐ข๐ง๐ ๐ ๐๐/๐ฌ, ๐๐๐๐ข๐ง๐ค ๐.๐ ๐ซ๐๐๐๐ก๐ข๐ง๐ ๐๐๐ ๐๐/๐ฌ, ๐๐๐๐ ๐๐๐ง๐ ๐๐ญ ๐๐.๐ ๐๐/๐ฌ. Then we ran collective operations across ๐๐๐ ๐๐๐๐ฌ (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from ๐๐๐ ๐๐/๐ฌ on a single node to ๐๐๐-๐๐๐ ๐๐/๐ฌ across 16 nodes.
If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.
๐๐ก๐ ๐๐ฆ๐จ๐ฅ ๐๐ซ๐๐ข๐ง๐ข๐ง๐ ๐๐ฅ๐๐ฒ๐๐จ๐จ๐ค: https://lnkd.in/e5MKXUHS
Shared with โค๏ธ by the HuggingFace team
reacted
to
nouamanetazi's
post
with ๐ค
about 1 month ago
After training ๐๐ฆ๐จ๐ฅ๐๐๐ on ๐๐๐ ๐๐๐๐๐ฌ for nearly a month, I've come to realize something most people overlook: ๐ข๐ง๐๐ซ๐๐ฌ๐ญ๐ซ๐ฎ๐๐ญ๐ฎ๐ซ๐ ๐ข๐ฌ ๐ญ๐ก๐ ๐ฆ๐๐ค๐-๐จ๐ซ-๐๐ซ๐๐๐ค ๐๐๐๐ญ๐จ๐ซ ๐ข๐ง ๐๐๐ ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ . ๐ฅ
Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious ๐๐๐๐ ๐๐ซ๐ซ๐จ๐ซ๐ฌ, or when your expensive GPU cluster is running at ๐๐% ๐๐๐๐ข๐๐ข๐๐ง๐๐ฒ, the problem isn't your model. It's most probably a ๐ฆ๐ข๐ฌ๐ฎ๐ฌ๐ ๐จ๐ ๐ญ๐ก๐ ๐ก๐๐ซ๐๐ฐ๐๐ซ๐. ๐ ๏ธ
Questions that seemed simple but had no clear answers: Why is ๐๐จ๐ ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ ๐ฌ๐ฅ๐จ๐ฐ๐๐ซ ๐ญ๐ก๐๐ง ๐๐๐ง๐ฌ๐ ๐ฆ๐จ๐๐๐ฅ๐ฌ? Which ๐๐๐๐ ๐๐ฅ๐๐ ๐ฌ should we actually set? How often should we checkpoint without killing throughput?
That's why we built ๐๐ก๐ ๐๐ฆ๐จ๐ฅ ๐๐ซ๐๐ข๐ง๐ข๐ง๐ ๐๐ฅ๐๐ฒ๐๐จ๐จ๐ค ๐: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the ๐ข๐ง๐๐ซ๐๐ฌ๐ญ๐ซ๐ฎ๐๐ญ๐ฎ๐ซ๐ ๐ฅ๐๐ฒ๐๐ซ that most teams get wrong.
We validated real vs theoretical bandwidth across the entire stack: ๐๐๐๐ ๐ก๐ข๐ญ๐ญ๐ข๐ง๐ ๐ ๐๐/๐ฌ, ๐๐๐๐ข๐ง๐ค ๐.๐ ๐ซ๐๐๐๐ก๐ข๐ง๐ ๐๐๐ ๐๐/๐ฌ, ๐๐๐๐ ๐๐๐ง๐ ๐๐ญ ๐๐.๐ ๐๐/๐ฌ. Then we ran collective operations across ๐๐๐ ๐๐๐๐ฌ (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from ๐๐๐ ๐๐/๐ฌ on a single node to ๐๐๐-๐๐๐ ๐๐/๐ฌ across 16 nodes.
If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.
๐๐ก๐ ๐๐ฆ๐จ๐ฅ ๐๐ซ๐๐ข๐ง๐ข๐ง๐ ๐๐ฅ๐๐ฒ๐๐จ๐จ๐ค: https://lnkd.in/e5MKXUHS
Shared with โค๏ธ by the HuggingFace team
Organizations
None yet