Tyler Williams PRO

unmodeled-tyler

https://unmodeledtyler.com

AI & ML interests

Founder of Quanta Intellect/VANTA Research - Looking to get in touch? Head to my website!

Recent Activity

reacted to qgallouedec's post with 🔥 12 minutes ago

@CohereLabs just released 🌿 Tiny Aya: a fully open-source 3B parameter model that speaks 70+ languages 🌍! But there’s a catch: Tiny Aya is just a language model. It doesn’t support tool calling, the key capability that turns frontier models into powerful *agents*. So the real question is: How hard is it to turn Tiny Aya into an agent? Turns out… it’s simple, thanks to Hugging Face TRL. We’re sharing a hands-on example showing how to train Tiny Aya to turn it into a tool-calling agent using TRL, unlocking what could become the first *massively multilingual open agent*. Small model. Global reach. Agent capabilities. 👉 https://github.com/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb

reacted to alvdansen's post with 🔥 13 minutes ago

Just open-sourced LoRA Gym with Timothy - production-ready training pipeline for character, motion, aesthetic, and style LoRAs on Wan 2.1/2.2, built on musubi-tuner. 16 training templates across Modal (serverless) and RunPod (bare metal) covering T2V, I2V, Lightning-merged, and vanilla variants. Our current experimentation focus is Wan 2.2, which is why we built on musubi-tuner (kohya-ss). Wan 2.2's DiT uses a Mixture-of-Experts architecture with two separate experts gated by a hard timestep switch - you're training two LoRAs per concept, one for high-noise (composition/motion) and one for low-noise (texture/identity), and loading both at inference. Musubi handles this dual-expert training natively, and our templates build on top of it to manage the correct timestep boundaries, precision settings, and flow shift values so you don't have to debug those yourself. We've also documented bug fixes for undocumented issues in musubi-tuner and validated hyperparameter defaults derived from cross-referencing multiple practitioners' results rather than untested community defaults. Also releasing our auto-captioning toolkit for the first time. Per-LoRA-type captioning strategies for characters, styles, motion, and objects. Gemini (free) or Replicate backends. Current hyperparameters reflect consolidated community findings. We've started our own refinement and plan to release specific recommendations and methodology as soon as next week. Repo: github.com/alvdansen/lora-gym

reacted to AbstractPhil's post with 👀 about 17 hours ago

The Rosetta Stone geometric vocabulary and the ramping up capacity. What makes this particular invariant special, is the existence within all structures I've tested so far. I had Claude write up the direct article based on what we built together, but I've tested it on many substructures. This is flawed, and I have a series of answers to making it more accurate. First a reconstruction from the ground up. This means each shape is specifically built upward from the substructure to the point of inductive deviance. This will be less quick at first and then build speed as I optimize like the last system did. The "saddle" problem; the system detected saddles because there wasn't enough deviance in the shapes to attenuate to more cardinality and more aligned substructures. The blobs were around 30-40% of the overall patches, which interpolated into the others produced a fair approximation. It MOST DEFINITELY did see those shapes in their voxel complexity. This is real. https://claude.ai/public/artifacts/bf1256c7-726d-4943-88ad-d6addb263b8b You can play with a public claude artifact dedicated to viewing the current shape spectrum - and with that know exactly why it's flawed. The flawed and repetitive shapes. I rapid prototyped and there are multiple redundant shapes that simply don't classify well or at all. Not to mention the rotation simply doesn't help much of the time, or doesn't exist with many shapes. This will be rectified in the next variation. Projecting to shared latent space as a catalyst to allow growing subjective geoflow matched step variance, rather than simply direct classification. This will theoretically allow for full channel-to-channel invariant features to be mapped from structure to structure, and the very formula that encapsulated them to be directly baked into the math rather than classified as a substructure analysis. There are many challenges between here and there, so stay tuned my friends as I plot the geometric language of pretrained AI.

View all activity

Organizations

Posts 35

Post

2414

NEW MODEL: vanta-research/PE-Type-4-Solene-4B

PE-Type-4-Solene-4B is the fourth release in Project Enneagram from VANTA Research, an initiative to study nuance in AI persona design wherein each of the 9 Enneagram types will be finetuned on the Gemma3 4B architecture.

Solene is finetuned to exhibit the Individualist profile as defined by the Enneagram Institute; emotional honesty/depth, growth & transformation intelligence, and creative expression.

As with the other releases in this project, Solene is perfect for research applications, persona exploration, or self-improvement.

Type 5 soon!

View all Posts