Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.07998

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 29
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31

Interesting findings

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 127
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31
OG-RAG: Ontology-Grounded Retrieval-Augmented Generation For Large Language Models

Paper • 2412.15235 • Published Dec 12, 2024

Images are Worth Variable Length of Representations

Paper • 2506.03643 • Published Jun 4 • 4
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31
Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 159

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 34
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 27
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 126
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 22

Running

14

PyVision

💬

14

Ask questions about images to get detailed responses
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8, 2024 • 172
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31

VideoDeepResearch: Long Video Understanding With Agentic Tool Using

Paper • 2506.10821 • Published Jun 12 • 19
Jan-nano Technical Report

Paper • 2506.22760 • Published Jun 28 • 9
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25 • 64
WebSailor: Navigating Super-human Reasoning for Web Agent

Paper • 2507.02592 • Published Jul 3 • 123

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published May 30, 2024 • 20
Spectrally Pruned Gaussian Fields with Neural Compensation

Paper • 2405.00676 • Published May 1, 2024 • 10
Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28, 2024 • 29
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29, 2024 • 121

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 29
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

Running

14

PyVision

💬

14

Ask questions about images to get detailed responses
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31

PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8, 2024 • 172
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31

Interesting findings

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 127
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31
OG-RAG: Ontology-Grounded Retrieval-Augmented Generation For Large Language Models

Paper • 2412.15235 • Published Dec 12, 2024

VideoDeepResearch: Long Video Understanding With Agentic Tool Using

Paper • 2506.10821 • Published Jun 12 • 19
Jan-nano Technical Report

Paper • 2506.22760 • Published Jun 28 • 9
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25 • 64
WebSailor: Navigating Super-human Reasoning for Web Agent

Paper • 2507.02592 • Published Jul 3 • 123

Images are Worth Variable Length of Representations

Paper • 2506.03643 • Published Jun 4 • 4
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31
Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 159

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 34
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 27
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 126
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 22

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published May 30, 2024 • 20
Spectrally Pruned Gaussian Fields with Neural Compensation

Paper • 2405.00676 • Published May 1, 2024 • 10
Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28, 2024 • 29
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29, 2024 • 121

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs