VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Paper • 2411.04923 • Published Nov 7, 2024 • 24
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models Paper • 2311.13435 • Published Nov 22, 2023 • 19