OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs

본문 미리보기

arXiv:2606.07577v1 Announce Type: new Abstract: Audio-visual large language models (LLMs) hold strong promise for long-form video understanding, yet their long-video inference is fundamentally limited by the linear growth of video tokens and key-value (KV) caches. We present OmniMem, a memory-efficient streaming framework designed specifically for audio-visual LLMs. Unlike existing compression methods that treat all tokens uniformly, OmniMem introduces a modality-aware memory allocation strateg

OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs

본문 미리보기

관련 글

PathoSage: Towards Multi-Source Evidence Adjudication in Pathology via Experience-Aware Agentic Workflow

Syll: Open-Source Personal Automation with Cross-Surface Execution

A case study of evaluating AI agents on a neuroscience data-to-discovery pipeline

Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning