OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs
본문 미리보기
arXiv:2606.07577v1 Announce Type: new Abstract: Audio-visual large language models (LLMs) hold strong promise for long-form video understanding, yet their long-video inference is fundamentally limited by the linear growth of video tokens and key-value (KV) caches. We present OmniMem, a memory-efficient streaming framework designed specifically for audio-visual LLMs. Unlike existing compression methods that treat all tokens uniformly, OmniMem introduces a modality-aware memory allocation strateg
전체 내용이 궁금하다면?
원문을 직접 읽어보세요