Advancing Creative Physical Intelligence in Large Multimodal Models | AIChainDay

🇰🇷 한국어 요약by Claude · 2026. 5. 27.

MM-CreativityBench는 시각적으로 풍부하고 물리적 제약이 있는 환경에서 어포던스(affordance) 기반 창의적 도구 사용 능력을 평가하는 벤치마크다. 현재 대형 멀티모달 모델(LMM)들은 생성 능력 부족이 아니라 근거 기반 탐색을 지속하지 못해 실패하며, 관련 객체를 누락하거나 이미지에 없는 속성을 환각하는 경향이 있다. 이를 해결하기 위해 DPO(Direct Preference Optimization) 기반 어포던스 정렬과 어포던스 지식 베이스 감독 학습을 결합해, 올바른 객체·부품 선택에서 일관된 성능 향상과 함께 할루시네이션 및 근거 오류를 크게 줄였다. 개방형 시각 환경에서 창의적 문제 해결 능력의 새로운 평가 기준을 제시한다.

•LMM의 창의적 도구 사용 실패 원인은 생성 능력 부족이 아닌 '근거 기반 탐색 지속 실패' — 관련 객체 누락, 부품 검토 부족, 이미지에 없는 속성 환각.
•MM-CreativityBench: 구조화된 시점과 후보 개체 뷰로 인터랙티브 세분화 평가를 지원하는 어포던스 기반 창의적 문제 해결 벤치마크.
•DPO 기반 어포던스 정렬: 시각 증거에 근거한 속성-어포던스 추론을 선호하도록 학습해 할루시네이션 감소.
•어포던스 지식 베이스 감독학습으로 다회차 계획 및 광범위한 개체 탐색 개선, 할루시네이션 크게 감소.

AI2026년 5월 27일AI 점수: 93%

Advancing Creative Physical Intelligence in Large Multimodal Models

출처:arXiv cs.AI

✨ AI 인사이트

🧑‍💻 개발자

1.MM-CreativityBench: 시각적 물리 제약 환경에서 LMM의 창의적 도구 활용 능력 평가 벤치마크 도입
2.현 LMM들은 관련 객체 누락·속성 환각으로 그라운디드 탐색을 지속하지 못하는 것으로 확인
3.DPO 기반 어포던스 그라운딩 정렬으로 올바른 엔티티·부품 선택 향상으로 환각 크게 감소

💡

왜 중요한가?

LMM의 지각·추론이 창의적 비선형 문제 해결에 충분하지 않음을 실증하고, DPO 기반 어포던스 정렬이라는 구체적 해법을 제시해 로봇·구현 AI 연구에 실질적으로 기여한다.

🏷️ 언급 프로젝트

MM-CreativityBench

본문 미리보기

arXiv:2605.26396v1 Announce Type: new Abstract: Large multimodal models (LMMs) have rapidly advanced in perception and reasoning; however, it remains unclear whether these capabilities generalize to discovering visually grounded solutions in open-ended environments, beyond pattern recognition. In such settings, intelligence requires more than answering well-posed questions: it involves identifying how elements in a scene can be repurposed in non-obvious yet physically feasible ways. This form o

전체 내용이 궁금하다면?

원문을 직접 읽어보세요

원문 보기

#대형 멀티모달 모델#물리적 추론#시각적 근거#창의적 지능

8시간 전

Thousand Token Wood: shipping a multi-agent economy on a 3B model

🏢공식HuggingFace Blog

원문

1일 전

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

arXiv:2606. 05384v1 Announce Type: new Abstract: LLM-as-judge evaluation is widely used in benchmarking pipelines, where model outputs are compared and ranked using automated evaluators. These pipelines typically assume that judgments are stable properties of fixed inputs. We show that this assumpti

📰미디어arXiv cs.AI

원문

Advancing Creative Physical Intelligence in Large Multimodal Models

본문 미리보기

관련 글

Thousand Token Wood: shipping a multi-agent economy on a 3B model

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)

SentinelBench: A Benchmark for Long-Running Monitoring Agents