Constraint-Data-Value-Maximization: Utilizing Data Attribution for Effective Data Pruning in Low-Data Environments | AIChainDay

🇰🇷 한국어 요약by Claude · 2026. 5. 13.

기존 Shapley 기반 데이터 값은 데이터가 극히 적은 환경에서 저가치 데이터 제거에 최적화되지 않음을 보였습니다. 이를 해결하기 위해 제약 데이터 가치 극대화(CDVM) 기법을 제안하며, 전체 영향력 극대화와 과도한 기여 패널티를 동시에 고려하는 제약 최적화로 소량 데이터 환경에서 강인한 성능을 달성합니다. OpenDataVal 벤치마크에서 강인한 성능과 경쟁력 있는 실행 속도를 보였습니다.

•기존 Shapley 기반 데이터 값은 소량 데이터 환경에서 저가치 데이터 제거에 최적화되지 않습니다.
•CDVM은 전체 영향력 극대화와 과도한 기여 패널티를 동시 고려하는 제약 최적화 방식을 사용합니다.
•OpenDataVal 벤치마크에서 강인한 성능과 경쟁력 있는 실행 속도를 달성했습니다.

AI2026년 5월 13일AI 점수: 92%

Constraint-Data-Value-Maximization: Utilizing Data Attribution for Effective Data Pruning in Low-Data Environments

출처:arXiv cs.AI

✨ AI 인사이트

🧑‍💻 개발자

1.데이터 귀인을 활용한 저데이터 환경 프루닝을 위한 CDVM 접근법 소개
2.기존 Shapley 기반 데이터 값이 소량 데이터 환경에서 최적이 아님을 실증
3.제약 최적화로 총 영향력 최대화하면서 과도한 기여를 패널티화
4.OpenDataVal 벤치마크에서 강력한 성능과 경쟁력 있는 런타임 기록

💡

왜 중요한가?

데이터가 부족한 환경에서 모델 학습을 위한 최적 데이터 선별 방법을 개선하여, 효율적인 ML 시스템 구축에 실질적 도움을 제공한다.

🏷️ 언급 프로젝트

CDVM

본문 미리보기

arXiv:2605.11312v1 Announce Type: new Abstract: Attributing model behavior to training data is an evolving research field. A common benchmark is data removal, which involves eliminating data instances with either low or high values, then assessing a model's performance trained on the modified dataset. Many existing studies leverage Shapley-based data values for this task. In this paper, we demonstrate that these data values are not optimally suited for pruning low-value data when only a limited

전체 내용이 궁금하다면?

원문을 직접 읽어보세요

원문 보기

#데이터 프루닝#데이터 귀속#ML 훈련#Shapley 값

9시간 전

Thousand Token Wood: shipping a multi-agent economy on a 3B model

#다중 에이전트#AI 모델#에이전트 경제

🏢공식HuggingFace Blog

원문

1일 전

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

arXiv:2606. 05384v1 Announce Type: new Abstract: LLM-as-judge evaluation is widely used in benchmarking pipelines, where model outputs are compared and ranked using automated evaluators. These pipelines typically assume that judgments are stable properties of fixed inputs. We show that this assumpti

#LLM 평가#견고성#조작 가능성

📰미디어arXiv cs.AI

원문

Constraint-Data-Value-Maximization: Utilizing Data Attribution for Effective Data Pruning in Low-Data Environments

본문 미리보기

관련 글

Thousand Token Wood: shipping a multi-agent economy on a 3B model

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment