Constraint acquisition needs better benchmarks | AIChainDay

🇰🇷 한국어 요약by Claude · 2026. 5. 27.

제약 획득(Constraint Acquisition) 연구의 발전을 저해하는 부적절한 벤치마크 문제를 해결하기 위한 MPMMine 벤치마크 스위트를 제안한다. 기존 벤치마크는 솔버 평가용으로 설계돼 CA 알고리즘에 필요한 도메인 지식 아티팩트가 없고, 문제별 처리가 불일치해 재현성과 비교 가능성이 낮다. MPMMine은 일관성·표준화·완전성·확장성·개방성·버전 관리를 원칙으로 MiniZinc·CommonMark·JSON 개방 형식을 채택했다. 문제당 여러 모델, 모델당 수십 개 인스턴스, 정수·연속 도메인의 수천 개 솔루션·비솔루션, 자연어 설명을 제공해 텍스트-모델 변환 연구도 지원한다.

•기존 CA 벤치마크의 결함: 솔버 평가용 설계, 느슨한 조직화, 문제별 불일치 처리, CA 알고리즘 필수 도메인 지식 아티팩트 부재.
•MPMMine 원칙: 일관성·표준화·완전성·확장성·개방성·버전 관리로 재현성과 비교 가능성 확보.
•개방 형식(MiniZinc, CommonMark, JSON) 채택, 문제당 여러 모델와 모델당 수십 개 인스턴스 제공.
•정수·연속 도메인 수청 개 솔루션/비솔루션 및 자연어 설명 포함해 텍스트-모델 변환 연구도 지원.

AI2026년 5월 27일AI 점수: 90%

Constraint acquisition needs better benchmarks

출처:arXiv cs.AI

✨ AI 인사이트

🧑‍💻 개발자

1.MPMMine: CA 알고리즘 평가를 위한 표준화된 MP 벤치마크 스위트 공개
2.MiniZinc·CommonMark·JSON 오픈 포맷, 문제당 다중 모델에 수체 인스턴스와 수슬 대 비해 제공
3.자연어 설명 포함해 텍스트-모델 방법론을 지원, CA 연구의 재현성·비교 가능성 문제 해결

💡

왜 중요한가?

CA 연구의 재현성과 비교 가능성을 가로막던 벤치마크 부재 문제를 직접 해결해, 수학 프로그래밍과 자연어 처리를 결합하는 연구 영역의 발전을 가속할 수 있다.

🏷️ 언급 프로젝트

MPMMine

본문 미리보기

arXiv:2605.26279v1 Announce Type: new Abstract: Constraint Acquisition (CA) and related research on the validation and enhancement of Mathematical Programming (MP) models from domain knowledge artifacts are currently limited by inadequate benchmarks. This deficiency impedes reproducibility and cross-study comparability, slowing the maturation of CA methods. Existing benchmarks were designed for solver evaluation rather than for assessing CA algorithms. They are loosely organized, treat individu

전체 내용이 궁금하다면?

원문을 직접 읽어보세요

원문 보기

#제약 습득#벤치마크#수학적 프로그래밍#평가 프레임워크

8시간 전

Thousand Token Wood: shipping a multi-agent economy on a 3B model

🏢공식HuggingFace Blog

원문

1일 전

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

arXiv:2606. 05384v1 Announce Type: new Abstract: LLM-as-judge evaluation is widely used in benchmarking pipelines, where model outputs are compared and ranked using automated evaluators. These pipelines typically assume that judgments are stable properties of fixed inputs. We show that this assumpti

📰미디어arXiv cs.AI

원문

Constraint acquisition needs better benchmarks

본문 미리보기

관련 글

Thousand Token Wood: shipping a multi-agent economy on a 3B model

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)

SentinelBench: A Benchmark for Long-Running Monitoring Agents