T2D-Bench: Evidence-Gated Evaluation of LLM Outputs for Type 2 Diabetes Using a Multi-Layer Clinical-Lifestyle Knowledge Graph | AIChainDay

🇰🇷 한국어 요약by Claude · 2026. 6. 24.

T2D-Bench는 LLM의 제2형 당뇨병 권고가 명시적이고 그래프로 검증 가능한 근거 요건을 충족하는지 시험하는 재현 가능 벤치마크이자 근거 게이트 평가 프레임워크다. LLM은 임상적으로 유창한 권고를 내놓으면서도 가이드라인 제약을 어기거나 생활습관-혈당 주장을 명시적으로 정당화하지 못하는 문제가 있다. T2D-Bench는 생의학 축(UMLS·DrugBank·SIDER), 계산 가능한 ADA 진료표준 규칙, 기전 다리로 혈당 검사 효과와 연결된 생활습관 지식을 결합한 다층 임상-생활습관 지식그래프 위에 구축됐다. 진단·약물 안전·적대적 생활습관 충돌을 아우르는 100개 구조화 시나리오에서 GPT-4o-mini는 35%, GPT-4o는 33%가 근거 경로 검사에 실패했다. 근거 게이트는 근거 없는 누락을 탐지하고 제약된 수정으로 검증기 수준 준수를 달성하게 한다.

•UMLS·DrugBank·SIDER·ADA 규칙·생활습관을 결합한 다층 임상-생활습관 지식그래프 기반
•100개 구조화 시나리오(진단·약물 안전·생활습관 충돌)로 그래프 검증 가능 근거 요건 평가
•GPT-4o-mini 35%, GPT-4o 33%가 근거 경로 검사에 실패
•근거 게이트가 근거 없는 누락을 탐지하고 제약된 수정으로 준수 달성
•계산 가능한 근거 제약이 임상 누락을 명시·측정·교정 가능하게 만듦

AI2026년 6월 24일

T2D-Bench: Evidence-Gated Evaluation of LLM Outputs for Type 2 Diabetes Using a Multi-Layer Clinical-Lifestyle Knowledge Graph

출처:arXiv cs.AI

본문 미리보기

arXiv:2606.24145v1 Announce Type: new Abstract: Large language models (LLMs) can produce clinically fluent recommendations for type 2 diabetes while failing to satisfy guideline constraints or explicitly justify lifestyle-related glycemic claims. We present T2D-Bench, a reproducible benchmark and evidence-gated evaluation framework for testing whether LLM outputs satisfy explicit, graph-checkable evidence requirements. T2D-Bench is built on a multi-layer clinical-lifestyle knowledge graph that

전체 내용이 궁금하다면?

원문을 직접 읽어보세요

원문 보기

2시간 전

Critique of Agent Model

arXiv:2606. 23991v1 Announce Type: new Abstract: What is an agent? What constitutes agency? With the rise of Large Language Model (LLM) systems marketed as ``coding agents'', ``AI co-scientists'', and other ``agentic" tools that promise to drive up productivity, and at the same time, ``existential"

📰미디어arXiv cs.AI

원문

T2D-Bench: Evidence-Gated Evaluation of LLM Outputs for Type 2 Diabetes Using a Multi-Layer Clinical-Lifestyle Knowledge Graph

본문 미리보기

관련 글

Critique of Agent Model

Can Language Model Agents be Helpful Circuit Explainers in Mechanistic Interpretability?

Beyond Trajectory Imitation: Strategy-Guided Policy Optimization for LLM Reasoning

Exploring Academic Influence of Algorithms by Co-occurrence Network Based on Full-text of Academic Papers