Towards Verifiable Agentic Data Science: Solving Irregular TSQA Via Tool-Grounded Reasoning | AIChainDay

🇰🇷 한국어 요약by Claude · 2026. 6. 16.

이 연구는 실세계 시계열이 비동기적이고 결측이 정보적이며 샘플링 빈도가 제각각인 불규칙(irregular) 특성을 갖는데도 기존 시계열 질의응답(TSQA) 벤치마크가 대부분 규칙적 샘플링을 가정하는 공백을 지적한다. 이를 메우기 위해 13개 도메인에 걸쳐 10개 과제 유형, 1,700개 질문으로 구성된 IRTS-ToolBench를 도입한다. 표준화된 입력과 재현 가능한 평가 프로토콜을 제공해 LLM 기반 불규칙 시계열 분석 연구자가 독립적으로 활용할 수 있다. 도구 기반(tool-grounded) 추론으로 검증 가능한 에이전트형 데이터 과학을 평가하는 토대를 제시한다.

•기존 TSQA 벤치마크가 규칙적 샘플링을 가정해 불규칙 시계열을 다루지 못하는 공백 지적
•13개 도메인·10개 과제 유형·1,700개 질문의 IRTS-ToolBench 도입
•표준화 입력과 재현 가능한 평가 프로토콜 제공
•도구 기반 추론으로 검증 가능한 에이전트형 데이터 과학 평가 토대 마련

AI2026년 6월 16일

Towards Verifiable Agentic Data Science: Solving Irregular TSQA Via Tool-Grounded Reasoning

출처:arXiv cs.AI

본문 미리보기

arXiv:2606.15107v1 Announce Type: new Abstract: Time series data in real-world deployments is overwhelmingly irregular. Observations are asynchronous, missing values are informative rather than random, and sampling frequencies vary across sensors and operational windows. However, existing Time Series Question Answering (TSQA) benchmarks mostly assume regularly sampled inputs, leaving a fundamental gap in understanding how large language models (LLMs) and AI agents perform under irregular condit

전체 내용이 궁금하다면?

원문을 직접 읽어보세요

원문 보기

2시간 전

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

arXiv:2606. 14885v1 Announce Type: new Abstract: Agentic search over large corpora relies on retriever-mediated interfaces (e. g. , BM25 or ColBERT) for scalable candidate discovery. While effective at ranking relevant documents, these interfaces expose evidence only as ranked results or bounded doc

📰미디어arXiv cs.AI

원문

Towards Verifiable Agentic Data Science: Solving Irregular TSQA Via Tool-Grounded Reasoning

본문 미리보기

관련 글

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

Relational Structural Causal Models

Cognitive Debt: AI as Intellectual Leverage and the Dynamics of Systemic Fragility

A Definition of Good Explanations and the Challenges Explaining LLM Outputs