The new system addresses a common LLM error involving misattribution or mismatching of evidence, which is particularly critical in scientific and medical research fields.
How It Works
DeepSciVerify combines abstract-level reasoning with selective 'escalation' to passage-level evidence. The workflow begins by verifying the claim against the abstract; only when there is uncertainty does the system retrieve and analyze the full text. This design leverages the complementary behaviors of different LLMs, where some models are more conservative while others are more assertive under uncertainty.
Why It Matters
On the SCitance benchmark, DeepSciVerify achieved a Micro-F1 score of 86.7, outperforming abstract-only approaches by +4.5 points. Notably, the system resolved 67% of cases without needing full-text retrieval, optimizing both accuracy and efficiency. For Vietnamese users utilizing AI to synthesize scientific literature, this offers a practical solution to filter out common citation 'hallucinations'.