Tag

#Arxiv

15 English Kalera News articles tagged Arxiv — source-backed.

AI · tools-ai Jun 9, 2026

MPMMine: A New Benchmark Suite for Constraint Acquisition in Mathematical Programming

MPMMine is introduced to provide a standardized evaluation framework for algorithms that discover and validate mathematical programming (MP) models.

Sources arxiv.org

AI Jun 9, 2026

Error Control Solutions for LLMs in Virtual Laboratory Workflows

A new study proposes a framework to mitigate errors and uncertainty when using LLMs to automate experimental procedures in virtual environments.

Sources arxiv.org

AI · tools-ai Jun 9, 2026

AI: Speeding Up Guardrails 12x via "Latent Reasoning"

The new COLAGUARD model addresses the safety-speed trade-off in guardrailing large language models. Instead of requiring explicit reasoning which causes high latency, COLAGUARD shifts the multi-step reasoning process into the latent space during inference. Results show that the model significantly improves F1 scores compared to Llama Guard 3, while being 12.9x faster and consuming 22.4x fewer tokens.

Sources arxiv.org

AI Jun 7, 2026

Discovery reveals LLMs 'capitulate' under user pressure 🧠

An arXiv study reveals that LLMs easily compromise correct results under user pressure, while proposing COLAGUARD as a highly effective security solution.

Sources arxiv.org arxiv.org arxiv.org

AI Jun 7, 2026

New studies untangle reinforcement learning (RL) bottlenecks 🤖

Studies on arXiv propose solutions for sim-to-real transfer, off-policy optimization, and opponent behavior shaping in multi-agent environments.

Sources arxiv.org arxiv.org arxiv.org

AI · tools-ai Jun 5, 2026

AI Agent Evolution: From Brick Building to the Challenge of 'Aging'

A new series of studies on AI agents focuses on physical feasibility (BrickAnything) and maintaining long-term system performance.

Sources arxiv.org arxiv.org arxiv.org

AI · tools-ai Jun 5, 2026

AI: Securing Autonomous Agents with Out-of-Band Data

Redpanda introduces the Agentic Data Plane (ADP), an architecture that utilizes out-of-band metadata channels to manage security for autonomous AI agents. Instead of relying on agents to handle access policies directly, ADP pushes security contexts and audit trails out of their control. This helps prevent risks from agent hallucinations or manipulation, ensuring compliance with data rights and execution policies even in complex tasks like financial portfolio management.

Sources arxiv.org

AI · tools-ai Jun 5, 2026

AI: LLM Agents Can Break the "Bottleneck" of Biological Phenotype Annotation

New research shows that LLM-based AI agents (Anthropic, OpenAI) are capable of annotating biological phenotype data with accuracy comparable to human experts. This has traditionally been a highly specialized and time-consuming process, causing a bottleneck in evolutionary biology research. Agents equipped with a self-contained workspace (research PDFs, annotation guidelines, ontologies) achieved performance that far exceeds traditional NLP tools.

Sources arxiv.org

AI Jun 1, 2026

UniScale: Jointly Optimizing Model Routing and Test-Time Scaling

UniScale is an online framework that unifies model routing and test-time scaling into a single optimization space, achieving a better balance between quality and cost.

Sources arxiv.org

AI Jun 1, 2026

Decoupling Updateability and Benefitability in Self-Evolving LLM Agents

Research from arXiv (2605.30621) indicates that an agent's ability to update its "harness" does not necessarily mean it will benefit from it. Mid-tier models typically benefit the most from self-evolution.

Sources arxiv.org

AI Jun 1, 2026

Safe Reinforcement Learning for Autonomous Driving via Expert Advice

Proposing an uncertainty-aware framework to guide exploration in reinforcement learning for autonomous vehicles, helping to avoid collisions during training.

Sources arxiv.org

AI Jun 1, 2026

AdaCoM: Adaptive Context Management for Long-Horizon AI Agent Tasks

AdaCoM trains an external LLM to manage context for a "frozen" agent, mitigating the degradation of reasoning capabilities in overextended contexts.

Sources arxiv.org

AI Jun 1, 2026

COMPASS: Process Alignment for Safe Search Agents

COMPASS uses MCTS for the safety alignment of search agents, detecting malicious intents disguised as seemingly harmless sub-queries.

Sources arxiv.org

AI May 28, 2026

Optimizing Multi-Turn Conversations with Calibrated Interactive RL

New research proposes the Calibrated Interactive RL framework to mitigate distribution shift and behavioral bias in conversational LLMs.

Sources arxiv.org

AI May 27, 2026

New Studies Reveal the True Cognitive Limits of LLMs

Multiple new studies published on arXiv have simultaneously exposed significant flaws in the self-awareness, mathematical reasoning, and logical thinking of large language models.

Sources arxiv.org arxiv.org arxiv.org