403780 results (page 6 of 16152)
-
A Dataset and Evaluation for Complex 4D Markerless Human Motion Capture
Marker-based motion capture (MoCap) systems have long been the gold standard for accurate 4D human modeling, yet their reliance on specialized hardware and markers limits scalability and real-world deployment. Advancing reliable markerless 4D human motion capture requires datasets that reflect the complexity of real-world human interactions. Yet, existing benchmarks often lack realistic multi-pers…
-
ARGOS: Who, Where, and When in Agentic Multi-Camera Person Search
We introduce ARGOS, the first benchmark and framework that reformulates multi-camera person search as an interactive reasoning problem requiring an agent to plan, question, and eliminate candidates under information asymmetry. An ARGOS agent receives a vague witness statement and must decide what to ask, when to invoke spatial or temporal tools, and how to interpret ambiguous responses, all within…
-
GF-Score: Certified Class-Conditional Robustness Evaluation with Fairness Guarantees
Adversarial robustness is essential for deploying neural networks in safety-critical applications, yet standard evaluation methods either require expensive adversarial attacks or report only a single aggregate score that obscures how robustness is distributed across classes. We introduce the \emph{GF-Score} (GREAT-Fairness Score), a framework that decomposes the certified GREAT Score into per-clas…
-
Reliability-Guided Depth Fusion for Glare-Resilient Navigation Costmaps
Specular glare on reflective floors and glass surfaces frequently corrupts RGB-D depth measurements, producing holes and spikes that accumulate as persistent phantom obstacles in occupancy-grid costmaps. This paper proposes a glare-resilient costmap construction method based on explicit depth-reliability modeling. A lightweight Depth Reliability Map (DRM) estimator predicts per-pixel measurement t…
-
Scaling In-Context Segmentation with Hierarchical Supervision
In-context learning (ICL) enables medical image segmentation models to adapt to new anatomical structures from limited examples, reducing the clinical annotation burden. However, standard ICL methods typically rely on dense, global cross-attention, which scales poorly with image resolution. While recent approaches have introduced localized attention mechanisms, they often lack explicit supervision…
-
Generating Effective CoT Traces for Mitigating Causal Hallucination
Although large language models (LLMs) excel in complex reasoning tasks, they suffer from severe causal hallucination in event causality identification (ECI), particularly in smaller models ($\leq$1.5B parameters). A promising approach to address this issue is to fine-tune them with Chain-of-Thought (CoT) traces. However, there is currently a lack of CoT trace dataset available for ECI. In this pap…
-
Short Version of VERIFAI2026 Paper -- Learning Infused Formal Reasoning: Contract Synthesis, Artefact Reuse and Semantic Foundations
Artificial intelligence systems have achieved remarkable capability in natural language processing, perception and decision-making tasks. However, their behaviour often remains opaque and difficult to verify, limiting their applicability in safety-critical systems. Formal methods provide mathematically rigorous mechanisms for specifying and verifying system behaviour, yet the creation and maintena…
-
Stress Detection Using Wearable Physiological and Sociometric Sensors
Stress remains a significant social problem for individuals in modern societies. This paper presents a machine learning approach for the automatic detection of stress of people in a social situation by combining two sensor systems that capture physiological and social responses. We compare the performance using different classifiers including support vector machine, AdaBoost, and k-nearest neighbo…
-
Universal NER v2: Towards a Massively Multilingual Named Entity Recognition Benchmark
While multilingual language models promise to bring the benefits of LLMs to speakers of many languages, gold-standard evaluation benchmarks in most languages to interrogate these assumptions remain scarce. The Universal NER project, now entering its fourth year, is dedicated to building gold-standard multilingual Named Entity Recognition (NER) benchmark datasets. Inspired by existing massively mul…
-
Can AI Tools Transform Low-Demand Math Tasks? An Evaluation of Task Modification Capabilities
While recent research has explored AI tools' ability to classify the quality of mathematical tasks (arXiv:2603.03512), little is known about their capacity to increase the quality of existing tasks. This study investigated whether AI tools could successfully upgrade low-cognitive-demand mathematics tasks. Eleven tools were tested, including six broadly available, general-purpose AI tools (e.g., Ch…
-
Evaluating Differential Privacy Against Membership Inference in Federated Learning: Insights from the NIST Genomics Red Team Challenge
While Federated Learning (FL) mitigates direct data exposure, the resulting trained models remain susceptible to membership inference attacks (MIAs). This paper presents an empirical evaluation of Differential Privacy (DP) as a defense mechanism against MIAs in FL, leveraging the environment of the 2025 NIST Genomics Privacy-Preserving Federated Learning (PPFL) Red Teaming Event. To improve infere…
-
Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Sequence-Level Likelihood
Group Relative Policy Optimization (GRPO) has significantly advanced the reasoning ability of large language models (LLMs), particularly in their mathemat ical reasoning performance. However, GRPO and related entropy regularization methods still struggle with token-level sparse-rewards, which is an inherent chal lenge in chain-of-thought (CoT) reasoning. These approaches often rely on undifferen t…
-
AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition
LLM-based multimodal emotion recognition relies on static parametric memory and often hallucinates when interpreting nuanced affective states. In this paper, given that single-round retrieval-augmented generation is highly susceptible to modal ambiguity and therefore struggles to capture complex affective dependencies across modalities, we introduce AffectAgent, an affect-oriented multi-agent retr…
-
Transformer Based Machine Fault Detection From Audio Input
In recent years, Sound AI is being increasingly used to predict machine failures. By attaching a microphone to the machine of interest, one can get real time data on machine behavior from the field. Traditionally, Convolutional Neural Net (CNN) architectures have been used to analyze spectrogram images generated from the sounds captured and predict if the machine is functioning as expected. CNN ar…
-
On Higher-Order Geometric Refinements of Classical Covariance Asymptotics: An Approach via Intrinsic and Extrinsic Information Geometry
Classical Fisher-information asymptotics describe the covariance of regular efficient estimators through the local quadratic approximation of the log-likelihood, and thus capture first-order geometry only. In curved models, including mixtures, curved exponential families, latent-variable models, and manifold-constrained parameter spaces, finite-sample behavior can deviate systematically from these…
-
InsightFlow: LLM-Driven Synthesis of Patient Narratives for Mental Health into Causal Models
Clinical case formulation organizes patient symptoms and psychosocial factors into causal models, often using the 5P framework. However, constructing such graphs from therapy transcripts is time consuming and varies across clinicians. We present InsightFlow, an LLM based approach that automatically generates 5P aligned causal graphs from patient-therapist dialogues. Using 46 psychotherapy intake t…
-
Stability and Geometry of Attractors in Neural Cellular Automata
Throughout the literature on Neural Cellular Automata (NCAs), it is often taken for granted that the systems learn attractors. This is shown through evolving the system for many timesteps and noting visual similarity to the goal state. There remain many questions after such an analysis. Namely, what kind of attractors do we have? Is their behavior ordered or chaotic? Can we estimate stability over…
-
Monte Carlo Stochastic Depth for Uncertainty Estimation in Deep Learning
The deployment of deep neural networks in safety-critical systems necessitates reliable and efficient uncertainty quantification (UQ). A practical and widespread strategy for UQ is repurposing stochastic regularizers as scalable approximate Bayesian inference methods, such as Monte Carlo Dropout (MCD) and MC-DropBlock (MCDB). However, this paradigm remains under-explored for Stochastic Depth (SD),…
-
Transferable Expertise for Autonomous Agents via Real-World Case-Based Learning
LLM-based autonomous agents perform well on general reasoning tasks but still struggle to reliably use task structure, key constraints, and prior experience in complex real-world settings. We propose a case-based learning framework that converts experience from past tasks into reusable knowledge assets, allowing agents to transfer prior case experience to new tasks and perform more structured anal…
-
Illuminating the Local Universe: Large-Scale Structure from ZTF Type Ia Supernovae
Within the volume-limited subsample at $z<0.06$ of the Zwicky Transient Facility (ZTF) DR2 sample, we confirm a statistically significant excess of Type Ia supernovae (SNe Ia) at $z \simeq 0.02$-$0.04$, previously reported but not explained by survey selection effects. Forward simulations assuming a uniform volumetric SN Ia rate and realistic ZTF detection efficiencies fail to reproduce the featur…
-
Information-Theoretic Optimization for Task-Adapted Compressed Sensing Magnetic Resonance Imaging
Task-adapted compressed sensing magnetic resonance imaging (CS-MRI) is emerging to address the specific demands of downstream clinical tasks with significantly fewer k-space measurements than required by Nyquist sampling. However, existing task-adapted CS-MRI methods suffer from the uncertainty problem for medical diagnosis and cannot achieve adaptive sampling in end-to-end optimization with recon…
-
The undetectable fraction of core-collapse supernovae in luminous infrared galaxies -- II. GSAOI/GeMS dataset
Core-collapse supernovae (CCSNe) in luminous infrared galaxies (LIRGs) can have extreme line-of-sight host galaxy dust extinctions, which leads to a large fraction of the events remaining undetected by optical and infrared surveys. This population of undetected CCSNe is important to constrain in order to determine the cosmic CCSN rates. Our aim is to confirm and refine our estimates for the undete…
-
MISID: A Multimodal Multi-turn Dataset for Complex Intent Recognition in Strategic Deception Games
Understanding human intent in complex multi-turn interactions remains a fundamental challenge in human-computer interaction and behavioral analysis. While existing intent recognition datasets focus mainly on single utterances or simple dialogues, real-world scenarios often involve sophisticated strategic interactions where participants must maintain complex deceptive narratives over extended perio…
-
Risk-Calibrated Learning: Minimizing Fatal Errors in Medical AI
Deep learning models often achieve expert-level accuracy in medical image classification but suffer from a critical flaw: semantic incoherence. These high-confidence mistakes that are semantically incoherent (e.g., classifying a malignant tumor as benign) fundamentally differ from acceptable errors which stem from visual ambiguity. Unlike safe, fine-grained disagreements, these fatal failures erod…
-
Graviton Production from Inflaton Condensate: Boltzmann vs Bogoliubov
We study graviton production from an oscillating inflaton condensate during reheating by systematically comparing Boltzmann and Bogoliubov descriptions for inflaton potentials of the form $V(φ)\proptoφ^n$ around the minimum. The Bogoliubov framework provides a unified description of graviton production, capturing both perturbative and non-perturbative effects across short and long wavelengths, whe…