Archon

Browse and search harvested arxiv metadata.

1273993 results (page 115 of 50960)

Linear equivalence of nonlinear recurrent neural networks

2604.23489 cond-mat.dis-nn 2026-04-26 PDF (arxiv)

David G. Clark

Large nonlinear recurrent neural networks with random couplings generate high-dimensional, potentially chaotic activity whose structure is of interest in neuroscience, machine learning, ecology, and other fields. A fundamental object encoding the collective structure of this activity is the $N \times N$ covariance matrix. Prior analytical work on the covariance matrix has been limited to low-dimen…

Open PDF (arxiv)
Do Synthetic Trajectories Reflect Real Reward Hacking? A Systematic Study on Monitoring In-the-Wild Hacking in Code Generation

2604.23488 cs.LG 2026-04-26 PDF (arxiv)

Lichen Li, Hengguang Zhou, Yijun Liang, Tianyi Zhou, Cho-Jui Hsieh

Reward hacking in code generation, where models exploit evaluation loopholes to obtain full reward without correctly solving the tasks, poses a critical challenge for Reinforcement Learning (RL) and the deployment of reasoning models. Existing studies have been conducted primarily on synthetic hacking trajectories. However, whether these synthetic behaviors faithfully represent naturally emerging …

Open PDF (arxiv)
Optimality Conditions and Numerical Algorithms for a Class of Minimax Bilevel Optimization Problems

2604.23487 math.OC 2026-04-26 PDF (arxiv)

Yaling Hu, Jiani Wang, Yu-hong Dai, Xiaojiao Tong

In many applications, including Stackelberg games, machine learning, and power systems \cite{Mackay2018Selftuning,Heinrich1952The,Wang2021Bi-Level}, the decisions in a minimax optimization problem can be constrained by a solution to an optimization problem. In this paper, we introduce optimality conditions of this novel minimax bilevel optimization problem and develops efficient first-order algori…

Open PDF (arxiv)
Your Students Don't Use LLMs Like You Wish They Did

2604.23486 cs.CL 2026-04-26 PDF (arxiv)

Sebastian Kobler, Matthew Clemson, Angela Sun, Jonathan K. Kummerfeld

Educational NLP systems are typically evaluated using engagement metrics and satisfaction surveys, which are at best a proxy for meeting pedagogical goals. We introduce six computational metrics for automated evaluation of pedagogical alignment in student-AI dialogue. We validate our metrics through analysis of 12,650 messages across 500 conversations from four courses. Using our metrics, we ident…

Open PDF (arxiv)
Agentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines

2604.23483 cs.AI 2026-04-26 PDF (arxiv)

Mazal Bethany, Kim-Kwang Raymond Choo, Nishant Vishwamitra, Peyman Najafirad

Multi-component natural language processing (NLP) pipelines are increasingly deployed for high-stakes decisions, yet no existing adversarial method can test their robustness under realistic conditions: binary-only feedback, no gradient access, and strict query budgets. We formalize this strict black-box threat model and propose a two-agent evasion framework operating in a semantic perturbation spa…

Open PDF (arxiv)
Leveraging Spatial Transcriptomics as Alternative to Manual Annotations for Deep Learning-Based Nuclei Analysis

2604.23481 cs.CV 2026-04-26 PDF (arxiv)

Kazuya Nishimura, Ryoma Bise, Haruka Hirose, Yasuhiro Kojima

Deep learning-based nuclei segmentation and classification in pathology images typically rely on large-scale pixel-level manual annotations, which are costly and difficult to obtain across diverse tissues and staining conditions. To address this limitation, we propose a framework that leverages spatial transcriptomics (ST) data as supervision for nuclei segmentation and classification. By incorpor…

Open PDF (arxiv)
Resource-Constrained Shortest Path with Polytopic Reset Sets

2604.23480 eess.SY 2026-04-26 PDF (arxiv)

Khaled Surur, Melkior Ornik

This paper investigates the problem of computing the shortest path between two states under resource constraints in environments with resource-replenishment regions. Namely, the length of the path is limited by a budget that can be restored within polytopic replenishment regions. We show that the optimal path in this problem exhibits a distinct geometric structure: it consists of straight-line seg…

Open PDF (arxiv)
JudgeSense: A Benchmark for Prompt Sensitivity in LLM-as-a-Judge Systems

2604.23478 cs.CL 2026-04-26 PDF (arxiv)

Rohith Reddy Bellibatlu

Large language models are increasingly deployed as automated judges for evaluating other models, yet the stability of their verdicts under semantically equivalent prompt paraphrases remains unmeasured. We introduce JudgeSense, a framework and benchmark for quantifying this property via the Judge Sensitivity Score (JSS), defined as the fraction of paraphrase pairs on which a judge returns an identi…

Open PDF (arxiv)
SEMA-SQL: Beyond Traditional Relational Querying with Large Language Models

2604.23477 cs.DB 2026-04-26 PDF (arxiv)

Yin Lin, Tianjing Zeng, Zhongjun Ding, Rong Zhu, Bolin Ding, H. V. Jagadish, Jingren Zhou

Relational databases excel at structured data analysis, but real-world queries increasingly require capabilities beyond standard SQL, such as semantically matching entities across inconsistent names, extracting information not explicitly stored in schemas, and analyzing unstructured text. While text-to-SQL systems enable natural language querying, they remain limited to relational operations and c…

Open PDF (arxiv)
Supernodes and Halos: Loss-Critical Hubs in LLM Feed-Forward Layers

2604.23475 cs.LG 2026-04-26 PDF (arxiv)

Audrey Cherilyn, Houman Safaai

We study the organization of channel-level importance in transformer feed-forward networks (FFNs). Using a Fisher-style loss proxy (LP) based on activation-gradient second moments, we show that loss sensitivity is concentrated in a small set of channels within each layer. In Llama-3.1-8B, the top 1% of channels per layer accounts for a median of 58.7% of LP mass, with a range of 33.0% to 86.1%. We…

Open PDF (arxiv)
GeoCert: Certified Geometric AI for Reliable Forecasting

2604.23474 cs.LG 2026-04-25 PDF (arxiv)

Regina Zhang, Zongru Li, Honggang Wen, Xiaofeng Liu, Siu-Ming Yiu, Pietro Liò, Kwok-Yan Lam

Forecasting systems in science must be accurate, physically consistent, and certifiably reliable. Most existing models address prediction, constraint enforcement, and verification separately, limiting scalability and interpretability. We introduce GeoCert, a geometric AI framework that unifies forecasting, physical reasoning, and formal verification within a single differentiable computation. GeoC…

Open PDF (arxiv)
Escher-Loop: Mutual Evolution by Closed-Loop Self-Referential Optimization

2604.23472 cs.AI 2026-04-25 PDF (arxiv)

Ziyang Liu, Xinyan Guo, Xuchen Wei, Han Hao, Liu Yang

While recent autonomous agents demonstrate impressive capabilities, they predominantly rely on manually scripted workflows and handcrafted heuristics, inherently limiting their potential for open-ended improvement. To address this, we propose Escher-Loop, a fully closed-loop framework that operationalizes the mutual evolution of two distinct populations: Task Agents that solve concrete problems, a…

Open PDF (arxiv)
Can Humans Detect AI? Mining Textual Signals of AI-Assisted Writing Under Varying Scrutiny Conditions

2604.23471 cs.HC 2026-04-25 PDF (arxiv)

Daniel Tabach

This study asks whether the threat of AI detection changes how people write with AI, and whether other people can tell the difference. In a two-phase controlled experiment, 21 participants wrote opinion pieces on remote work using an AI chatbot. Half were randomly warned that their submission would be scanned by an AI detection tool. The other half received no warning. Both groups had access to th…

Open PDF (arxiv)
Estimation of MIDAS Regressions with Errors-in-the-Variables

2604.23469 stat.ME 2026-04-25 PDF (arxiv)

Sukhbir Kaur, Sukhbir Singh, Kanchan Jain, Pooja Soni

In this paper, a Mixed Data Sampling (MIDAS) model is studied when both low and high frequency variables are contaminated with measurement error. It is shown that the profile likelihood estimator becomes inconsistent in the presence of measurement error. Using the corrected score approach along with profile likelihood approach, a consistent estimator for parameters of MIDAS Measurement Error model…

Open PDF (arxiv)
A Milestone in Formalization: The Sphere Packing Problem in Dimension 8

2604.23468 math.MG 2026-04-25 PDF (arxiv)

Sidharth Hariharan, Christopher Birkbeck, Seewoo Lee, Ho Kiu Gareth Ma, Bhavik Mehta, Auguste Poiroux, Maryna Viazovska

In 2016, Viazovska famously solved the sphere packing problem in dimension $8$, using modular forms to construct a 'magic' function satisfying optimality conditions determined by Cohn and Elkies in 2003. In March 2024, Hariharan and Viazovska launched a project to formalize this solution and related mathematical facts in the Lean Theorem Prover. A significant milestone was achieved in February 202…

Open PDF (arxiv)
Hybrid JIT-CUDA Graph Optimization for Low-Latency Large Language Model Inference

2604.23467 cs.LG 2026-04-25 PDF (arxiv)

Divakar Kumar Yadav, Tian Zhao

Large Language Models (LLMs) have achieved strong performance across natural language and multimodal tasks, yet their practical deployment remains constrained by inference latency and kernel launch overhead, particularly in interactive, short-sequence settings. This paper presents a hybrid runtime framework that combines Just-In-Time (JIT) compilation with CUDA Graph execution to reduce launch ove…

Open PDF (arxiv)
V.O.I.C.E (Voice, Ownership, Identity, Control, Expression): Risk Taxonomy of Synthetic Voice Generation From Empirical Data

2604.24794 cs.CR 2026-04-25 PDF (arxiv)

Tanusree Sharma, Anish Krishnagiri, Lili Dudas, Ahmed Adnan, Visar Berisha

As generative voice models are rapidly advancing in both capabilities and public utilization, the unconsented collection, reuse, and synthesis of voice data are introducing new classes of privacy, security and governance risk that are poorly captured by existing, largely uniform threat models. To fill the gap, we present V.O.I.C.E, a taxonomy of voice generation risk grounded in a multi-source thr…

Open PDF (arxiv)
Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

2604.23466 cs.LG 2026-04-25 PDF (arxiv)

Divakar Kumar Yadav, Tian Zhao, Deepak Kumar

NVIDIA's CUDA Tile (CuTile) introduces a Python-based, tile-centric abstraction for GPU kernel development that aims to simplify programming while retaining Tensor Core and Tensor Memory Accelerator (TMA) efficiency on modern GPUs. We present the first independent, cross-architecture evaluation of CuTile against established approaches such as cuBLAS, Triton, WMMA, and raw SIMT on three NVIDIA GPUs…

Open PDF (arxiv)
Machine learning models for estimating counterfactuals in a single-arm inflammatory bowel disease study

2604.23465 cs.LG 2026-04-25 PDF (arxiv)

Dan Liu, Fida K. Dankar, Jennifer C. deBruyn, Amanda Ricciuto, Anne M. Griffiths, Thomas D. Walters, Khaled EI Emam

Single-arm trials accelerate study timelines by reducing the number of patients that must be recruited for a concurrent control group. However, these designs require an alternative comparator to estimate treatment effects. One approach is to construct a virtual control arm using a machine learning (ML) model trained on external control data to predict the counterfactual outcomes of the treatment a…

Open PDF (arxiv)
On cross-validation for small area estimators

2604.23464 stat.ME 2026-04-25 PDF (arxiv)

Qianyu Dong, Zehang Richard Li

Subnational monitoring of public health often relies on household surveys where data are sparse at the desired spatial resolution. Small area estimation (SAE) methods address this challenge by borrowing strength across areas and incorporating auxiliary information. However, comparing these estimators remains difficult in the absence of ground truth. We propose a cross-validation framework for eval…

Open PDF (arxiv)
A theory of ROC analysis of rule-out and rule-in diagnostics with applications to mammography data

2604.23463 stat.ME 2026-04-25 PDF (arxiv)

Michelle Mastrianni, Kwok Lung Fan, Yee Lam Elim Thompson, Jessie J. J. Gommers, Ioannis Sechopoulos, Fredrik Strand, Weijie Chen, Gary Levine, Mukul Sherekar, Frank W. Samuelson

Multiple diagnostic tests are frequently used to determine the presence of a disease condition in patients. In this paper, we use bivariate copulas to examine the properties of receiver operating characteristic (ROC) curves formed when two correlated diagnostic tests are used together to rule-out ("believe the negative") and rule-in ("believe the positive") patients for disease. We use this theory…

Open PDF (arxiv)
Scaling limit of Sinkhorn-rescaled Random Matrices via Stability of Static Schrödinger Bridges

2604.23461 math.PR 2026-04-25 PDF (arxiv)

Danny Duan, Hanbaek Lyu, William Powell

We analyze the asymptotic behavior and scaling limits of large random matrices rescaled via the Sinkhorn algorithm to match prescribed row and column margins. For a random matrix with independent sub-exponential entries, we show that its Sinkhorn rescaling concentrates around the rescaling of its mean matrix, both at the level of the Schrödinger potentials and as random measures on the unit square…

Open PDF (arxiv)
Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models

2604.23460 cs.AI 2026-04-25 PDF (arxiv)

Sharan Ramjee

Chain-of-Thought (CoT) reasoning has emerged as a key technique for eliciting complex reasoning in Large Language Models (LLMs). Although interpretable, its dependence on natural language limits the model's expressive bandwidth. Continuous thought models address this bottleneck by reasoning in latent space rather than human-readable tokens. While they enable richer representations and faster infer…

Open PDF (arxiv)
Architecture Matters for Multi-Agent Security

2604.23459 cs.MA 2026-04-25 PDF (arxiv)

Ben Hagag, William L. Anderson, Christian Schroeder de Witt, Sarah Scheffler

Multi-agent systems (MAS), composed of networks of two or more autonomous AI agents, have become increasingly popular in production deployments, yet introduce security risks that do not arise in single-agent settings. Even if individual agents exhibit robust security, architectural decisions governing their coordination can create attack surfaces that have not been systematically characterized. In…

Open PDF (arxiv)
A Benchmark Suite of Reddit-Derived Datasets for Mental Health Detection

2604.23458 cs.CL 2026-04-25 PDF (arxiv)

Khalid Hasan, Jamil Saquer

The growing availability of online support groups has opened up new windows to study mental health through natural language processing (NLP). However, it is hindered by a lack of high-quality, well-validated datasets. Existing studies have a tendency to build task-specific corpora without collecting them into widely available resources, and this makes reproducibility as well as cross-task comparis…

Open PDF (arxiv)