Archon

Browse and search harvested arxiv metadata.

1273993 results (page 106 of 50960)

Resource-Lean Lexicon Induction for German Dialects

2604.23824 cs.CL 2026-04-26 PDF (arxiv)

Robert Litschko, Barbara Plank, Diego Frassinelli

Automatic induction of high-quality dictionaries is essential for building lexical resources, yet low-resource languages and dialects pose several challenges: limited access to annotators, high degree of spelling variations, and poor performance of large language models (LLMs). We empirically show that statistical models (random forests) trained on string similarity features are surprisingly effec…

Open PDF (arxiv)
A GLIMPSE of the 99%: a census of the faintest galaxies during the epoch reionization and its implications for galaxy formation models

2604.23823 astro-ph.GA 2026-04-26 PDF (arxiv)

Hakim Atek, Iryna Chemerynska, Lukas J. Furtak, Johan Richard, John Chisholm, Vasily Kokorev, Michelle Jecmen, Damien Korber, Ryan Endsley, Richard Pan, Arghyadeep Basu, Jeremy Blaizot, Rychard Bouwens, Meriam Ezziati, Sylvain Heurtier, Kristen. B. W. McQuinn, Marcie Mun, Julian B. Munoz, Pascal Oesch, Joakim Rosdahl, Alberto Saldana-Lopez, Seiji Fujimoto

We present a comprehensive study of the galaxy UV luminosity function (UVLF) at $z=6-9$ leveraging deep JWST observations from the GLIMPSE survey. Thanks to gravitational lensing, we probe the UVLF to an unprecedented depth of $M_{\text{UV}} = -12$ mag, approximately three magnitudes deeper than previous robust constraints. Our UVLF determination incorporates a rigorous end-to-end uncertainty fram…

Open PDF (arxiv)
KISS Sorcar: A Stupidly-Simple General-Purpose and Software Engineering AI Assistant

2604.23822 cs.SE 2026-04-26 PDF (arxiv)

Koushik Sen

Large language models can generate code and call tools with remarkable fluency, yet deploying them as practical software engineering assistants still expose stubborn gaps: finite context windows, single mistakes that derail entire sessions, agents that get stuck in dead ends, AI slop, and generated changes that are difficult to review or revert. We present KISS Sorcar, a general-purpose assistan…

Open PDF (arxiv)
Accelerating Quantum Materials Characterization: Hybrid Active Learning for Autonomous Spin Wave Spectroscopy

2604.23821 cond-mat.mtrl-sci 2026-04-26 PDF (arxiv)

William Ratcliff

Autonomous neutron spectroscopy must solve three distinct tasks: detection (where is the signal?), inference (which Hamiltonian governs it?), and refinement (what are the parameters?). No single controller solves all three equally well. We present TAS-AI, a hybrid agnostic-to-physics-informed framework for autonomous triple-axis spin-wave spectroscopy that separates these tasks explicitly. In blin…

Open PDF (arxiv)
On the Generalization Properties of Selective State-Space Models for Filtering Tasks for Unknown Systems

2604.23818 eess.SY 2026-04-26 PDF (arxiv)

Alex Tang, M. Emrullah Ildiz, Batin Kurt, Samet Oymak, Necmiye Ozay

Selective State-Space Models (SSMs) such as Mamba have emerged as an alternative architecture to self-attention based transformers in sequence modeling tasks. Recent works have demonstrated the use of transformers in some filtering and output prediction tasks via in-context learning. In this paper, we analyze whether structured SSMs can work equally well for filtering of unknown systems. In partic…

Open PDF (arxiv)
FUTURAL: A Metasearch Platform for Empowering Rural Areas with Smart Solutions

2604.23817 cs.IR 2026-04-26 PDF (arxiv)

Matei Popovici, Ciprian Dobre

The FUTURAL project aims to provide a comprehensive suite of digital Smart Solutions (SS) across five critical domains to address pressing social and environmental issues. Central to this initiative is a robust Metasearch platform, which will not only serve as the primary access point to FUTURAL's solutions but also facilitate the search and retrieval of SS developed by other initiatives. This pap…

Open PDF (arxiv)
Query2Diagram: Answering Developer Queries with UML Diagrams

2604.23816 cs.SE 2026-04-26 PDF (arxiv)

Oleg Baryshnikov, Anton M. Alekseev, Sergey I. Nikolenko

Software documentation frequently becomes outdated or fails to exist entirely, yet developers need focused views of their codebase to understand complex systems. While automated reverse engineering tools can generate UML diagrams from code, they produce overwhelming detail without considering developer intent. We introduce query-driven UML diagram generation, where LLMs create diagrams that direct…

Open PDF (arxiv)
DRACULA: Hunting for the Actions Users Want Deep Research Agents to Execute

2604.23815 cs.CL 2026-04-26 PDF (arxiv)

Nishant Balepur, Malachi Hamada, Varsha Kishore, Sergey Feldman, Amanpreet Singh, Pao Siangliulue, Joseph Chee Chang, Rachel Rudinger, Eunsol Choi, Jordan Lee Boyd-Graber, Doug Downey, Aakanksha Naik

Scientific Deep Research (DR) agents answer user queries by synthesizing research papers into multi-section reports. User feedback can improve their utility, but existing protocols only score the final report, making it hard to study and learn which intermediate actions DR agents should take to improve reports. We collect DRACULA, the first dataset with user feedback on intermediate actions for DR…

Open PDF (arxiv)
Mapping License Plate Recoverability Under Extreme Viewing Angles for Oppor-tunistic Urban Sensing

2604.23814 cs.CV 2026-04-26 PDF (arxiv)

Igor Adamenko, Orpaz Ben Aharon, Yehudit Aperstein, Alexander Apartsin

Urban environments contain many imaging sensors built for specific purposes, including ATM, body-worn, CCTV, and dashboard cameras. Under the opportunistic sensing paradigm, these sensors can be repurposed for secondary inference tasks such as license plate recognition. Yet objects of interest in such imagery are often noisy, low-resolution, and captured from extreme viewpoints. Recent advances in…

Open PDF (arxiv)
ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction

2604.23813 cs.CV 2026-04-26 PDF (arxiv)

Zichun Guo, Yuling Shi, Wenhao Zeng, Chao Hu, Haotian Lin, Terry Yue Zhuo, Jiawei Chen, Xiaodong Gu, Wenping Ma

Multimodal Large Language Models (MLLMs) have achieved remarkable performance in Visually Rich Document Understanding (VRDU) tasks, but their capabilities are mainly evaluated on pristine, well-structured document images. We consider content restoration from shredded fragments, a challenging VRDU setting that requires integrating visual pattern recognition with semantic reasoning under significant…

Open PDF (arxiv)
SeqShield: A Behavioral Analysis Approach to Uncover Rootkits

2604.23812 cs.CR 2026-04-26 PDF (arxiv)

Paras Ghodeshwar, Sandeep K Shukla, Anand Handa, Nitesh Kumar

Rootkits are among the most elusive types of malware, capable of bypassing traditional static analysis methods due to their metamorphic behavior. Signature-based detection techniques struggle against these threats, necessitating a shift toward dynamic analysis approaches. We propose SeqShield, a behavior-based rootkit detection approach designed specifically for the Windows OS, leveraging API call…

Open PDF (arxiv)
Similar Users-Augmented Interest Network

2604.23810 cs.IR 2026-04-26 PDF (arxiv)

Xiaolong Chen, Haoyi Zhao, Xu Huang, Defu Lian

Click-through rate (CTR) prediction is one of the core tasks in recommender systems. User behavior sequences, as one of the most effective features, can accurately reflect user preferences and significantly improve prediction accuracy. Richer behavior sequences often enable more comprehensive user profiling, and recent studies have shown that scaling the length of user behavior sequence can yield …

Open PDF (arxiv)
LegalDrill: Diagnosis-Driven Synthesis for Legal Reasoning in Small Language Models

2604.23809 cs.CL 2026-04-26 PDF (arxiv)

Tianchun Li, Haochen Liu, Vishwa Pardeshi, Xingchen Wang, Tianci Liu, Huijun Zhao, Wei Fan, Jing Gao

Small language models (SLMs) are promising for real-world deployment due to their efficiency and low operational cost. However, their limited capacity struggles with high-stakes legal reasoning tasks that require coherent statute interpretation and logically consistent deduction. Furthermore, training SLMs for such tasks demands high-quality, concise reasoning trajectories, which are prohibitively…

Open PDF (arxiv)
Compile-Time Tensor Shape Checking via Staged Shape-Dependent Types

2604.23807 cs.PL 2026-04-26 PDF (arxiv)

Takashi Suwa, Atsushi Igarashi

When writing programs involving matrices or tensors in general, it is desirable to rule out the inconsistency of tensor shapes (i.e., the generalization of matrix sizes) before actual computation. For this purpose, some languages provide dependent types such as Mat m n, and others offer refinement types to track predicates for shapes. Despite the theoretical maturity, however, such methods are oft…

Open PDF (arxiv)
Symmetric Equilibrium Propagation for Thermodynamic Diffusion Training

2604.23806 cs.LG 2026-04-26 PDF (arxiv)

Aditi De

The reverse process in score-based diffusion models is formally equivalent to overdamped Langevin dynamics in a time-dependent energy landscape. In our prior work we showed that a bilinearly-coupled analog substrate can physically realize this dynamics at a projected three-to-four orders of magnitude energy advantage over digital inference by replacing dense skip connections with low-rank inter-mo…

Open PDF (arxiv)
Reparameterization through Coverings and Topological Weight Priors

2604.23804 cs.LG 2026-04-26 PDF (arxiv)

Maxim Beketov, Pavel Snopov

We generalise the reparameterization trick applied in variational autoencoders (VAEs) letting these have latent spaces of non-trivial topology - i.e. that of base manifolds covered with other ones, on which some technique for RT is available. That is possible since covering maps are measurable - moreover, in case of particular measure preservation property holding for the covering, one can establi…

Open PDF (arxiv)
Bringing a Personal Point of View: Evaluating Dynamic 3D Gaussian Splatting for Egocentric Scene Reconstruction

2604.23803 cs.CV 2026-04-26 PDF (arxiv)

Jan Warchocki, Xi Wang, Jonas Kulhanek, Jan van Gemert

Egocentric video provides a unique view into human perception and interaction, with growing relevance for augmented reality, robotics, and assistive technologies. However, rapid camera motion and complex scene dynamics pose major challenges for 3D reconstruction from this perspective. While 3D Gaussian Splatting (3DGS) has become a state-of-the-art method for efficient, high-quality novel view syn…

Open PDF (arxiv)
EndoGov: A knowledge-governed multi-agent expert system for endometrial cancer risk stratification

2604.23802 cs.MA 2026-04-26 PDF (arxiv)

Weiye Dai, Liyun Shi, Zanxiang He, Yuling Ma, Mengyuan Lin, Dianxiang Sun, Liming Nie

Multimodal artificial intelligence models for endometrial cancer (EC) risk stratification typically optimize aggregate predictive performance but provide limited mechanisms for enforcing mandatory guideline overrides, such as assigning POLE-mutated tumors to the low-risk group despite high-grade morphology. We present EndoGov, a two-tier multi-agent expert system that factorizes the decision proce…

Open PDF (arxiv)
Domain Fine-Tuning vs. Retrieval-Augmented Generation for Medical Multiple-Choice Question Answering: A Controlled Comparison at the 4B-Parameter Scale

2604.23801 cs.CL 2026-04-26 PDF (arxiv)

Avi-ad Avraam Buskila

Practitioners deploying small open-weight large language models (LLMs) for medical question answering face a recurring design choice: invest in a domain-fine-tuned model, or keep a general-purpose model and inject domain knowledge at inference time via retrieval-augmented generation (RAG). We isolate this trade-off by holding model size, prompt template, decoding temperature, retrieval pipeline, a…

Open PDF (arxiv)
Causal Representation Learning from General Environments under Nonparametric Mixing

2604.23800 cs.LG 2026-04-26 PDF (arxiv)

Ignavier Ng, Shaoan Xie, Xinshuai Dong, Peter Spirtes, Kun Zhang

Causal representation learning aims to recover the latent causal variables and their causal relations, typically represented by directed acyclic graphs (DAGs), from low-level observations such as image pixels. A prevailing line of research exploits multiple environments, which assume how data distributions change, including single-node interventions, coupled interventions, or hard interventions, o…

Open PDF (arxiv)
VitaminP: cross-modal learning enables whole-cell segmentation from routine histology

2604.23799 cs.CV 2026-04-26 PDF (arxiv)

Yasin Shokrollahi, Karina B. Pinao Gonzales, Elizve N. Barrientos Toro, Paul Acosta, Patient Mosaic Team, Pingjun Chen, Yinyin Yuan, Xiaoxi Pan

Accurate whole-cell and nuclear segmentation is essential for precision pathology and spatial omics, yet routine hematoxylin and eosin (H&E) staining provides limited cytoplasmic contrast, restricting analyses to nuclei. Multiplex immunofluorescence (mIF) facilitates precise whole-cell delineation but remains constrained by cost and accessibility. We introduce VitaminP, a cross-modal learning fram…

Open PDF (arxiv)
ELSA: Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers

2604.23798 cs.LG 2026-04-26 PDF (arxiv)

Chih-Chung Hsu, Xin-Di Ma, Wo-Ting Liao, Chia-Ming Lee

Existing attention accelerators often trade exact softmax semantics, depend on fused Tensor Core kernels, or incur sequential depth that limits FP32 throughput on long sequences. We present \textbf{ELSA}, an algorithmic reformulation of online softmax attention that (i)~preserves exact softmax semantics in real arithmetic with a \emph{provable} $\mathcal{O}(u\log n)$ FP32 relative error bound; (ii…

Open PDF (arxiv)
Optimizing Information Freshness for Wireless Local Area Networks with Multiple APs

2604.23796 cs.NI 2026-04-26 PDF (arxiv)

Ananth Ram Rajagopalan, Jiahui Ni, Vishrant Tripathi

Dense indoor WLANs increasingly rely on multiple access points (APs) operating over partially overlapping spectrum to support latency-sensitive applications. In such deployments, simultaneous transmissions across APs create co-channel and adjacent-channel interference, making scheduling decisions interdependent and directly impacting information freshness. Motivated by emerging software-defined WL…

Open PDF (arxiv)
LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models

2604.23795 cs.CR 2026-04-26 PDF (arxiv)

Kato Mivule

This paper extends the Classification Error Gauge (x-CEG) framework, originally developed for measuring the privacy-utility trade-off in tabular datasets, to privacy auditing of Large Language Models (LLMs). We propose LLM-CEG, a systematic framework that employs membership inference attack (MIA) success rates as an empirical privacy gauge and model perplexity as a utility gauge, iteratively adjus…

Open PDF (arxiv)
SCAT Data Release 1: 1810 optical spectra of 1330 transients

2604.23794 astro-ph.HE 2026-04-26 PDF (arxiv)

Michael A. Tucker, Mark E. Huber, Benjamin J. Shappee, Jason T. Hinkle, Willem B. Hoogendam, Charlotte R. Angus, Chris Ashall, Katie Auchettl, Kenneth C. Chambers, Dhvanil D. Desai, Aaron Do, Joseph Ghammashi, Catherine J. Grier, Joanna Herman, Thomas de Jaeger, Jodie Kiyokawa, Thomas B. Lowe, Eugene A. Magnier, Anna V. Payne, Sara Romagnoli

We present the first data release (DR1) of the Spectroscopic Classification of Astronomical Transients (SCAT) survey, covering the first $\approx 5$ years of observations (March 2018 - January 2023). DR1 includes 1810 spectra of 1330 transients, which we sort into broad spectroscopic classes including supernovae (SNe), transients originating in galactic nuclei, and stellar variability. We collect …

Open PDF (arxiv)