Archon

Browse and search harvested arxiv metadata.

1050837 results (page 55 of 42034)

Environmental Sound Deepfake Detection Using Deep-Learning Framework

2604.19652 cs.SD 2026-04-21 PDF (arxiv)

Lam Pham, Khoi Vu, Dat Tran, Phat Lam, Vu Nguyen, David Fischinger, Alexander Schindler, Martin Boyer, Son Le

In this paper, we propose a deep-learning framework for environmental sound deepfake detection (ESDD) -- the task of identifying whether the sound scene and sound event in an input audio recording is fake or not. To this end, we conducted extensive experiments to explore how individual spectrograms, a wide range of network architectures and pre-trained models, ensemble of spectrograms or network a…

Open PDF (arxiv)
A Possible Protocluster of Galaxies Serendipitously Discovered in the Field of an Intermediate-Redshift Post-starburst Galaxy

2604.19651 astro-ph.GA 2026-04-21 PDF (arxiv)

Mary C. Knowlton, Justin S. Spilker, Rachel Bezanson, Vincenzo R. D'Onofrio, Anika Kumar, David J. Setton, Katherine A. Suess

We present the serendipitous discovery of an overdensity of submillimeter galaxies (SMGs) in the field of SDSSJ0909-0108, a massive z~0.7 post-starburst galaxy from the SQuIGGLE survey. ALMA observations at 870um and 2mm reveal six galaxies within a 35'' region with flux ratios consistent with emission from dust. Given the rarity of 870um sources and the small field-of-view of ALMA, we speculate t…

Open PDF (arxiv)
CoCo-SAM3: Harnessing Concept Conflict in Open-Vocabulary Semantic Segmentation

2604.19648 cs.CV 2026-04-21 PDF (arxiv)

Yanhui Chen, Baoyao Yang, Siqi Liu, Jingchao Wang

SAM3 advances open-vocabulary semantic segmentation by introducing a prompt-driven mask generation paradigm. However, in multi-class open-vocabulary scenarios, masks generated independently from different category prompts lack a unified and inter-class comparable evidence scale, often resulting in overlapping coverage and unstable competition. Moreover, synonymous expressions of the same concept t…

Open PDF (arxiv)
Multiscale Assessment of Tritium Behavior in Preliminary Fusion Pilot Plant Design Using Surrogate Models in TMAP8

2604.19647 physics.comp-ph 2026-04-21 PDF (arxiv)

Lin Yang, Pierre-Clément A. Simon, Emre Yildirim, José Trueba, Matthew Robinson, Masashi Shimada

The complexity and significance of multiscale phenomena in fusion energy systems make advanced modeling necessary for designing, optimizing, and safely deploying fusion plants. Tritium accountancy is one of those challenges for deuterium-tritium fusion systems. Its availability is constrained by its short half-life (12.33 years) and limited natural abundance, which require fusion plants to breed t…

Open PDF (arxiv)
The signal is the ceiling: Measurement limits of LLM-predicted experience ratings from open-ended survey text

2604.19645 cs.CL 2026-04-21 PDF (arxiv)

Andrew Hong, Jason Potteiger, Luis E. Zapata

An earlier paper (Hong, Potteiger, and Zapata 2026) established that an unoptimized GPT 4.1 prompt predicts fan-reported experience ratings within one point 67% of the time from open-ended survey text. This paper tests the relative impact of prompt design and model selection on that performance. We compared four configurations on approximately 10,000 post-game surveys from five MLB teams: the orig…

Open PDF (arxiv)
A Gesture-Based Visual Learning Model for Acoustophoretic Interactions using a Swarm of AcoustoBots

2604.19643 cs.RO 2026-04-21 PDF (arxiv)

Alex Lin, Lei Gao, Narsimlu Kemsaram, Sriram Subramanian

AcoustoBots are mobile acoustophoretic robots capable of delivering mid-air haptics, directional audio, and acoustic levitation, but existing implementations rely on scripted commands and lack an intuitive interface for real-time human control. This work presents a gesture-based visual learning framework for contactless human-swarm interaction with a multimodal AcoustoBot platform. The system comb…

Open PDF (arxiv)
Micro Language Models Enable Instant Responses

2604.19642 cs.CL 2026-04-21 PDF (arxiv)

Wen Cheng, Tuochao Chen, Karim Helwani, Sriram Srinivasan, Luke Zettlemoyer, Shyamnath Gollakota

Edge devices such as smartwatches and smart glasses cannot continuously run even the smallest 100M-1B parameter language models due to power and compute constraints, yet cloud inference introduces multi-second latencies that break the illusion of a responsive assistant. We introduce micro language models ($μ$LMs): ultra-compact models (8M-30M parameters) that instantly generate the first 4-8 words…

Open PDF (arxiv)
Regulation Zero 2: A Flow-Centric Sequential Regulation Planning Framework to Counter Regulation Cascading in Pre-tactical Air Traffic Flow Management

2604.19641 math.OC 2026-04-21 PDF (arxiv)

Thinh Hoang, Zhengyi Wang, Leila Zerrouki, Daniel Delahaye

Air Traffic Flow Management (ATFM) traffic regulations are being increasingly used as rising demand meets persistent workforce shortages. This operational strain has amplified a critical phenomenon that we call \emph{regulation cascading}: the compounding, non-linear interactions that occur when multiple regulations influence one another in unpredictable ways. As the number and complexity of regul…

Open PDF (arxiv)
Safety-Critical Contextual Control via Online Riemannian Optimization with World Models

2604.19639 eess.SY 2026-04-21 PDF (arxiv)

Tongxin Li

Modern world models are becoming too complex to admit explicit dynamical descriptions. We study safety-critical contextual control, where a Planner must optimize a task objective using only feasibility samples from a black-box Simulator, conditioned on a context signal $ξ_t$. We develop a sample-based Penalized Predictive Control (PPC) framework grounded in online Riemannian optimization, in which…

Open PDF (arxiv)
SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models

2604.19638 cs.AI 2026-04-21 PDF (arxiv)

Josue Torres-Fonseca, Naihao Deng, Yinpei Dai, Shane Storks, Yichi Zhang, Rada Mihalcea, Casey Kennington, Joyce Chai

Multimodal Large Language Models are increasingly adopted as autonomous agents in interactive environments, yet their ability to proactively address safety hazards remains insufficient. We introduce SafetyALFRED, built upon the embodied agent benchmark ALFRED, augmented with six categories of real-world kitchen hazards. While existing safety evaluations focus on hazard recognition through disembod…

Open PDF (arxiv)
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

2604.19636 cs.CV 2026-04-21 PDF (arxiv)

Xiangyang Luo, Xiaozhe Xin, Tao Feng, Xu Guo, Meiguang Jin, Junfeng Ma

Synthesizing human--object interaction (HOI) videos has broad practical value in e-commerce, digital advertising, and virtual marketing. However, current diffusion models, despite their photorealistic rendering capability, still frequently fail on (i) the structural stability of sensitive regions such as hands and faces and (ii) physically plausible contact (e.g., avoiding hand--object interpenetr…

Open PDF (arxiv)
Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model

2604.19635 cs.SD 2026-04-21 PDF (arxiv)

Shuhai Peng, Hui Lu, Jinjiang Liu, Liyang Chen, Guiping Zhong, Jiakui Li, Huimeng Wang, Haiyun Li, Liang Cao, Shiyin Kang, Zhiyong Wu

While generative models have set new benchmarks for Target Speaker Extraction (TSE), their inherent reliance on global context precludes deployment in real-time applications. Direct adaptation to streaming scenarios often leads to catastrophic inference performance degradation due to the severe mismatch between training and streaming inference. To bridge this gap, we present the first autoregressi…

Open PDF (arxiv)
Time Series Augmented Generation for Financial Applications

2604.19633 cs.AI 2026-04-21 PDF (arxiv)

Anton Kolonin, Alexey Glushchenko, Evgeny Bochkov, Abhishek Saxena

Evaluating the reasoning capabilities of Large Language Models (LLMs) for complex, quantitative financial tasks is a critical and unsolved challenge. Standard benchmarks often fail to isolate an agent's core ability to parse queries and orchestrate computations. To address this, we introduce a novel evaluation methodology and benchmark designed to rigorously measure an LLM agent's reasoning for fi…

Open PDF (arxiv)
CreatiParser: Generative Image Parsing of Raster Graphic Designs into Editable Layers

2604.19632 cs.CV 2026-04-21 PDF (arxiv)

Weidong Chen, Dexiang Hong, Zhendong Mao, Yutao Cheng, Xinyan Liu, Lei Zhang, Yongdong Zhang

Graphic design images consist of multiple editable layers, such as text, background, and decorative elements, while most generative models produce rasterized outputs without explicit layer structures, limiting downstream editing. Existing graphic design parsing methods typically rely on multi-stage pipelines combining layout prediction, matting, and inpainting, which suffer from error accumulation…

Open PDF (arxiv)
MOSA: Motion-Guided Semantic Alignment for Dynamic Scene Graph Generation

2604.19631 cs.CV 2026-04-21 PDF (arxiv)

Xuejiao Wang, Bohao Zhang, Changbo Wang, Gaoqi He

Dynamic Scene Graph Generation (DSGG) aims to structurally model objects and their dynamic interactions in video sequences for high-level semantic understanding. However, existing methods struggle with fine-grained relationship modeling, semantic representation utilization, and the ability to model tail relationships. To address these issues, this paper proposes a motion-guided semantic alignment …

Open PDF (arxiv)
Adding Compilation Metadata To Binaries To Make Disassembly Decidable

2604.19628 cs.CR 2026-04-21 PDF (arxiv)

Daniel Engel, Freek Verbeek, Pranav Kumar, Binoy Ravindran

The binary executable format is the standard method for distributing and executing software. Yet, it is also as opaque a representation of software as can be. If the binary format were augmented with metadata that provides security-relevant information, such as which data is intended by the compiler to be executable instructions, or how memory regions are expected to be bounded, that would dramati…

Open PDF (arxiv)
Odour sensing in turbulent plumes with high-speed electronic nose and non-invasive ground truth

2604.19626 eess.SP 2026-04-21 PDF (arxiv)

Nik Dennler, Elle Stark, Saimon Collaku, Lars Larson, André van Schaik, Michael Schmuker, John Crimaldi, Andreas T. Güntner, Aaron True

Chemical sensing in real-world environments requires resolving rapidly fluctuating and spatially heterogeneous concentration fields. However, these dynamics are strongly distorted by widely used, low-cost metal-oxide (MOx) gas sensors, whose thermal and surface-kinetic response acts as a low-pass filter on the underlying concentration signal. Quantifying and compensating for these effects remains …

Open PDF (arxiv)
GRAFT: Geometric Refinement and Fitting Transformer for Human Scene Reconstruction

2604.19624 cs.CV 2026-04-21 PDF (arxiv)

Pradyumna YM, Yuxuan Xue, Yue Chen, Nikita Kister, István Sárándi, Gerard Pons-Moll

Reconstructing physically plausible 3D human-scene interactions (HSI) from a single image currently presents a trade-off: optimization based methods offer accurate contact but are slow (~20s), while feed-forward approaches are fast yet lack explicit interaction reasoning, producing floating and interpenetration artifacts. Our key insight is that geometry-based human--scene fitting can be amortiz…

Open PDF (arxiv)
SAGE: Training-Free Semantic Evidence Composition for Edge-Cloud Inference under Hard Uplink Budgets

2604.19623 cs.LG 2026-04-21 PDF (arxiv)

Inhyeok Choi, Hyuncheol Park

Edge-cloud hybrid inference offloads difficult inputs to a powerful remote model, but the uplink channel imposes hard per-request constraints on the number of bits that can be transmitted. We show that selecting transmitted content based solely on attention-based importance, the standard approach in collaborative inference, is inherently limited under hard budgets. Two findings support this claim.…

Open PDF (arxiv)
The "Small World of Words" German Free-Association Norms

2604.19620 cs.CL 2026-04-21 PDF (arxiv)

Samuel Aeschbach, Rui Mata, Kaidi Lõo, Simon De Deyne, Dirk U. Wulff

Free-association norms provide essential empirical data for investigating linguistic, semantic, and cultural phenomena in the cognitive sciences. Although large-scale norms exist for languages such as English, Dutch, Spanish, and Mandarin Chinese, no comparable resource has been available for German. To address this gap, we present free-association norms for 5,877 German cue words as part of the G…

Open PDF (arxiv)
Autonomous UAV Pipeline Near-proximity Inspection via Disturbance-Aware Predictive Visual Servoing

2604.19618 cs.RO 2026-04-21 PDF (arxiv)

Wen Li, Hui Wang, Jinya Su, Cunjia Liu, Wen-Hua Chen, Shihua Li

Reliable pipeline inspection is critical to safe energy transportation, but is constrained by long distances, complex terrain, and risks to human inspectors. Unmanned aerial vehicles provide a flexible sensing platform, yet reliable autonomous inspection remains challenging. This paper presents an autonomous quadrotor near-proximity pipeline inspection framework for three-dimensional scenarios bas…

Open PDF (arxiv)
Goal-Oriented Semantic Communication for Logical Decision Making

2604.19614 cs.IT 2026-04-21 PDF (arxiv)

Ahmet Faruk Saz, Faramarz Fekri

This paper develops a principled foundation for goal-oriented semantic communication for logical decision-making. Consider a setting where autonomous agents engage in collaborative perception. In such settings, the volume of sensory data and limited bandwidth often make transmission of raw observations infeasible, requiring intelligent selection of task-relevant information. Because these scenario…

Open PDF (arxiv)
MG-NECOLA: A Field-Level Emulator for $f(R)$ Gravity and Massive Neutrino Cosmologies

2604.19613 astro-ph.CO 2026-04-21 PDF (arxiv)

J. Bayron Orjuela-Quintana, Mauricio Reyes, Elena Giusarma, Marco Baldi, Neerav Kaushal, César A. Valenzuela-Toledo

Accurate modeling of non-linear gravitational dynamics is essential for constraining extensions to the standard cosmological model using large-scale structure observations. While high-resolution $N$-body simulations provide the required fidelity, they are computationally prohibitive for the large ensembles needed to analyze Modified Gravity (MG) scenarios. We present MG-NECOLA, a field-level emula…

Open PDF (arxiv)
Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding

2604.19609 cs.CV 2026-04-21 PDF (arxiv)

Kadir Yilmaz, Adrian Kruse, Tristan Höfer, Daan de Geus, Bastian Leibe

Transformers have become a common foundation across deep learning, yet 3D scene understanding still relies on specialized backbones with strong domain priors. This keeps the field isolated from the broader Transformer ecosystem, limiting the transfer of new advances as well as the benefits of increasingly optimized software and hardware stacks. To bridge this gap, we adapt the vanilla Transformer …

Open PDF (arxiv)
Are X-ray Atmospheres Heated by Turbulent Dissipation? XRISM Constraints

2604.19607 astro-ph.HE 2026-04-21 PDF (arxiv)

B. R. McNamara, A. C. Fabian, H. R. Russell, P. E. J. Nulsen, A. Simionescu, A. Majumder, E. D. Miller, A. Sarkar

We evaluate whether dissipation of turbulence injected into hot cluster atmospheres by jets and bubbles can offset radiative cooling flows. No trends are found between atmospheric velocity dispersion, $σ_v$, and either the ratio of kinetic to thermal energy or jet power over nearly four decades of jet power. Apparently, jets disperse their energy gently at roughly constant energy per gram of gas. …

Open PDF (arxiv)