Archon

Browse and search harvested arxiv metadata.

1613351 results (page 4 of 64535)

FineCombo-TTS: Collaborative and Precise Controllable Speech Synthesis Using Text Descriptions and Reference Speech

2606.19209 cs.SD 2026-06-17 PDF (arxiv)

Shuoyi Zhou, Yixuan Zhou, Peiji Yang, Yifan Hu, Yicheng Zhong, Zhisheng Wang, Zhiyong Wu

Controllable text-to-speech (TTS) has become a key research focus. However, methods based on either reference speech or text descriptions lack flexibility and precise control, and recent joint approaches remain loosely coupled, with speech modeling timbre and text controlling global style. We propose FineCombo-TTS, a unified framework for speech synthesis grounded in reference speech and guided by…

Open PDF (arxiv)
ROSA-TFormer: A Radar-Optical Sensor-Aware Temporal Transformer for Pinus sylvestris Plantation Classification in Northern Shaanxi Using GEE-Derived Sentinel-1/2 Time Series

2606.19204 cs.CV 2026-06-17 PDF (arxiv)

Nengbo Zhang, Chang sheng

Accurate identification of Pinus sylvestris var. mongolica plantations is important for monitoring afforestation quality and ecological restoration in northern Shaanxi. This paper proposes ROSA-TFormer, a radar-optical sensor-aware temporal Transformer for P. sylvestris classification using Sentinel-1/2 time-series data generated on Google Earth Engine. The model integrates separate SAR and optica…

Open PDF (arxiv)
Forecasting what Matters: Decision-Focused RL for Controlled EV Charging with Unknown Departure Times

2606.19199 cs.LG 2026-06-17 PDF (arxiv)

Giuseppe Gabriele, Fabio Pavirani, Seyed Soroush Karimi Madahi, Chris Develder

The recent growth of EV adoption poses challenges for power systems, including increased peak demand and potential grid instability. Smart control of EV charging -- e.g., based on reinforcement learning (RL) -- can alleviate these issues by learning temporal and contextual patterns from historical data. Yet, in real-world scenarios, key features, such as departure time, often are unavailable. This…

Open PDF (arxiv)
The More the Merrier: Combining Properties for ABox Abduction under Repair Semantics for ELbot

2606.19197 cs.LO 2026-06-17 PDF (arxiv)

Anselm Haak, Patrick Koopmann, Yasir Mahmood, Anni-Yasmin Turhan

Abduction is a central approach to explain missing entailments from a knowledge base by providing a hypothesis, that would, if added to the knowledge base, make the missing entailment become true. Abduction under repair semantics has recently been investigated in detail, where several desirable properties and optimality criteria were considered, such as signature-restrictions and minimality in siz…

Open PDF (arxiv)
Blind Symmetry Matching in Quantum States with Application to Shot-Count Reduction

2606.19196 quant-ph 2026-06-17 PDF (arxiv)

Mitchell A. Thornton

Measuring a quantum computation in a basis adapted to a symmetry it carries reduces the repeated measurements, commonly referred to as ``shots'', needed to read a statistical answer. Detecting the symmetry a quantum state carries has many uses: certifying a claimed symmetry, identifying a conserved-charge sector, flagging symmetry-breaking as an error signature, and selecting a compression or read…

Open PDF (arxiv)
Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

2606.19195 cs.CV 2026-06-17 PDF (arxiv)

Kangsheng Duan, Ziyang Xu, Wenyu Liu, Xiaohu Ruan, Xiaoxin Chen, Xinggang Wang

While 10B-level industrial foundation models have pushed the boundaries of image inpainting, their prohibitive computational costs severely hinder practical deployment. Constructing a highly optimized task-specific specialist offers a promising solution; however, extreme structural compression inevitably triggers a severe representation bottleneck. To conquer this, we propose Moebius, a highly eff…

Open PDF (arxiv)
Invertible Neural Network Adapter for One-Step Flow Matching in Robot Manipulation

2606.19194 cs.RO 2026-06-17 PDF (arxiv)

Yu Zhang, Kangyi Ji, Yongxiang Zou, Rongtao Xu, Feng Zheng, Long Cheng

This paper presents an invertible neural network adapter for general robotic manipulation, designed to generate precise high-dimensional actions conditioned on multimodal observations, including visual, linguistic, and proprioceptive inputs, through a one-step denoising process. Built upon a flow-matching formulation, the proposed adapter effectively constrains the action generation trajectory wit…

Open PDF (arxiv)
Solitary dwarf galaxy groups as tracers of primordial dark matter halos in the local Universe

2606.19193 astro-ph.GA 2026-06-17 PDF (arxiv)

Z. S. Yuan, Z. L. Wen, J. L. Han

In $Λ$CDM cosmology, galaxies and clusters form within dark matter halos and merge in the hierarchical assembly paradigm to form massive systems. Using the released optical survey data, we searched for groups composed solely of dwarf galaxies, each with a stellar mass $M_*<10^{9.5}~M_{\odot}$. We identified 14 dwarf galaxy groups with at least 5 dwarf galaxies, all located within a projected radiu…

Open PDF (arxiv)
PhantomSkill: Malicious Code Injection in Agent Skill Ecosystems

2606.19191 cs.CR 2026-06-17 PDF (arxiv)

Yu-Ting Lin, Chia-Mu Yu

Agent skills allow LLM-based coding agents to acquire domain-specific capabilities from third-party packages, but they also introduce a new supply-chain attack surface. We present PhantomSkill, an attack framework that hides malicious behavior in a skill's auxiliary resources rather than in its textual description. Its core technique, VulMask, rewrites overt malicious scripts into vulnerability-sh…

Open PDF (arxiv)
FAST-LIVGO: A Degeneracy-Robust LiDAR-Inertial-Visual-GNSS Fusion Odometry

2606.19190 cs.RO 2026-06-17 PDF (arxiv)

Zhiyu Chen, Chunran Zheng, Jiayu Wen, XiaoLei Zhang, Jiaming Xu, Feng Pan, Yukang Cui

Robust state estimation and mapping in long-term, large-scale, and highly dynamic environments remains a key challenge in robotics. Existing LiDAR-Inertial-Visual Odometry (LIVO) systems achieve strong local accuracy but suffer from accumulated drift over long distances and may fail in geometrically degraded or textureless scenes. Meanwhile, GNSS-aided fusion frameworks often rely on LiDAR or visu…

Open PDF (arxiv)
Learning to Annotate Delayed and False AEB Events: A Practical System for Extreme Class Imbalance and Asymmetric Label Noise

2606.19186 cs.RO 2026-06-17 PDF (arxiv)

Mengxiang Hao, Xin Jiang, Xinghao Huang, Wenliang Su, Zhiteng Wang, Junjie Rao, Xiaotian Yang, Wei Liao, Chengyu Han, Gen Liang, Yulun Song, Zhitao Xu, Xianpeng Lang

Autonomous Emergency Braking (AEB) optimization relies on accurately annotated real-world trigger events, particularly rare but critical delayed and false AEB triggers that expose system deficiencies. However, these minority samples comprise less than 5% of thousands of daily triggers, making manual annotation prohibitively expensive at scale. We present the first automated AEB annotation framewor…

Open PDF (arxiv)
AGDN: Learning to Solve Traveling Salesman Problem with Anisotropic Graph Diffusion Network

2606.19185 cs.LG 2026-06-17 PDF (arxiv)

Bolin Shen, Ziwei Huang, Zhiguang Cao, Yushun Dong

The Traveling Salesman Problem (TSP) is a cornerstone of combinatorial optimization and arises in many practical scenarios. Although graph-based learning approaches have been explored for TSP, the question of how to exploit graph structure more effectively remains open. We present the Anisotropic Graph Diffusion Network (AGDN), a new Graph Neural Network framework designed to solve TSP. Our method…

Open PDF (arxiv)
When AUC Misleads: Polarization-Aware Evaluation of Deepfake Detectors under Domain Shift

2606.19184 cs.CV 2026-06-17 PDF (arxiv)

Dat Nguyen, Cosmin Radoi, Romain Hermary, Marcella Astrid, Nesryne Mejri, Enjie Ghorbel, Djamila Aouada

Recent advances in generative AI, such as diffusion models and face-swapping tools, have enabled the creation of highly realistic deepfakes, leading to real-world harms including financial fraud and non-consensual explicit content. In response, deepfake detection has become an active research area, with recent methods increasingly focusing on improving generalization to unseen manipulations. This …

Open PDF (arxiv)
Language Models as Interfaces, Not Oracles: A Hybrid LLM-ML System for Pediatric Appendicitis

2606.19183 cs.CL 2026-06-17 PDF (arxiv)

Soheyl Bateni, Maryam Abdolali

Large language models (LLMs) can make clinical decision support more accessible by interpreting free-text documentation, but their direct use as diagnostic engines is limited by sensitivity to prompts, information order, and plausible but incorrect outputs. Structured machine-learning models offer more stable risk prediction, yet they require tabular inputs that are difficult to integrate with nar…

Open PDF (arxiv)
Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods

2606.19179 cs.LG 2026-06-17 PDF (arxiv)

Depen Morwani, Alexandru Meterez, Pranav Nair, Sham Kakade

Stochastic momentum methods such as heavy ball (HB), Nesterov momentum, and variants of Accelerated SGD (ASGD) [Kidambi et al., 2018] are widely used in modern training, but their stochastic benefits depend on two distinct quantities: serial runtime, the number of iterations needed to reach a target accuracy, and compute efficiency (CE), the inverse total gradient-query or FLOP cost. Larger batche…

Open PDF (arxiv)
Hardware- and Vision-in-the-Loop Validation of Deep Monocular Pose Estimation for Autonomous Maritime UAV Flight

2606.19176 cs.RO 2026-06-17 PDF (arxiv)

Maneesha Wickramasuriya, Beomyeol Yu, Jaden Shin, Mason Huslig, Taeyoung Lee, Murray Snyder

Autonomous UAV operations on ships require reliable vision-based relative pose estimation, yet at-sea validation is costly, weather-dependent, and risky. This paper presents a hardware-validated vision-in-the-loop framework that enables fully autonomous indoor flight while emulating photorealistic maritime environments. Rendered maritime views are processed onboard by a deep transformer-based mono…

Open PDF (arxiv)
A Clinician-Centered Pipeline for Annotation and Evaluation in Ultrasound AI Studies

2606.19174 cs.HC 2026-06-17 PDF (arxiv)

Fangyijie Wang, Jianjun Yu, Wentao Shi, Haixia Huang, Ran Shi, Guénolé Silvestre, Kathleen M. Curran

Clinician-centered evaluation is critical for validating medical AI systems, especially in ultrasound imaging where quantitative metrics do not always capture clinical usability. Existing medical image platforms primarily focus on dataset labeling. They lack integrated support for blinded model comparison and reproducible evaluation workflows. We present a clinician-centered pipeline for remote an…

Open PDF (arxiv)
User as Engram: Internalizing Per-User Memory as Local Parametric Edits

2606.19172 cs.AI 2026-06-17 PDF (arxiv)

Bojie Li

Personal memory in a language model is two problems: content and reasoning skill. The brain keeps the two apart (a sparse, local engram in the hippocampus for each episode, a slow neocortex for the shared skills that interpret it), so a new fact need not overwrite everything else. Most personalization today keeps a user's facts outside the weights, in a natural-language memory file or a retrieval …

Open PDF (arxiv)
Dango: A Strictly L1-Only Large Language Model for Studying Second Language Acquisition

2606.19170 cs.CL 2026-06-17 PDF (arxiv)

Shiho Matta, Yin Jou Huang, Fei Cheng, Takashi Kodama, Hirokazu Kiyomaru, Yugo Murawaki

We introduce Dango, a 1.8B-parameter large language model designed for controlled studies of L1-to-L2 (Japanese-to-English) transfer in second language acquisition (SLA). While previous studies have explored SLA in language models, they have predominantly relied on smaller or non-decoder models, limiting their ability to generate open-ended text and reducing their suitability as practical L2 simul…

Open PDF (arxiv)
RespGeomLib: A Reproducible Parametric Engine for Generating Analysis-Ready Human Airway Lumen Geometry

2606.19169 cs.GR 2026-06-17 PDF (arxiv)

Nichula Wasalathilaka, Parakrama Ekanayake, Roshan Godaliyadda

CT-derived airway models support pulmonary morphometry and airflow simulation, but are often limited by distal scan resolution and the need for substantial cleanup near bifurcations. Procedural alternatives are reproducible, yet many rely on stitched tubular primitives that introduce non-smooth junctions and poorly defined open boundaries. We present RespGeomLib, a reproducible parametric engine f…

Open PDF (arxiv)
Beyond Safe Data: Pretraining-Stage Alignment with Regular Safety Reflection

2606.19168 cs.AI 2026-06-17 PDF (arxiv)

Jinhan Li, Kexian Tang, Yihan Xu, Zhuorui Ye, Kaifeng Lyu

To achieve deeper safety alignment for large language models (LLMs), recent efforts have studied how to push safety interventions earlier into the pretraining stage, primarily by filtering unsafe data or rewriting it into safer forms. We argue that pretraining-stage alignment should go beyond making the data safe: LLMs may compose seemingly benign knowledge and capabilities into unsafe behaviors. …

Open PDF (arxiv)
Essential Subspace Merging for Multi-Task Learning

2606.19164 cs.LG 2026-06-17 PDF (arxiv)

Longhua Li, Lei Qi, Xin Geng, Qi Tian

Model merging aims to enable multi-task learning by integrating the capabilities of multiple models fine-tuned from the same pre-trained checkpoint into a single model. Its core challenge is inter-task interference among task-specific parameter updates. In this paper, we analyze the output shifts induced by task updates and observe that their energy is concentrated in a small number of principal d…

Open PDF (arxiv)
Pulse: Training Acceleration for Large Diffusion Models with Automatic Pipeline Parallelism

2606.19163 cs.DC 2026-06-17 PDF (arxiv)

Boran Sun, Guoyong Jiang, Lin Zhang, Chen Chen, Yuechen Tao, Zhishu Che, Jieling Yu, Shan Chang, Huaxi Gu, Fangming Liu, Bo Li

Diffusion models are now a dominant approach for high-fidelity image and video generation, yet scaling their training across GPU clusters remains challenging. Unlike transformer-only architectures, diffusion backbones commonly adopt UNet-style encoder-decoder structures with heterogeneous layers and long-range skip connections. Under conventional pipeline parallelism, these non-local dependencies …

Open PDF (arxiv)
The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL

2606.19162 cs.LG 2026-06-17 PDF (arxiv)

Nicolas Beltran-Velez, Felix Friedrich, Zhang Xiaofeng, Reyhane Askari-Hemmat, Xiaochuang Han, Adriana Romero-Soriano, Michal Drozdzal

Score- and flow-matching models often rely on preference-based reinforcement learning for two purposes: aligning with subjective preferences and, surprisingly, recovering properties such as visual realism and coherent object structure that matching-based training is intended to learn from the data itself. We argue that this reflects a structural mismatch. Matching losses measure $\ell_2$ regressio…

Open PDF (arxiv)
HT-Bench: Benchmarking and Learning Dexterous Full-Hand Tactile Representations with Egocentric Vision

2606.19161 cs.RO 2026-06-17 PDF (arxiv)

Yuzhe Huang, Jiaping Wu, Jiaming Jiang, Hezhe Lin, Aikebaier Aierken, Yunlong Wang, Kun Cheng, Ziyuan Jiao, Yuanxin Zhong

Establishing a universal benchmark for tactile representation learning in robotic manipulation remains challenging due to the diversity of tactile sensor designs, data formats, and robot embodiments. Rather than seeking to establish such, we explore a scalable and promising direction for future development: egocentric vision paired with full-hand tactile data. To this end, we introduce \textbf{HT-…

Open PDF (arxiv)