Archon

Browse and search harvested arxiv metadata.

931299 results (page 31 of 37252)

RAFT-MSF++: Temporal Geometry-Motion Feature Fusion for Self-Supervised Monocular Scene Flow

2604.19349 cs.CV 2026-04-21 PDF (arxiv)

Xunpei Sun, Zuoxun Hou, Yi Chang, Gang Chen, Wei-Shi Zheng

Monocular scene flow estimation aims to recover dense 3D motion from image sequences, yet most existing methods are limited to two-frame inputs, restricting temporal modeling and robustness to occlusions. We propose RAFT-MSF++, a self-supervised multi-frame framework that recurrently fuses temporal features to jointly estimate depth and scene flow. Central to our approach is the Geometry-Motion Fe…

Open PDF (arxiv)
Geometry-Guided Self-Supervision for Ultra-Fine-Grained Recognition with Limited Data

2604.19345 cs.CV 2026-04-21 PDF (arxiv)

Shijie Wang, Yadan Luo, Zijian Wang, Haojie Li, Zi Huang, Mahsa Baktashmotlagh

This paper investigates the intrinsic geometrical features of highly similar objects and introduces a general self-supervised framework called the Geometric Attribute Exploration Network (GAEor), which is designed to address the ultra-fine-grained visual categorization (Ultra-FGVC) task in data-limited scenarios. Unlike prior work that often captures subtle yet critical distinctions, GAEor generat…

Open PDF (arxiv)
If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems

2604.19844 cs.CV 2026-04-21 PDF (arxiv)

Jiamin Chang, Minhui Xue, Ruoxi Sun, Shuchao Pang, Salil S. Kanhere, Hammond Pearce

Recent advances in embodied Vision-Language Agentic Systems (VLAS), powered by large vision-language models (LVLMs), enable AI systems to perceive and reason over real-world scenes. Within this context, environmental signals such as traffic lights are essential in-band signals that can and should influence agent behavior. However, similar signals could also be crafted to operate as misleading visu…

Open PDF (arxiv)
Quadruped Parkour Learning: Sparsely Gated Mixture of Experts with Visual Input

2604.19344 cs.RO 2026-04-21 PDF (arxiv)

Michael Ziegltrum, Jianhao Jiao, Tianhu Peng, Chengxu Zhou, Dimitrios Kanoulas

Robotic parkour provides a compelling benchmark for advancing locomotion over highly challenging terrain, including large discontinuities such as elevated steps. Recent approaches have demonstrated impressive capabilities, including dynamic climbing and jumping, but typically rely on sequential multilayer perceptron (MLP) architectures with densely activated layers. In contrast, sparsely gated mix…

Open PDF (arxiv)
Scalable Memristive-Friendly Reservoir Computing for Time Series Classification

2604.19343 cs.NE 2026-04-21 PDF (arxiv)

Coşku Can Horuz, Andrea Ceni, Claudio Gallicchio, Sebastian Otte

Memristive devices present a promising foundation for next-generation information processing by combining memory and computation within a single physical substrate. This unique characteristic enables efficient, fast, and adaptive computing, particularly well suited for deep learning applications. Among recent developments, the memristive-friendly echo state network (MF-ESN) has emerged as a promis…

Open PDF (arxiv)
Are Large Language Models Economically Viable for Industry Deployment?

2604.19342 cs.CL 2026-04-21 PDF (arxiv)

Abdullah Mohammad, Sushant Kumar Ray, Pushkar Arora, Rafiq Ali, Ebad Shabbir, Gautam Siddharth Kashyap, Jiechao Gao, Usman Naseem

Generative AI-powered by Large Language Models (LLMs)-is increasingly deployed in industry across healthcare decision support, financial analytics, enterprise retrieval, and conversational automation, where reliability, efficiency, and cost control are critical. In such settings, models must satisfy strict constraints on energy, latency, and hardware utilization-not accuracy alone. Yet prevailing …

Open PDF (arxiv)
Evaluation-driven Scaling for Scientific Discovery

2604.19341 cs.LG 2026-04-21 PDF (arxiv)

Haotian Ye, Haowei Lin, Jingyi Tang, Yizhen Luo, Caiyin Yang, Chang Su, Rahul Thapa, Rui Yang, Ruihua Liu, Zeyu Li, Chong Gao, Dachao Ding, Guangrong He, Miaolei Zhang, Lina Sun, Wenyang Wang, Yuchen Zhong, Zhuohao Shen, Di He, Jianzhu Ma, Stefano Ermon, Tongyang Li, Xiaowen Chu, James Zou, Yuzhi Xu

Language models are increasingly used in scientific discovery to generate hypotheses, propose candidate solutions, implement systems, and iteratively refine them. At the core of these trial-and-error loops lies evaluation: the process of obtaining feedback on candidate solutions via verifiers, simulators, or task-specific scoring functions. While prior work has highlighted the importance of evalua…

Open PDF (arxiv)
Improvements to the post-processing of weather forecasts using machine learning and feature selection

2604.19340 physics.ao-ph 2026-04-21 PDF (arxiv)

Kazuma Iwase, Tomoyuki Takenawa

This study aims to develop and improve machine learning-based post-processing models for precipitation, temperature, and wind speed predictions using the Mesoscale Model (MSM) dataset provided by the Japan Meteorological Agency (JMA) for 18 locations across Japan, including plains, mountainous regions, and islands. By incorporating meteorological variables from grid points surrounding the target l…

Open PDF (arxiv)
Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data

2604.19339 cs.CV 2026-04-21 PDF (arxiv)

Shijie Wang, Zijian Wang, Yadan Luo, Haojie Li, Zi Huang, Mahsa Baktashmotlagh

Ultra-fine-grained visual categorization (Ultra-FGVC) aims to classify highly similar subcategories within fine-grained objects using limited training samples. However, holistic yet discriminative cues, such as leaf contours in extremely similar cultivars, remain under-explored in current studies, thereby limiting recognition performance. Though crucial, modeling holistic cues with complex morphol…

Open PDF (arxiv)
Hybrid Beamforming for Subarray-Level Movable Antenna Enhanced MU-MIMO Communications

2604.19338 eess.SP 2026-04-21 PDF (arxiv)

Shanshan Zhang, Songjie Yang, Wenxuan Zhang, Youzhi Xiong, Siya Yao

This study investigates subarray-level movable antenna (MA) architecture for multi-user MIMO (MU-MIMO) systems. Unlike conventional systems with fixed-position antennas (FPAs), the proposed scheme harnesses the additional positional degrees of freedom (DoFs) of movable subarrays to enhance spatial multiplexing capabilities for both multi-user and multi-stream communications. Our objective is to ma…

Open PDF (arxiv)
POLAR-PIC: A Holistic Framework for Matrixized PIC with Co-Designed Compute, Layout, and Communication

2604.19337 cs.DC 2026-04-21 PDF (arxiv)

Yizhuo Rao, Xingjian Cui, Shangzhi Pang, Jiabin Xie, Guangnan Feng, Jinhui Wei, Ziyan Zhang, Languang Gao, Zhenyu Wang, Zhiguang Chen, Yutong Lu

Particle-in-Cell (PIC) simulations are fundamental to plasma physics but often suffer from limited scalability due to particle-grid interaction bottlenecks and particle redistribution costs. Specifically, the particle-grid interaction computations have not taken full advantage of the emerging Matrix Processing Units (MPUs), the particle motion introduces irregular memory accesses, and the bulk-syn…

Open PDF (arxiv)
FedSEA: Achieving Benefit of Parallelization in Federated Online Learning

2604.19336 cs.LG 2026-04-21 PDF (arxiv)

Harekrushna Sahu, Pratik Jawanpuria, Pranay Sharma

Online federated learning (OFL) has emerged as a popular framework for decentralized decision-making over continuous data streams without compromising client privacy. However, the adversary model assumed in standard OFL typically precludes any potential benefits of parallelization. Further, it fails to adequately capture the different sources of statistical variation in OFL problems. In this paper…

Open PDF (arxiv)
When Active Learning Falls Short: An Empirical Study on Chemical Reaction Extraction

2604.19335 cs.LG 2026-04-21 PDF (arxiv)

Simin Yu, Sufia Fathima

The rapid growth of chemical literature has generated vast amounts of unstructured data, where reaction information is particularly valuable for applications such as reaction predictions and drug design. However, the prohibitive cost of expert annotation has led to a scarcity of training data, severely hindering the performance of automatic reaction extraction. In this work, we conduct a systemati…

Open PDF (arxiv)
Silicon Aware Neural Networks

2604.19334 cs.CV 2026-04-21 PDF (arxiv)

Sebastian Fieldhouse, Kea-Tiong Tang

Recent work in the machine learning literature has demonstrated that deep learning can train neural networks made of discrete logic gate functions to perform simple image classification tasks at very high speeds on CPU, GPU and FPGA platforms. By virtue of being formed by discrete logic gates, these Differentiable Logic Gate Networks (DLGNs) lend themselves naturally to implementation in custom si…

Open PDF (arxiv)
Evaluating LLM-Driven Summarisation of Parliamentary Debates with Computational Argumentation

2604.19331 cs.CL 2026-04-21 PDF (arxiv)

Eoghan Cunningham, Derek Greene, James Cross, Antonio Rago

Understanding how policy is debated and justified in parliament is a fundamental aspect of the democratic process. However, the volume and complexity of such debates mean that outside audiences struggle to engage. Meanwhile, Large Language Models (LLMs) have been shown to enable automated summarisation at scale. While summaries of debates can make parliamentary procedures more accessible, evaluati…

Open PDF (arxiv)
Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation

2604.19330 eess.AS 2026-04-21 PDF (arxiv)

Jianbo Ma, Richard Cartwright

Recent advances in Text-To-Speech (TTS) synthesis have seen the popularity of multi-stage approaches that first predict semantic tokens and then generate acoustic tokens. In this paper, we extend the coarse-to-fine generation paradigm to the temporal domain and introduce Chain-of-Details (CoD), a novel framework that explicitly models temporal coarse-to-fine dynamics in speech generation using a c…

Open PDF (arxiv)
PLaMo 2.1-VL Technical Report

2604.19324 cs.CV 2026-04-21 PDF (arxiv)

Tommi Kerola, Yuya Masuda, Takashi Masuko, Toshiki Nakanishi, Daisuke Nishino, Kuniyuki Takahashi, Hanqin Wang, Yoshihiro Yamada

We introduce PLaMo 2.1-VL, a lightweight Vision Language Model (VLM) for autonomous devices, available in 8B and 2B variants and designed for local and edge deployment with Japanese-language operation. Focusing on Visual Question Answering (VQA) and Visual Grounding as its core capabilities, we develop and evaluate the models for two real-world application scenarios: factory task analysis via tool…

Open PDF (arxiv)
Concept Inconsistency in Dermoscopic Concept Bottleneck Models: A Rough-Set Analysis of the Derm7pt Dataset

2604.19323 cs.LG 2026-04-21 PDF (arxiv)

Gonzalo Nápoles, Isel Grau, Yamisleydi Salgueiro

Concept Bottleneck Models (CBMs) route predictions exclusively through a clinically grounded concept layer, binding interpretability to concept-label consistency. When a dataset contains concept-level inconsistencies, identical concept profiles mapped to conflicting diagnosis labels create an unresolvable bottleneck that imposes a hard ceiling on achievable accuracy. In this paper, we apply rough …

Open PDF (arxiv)
RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models

2604.19321 cs.LG 2026-04-21 PDF (arxiv)

Yusuf Çelebi, Yağız Asker, Özay Ezerceli, Mahmoud ElHussieni, Selva Taş, Reyhan Bayraktar, Fatma Betül Terzioğlu

Fine-tuning Large Language Models (LLMs) remains structurally uncertain despite parameter-efficient methods such as Low-Rank Adaptation (LoRA), as the layer-specific roles of internal representations are poorly understood, leading to heuristic decisions about where adaptation should be applied. We model the evolution of hidden states as a high-dimensional geometric trajectory and propose using the…

Open PDF (arxiv)
An Oracle-Free Quantum Algorithm for Nonadiabatic Quantum Molecular Dynamics

2604.19319 quant-ph 2026-04-21 PDF (arxiv)

Joshua Courtney

Quantum computation is an attractive front for many problems that are intractable for computers today. One such problem is nonadiabatic quantum molecular dynamics, where quantized internal states coupling to parameterized modes result in a Hamiltonian resistant to oracle-based models and spectral decomposition. This dissertation applies diabatic Hamiltonian operators directly to the computational …

Open PDF (arxiv)
Multi-view Crowd Tracking Transformer with View-Ground Interactions Under Large Real-World Scenes

2604.19318 cs.CV 2026-04-21 PDF (arxiv)

Qi Zhang, Jixuan Chen, Kaiyi Zhang, Xinquan Yu, Antoni B. Chan, Hui Huang

Multi-view crowd tracking estimates each person's tracking trajectories on the ground of the scene. Recent research works mainly rely on CNNs-based multi-view crowd tracking architectures, and most of them are evaluated and compared on relatively small datasets, such as Wildtrack and MultiviewX. Since these two datasets are collected in small scenes and only contain tens of frames in the evaluatio…

Open PDF (arxiv)
Merger rate of initially clustered primordial black holes for the two-body channel

2604.19316 astro-ph.CO 2026-04-21 PDF (arxiv)

Kentaro Kasai, Masahiro Kawasaki, Kai Murai, Shunsuke Neda

Primordial black holes (PBHs) may form an initially clustered population depending on their production mechanism. Motivated by binary black-hole merger events observed by gravitational-wave interferometers, we revisit the evaluation of the merger rate of PBH binaries and extend the formalism to include the effects of clustering. We show that, in the presence of relatively weak PBH clustering, the …

Open PDF (arxiv)
Improving LLM-Driven Test Generation by Learning from Mocking Information

2604.19315 cs.SE 2026-04-21 PDF (arxiv)

Jamie Lee, Flynn Teh, Hengcheng Zhu, Mengzhen Li, Mattia Fazzini, Valerio Terragni

Large Language Models (LLMs) have recently shown strong potential for automated unit test generation. This has motivated us to investigate whether developer-defined test doubles (commonly referred to as mocks) available in existing test suites can be leveraged to improve LLM-driven test generation. To this end, we propose MOCKMILL, an LLM-based technique and tool that generates test cases by explo…

Open PDF (arxiv)
Framelet-Based Blind Image Restoration with Minimax Concave Regularization

2604.19314 cs.CV 2026-04-21 PDF (arxiv)

Heng Zhang, Reza Parvaz, Rui Yang

Recovering corrupted images is one of the most challenging problems in image processing. Among various restoration tasks, blind image deblurring has been extensively studied due to its practical importance and inherent difficulty. In this problem, both the point spread function (PSF) and the underlying latent sharp image must be estimated simultaneously. This problem cannot be solved directly due …

Open PDF (arxiv)
On the Conditioning Consistency Gap in Conditional Neural Processes

2604.19312 cs.LG 2026-04-21 PDF (arxiv)

Robin Young

Neural processes are meta-learning models that map context sets to predictive distributions. While inspired by stochastic processes, NPs do not generally satisfy the Kolmogorov consistency conditions required to define a valid stochastic process. This inconsistency is widely acknowledged but poorly understood. Practitioners note that NPs work well despite the violation, without quantifying what th…

Open PDF (arxiv)