943429 results (page 34 of 37738)
-
Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation
Reinforcement learning, particularly Group Relative Policy Optimization (GRPO), has emerged as an effective framework for post-training visual generative models with human preference signals. However, its effectiveness is fundamentally limited by coarse reward credit assignment. In modern visual generation, multiple reward models are often used to capture heterogeneous objectives, such as visual q…
-
Adaptive Slicing-Assisted Hyper Inference for Enhanced Small Object Detection in High-Resolution Imagery
Deep learning-based object detectors have achieved remarkable success across numerous computer vision applications, yet they continue to struggle with small object detection in high-resolution aerial and satellite imagery, where dense object distributions, variable shooting angles, diminutive target sizes, and substantial inter-class variability pose formidable challenges. Existing slicing strateg…
-
Reliable Remote Inference from Unreliable Components: Joint Communication and Computation Limits
Classical information theory typically assumes reliable receiver-side processing. We study remote inference when communication is noisy and the receiver itself is built from unreliable components under a finite redundancy budget. Under a committed/no-bypass receiver closure, task-relevant information can affect the final estimate only by passing through a budgeted collection of vulnerable primitiv…
-
Preconditioners for the Onsager-Stefan-Maxwell equations for multicomponent diffusion
The Onsager-Stefan-Maxwell (OSM) equations are an important model of mass transport in multicomponent flows with multiple chemical species. They describe the coupling of diffusive fluxes between species, accounting for their interactions through frictional and thermodynamic driving forces. In this work we propose an augmented Lagrangian preconditioner and prove its discretization-robustness for a …
-
Exact Quadratic Penalty Function for Symplectic Eigenvalue Problem
The symplectic eigenvalue problem for symmetric positive-definite (spd) matrices plays a crucial role in various scientific fields, including quantum mechanics and control theory. This paper introduces a trace-penalty minimization method, which transforms the symplectic eigenvalue problem into the unconstrained minimization of the trace-penalty function. We prove the equivalence between the penalt…
-
iCoRe: An Iterative Correlation-Aware Retriever for Bug Reproduction Test Generation
Automatically generating bug reproduction tests (BRT) from issue descriptions is crucial for software maintenance. LLM-based approaches have shown great potential for this task. Their effectiveness heavily relies on retrieving high-quality context from the codebase. The retrieval phase of existing approaches relies on either traditional methods like BM25 or LLM-driven strategies. LLM-based retriev…
-
UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction
Full-duplex speech interaction, as the most natural and intuitive mode of human communication, is driving artificial intelligence toward more human-like conversational systems. Traditional cascaded speech processing pipelines suffer from critical limitations, including accumulated latency, information loss, and error propagation across modules. To address these issues, recent efforts focus on the …
-
Thinking Before Matching: A Reinforcement Reasoning Paradigm Towards General Person Re-Identification
Learning identity-discriminative representations with multi-scene generality has become a critical objective in person re-identification (ReID). However, mainstream perception-driven paradigms tend to identify fitting from massive annotated data rather than identity-causal cues understanding, which presents a fragile representation against multiple disruptions. In this work, ReID-R is proposed as …
-
Sherpa.ai Privacy-Preserving Multi-Party Entity Alignment without Intersection Disclosure for Noisy Identifiers
Federated Learning (FL) enables collaborative model training among multiple parties without centralizing raw data. There are two main paradigms in FL: Horizontal FL (HFL), where all participants share the same feature space but hold different samples, and Vertical FL (VFL), where parties possess complementary features for the same set of samples. A prerequisite for VFL training is privacy-preservi…
-
Attention-based Multi-modal Deep Learning Model of Spatio-temporal Crop Yield Prediction with Satellite, Soil and Climate Data
Crop yield prediction is one of the most important challenge, which is crucial to world food security and policy-making decisions. The conventional forecasting techniques are limited in their accuracy with reference to the fact that they utilize static data sources that do not reflect the dynamic and intricate relationships that exist between the variables of the environment over time [5,13]. This…
-
An Object-Centered Data Acquisition Method for 3D Gaussian Splatting using Mobile Phones
Data acquisition through mobile phones remains a challenge for 3D Gaussian Splatting (3DGS). In this work we target the object-centered scenario and enable reliable mobile acquisition by providing on-device capture guidance and recording onboard sensor signals for offline reconstruction. After the calibration step, the device orientations are aligned to a baseline frame to obtain relative poses, a…
-
Conceptual Design and Analysis of a NanoTug Swarm for Active Debris Removal
This paper investigates a swarm-based concept in which a number of nanosatellites, referred to as NanoTugs, are deployed by a mother spacecraft to capture and cooperatively stabilize and de-orbit space debris. The study focuses on the stabilization and de-orbiting phases of the mission, where each NanoTug is equipped with thrusters to perform the de-orbiting maneuver. An analytical method is devel…
-
The Logical Expressiveness of Topological Neural Networks
Graph neural networks (GNNs) are the standard for learning on graphs, yet they have limited expressive power, often expressed in terms of the Weisfeiler-Leman (WL) hierarchy or within the framework of first-order logic. In this context, topological neural networks (TNNs) have recently emerged as a promising alternative for graph representation learning. By incorporating higher-order relational str…
-
ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation
Current AI agent frameworks have made remarkable progress in automating individual tasks, yet all existing systems serve a single user. Human productivity rests on the social and organizational relationships through which people coordinate, negotiate, and delegate. When agents move beyond performing tasks for one person to representing that person in collaboration with others, the infrastructure f…
-
VLTI-GRAVITY observations of blazars
Parsec-scale jets of blazars have so far been spatially resolved only in mm- and submm wavelengths, where very long baseline interferometry can be used to obtain milliarcsecond-scale images of the jets. We have attempted to spatially resolve the near-infrared emission in jet-dominated blazars for the first time. We used the VLTI-GRAVITY instrument to obtain milliarcsecond-scale near-infrared inter…
-
Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations
Autonomous agents operating in open-world tasks -- where the completion boundary is not given in advance -- face denominator blindness: they systematically underestimate the scope of the target space. Forage V1 addressed this through co-evolving evaluation (an independent Evaluator discovers what "complete" means) and method isolation (Evaluator and Planner cannot see each other's code). V2 extend…
-
Audio Spoof Detection with GaborNet
An direction of development in the extraction of features from audio signals is based on processing raw samples in the time domain. Such an approach appears to be effective, especially in the era of neural networks. An example is SincNet. In this solution, the core of the neural network layer is a set of sinc functions that are convolved with the input signal. Due to the finite length of sinc func…
-
When Can We Trust Deep Neural Networks? Towards Reliable Industrial Deployment with an Interpretability Guide
The deployment of AI systems in safety-critical domains, such as industrial defect inspection, autonomous driving, and medical diagnosis, is severely hampered by their lack of reliability. A single undetected erroneous prediction can lead to catastrophic outcomes. Unfortunately, there is often no alternative but to place trust in the outputs of a trained AI system, which operates without an intern…
-
Demonstrating Online Schema Alignment in Decentralized Knowledge Graphs Querying
Decentralized Knowledge Graphs querying enables integrating distributed data without centralization, but is highly sensitive to vocabulary heterogeneity. Query issuers cannot realistically anticipate all vocabulary mismatches, especially when alignment rules are local, scoped, or discovered at runtime. We present an online schema alignment approach for Link Traversal Query Processing (LTQP) that d…
-
Auditing LLMs for Algorithmic Fairness in Casenote-Augmented Tabular Prediction
LLMs are increasingly being considered for prediction tasks in high-stakes social service settings, but their algorithmic fairness properties in this context are poorly understood. In this short technical report, we audit the algorithmic fairness of LLM-based tabular classification on a real housing placement prediction task, augmented with street outreach casenotes from a nonprofit partner. We au…
-
SketchFaceGS: Real-Time Sketch-Driven Face Editing and Generation with Gaussian Splatting
3D Gaussian representations have emerged as a powerful paradigm for digital head modeling, achieving photorealistic quality with real-time rendering. However, intuitive and interactive creation or editing of 3D Gaussian head models remains challenging. Although 2D sketches provide an ideal interaction modality for fast, intuitive conceptual design, they are sparse, depth-ambiguous, and lack high-f…
-
Cascaded Code Editing: Large-Small Model Collaboration for Effective and Efficient Code Editing
Code editing constitutes a fundamental practice in software development, wherein developers modify existing codebases according to natural language requirements. Accurate code editing necessitates a comprehensive understanding of both the existing codebase and the modification requirements. Although large language models (LLMs) have demonstrated promising performance in code editing tasks, they su…
-
Cosmological constraints on TeV-scale dark matter subcomponents decaying between recombination and reionisation
The Dark Ages and the Cosmic Dawn are an untapped well of information about the particle physics properties of dark matter, which may become accessible with future radio telescopes able to probe the 21-cm signal from atomic hydrogen. In this work we study the impact on cosmological observables of a dark matter subcomponent composed of TeV-scale particles that decay into electrons, photons or neutr…
-
Benchmarking Vision Foundation Models for Domain-Generalizable Face Anti-Spoofing
Face Anti-Spoofing (FAS) remains challenging due to the requirement for robust domain generalization across unseen environments. While recent trends leverage Vision-Language Models (VLMs) for semantic supervision, these multimodal approaches often demand prohibitive computational resources and exhibit high inference latency. Furthermore, their efficacy is inherently limited by the quality of the u…
-
How Far Are Video Models from True Multimodal Reasoning?
Despite remarkable progress toward general-purpose video models, a critical question remains unanswered: how far are these models from achieving true multimodal reasoning? Existing benchmarks fail to address this question rigorously, as they remain constrained by straightforward task designs and fragmented evaluation metrics that neglect complex multimodal reasoning. To bridge this gap, we introdu…