1050837 results (page 55 of 42034)
-
Environmental Sound Deepfake Detection Using Deep-Learning Framework
In this paper, we propose a deep-learning framework for environmental sound deepfake detection (ESDD) -- the task of identifying whether the sound scene and sound event in an input audio recording is fake or not. To this end, we conducted extensive experiments to explore how individual spectrograms, a wide range of network architectures and pre-trained models, ensemble of spectrograms or network a…
-
A Possible Protocluster of Galaxies Serendipitously Discovered in the Field of an Intermediate-Redshift Post-starburst Galaxy
We present the serendipitous discovery of an overdensity of submillimeter galaxies (SMGs) in the field of SDSSJ0909-0108, a massive z~0.7 post-starburst galaxy from the SQuIGGLE survey. ALMA observations at 870um and 2mm reveal six galaxies within a 35'' region with flux ratios consistent with emission from dust. Given the rarity of 870um sources and the small field-of-view of ALMA, we speculate t…
-
CoCo-SAM3: Harnessing Concept Conflict in Open-Vocabulary Semantic Segmentation
SAM3 advances open-vocabulary semantic segmentation by introducing a prompt-driven mask generation paradigm. However, in multi-class open-vocabulary scenarios, masks generated independently from different category prompts lack a unified and inter-class comparable evidence scale, often resulting in overlapping coverage and unstable competition. Moreover, synonymous expressions of the same concept t…
-
Multiscale Assessment of Tritium Behavior in Preliminary Fusion Pilot Plant Design Using Surrogate Models in TMAP8
The complexity and significance of multiscale phenomena in fusion energy systems make advanced modeling necessary for designing, optimizing, and safely deploying fusion plants. Tritium accountancy is one of those challenges for deuterium-tritium fusion systems. Its availability is constrained by its short half-life (12.33 years) and limited natural abundance, which require fusion plants to breed t…
-
The signal is the ceiling: Measurement limits of LLM-predicted experience ratings from open-ended survey text
An earlier paper (Hong, Potteiger, and Zapata 2026) established that an unoptimized GPT 4.1 prompt predicts fan-reported experience ratings within one point 67% of the time from open-ended survey text. This paper tests the relative impact of prompt design and model selection on that performance. We compared four configurations on approximately 10,000 post-game surveys from five MLB teams: the orig…
-
A Gesture-Based Visual Learning Model for Acoustophoretic Interactions using a Swarm of AcoustoBots
AcoustoBots are mobile acoustophoretic robots capable of delivering mid-air haptics, directional audio, and acoustic levitation, but existing implementations rely on scripted commands and lack an intuitive interface for real-time human control. This work presents a gesture-based visual learning framework for contactless human-swarm interaction with a multimodal AcoustoBot platform. The system comb…
-
Micro Language Models Enable Instant Responses
Edge devices such as smartwatches and smart glasses cannot continuously run even the smallest 100M-1B parameter language models due to power and compute constraints, yet cloud inference introduces multi-second latencies that break the illusion of a responsive assistant. We introduce micro language models ($μ$LMs): ultra-compact models (8M-30M parameters) that instantly generate the first 4-8 words…
-
Regulation Zero 2: A Flow-Centric Sequential Regulation Planning Framework to Counter Regulation Cascading in Pre-tactical Air Traffic Flow Management
Air Traffic Flow Management (ATFM) traffic regulations are being increasingly used as rising demand meets persistent workforce shortages. This operational strain has amplified a critical phenomenon that we call \emph{regulation cascading}: the compounding, non-linear interactions that occur when multiple regulations influence one another in unpredictable ways. As the number and complexity of regul…
-
Safety-Critical Contextual Control via Online Riemannian Optimization with World Models
Modern world models are becoming too complex to admit explicit dynamical descriptions. We study safety-critical contextual control, where a Planner must optimize a task objective using only feasibility samples from a black-box Simulator, conditioned on a context signal $ξ_t$. We develop a sample-based Penalized Predictive Control (PPC) framework grounded in online Riemannian optimization, in which…
-
SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models
Multimodal Large Language Models are increasingly adopted as autonomous agents in interactive environments, yet their ability to proactively address safety hazards remains insufficient. We introduce SafetyALFRED, built upon the embodied agent benchmark ALFRED, augmented with six categories of real-world kitchen hazards. While existing safety evaluations focus on hazard recognition through disembod…
-
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
Synthesizing human--object interaction (HOI) videos has broad practical value in e-commerce, digital advertising, and virtual marketing. However, current diffusion models, despite their photorealistic rendering capability, still frequently fail on (i) the structural stability of sensitive regions such as hands and faces and (ii) physically plausible contact (e.g., avoiding hand--object interpenetr…
-
Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model
While generative models have set new benchmarks for Target Speaker Extraction (TSE), their inherent reliance on global context precludes deployment in real-time applications. Direct adaptation to streaming scenarios often leads to catastrophic inference performance degradation due to the severe mismatch between training and streaming inference. To bridge this gap, we present the first autoregressi…
-
Time Series Augmented Generation for Financial Applications
Evaluating the reasoning capabilities of Large Language Models (LLMs) for complex, quantitative financial tasks is a critical and unsolved challenge. Standard benchmarks often fail to isolate an agent's core ability to parse queries and orchestrate computations. To address this, we introduce a novel evaluation methodology and benchmark designed to rigorously measure an LLM agent's reasoning for fi…
-
CreatiParser: Generative Image Parsing of Raster Graphic Designs into Editable Layers
Graphic design images consist of multiple editable layers, such as text, background, and decorative elements, while most generative models produce rasterized outputs without explicit layer structures, limiting downstream editing. Existing graphic design parsing methods typically rely on multi-stage pipelines combining layout prediction, matting, and inpainting, which suffer from error accumulation…
-
MOSA: Motion-Guided Semantic Alignment for Dynamic Scene Graph Generation
Dynamic Scene Graph Generation (DSGG) aims to structurally model objects and their dynamic interactions in video sequences for high-level semantic understanding. However, existing methods struggle with fine-grained relationship modeling, semantic representation utilization, and the ability to model tail relationships. To address these issues, this paper proposes a motion-guided semantic alignment …
-
Adding Compilation Metadata To Binaries To Make Disassembly Decidable
The binary executable format is the standard method for distributing and executing software. Yet, it is also as opaque a representation of software as can be. If the binary format were augmented with metadata that provides security-relevant information, such as which data is intended by the compiler to be executable instructions, or how memory regions are expected to be bounded, that would dramati…
-
Odour sensing in turbulent plumes with high-speed electronic nose and non-invasive ground truth
Chemical sensing in real-world environments requires resolving rapidly fluctuating and spatially heterogeneous concentration fields. However, these dynamics are strongly distorted by widely used, low-cost metal-oxide (MOx) gas sensors, whose thermal and surface-kinetic response acts as a low-pass filter on the underlying concentration signal. Quantifying and compensating for these effects remains …
-
GRAFT: Geometric Refinement and Fitting Transformer for Human Scene Reconstruction
Reconstructing physically plausible 3D human-scene interactions (HSI) from a single image currently presents a trade-off: optimization based methods offer accurate contact but are slow (~20s), while feed-forward approaches are fast yet lack explicit interaction reasoning, producing floating and interpenetration artifacts. Our key insight is that geometry-based human--scene fitting can be amortiz…
-
SAGE: Training-Free Semantic Evidence Composition for Edge-Cloud Inference under Hard Uplink Budgets
Edge-cloud hybrid inference offloads difficult inputs to a powerful remote model, but the uplink channel imposes hard per-request constraints on the number of bits that can be transmitted. We show that selecting transmitted content based solely on attention-based importance, the standard approach in collaborative inference, is inherently limited under hard budgets. Two findings support this claim.…
-
The "Small World of Words" German Free-Association Norms
Free-association norms provide essential empirical data for investigating linguistic, semantic, and cultural phenomena in the cognitive sciences. Although large-scale norms exist for languages such as English, Dutch, Spanish, and Mandarin Chinese, no comparable resource has been available for German. To address this gap, we present free-association norms for 5,877 German cue words as part of the G…
-
Autonomous UAV Pipeline Near-proximity Inspection via Disturbance-Aware Predictive Visual Servoing
Reliable pipeline inspection is critical to safe energy transportation, but is constrained by long distances, complex terrain, and risks to human inspectors. Unmanned aerial vehicles provide a flexible sensing platform, yet reliable autonomous inspection remains challenging. This paper presents an autonomous quadrotor near-proximity pipeline inspection framework for three-dimensional scenarios bas…
-
Goal-Oriented Semantic Communication for Logical Decision Making
This paper develops a principled foundation for goal-oriented semantic communication for logical decision-making. Consider a setting where autonomous agents engage in collaborative perception. In such settings, the volume of sensory data and limited bandwidth often make transmission of raw observations infeasible, requiring intelligent selection of task-relevant information. Because these scenario…
-
MG-NECOLA: A Field-Level Emulator for $f(R)$ Gravity and Massive Neutrino Cosmologies
Accurate modeling of non-linear gravitational dynamics is essential for constraining extensions to the standard cosmological model using large-scale structure observations. While high-resolution $N$-body simulations provide the required fidelity, they are computationally prohibitive for the large ensembles needed to analyze Modified Gravity (MG) scenarios. We present MG-NECOLA, a field-level emula…
-
Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding
Transformers have become a common foundation across deep learning, yet 3D scene understanding still relies on specialized backbones with strong domain priors. This keeps the field isolated from the broader Transformer ecosystem, limiting the transfer of new advances as well as the benefits of increasingly optimized software and hardware stacks. To bridge this gap, we adapt the vanilla Transformer …
-
Are X-ray Atmospheres Heated by Turbulent Dissipation? XRISM Constraints
We evaluate whether dissipation of turbulence injected into hot cluster atmospheres by jets and bubbles can offset radiative cooling flows. No trends are found between atmospheric velocity dispersion, $σ_v$, and either the ratio of kinetic to thermal energy or jet power over nearly four decades of jet power. Apparently, jets disperse their energy gently at roughly constant energy per gram of gas. …