374767 results (page 1 of 14991)
-
Lyra 2.0: Explorable Generative 3D Worlds
Recent advances in video generation enable a new paradigm for 3D scene creation: generating camera-controlled videos that simulate scene walkthroughs, then lifting them to 3D via feed-forward reconstruction techniques. This generative reconstruction approach combines the visual fidelity and creative capacity of video models with 3D outputs ready for real-time rendering and simulation. Scaling to l…
-
SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis
Large Language Models (LLMs) and Vision-Language Models (VLMs) increasingly generate indoor scenes through intermediate structures such as layouts and scene graphs, yet evaluation still relies on LLM or VLM judges that score rendered views, making judgments sensitive to viewpoint, prompt phrasing, and hallucination. When the evaluator is unstable, it becomes difficult to determine whether a model …
-
Obscured at the Core: Evidence for Nuclear Dust in Reddened Type-1 AGN
Reddened Type-1 quasars offer a unique window into the structure and evolution of active galactic nuclei (AGN), yet their physical origin and the source of their reddening remain uncertain. Optical surveys often miss these dust-obscured objects, resulting in an incomplete view of the quasar population. In this work, we construct a sample of 6,600 Type-1 quasars at redshifts $0.5 \leq z \leq 1.2$ b…
-
Generative Refinement Networks for Visual Synthesis
While diffusion models dominate the field of visual generation, they are computationally inefficient, applying a uniform computational effort regardless of different complexity. In contrast, autoregressive (AR) models are inherently complexity-aware, as evidenced by their variable likelihoods, but are often hindered by lossy discrete tokenization and error accumulation. In this work, we introduce …
-
Visual Preference Optimization with Rubric Rewards
The effectiveness of Direct Preference Optimization (DPO) depends on preference data that reflect the quality differences that matter in multimodal tasks. Existing pipelines often rely on off-policy perturbations or coarse outcome-based signals, which are not well suited to fine-grained visual reasoning. We propose rDPO, a preference optimization framework based on instance-specific rubrics. For e…
-
Conflated Inverse Modeling to Generate Diverse and Temperature-Change Inducing Urban Vegetation Patterns
Urban areas are increasingly vulnerable to thermal extremes driven by rapid urbanization and climate change. Traditionally, thermal extremes have been monitored using Earth-observing satellites and numerical modeling frameworks. For example, land surface temperature derived from Landsat or Sentinel imagery is commonly used to characterize surface heating patterns. These approaches operate as forwa…
-
CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations
The explosive growth of system logs makes streaming compression essential, yet existing log anomaly detection (LAD) methods incur severe pre-processing overhead by requiring full decompression and parsing. We introduce CLAD, the first deep learning framework to perform LAD directly on compressed byte streams. CLAD bypasses these bottlenecks by exploiting a key insight: normal logs compress into re…
-
Classical and Quantum Speedups for Non-Convex Optimization via Energy Conserving Descent
The Energy Conserving Descent (ECD) algorithm was recently proposed (De Luca & Silverstein, 2022) as a global non-convex optimization method. Unlike gradient descent, appropriately configured ECD dynamics escape strict local minima and converge to a global minimum, making it appealing for machine learning optimization. We present the first analytical study of ECD, focusing on the one-dimensional…
-
Representation geometry shapes task performance in vision-language modeling for CT enterography
Computed tomography (CT) enterography is a primary imaging modality for assessing inflammatory bowel disease (IBD), yet the representational choices that best support automated analysis of this modality are unknown. We present the first study of vision-language transfer learning on abdominal CT enterography and identify two main findings. First, mean pooling of slice embeddings gives better catego…
-
See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual Feedback
Computer Use Agents (CUAs) fundamentally rely on graphical user interface (GUI) grounding to translate language instructions into executable screen actions, but editing-level grounding in dense coding interfaces, where sub-pixel accuracy is required to interact with dense IDE elements, remains underexplored. Existing approaches typically rely on single-shot coordinate prediction, which lacks a mec…
-
Toward Autonomous Long-Horizon Engineering for ML Research
Autonomous AI research has advanced rapidly, but long-horizon ML research engineering remains difficult: agents must sustain coherent progress across task comprehension, environment setup, implementation, experimentation, and debugging over hours or days. We introduce AiScientist, a system for autonomous long-horizon engineering for ML research built on a simple principle: strong long-horizon perf…
-
PAL: Personal Adaptive Learner
AI-driven education platforms have made some progress in personalisation, yet most remain constrained to static adaptation--predefined quizzes, uniform pacing, or generic feedback--limiting their ability to respond to learners' evolving understanding. This shortfall highlights the need for systems that are both context-aware and adaptive in real time. We introduce PAL (Personal Adaptive Learner), …
-
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understood. This paper provides a systematic investigation of OPD dynamics and mechanisms. We first identify that two conditions govern whether OPD succeeds or fails: (i) the student and teacher should share compatible thinking patterns; and (ii) even with…
-
Learning Versatile Humanoid Manipulation with Touch Dreaming
Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, dexterous hands, and contact-aware perception under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first develop an RL-based whole-body controller that provides stable lower-body and tors…
-
Bilevel Late Acceptance Hill Climbing for the Electric Capacitated Vehicle Routing Problem
This paper tackles the Electric Capacitated Vehicle Routing Problem (E-CVRP) through a bilevel optimization framework that handles routing and charging decisions separately or jointly depending on the search stage. By analyzing their interaction, we introduce a surrogate objective at the upper level to guide the search and accelerate convergence. A bilevel Late Acceptance Hill Climbing algorithm (…
-
Probing Scalar-Tensor-Induced Gravitational Waves in the nHz Band: $\texttt{NANOGrav}$ and SKA
Scalar-induced gravitational waves (SIGWs) have recently attracted considerable interest, both as a possible explanation for the nanohertz signal reported by the Pulsar Timing Array (PTA) collaboration and for their connection with primordial black hole (PBH) physics. In addition to SIGWs, scalar-tensor-induced gravitational waves (STGWs) have emerged as a promising cosmological source of the stoc…
-
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, standard OPD requires a live teacher inference server throughout training, resulting in substantial infrastructure overhead. In this work, we investigate whether on-policy distillation can be performed offline. A natural approach is to precompute teacher log-probabilities once over S…
-
Nonparametric efficient inference for network quantile causal effects under partial interference
Interference arises when the treatment assigned to one individual affects the outcomes of other individuals. Commonly, individuals are naturally grouped into clusters, and interference occurs only among individuals within the same cluster, a setting referred to as partial interference. We study network causal effects on outcome quantiles in the presence of partial interference. We develop a genera…
-
One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness
Instruction-tuned large language models produce helpful, structured responses, but how robust is this helpfulness when trivially constrained? We show that simple lexical constraints (banning a single punctuation character or common word) cause instruction-tuned LLMs to collapse their responses, losing 14--48% of comprehensiveness in pairwise evaluation across three open-weight model families and o…
-
XRZero-G0: Pushing the Frontier of Dexterous Robotic Manipulation with Interfaces, Quality and Ratios
The acquisition of high-quality, action-aligned demonstration data remains a fundamental bottleneck in scaling foundation models for dexterous robot manipulation. Although robot-free human demonstrations (e.g., the UMI paradigm) offer a scalable alternative to traditional teleoperation, current systems are constrained by sub-optimal hardware ergonomics, open-loop workflows, and a lack of systemati…
-
Cosmologically viable non-polynomial quasi-topological gravity: explicit models, $Λ$CDM limit and observational constraints
We investigate the cosmological implications of non-polynomial quasi-topological gravity (NPQTG), a novel class of modified gravitational theories in which the background dynamics is encoded in a single function of the Hubble parameter. This framework provides a minimal and theoretically consistent extension of general relativity, incorporating higher-curvature effects while preserving second-orde…
-
How I Wonder What You Are -- JWST's Little Red Dots do not TWINKLE
Little Red Dots (LRDs) are a population of compact, red sources that have emerged as one of the most puzzling findings of JWST. Variability provides a direct probe of their central engines. Here we present the first joint spectroscopic and photometric time-domain study of LRDs undertaken with the JWST TWINKLE slitless spectroscopy program. Surveying the FRESCO GOODS-North legacy field, TWINKLE mon…
-
Agentic Discovery with Active Hypothesis Exploration for Visual Recognition
We introduce HypoExplore, an agentic framework that formulates neural architecture discovery for visual recognition as a hypothesis-driven scientific inquiry. Given a human-specified high-level research direction, HypoExplore ideates, implements, evaluates, and improves neural architectures through evolutionary branching. New hypotheses are created using a large language model by selecting a paren…
-
Personalizing LLM-Based Conversational Programming Assistants
Large Language Models (LLMs) have shown much promise in powering a variety of software engineering (SE) tools. Offering natural language as an intuitive interaction mechanism, LLMs have recently been employed as conversational ``programming assistants'' capable of supporting several SE activities simultaneously. As with any SE tool, it is crucial that these assistants effectively meet developers' …
-
PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models
Large Language Models (LLMs) are increasingly integrated into real-world decision-making, including in the domain of public policy. Yet, their ability to comprehend and reason about policy-related content remains underexplored. To fill this gap, we present \textbf{\textit{PolicyBench}}, the first large-scale cross-system benchmark (US-China) evaluating policy comprehension, comprising 21K cases ac…