1273993 results (page 106 of 50960)
-
Resource-Lean Lexicon Induction for German Dialects
Automatic induction of high-quality dictionaries is essential for building lexical resources, yet low-resource languages and dialects pose several challenges: limited access to annotators, high degree of spelling variations, and poor performance of large language models (LLMs). We empirically show that statistical models (random forests) trained on string similarity features are surprisingly effec…
-
A GLIMPSE of the 99%: a census of the faintest galaxies during the epoch reionization and its implications for galaxy formation models
We present a comprehensive study of the galaxy UV luminosity function (UVLF) at $z=6-9$ leveraging deep JWST observations from the GLIMPSE survey. Thanks to gravitational lensing, we probe the UVLF to an unprecedented depth of $M_{\text{UV}} = -12$ mag, approximately three magnitudes deeper than previous robust constraints. Our UVLF determination incorporates a rigorous end-to-end uncertainty fram…
-
KISS Sorcar: A Stupidly-Simple General-Purpose and Software Engineering AI Assistant
Large language models can generate code and call tools with remarkable fluency, yet deploying them as practical software engineering assistants still expose stubborn gaps: finite context windows, single mistakes that derail entire sessions, agents that get stuck in dead ends, AI slop, and generated changes that are difficult to review or revert. We present KISS Sorcar, a general-purpose assistan…
-
Accelerating Quantum Materials Characterization: Hybrid Active Learning for Autonomous Spin Wave Spectroscopy
Autonomous neutron spectroscopy must solve three distinct tasks: detection (where is the signal?), inference (which Hamiltonian governs it?), and refinement (what are the parameters?). No single controller solves all three equally well. We present TAS-AI, a hybrid agnostic-to-physics-informed framework for autonomous triple-axis spin-wave spectroscopy that separates these tasks explicitly. In blin…
-
On the Generalization Properties of Selective State-Space Models for Filtering Tasks for Unknown Systems
Selective State-Space Models (SSMs) such as Mamba have emerged as an alternative architecture to self-attention based transformers in sequence modeling tasks. Recent works have demonstrated the use of transformers in some filtering and output prediction tasks via in-context learning. In this paper, we analyze whether structured SSMs can work equally well for filtering of unknown systems. In partic…
-
FUTURAL: A Metasearch Platform for Empowering Rural Areas with Smart Solutions
The FUTURAL project aims to provide a comprehensive suite of digital Smart Solutions (SS) across five critical domains to address pressing social and environmental issues. Central to this initiative is a robust Metasearch platform, which will not only serve as the primary access point to FUTURAL's solutions but also facilitate the search and retrieval of SS developed by other initiatives. This pap…
-
Query2Diagram: Answering Developer Queries with UML Diagrams
Software documentation frequently becomes outdated or fails to exist entirely, yet developers need focused views of their codebase to understand complex systems. While automated reverse engineering tools can generate UML diagrams from code, they produce overwhelming detail without considering developer intent. We introduce query-driven UML diagram generation, where LLMs create diagrams that direct…
-
DRACULA: Hunting for the Actions Users Want Deep Research Agents to Execute
Scientific Deep Research (DR) agents answer user queries by synthesizing research papers into multi-section reports. User feedback can improve their utility, but existing protocols only score the final report, making it hard to study and learn which intermediate actions DR agents should take to improve reports. We collect DRACULA, the first dataset with user feedback on intermediate actions for DR…
-
Mapping License Plate Recoverability Under Extreme Viewing Angles for Oppor-tunistic Urban Sensing
Urban environments contain many imaging sensors built for specific purposes, including ATM, body-worn, CCTV, and dashboard cameras. Under the opportunistic sensing paradigm, these sensors can be repurposed for secondary inference tasks such as license plate recognition. Yet objects of interest in such imagery are often noisy, low-resolution, and captured from extreme viewpoints. Recent advances in…
-
ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction
Multimodal Large Language Models (MLLMs) have achieved remarkable performance in Visually Rich Document Understanding (VRDU) tasks, but their capabilities are mainly evaluated on pristine, well-structured document images. We consider content restoration from shredded fragments, a challenging VRDU setting that requires integrating visual pattern recognition with semantic reasoning under significant…
-
SeqShield: A Behavioral Analysis Approach to Uncover Rootkits
Rootkits are among the most elusive types of malware, capable of bypassing traditional static analysis methods due to their metamorphic behavior. Signature-based detection techniques struggle against these threats, necessitating a shift toward dynamic analysis approaches. We propose SeqShield, a behavior-based rootkit detection approach designed specifically for the Windows OS, leveraging API call…
-
Similar Users-Augmented Interest Network
Click-through rate (CTR) prediction is one of the core tasks in recommender systems. User behavior sequences, as one of the most effective features, can accurately reflect user preferences and significantly improve prediction accuracy. Richer behavior sequences often enable more comprehensive user profiling, and recent studies have shown that scaling the length of user behavior sequence can yield …
-
LegalDrill: Diagnosis-Driven Synthesis for Legal Reasoning in Small Language Models
Small language models (SLMs) are promising for real-world deployment due to their efficiency and low operational cost. However, their limited capacity struggles with high-stakes legal reasoning tasks that require coherent statute interpretation and logically consistent deduction. Furthermore, training SLMs for such tasks demands high-quality, concise reasoning trajectories, which are prohibitively…
-
Compile-Time Tensor Shape Checking via Staged Shape-Dependent Types
When writing programs involving matrices or tensors in general, it is desirable to rule out the inconsistency of tensor shapes (i.e., the generalization of matrix sizes) before actual computation. For this purpose, some languages provide dependent types such as Mat m n, and others offer refinement types to track predicates for shapes. Despite the theoretical maturity, however, such methods are oft…
-
Symmetric Equilibrium Propagation for Thermodynamic Diffusion Training
The reverse process in score-based diffusion models is formally equivalent to overdamped Langevin dynamics in a time-dependent energy landscape. In our prior work we showed that a bilinearly-coupled analog substrate can physically realize this dynamics at a projected three-to-four orders of magnitude energy advantage over digital inference by replacing dense skip connections with low-rank inter-mo…
-
Reparameterization through Coverings and Topological Weight Priors
We generalise the reparameterization trick applied in variational autoencoders (VAEs) letting these have latent spaces of non-trivial topology - i.e. that of base manifolds covered with other ones, on which some technique for RT is available. That is possible since covering maps are measurable - moreover, in case of particular measure preservation property holding for the covering, one can establi…
-
Bringing a Personal Point of View: Evaluating Dynamic 3D Gaussian Splatting for Egocentric Scene Reconstruction
Egocentric video provides a unique view into human perception and interaction, with growing relevance for augmented reality, robotics, and assistive technologies. However, rapid camera motion and complex scene dynamics pose major challenges for 3D reconstruction from this perspective. While 3D Gaussian Splatting (3DGS) has become a state-of-the-art method for efficient, high-quality novel view syn…
-
EndoGov: A knowledge-governed multi-agent expert system for endometrial cancer risk stratification
Multimodal artificial intelligence models for endometrial cancer (EC) risk stratification typically optimize aggregate predictive performance but provide limited mechanisms for enforcing mandatory guideline overrides, such as assigning POLE-mutated tumors to the low-risk group despite high-grade morphology. We present EndoGov, a two-tier multi-agent expert system that factorizes the decision proce…
-
Domain Fine-Tuning vs. Retrieval-Augmented Generation for Medical Multiple-Choice Question Answering: A Controlled Comparison at the 4B-Parameter Scale
Practitioners deploying small open-weight large language models (LLMs) for medical question answering face a recurring design choice: invest in a domain-fine-tuned model, or keep a general-purpose model and inject domain knowledge at inference time via retrieval-augmented generation (RAG). We isolate this trade-off by holding model size, prompt template, decoding temperature, retrieval pipeline, a…
-
Causal Representation Learning from General Environments under Nonparametric Mixing
Causal representation learning aims to recover the latent causal variables and their causal relations, typically represented by directed acyclic graphs (DAGs), from low-level observations such as image pixels. A prevailing line of research exploits multiple environments, which assume how data distributions change, including single-node interventions, coupled interventions, or hard interventions, o…
-
VitaminP: cross-modal learning enables whole-cell segmentation from routine histology
Accurate whole-cell and nuclear segmentation is essential for precision pathology and spatial omics, yet routine hematoxylin and eosin (H&E) staining provides limited cytoplasmic contrast, restricting analyses to nuclei. Multiplex immunofluorescence (mIF) facilitates precise whole-cell delineation but remains constrained by cost and accessibility. We introduce VitaminP, a cross-modal learning fram…
-
ELSA: Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers
Existing attention accelerators often trade exact softmax semantics, depend on fused Tensor Core kernels, or incur sequential depth that limits FP32 throughput on long sequences. We present \textbf{ELSA}, an algorithmic reformulation of online softmax attention that (i)~preserves exact softmax semantics in real arithmetic with a \emph{provable} $\mathcal{O}(u\log n)$ FP32 relative error bound; (ii…
-
Optimizing Information Freshness for Wireless Local Area Networks with Multiple APs
Dense indoor WLANs increasingly rely on multiple access points (APs) operating over partially overlapping spectrum to support latency-sensitive applications. In such deployments, simultaneous transmissions across APs create co-channel and adjacent-channel interference, making scheduling decisions interdependent and directly impacting information freshness. Motivated by emerging software-defined WL…
-
LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models
This paper extends the Classification Error Gauge (x-CEG) framework, originally developed for measuring the privacy-utility trade-off in tabular datasets, to privacy auditing of Large Language Models (LLMs). We propose LLM-CEG, a systematic framework that employs membership inference attack (MIA) success rates as an empirical privacy gauge and model perplexity as a utility gauge, iteratively adjus…
-
SCAT Data Release 1: 1810 optical spectra of 1330 transients
We present the first data release (DR1) of the Spectroscopic Classification of Astronomical Transients (SCAT) survey, covering the first $\approx 5$ years of observations (March 2018 - January 2023). DR1 includes 1810 spectra of 1330 transients, which we sort into broad spectroscopic classes including supernovae (SNe), transients originating in galactic nuclei, and stellar variability. We collect …