1067889 results (page 58 of 42716)
-
Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps
We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of threat hunting: given a database of raw Windows event logs with no guided questions or hints, identify the exact timestamps of malicious events. The benchmark wraps 106 real attack procedures from the OTRF Security-Datasets corpus - spanning 86 MITRE…
-
BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps
Tokenizing music to fit the general framework of language models is a compelling challenge, especially considering the diverse symbolic structures in which music can be represented (e.g., sequences, grids, and graphs). To date, most approaches tokenize symbolic music as sequences of musical events, such as onsets, pitches, time shifts, or compound note events. This strategy is intuitive and has pr…
-
Hypergraph Mining via Proximity Matrix
Hypergraphs serve as an effective tool widely adopted to characterize higher-order interactions in complex systems. The most intuitive and commonly used mathematical instrument for representing a hypergraph is the incidence matrix, in which each entry is binary, indicating whether the corresponding node belongs to the corresponding hyperedge. Although the incidence matrix has become a foundational…
-
Calibrating Scientific Foundation Models with Inference-Time Stochastic Attention
Transformer-based scientific foundation models are increasingly deployed in high-stakes settings, but current architectures give deterministic outputs and provide limited support for calibrated predictive uncertainty. We propose Stochastic Attention, a lightweight inference-time modification that randomizes attention by replacing softmax weights with normalized multinomial samples controlled by a …
-
Cosmic evolution of the [CII]-to-molecular gas relation
The [CII] 158 $μ$m line is widely used to trace star formation and the gas contents of high-redshift galaxies. However, it remains unclear under which physical conditions it reliably traces the molecular reservoir, and whether a unique conversion factor $α_{\rm [CII]}$ can be applied across cosmic time. We investigate the evolution of the relation between the [CII] luminosity and molecular gas mas…
-
Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments
This technical note revisits the relationship between RaBitQ and TurboQuant under a unified comparison framework. We compare the two methods in terms of methodology, theoretical guarantees, and empirical performance, using a reproducible, transparent, and symmetric setup. Our results show that, despite the claimed advantage of TurboQuant, TurboQuant does not provide a consistent improvement over R…
-
Evaluating LLM-Generated Obfuscated XSS Payloads for Machine Learning-Based Detection
Cross-site scripting (XSS) remains a persistent web security vulnerability, especially because obfuscation can change the surface form of a malicious payload while preserving its behavior. These transformations make it difficult for traditional and machine learning-based detection systems to reliably identify attacks. Existing approaches for generating obfuscated payloads often emphasize syntactic…
-
Revac: A Social Deduction Reasoning Agent
Social deduction games such as Mafia present a unique AI challenge: players must reason under uncertainty, interpret incomplete and intentionally misleading information, evaluate human-like communication, and make strategic elimination decisions. Unlike deterministic board games, success in Mafia depends not on perfect information or brute-force search, but on inference, memory, and adaptability i…
-
GenerativeMPC: VLM-RAG-guided Whole-Body MPC with Virtual Impedance for Bimanual Mobile Manipulation
Bimanual mobile manipulation requires a seamless integration between high-level semantic reasoning and safe, compliant physical interaction - a challenge that end-to-end models approach opaquely and classical controllers lack the context to address. This paper presents GenerativeMPC, a hierarchical cyber-physical framework that explicitly bridges semantic scene understanding with physical control …
-
Singularities in phase separation models: a spectral element approach for the nonlocal Cahn-Hilliard equation
The nonlocal Cahn-Hilliard equation provides a natural extension of the classical model for phase separation by incorporating long-range interactions through a singular convolution kernel. While this formulation admits a rich existence and regularity theory, its numerical approximation remains challenging: discretisation of the nonlocal term leads to dense operators, and the singularity of the ker…
-
SimDiff: Depth Pruning via Similarity and Difference
Depth pruning improves the deployment efficiency of large language models (LLMs) by identifying and removing redundant layers. A widely accepted standard for this identification process is to measure the similarity between layers using cosine distance. However, we find that methods relying solely on this one-dimensional heuristic can exhibit unpredictable performance and even catastrophic collapse…
-
Accelerating Optimization and Machine Learning through Decentralization
Decentralized optimization enables multiple devices to learn a global machine learning model while each individual device only has access to its local dataset. By avoiding the need for training data to leave individual users' devices, it enhances privacy and scalability compared to conventional centralized learning, where all data has to be aggregated to a central server. However, decentralized op…
-
PRADAS: PRior-Assisted DAta Splitting for False Discovery Rate Control
In the FDR-controlling literature, mirror statistics offer a flexible alternative to $p$-value based procedures. When prior information is available, however, it is unclear how to incorporate mirror statistics in a principled way, and the standard equal split used by data-splitting methods can be inefficient. In this paper, we characterize a broader class of mirror statistics for any fixed splitti…
-
From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning
Generative engines (GEs) are reshaping information access by replacing ranked links with citation-grounded answers, yet current Generative Engine Optimization (GEO) methods optimize each instance in isolation, unable to accumulate or transfer effective strategies across tasks and engines. We reframe GEO as a strategy learning problem and propose MAGEO, a multi-agent framework in which coordinated …
-
Constructive Approaches to Perception-Aware Lossy Source Coding: Information-Theoretic Guidelines
Perception-aware lossy source coding has attracted significant recent interest. It augments the classical distortion criterion with an explicit perception constraint, thereby enabling more refined control over fidelity and perceptual quality. Despite rapid progress, the diversity of rate-distortion-perception formulations and their underlying assumptions remains poorly understood by many practitio…
-
When Graph Structure Becomes a Liability: A Critical Re-Evaluation of Graph Neural Networks for Bitcoin Fraud Detection under Temporal Distribution Shift
The consensus that GCN, GraphSAGE, GAT, and EvolveGCN outperform feature-only baselines on the Elliptic Bitcoin Dataset is widely cited but has not been rigorously stress-tested under a leakage-free evaluation protocol. We perform a seed-matched inductive-versus-transductive comparison and find that this consensus does not hold. Under a strictly inductive protocol, Random Forest on raw features ac…
-
Defining Robust Ultrasound Quality Metrics via an Ultrasound Foundation Model
Clinicians lack a principled framework to quantify diagnostic utility in ultrasound reconstructions. Existing standards like PSNR and VGG-LPIPS are inadequate, failing to account for modality-specific physics or the structural nuances of acoustic imaging. We close this gap with a TinyUSFM-based evaluation framework featuring two distinct metrics: TinyUSFM-uLPIPS, a full-reference perceptual distan…
-
Evaluating Histogram Matching for Robust Deep learning-Based Grapevine Disease Detection
Variability in illumination is a primary factor limiting deep learning robustness for field-based plant disease detection. This study evaluates Histogram Matching (HM), a technique that transforms the pixel intensity distribution of an image to match a reference profile, to mitigate this in grapevine classification, distinguishing among healthy leaves, downy mildew, and spider mite damage. We prop…
-
Assessing VLM-Driven Semantic-Affordance Inference for Non-Humanoid Robot Morphologies
Vision-language models (VLMs) have demonstrated remarkable capabilities in understanding human-object interactions, but their application to robotic systems with non-humanoid morphologies remains largely unexplored. This work investigates whether VLMs can effectively infer affordances for robots with fundamentally different embodiments than humans, addressing a critical gap in the deployment of th…
-
Bangla Key2Text: Text Generation from Keywords for a Low Resource Language
This paper introduces \textit{Bangla Key2Text}, a large-scale dataset of $2.6$ million Bangla keyword--text pairs designed for keyword-driven text generation in a low-resource language. The dataset is constructed using a BERT-based keyword extraction pipeline applied to millions of Bangla news texts, transforming raw articles into structured keyword--text pairs suitable for supervised learning. To…
-
Market Dynamics, Governance and Open Research Metadata in the AI Era
The debate about scholarly knowledge infrastructure has long been framed as a contest between openness and commercial enclosure. This framing distorts both policy and practice. The real tension lies between the persistent cost of producing and refining structured metadata under deep technological friction, and the differentiated demands distinct communities place on data quality, focus and granula…
-
Enhancing Unsupervised Keyword Extraction in Academic Papers through Integrating Highlights with Abstract
Automatic keyword extraction from academic papers is a key area of interest in natural language processing and information retrieval. Although previous research has mainly focused on utilizing abstract and references for keyword extraction, this paper focuses on the highlights section - a summary describing the key findings and contributions, offering readers a quick overview of the research. Our …
-
Cyclic Equalizability Characterized by Parikh Vectors
Cyclic equalizability is a notion introduced by Shinagawa and Nuida in 2025, in the study of card-based cryptography. Informally, a collection of words is cyclically equalizable if, by inserting the same letters at the same positions in all words, they can be transformed into words that are cyclic shifts of one another. Shinagawa and Nuida showed that two binary words of equal length are cyclicall…
-
ReaLB: Real-Time Load Balancing for Multimodal MoE Inference
Mixture-of-Experts (MoE) architectures are widely used in modern large language models and multimodal models. However, inference efficiency is often limited by highly dynamic and skewed expert workloads across different modalities. During the prefill stage with large batch sizes, vision tokens frequently dominate the input sequences. Under expert parallelism (EP), this leads to severe load imbalan…
-
Beyond Rating: A Comprehensive Evaluation and Benchmark for AI Reviews
The rapid adoption of Large Language Models (LLMs) has spurred interest in automated peer review; however, progress is currently stifled by benchmarks that treat reviewing primarily as a rating prediction task. We argue that the utility of a review lies in its textual justification--its arguments, questions, and critique--rather than a scalar score. To address this, we introduce Beyond Rating, a h…