Daily arXiv Digest — 2026-03-25 (ML + Optimization)

Auto-generated from arXiv metadata + an LLM reading only titles/abstracts. Equations are interpretive; always verify with the PDF.

1) Universal and Parameter-free Gradient Sliding for Composite Optimization

Authors: Yan Wu, Yuyuan Ouyang, Zhe Zhang, Qi Luo
arXiv: 2603.23492 · pdf
Categories: math.OC

Abstract

We propose a parameter-free universal gradient sliding (PFUGS) algorithm for computing an approximation solution to the convex composite optimization problem $\min_{x\in X} {f(x) + g(x)}$. When $f$ and $g$ have $(M_ν,ν)$-Hölder and $L$-Lipschitz continuous (sub)gradients respectively, our proposed PFUGS method computes an approximate solution within at most $\mathcal{O}((M_ν/\varepsilon)^{2/(1+3ν)})$ and $\mathcal{O}((L/\varepsilon)^{1/2})$ evaluations of (sub)gradients of $f$ and $g$ respectively. Moreover, the PFUGS algorithm is parameter-free and does not require any prior knowledge on problem constants $ν$, $M_ν$, and $L$. To the best of knowledge, for problems involving two functions with different sets of problem constants, PFUGS is the first gradient sliding algorithm that is parameter-free.

Math explanation (LLM)

(No LLM key configured — showing abstract only. Set LLM_PROVIDER + an API key secret to enable math explanations.)

2) Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions

Authors: Rustem Islamov, Grigory Malinovsky, Alexander Gaponov, Aurelien Lucchi, Peter Richtárik, Eduard Gorbunov
arXiv: 2603.23472 · pdf
Categories: cs.LG, cs.CR, math.OC

Abstract

Federated Learning (FL) enables heterogeneous clients to collaboratively train a shared model without centralizing their raw data, offering an inherent level of privacy. However, gradients and model updates can still leak sensitive information, while malicious servers may mount adversarial attacks such as Byzantine manipulation. These vulnerabilities highlight the need to address differential privacy (DP) and Byzantine robustness within a unified framework. Existing approaches, however, often rely on unrealistic assumptions such as bounded gradients, require auxiliary server-side datasets, or fail to provide convergence guarantees. We address these limitations by proposing Byz-Clip21-SGD2M, a new algorithm that integrates robust aggregation with double momentum and carefully designed clipping. We prove high-probability convergence guarantees under standard $L$-smoothness and $σ$-sub-Gaussian gradient noise assumptions, thereby relaxing conditions that dominate prior work. Our analysis recovers state-of-the-art convergence rates in the absence of adversaries and improves utility guarantees under Byzantine and DP settings. Empirical evaluations on CNN and MLP models trained on MNIST further validate the effectiveness of our approach.

Math explanation (LLM)

(No LLM key configured — showing abstract only. Set LLM_PROVIDER + an API key secret to enable math explanations.)

3) Convergence analysis of accelerated algorithms via a mixed-order dynamical system for separable nonsmooth convex optimization

Authors: Geng-Hua Li, Hai-Yi Zhao, Xiangkai Sun
arXiv: 2603.23046 · pdf
Categories: math.OC

Abstract

For a linear equality constrained convex optimization problem involving two objective functions with a nonsmooth" +nonsmooth” composite structure, we study two algorithms derived from a mixed-order dynamical system which incorporates time scales and a Tikhonov regularization term. We observe that different types of multipliers lead to distinct algorithms. For the implicit multiplier and semi-implicit multiplier, we develop a new primal-dual joint algorithm and a new splitting algorithm, respectively. Our proposed joint algorithm can reduce to an algorithm for solving the corresponding non-separable linearly constrained convex optimization problem. Then, we establish the nonergodic convergence properties of all our proposed algorithms. Moreover, we derive that the sequences generated by these algorithms strongly converge to the minimal norm solution. Finally, numerical experiments are conducted to validate the practical performance of the proposed algorithms.

Math explanation (LLM)

(No LLM key configured — showing abstract only. Set LLM_PROVIDER + an API key secret to enable math explanations.)

4) End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions

Authors: Zakaria Mhammedi, Alexander Rakhlin, Nneka Okolo
arXiv: 2603.23461 · pdf
Categories: cs.LG

Abstract

We study reinforcement learning (RL) with linear function approximation in Markov Decision Processes (MDPs) satisfying \emph{linear Bellman completeness} – a fundamental setting where the Bellman backup of any linear value function remains linear. While statistically tractable, prior computationally efficient algorithms are either limited to small action spaces or require strong oracle assumptions over the feature space. We provide a computationally efficient algorithm for linear Bellman complete MDPs with \emph{deterministic transitions}, stochastic initial states, and stochastic rewards. For finite action spaces, our algorithm is end-to-end efficient; for large or infinite action spaces, we require only a standard argmax oracle over actions. Our algorithm learns an $\varepsilon$-optimal policy with sample and computational complexity polynomial in the horizon, feature dimension, and $1/\varepsilon$.

Math explanation (LLM)

(No LLM key configured — showing abstract only. Set LLM_PROVIDER + an API key secret to enable math explanations.)

5) Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation

Authors: Michal Balcerak, Suprosana Shit, Chinmay Prabhakar, Sebastian Kaltenbach, Michael S. Albergo, Yilun Du, Bjoern Menze
arXiv: 2603.23398 · pdf
Categories: cs.LG, cs.AI, stat.ML

Abstract

Energy-based models for discrete domains, such as graphs, explicitly capture relative likelihoods, naturally enabling composable probabilistic inference tasks like conditional generation or enforcing constraints at test-time. However, discrete energy-based models typically struggle with efficient and high-quality sampling, as off-support regions often contain spurious local minima, trapping samplers and causing training instabilities. This has historically resulted in a fidelity gap relative to discrete diffusion models. We introduce Graph Energy Matching (GEM), a generative framework for graphs that closes this fidelity gap. Motivated by the transport map optimization perspective of the Jordan-Kinderlehrer-Otto (JKO) scheme, GEM learns a permutation-invariant potential energy that simultaneously provides transport-aligned guidance from noise toward data and refines samples within regions of high data likelihood. Further, we introduce a sampling protocol that leverages an energy-based switch to seamlessly bridge: (i) rapid, gradient-guided transport toward high-probability regions to (ii) a mixing regime for exploration of the learned graph distribution. On molecular graph benchmarks, GEM matches or exceeds strong discrete diffusion baselines. Beyond sample quality, explicit modeling of relative likelihood enables targeted exploration at inference time, facilitating compositional generation, property-constrained sampling, and geodesic interpolation between graphs.

Math explanation (LLM)

(No LLM key configured — showing abstract only. Set LLM_PROVIDER + an API key secret to enable math explanations.)