Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit
Published in , 2024
We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a nonmonotone submodular function, taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a $O(d \log(dT))$ problemdependent upper bound for the $1/2$-approximate pseudo-regret, as well as a $O(dT^{2/3}\log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. To that end, we introduce a notion of hardness for submodular functions, characterizing how difficult it is to maximize them with this type of strategy.
Recommended citation: Zhou, J., Gaillard, P., Rahier, T., & Arbel, J. (2024). Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit. arXiv preprint arXiv:2410.08578.
Download Paper | Download Slides