In this work we consider the problem of online submodular maximization under
a cardinality constraint with differential privacy (DP). A stream of $T$
submodular functions over a common finite ground set $U$ arrives online, and at
each time-step the decision maker must choose at most $k$ elements of $U$
before observing the function. The decision maker obtains a payoff equal to the
function evaluated on the chosen set, and aims to learn a sequence of sets that
achieves low expected regret. In the full-information setting, we develop an
$(varepsilon,delta)$-DP algorithm with expected $(1-1/e)$-regret bound of
$mathcal{O}left( frac{k^2log |U|sqrt{T log k/delta}}{varepsilon}
right)$. This algorithm contains $k$ ordered experts that learn the best
marginal increments for each item over the whole time horizon while maintaining
privacy of the functions. In the bandit setting, we provide an
$(varepsilon,delta+ O(e^{-T^{1/3}}))$-DP algorithm with expected
$(1-1/e)$-regret bound of $mathcal{O}left( frac{sqrt{log
k/delta}}{varepsilon} (k (|U| log |U|)^{1/3})^2 T^{2/3} right)$. Our
algorithms contains $k$ ordered experts that learn the best marginal item to
select given the items chosen her predecessors, while maintaining privacy of
the functions. One challenge for privacy in this setting is that the payoff and
feedback of expert $i$ depends on the actions taken by her $i-1$ predecessors.
This particular type of information leakage is not covered by post-processing,
and new analysis is required. Our techniques for maintaining privacy with
feedforward may be of independent interest.

By admin