Introduction to Local Differential Privacy

2022, Sep 19    

Almost all data statistics and analysis tasks fundamentally depend on a basic understanding of the distribution of the data. Local differential privacy (LDP) is proposed as a distributed variant of Differential Privacy which achieves privacy guarantees for each user locally and is independent of any assumptions on the third party servers. It has been used for tasks like frequency and mean value estimation, heavy hitters discovery, $k$ -way marginal release, empirical risk minimization, federated learning, and deep learning.

:information_source: Definition: $\epsilon$ -Differential Privacy

A randomized mechanism $\mathcal{M}$ satisfies $\epsilon$ - LDP if and only if for any pairs of input values $v, v’$ in the domain of $\mathcal{M}$ , and for any possible output $y \in \mathcal{Y}$ , it holds

$\frac{\mathbb{P}[\mathcal{M}(v) = y]}{\mathbb{P}[\mathcal{M}(v') = y]} \leq e^{\epsilon}$

where $\mathbb{P}[\cdot]$ denotes probability and $\epsilon$ is the privacy budget. A smaller $\epsilon$ means stronger privacy protection, and vice versa.

:information_source: Theorem: Sequential Composition

Let $\mathcal{M}^i(v)$ be an $\epsilon^i$ -LDP algorithm on an input value $v$, and $\mathcal{M}(v)$ is the sequential composition of $\mathcal{M}^1(v)$, $\dots$ , $\mathcal{M}^m(v)$ . Then $\mathcal{M}(v)$ satisfies $\epsilon^1 + \dots + \epsilon^m$ -LDP.

Randomized Response

Randomized response is a mechanism for local differential privacy which was first proposed in a 1965 paper by S. L. Warner. At the time, the technique was intended to improve bias in survey responses about sensitive issues, and it was not originally proposed as a mechanism for differential privacy (which wouldn’t be invented for another 40 years). After differential privacy was developed, statisticians realized that this existing technique already satisfied the definition and it has become the de facto standard for LDP.

Users who possess a private answer $x$ flips it with probability $p$ of giving the true answer and probability $1-p$ of giving the false answer. The most common sketch of LDP is as follows:

  • Flip a coin
  • If the coin is heads, answer the question truthfully
  • If the coin is tails, flip another coin
  • If the second coin is heads, answer “yes”; if it is tails, answer “no”

The randomization in this algorithm comes from the two coin flips. As in all other differentially private algorithms, this randomization creates uncertainty about the true answer, which is the source of privacy.