TL;DS - Differential Privacy at Scale: Uber and Berkeley Collaboration
These are my notes from watching the video Differential Privacy at Scale: Uber and Berkeley Collaboration presented at Enigma 2018.
Attack Surface:
SQL Database with trips.
Attacks:
- Internal: Inside the company analysts retrieving the data using SQL
- External: External query makers
Solution
Chorus + DP
Why Anonymization not a solution?
- It has limited utility
- Subject to re-identification attacks
Challenges for Practical General-purpose DP
- Usability for non-experts
- Broad support for analytics queries
- Easy integration with existing infra (data environments)
What does broad support mean?
- System that can deploy multiple mechanisms
What is Chorus?
- Automatically enforces DP for SQL queries
- Modular to support various mechanisms (supports 93% queries in workload)
- Works with standard SQL DBs
- Deployed at Uber
How it works?
- Takes a SQL query and modifies it to an intrinsic private query (IPQ)
- IPQ has embedded DP mechanisms
Mechanisms
- Elastic Sensitivity
- Scale of noise is defined by elastic sensitivity, privacy budget
- This is determined using dataflow analysis of query
- Sample & Aggregate
- Calculates sensitivity of the query as the query executes
- Done by partitioning data and aggregating results in differentially private way
- WPINQ
- Restricted Sensitivity
Note
These together support 94% queries in the workload
Terms to Read
- Winsorized mean
- Laplace Mechanism
Evaluation
- 18,774 SQL Queries
- Less than 1% error by elastic sensitivity
- Median performance overhead of 1.7% on database of 800 million records