An Introduction to (Privacy in Data Analysis and) Differential Privacy

Data analysis and training of models often require working on datasets with sensitive/incriminating information that would compromise a person’s privacy, finances, or their daily life if analysts are given access to said datasets in their raw form, so to speak. Therefore data anonymization and preservation of privacy has been a long running line of research, albeit various techniques such as anonymization, providing summary statistics, query auditing etc. have been known to be susceptible to various (viz. linkage, reconstruction, differencing) attacks, and problems with practical implementation. In this talk, we shall discuss attempts at remedying this problem, and what went wrong with previously used approaches, and why differential privacy came to be, followed by a description of differential privacy, randomization, and if time permits, a discussion about some of its primitives like mechanisms, compositions of mechanisms, etc.

No slides are available for this talk.