Privacy-Preserving Techniques and Machine Learning for Critical Systems

Thesis Type: Doctorate

Institution Of The Thesis: Yildiz Technical University, Graduate School Of Natural And Applied Sciences, Turkey

Approval Date: 2022

Thesis Language: English


Supervisor: Tülay Yıldırım


It is obvious that the majority of machine learning-driven applications used in daily
life are fed from personal data. This data may contain personal information from
people’s health history to their purchase history, depending on the usage area of
the applications. The increase in these applications has led to the need to increase
the measures for the protection of such data. As a matter of fact, as of 2016, the
data protection law for European citizens came into force. However, considering the
role of data in the development of artificial intelligence algorithms, it is inevitable
that legislative restrictions will create a negative situation for developments in this
field. In this context, technologies that protect data privacy allow sensitive data to
be protected and analyzed. The disclosure of sensitive data can be minimized with
machine learning algorithms that protect privacy, developed using these technologies.
Although there have been promising studies in this area recently, applications with
real life data are extremely limited.
Within the scope of the thesis, it is aimed to examine the effects of differential
privacy approaches in healthcare and biometrics applications that use critical data by
conducting comprehensive research on technologies that protect privacy. In addition,
the thesis study presents a comparison with the federated learning approach, which
is another privacy-preserving technique. In particular, a comparison was made by
applying two different differential privacy techniques for signature data, which is a
biometric data type, and the effect of the privacy implementation on the similarity
score is evaluated. Another aim of the thesis is to compare the behavior of these
privacy-preserving approaches for two different models in each application area and data type.