Papers
-
LDPKiT: Recovering Utility in LDP Schemes by Training with Noise^2
Kexin Li, Yang Xi, Aastha Mehta, David Lie
Under review, preprint. 2024
Projects
Privacy Preserving Inference via Model Extraction Against Third-Party NLP Cloud Adversaries
(2024, Supervised by Prof. David Lie) Bachelor's thesis pdf
The widespread use of MLaaS cloud applications is accompanied by increasing data privacy risks, where third-party cloud adversaries can potentially exploit and abuse sensitive user data. However, many data privacy schemes sacrifice model data utility and performance efficiency in exchange for privacy. This thesis investigated a privacy-preserving inference infrastructure that uses model extraction methods to maintain model performance while preserving privacy. In particular, knowledge distillation and active learning techniques are used to build the infrastructure, tested in the NLP domain. The thesis evaluates different means to preserve privacy and performes analysis on their effectiveness, showing how employing a public dataset in the model extraction based inference infrastructure provides strong privacy while maintaining model performance, outperforming state-of-the-art methods such as using differential privacy noise.
Lingling Bot for Classical Music Instrument Identification (2021) Github repo
An ML model for multi-classical instrument identification from audio files with an accuracy of >80% in a team of four
Are You Spreading COVID Misinformation? A study on misinformation trends on Twitter
Paper pdf
The spread of misinformation has been exacerbated by the onset of the pandemic, as the use of remote and online resources has increased due to restrictions on offline interactions. While some misinformation are harmless, others can lead to the public panic and serious health concerns. Therefore, detecting and removing misinformation is crucial. In this project, we examine the correlation between influential Twitter accounts and the spread of misinformed claims related to COVID-19 on Twitter. As expected, we find that there is a positive correlation between real tweets and health organization accounts while other types of accounts don't have a significant relationship with fake with real sweets. Furthermore, we also observe that sentiment is not a driving factor in the spread of misinformation. This work was top 15 in the 2021 UDBC.
Locating Narrow Spectrocopic Features for a Solid-State Samarium Atomic Clock
(2020-2021, Supervised by Prof. Amar Vutha)
The samarium experiment searches for a narrow spectroscopic feature in a crystal that is lower-doped with samarium ions to make a ultra-precise atomic clock. The desired transition that has the feature includes a forbidden transition that can only be achieved by induced magnetic state mixing, giving rise to a weak transition. To increase the signal from the transition, I worked on increasing the signal to pinpoint the narrow feature. I performed simulations in Zemax Optics to determine an optimal lens configuration, analyzed how different lens combinations can increase the PMT collection efficiency, created computer aided designs for a lens system to integrate into to the original vaccume cryostat, and wrote code to performed real time analysis on experiments. My work resulted in an 8x increase in the signal collection frequency as well as a redesign of experimental components to increase the modularity and scalability.
Developing COVID-19 Rapid Lateral Flow Diagnostic Strips
(2021, Supervised by Prof. Xinyu Liu)
The pandemic highlighted the urgent need for rapid high-sensitivity diagnostic testing for large-scale population screening. I worked on optimizing the design of such lateral flow test strips for COVID diagnostics. I manufactured and tested different designs of the flow strip with several conjugates to find the best design. I was able to achieve a detection limit of 10ng/ml for N proteins in SARS-CoV-2 using spiked blood samples.