I am a fifth-year PhD candidate in the Turakhia Lab at UC San Diego, where my research bridges high-performance computing, computational biology, and machine learning to tackle critical public health challenges. My work focuses on overcoming computational bottlenecks in large-scale, noisy data analysis to enable actionable, real-time epidemiological insights.
Recently, I developed computational tools like WEPP and metaWEPP, which achieve high-resolution pathogen variant detection in complex environmental and patient metagenomic samples. Building on this, my current research focuses on AI-based vaccine design, aiming to proactively identify dominant variants from both currently circulating as well as novel variants that will emerge in the future.
My technical foundation lies in full-stack hardware and software optimization. Having developed hardware/software co-design techniques for ML accelerators, enhanced GPU utilization for ML workloads, and explored processing-near-memory approaches, I bring a strong optimization skillset across the entire computing stack.
Wastewater contains a treasure trove of public health information, including genetic traces of viruses and bacteria circulating in a community. However, this data is highly complex — more like a puzzle with millions of mixed-up pieces. WEPP is a powerful tool designed to solve this puzzle far more effectively than existing methods.
WEPP analyzes small fragments of genetic material (sequencing reads) from wastewater and accurately places them onto a pathogen’s “family tree,” known as a phylogenetic tree. By doing so, WEPP identifies the specific nodes in the tree, corresponding to unique genome sequences, from which these fragments most likely originated. This higher resolution allows public health officials to not only determine which pathogen variants are present in a community, but also to identify emerging strains before they receive an official designation. Such early detection is critical for monitoring outbreaks and enabling timely, effective public health responses.
While WEPP is designed to detect specific pathogens in wastewater, metaWEPP extends this approach to complex environmental samples—such as soil, water, air, and bodily fluids like cell-free DNA—that contain genetic material from many different organisms. Rather than a single puzzle, these samples resemble thousands of puzzles mixed together.
metaWEPP efficiently untangles this complexity by first identifying which organism each genetic fragment belongs to, and then applying WEPP to identify the specific genome variants for each organism. This enables a detailed view of the microbial composition of a sample, supporting applications ranging from clinical diagnostics to environmental and ecological monitoring.