How data science and machine learning interpret genomic data and contribute to personalized medicine

Speaker:Kathryn Roeder
Schedule:Tues., 16:30-17:30, July 9, 2024
Venue:Tsinghua University West Lecture Hall (西阶); Zoom Meeting ID: 4552601552 Passcode: YMSC
Date:2024-07-09

Abstract:

High‐throughput genomics yields vast amounts of data for personalized medicine and other health-related discoveries. For instance, genome‐wide association studies (GWAS), which involves tens of thousands to millions of subjects, have linked thousands of genetic changes or variants with human diseases. Accumulating these variants across a subjects' entire genome can help predict their risk for various diseases and these findings have already contributed in some instances to improved clinical treatment. However, even with the vast amount of information available, predictive power is typically weak using standard analytical techniques. Breakthroughs in the near future are anticipated using machine learning and AI techniques. On another front, CRISPR, a genetic engineering marvel, promises breathtaking potential for treatments of cancer and other genetic defects. To realize these benefits, careful study of immense amounts of data will be required. Data science and machine learning must become an integral part of genomics to fully realize the potential of CRISPR, GWAS and other genomic studies in the coming decade.  


Bio:

Kathryn Roeder is the UPMC Professor of Statistics and Life Sciences in the Departments of Statistics & Data Science and Computational Biology. She earned her Ph.D. in statistics at Pennsylvania State University, after which she was on the faculty at Yale University for the six years before coming to Carnegie Mellon University in 1994. In 1997 she received the COPSS Presidents' Award for the outstanding statistician under age 40. In 2020 she was awarded the COPSS Distinguished Achievement Award and Lectureship. In 2019 she was inducted into the National Academy of Sciences. Her research group develops statistical tools applied to genetic and genomic data to understand the workings of the human brain, and the interplay with genetic variation. These methods rely on various statistical and machine learning methods, causal inference, latent space embedding, sparse PCA and high dimensional nonparametric techniques.



Video:http://archive.ymsc.tsinghua.edu.cn/pacm_lecture?html=How_data_science_and_machine_learning_interpret_genomic_data_and_contribute_to_personalized_medicine.html