TRAIL Research Salon Series: From Prediction to Actionable Inference in the Era of AI

Background

Machine learning and AI have proven to be powerful tools for prediction. However, the complexity and scale of these models often make it difficult to draw valid inferences, which are crucial for informing decision-making and drawing scientific insights. Recently, significant efforts and developments have been made in both computer science and statistics to address the challenges of inference in the context of large AI models. This workshop aims to review current developments in this area, discuss unmet needs, and explore the challenges and opportunities ahead.

This workshop is organized by the Translational AI Laboratory (TRAIL) in the Mailman School of Public Health. TRAIL, directed by Ying Wei, is dedicated to enhancing the research capabilities in AI for public health, from methodological development to their translation into public health advancement. The TRAIL Research Salon Series will explore the latest advancements in AI and machine learning. In the week of August 20, let's discuss how we can move from prediction to actionable inference.

Workshop Details

The Research Salon comprises two parts. The program starts with an invited webinar by Tijana Zrinic, a post-doc from Stanford University to present her recent impressive work on Prediction-Powered Inference. On August 22nd, we will host a full-day workshop featuring presentations from our own faculty and invited experts in statistics and computer science from Washington University in St. Louis, the University of Wisconsin, and the Weill-Cornell AI institution, followed by roundtable discussions. We invite you to join us and interact with top experts in the field to be part of the conversation about the future of AI-driven inference.

Schedule

Pre-workshop Journal Clubs

Zoom Link https://columbiacuimc.zoom.us/j/95775259475?pwd=8ayALQavupY0vjPC9oULP6MofinyRo.1

Meeting ID: 957 7525 9475 Passcode: 834911

Aug 20th, 12 - 1 pm
- Speaker: Tijana Zrinic, Stanford University
- Host: Dan Malinsky, Columbia University
- Title: Inference via Machine Learning
- Abstract: From proteomics to remote sensing, machine learning predictions are beginning to substitute for real data when collection of the latter is difficult, slow, or costly. In this talk, I will present recent work that permits the use of predictions to make valid statistical inferences. I will discuss the use of machine learning predictions as substitutes for high-quality data on one hand and as a tool for guiding real data collection on the other. In both cases, machine learning allows for a significant boost in statistical power compared to classical baselines for inference that do not leverage prediction.
- Bio: Tijana Zrnic is a Ram and Vijay Shriram Data Science Postdoctoral Fellow at Stanford University, where she is hosted by Emmanuel Candès in Statistics. She works on developing theories and methods for reliable and responsible machine learning and data science. Previously, she obtained a PhD in Electrical Engineering and Computer Sciences at UC Berkeley in 2023, where she was advised by Moritz Hardt and Michael Jordan.

Workshop on Aug 22nd, 2024

Location: Hess Commons, 722 West 168th Street

Time: 9 am - 5 pm

Agenda:

8:30 - 9 am : Arrival and Breakfast
9:00 - 9:10 am: Ying Wei, Introduction and program overview
9:10 - 10:10 am, am Session 1: Post-prediction Inference, statisticians' insight
- Drawing Inference on Predicted Data: Some Ideas and Reflections (Speaker: Jiwei Zhao, University of Wisconsin)
- Discussant: Yanyuan Ma, Penn State University
10:20 - 11:40 am, Session 2: Inferential Data fusion
- Adaptive ML-powered Learning with Blockwise Missing and Semi-Supervised Data ( Speaker: Molei Liu, Columbia University)
- Leveraging multi-study, multi-outcome data to improve external validity and efficiency of clinical trials for managing schizophrenia (Speaker: Caleb Miles, Columbia University)
- Discussant: Tian Gu, Columbia University
11:40 am - 12:00 pm: Asha Saxena, The AI factors: Statistical Inference in a Data Driven World
12:15 - 1:45 pm Lunch Break
1:45 - 2:45 pm, Session 3: Causal discoveries from machine learning, engineers' perspectives
- Generating Real-World Evidence with Real-World Data and Machine Learning (Speaker: Chengxi Zang, Weill-Cornell Institute of AI for Digital Health
- Post-subtyping analysis with multimodal data – A case study of Parkinson’s disease subtyping and drug repurposing (Speaker: Chang Su, Weill-Cornell Institute of AI for Digital Health)
3:00 - 4:30 pm Discussions:
- Moderator: Xuming He, Washington University at St. Louis
- Bio: Xuming He is Kotzubei Beckmann Distinguished Professor and Inaugural Chair of Statistics and Data Science at the Washington University in St. Louis. He also serves as President (2023–2025) of the International Statistical Institute. Xuming received enormous honors and awards for his remarkable achievement in robust statistics, quantile regression, and outstanding services to the statistics and science communities, including being a fellow of the American Association for the Advancement of Science, the American Statistical Association, the Institute of Mathematical Statistics, and International Statistical Institute.
- Questions for discussions
  - Q1. Inference is a general term that refers to the process of drawing conclusions or making judgments based on evidence, reasoning, or available information.
    
    Question for statisticians and biostatisticians: Statistical inference provides disciplined methodologies and has supported many scientific applications. It works within the framework of a population and a probability space. Do you think statistical inference is still relevant and desirable outside the classical framework? Why and why not?
    
    Question for non-statisticians: Inference can have different meanings depending on the area of research. What are the common tools for inference in your field? How much do you rely on statistical inference for evidence-based research? With the rise of big data and AI models, is statistical inference still relevant and desirable?
  - Q2: Inference based on “imprecise predicted data” from machine learning or other AI tools has widespread applications. For example, this has become a significant topic in computational social science; see **https://doi.org/10.48550/arXiv.2306.04746.** How and what can we learn from other fields in this regard?
  - Q3. Causal inference plays a critical role in randomized clinical trials. For folks who work on causal inference, what role does causal inference play in public health outside the clinical trial settings? What are the major challenges in studying causal inference there, especially with big data and big models?
  - Q4: Big data and AI offer huge potentials together with amplified risks in terms of bias and false discoveries. What are the obvious risks and the more subtle, harder-to-detect problems? Examples? What roles should statisticians/biostatisticians play in fostering more trustworthy AI? How should we communicate to other scientists and to the public?

Registration

Registration details and deadlines will be provided in a future update.