统计讨论班 Statistics seminar-清华丘成桐数学科学中心

统计讨论班 Statistics seminar

Speaker：Min-ge Xie (Rutgers University)

Organizer：Yuhong Yang (YMSC), Fan Yang (YMSC)

Time：Thursday, 10:00-11:00 am, June 20, 2024

Venue：Shuangqing Complex Building C548

Upcoming talk：

Speaker：Min-ge Xie 谢敏革（Rutgers University）
Time: Thursday, 10:00-11:00 am, June 20, 2024
Venue：C548, Shuangqing Complex Building A

Title: Repro Samples Method and Principled Random Forests

Abstract:
Repro Samples method introduces a fundamentally new inferential framework that can be used to effectively address frequently encountered, yet highly non-trivial and complex inference problems involving discrete or non-numerical unknown parameters and/or non-numerical data. In this talk, we present a set of key developments in the repro samples method and use them to develop a novel machine learning ensemble tree model, termed principled random forests. Specifically, repro samples are artificial samples that are reproduced by mimicking the genesis of observed data. Using the repro samples and inversion techniques stemmed from fiducial inference, we can establish a confidence set for the underlying (‘true’) tree model that generated, or approximately generated, the observed data. We then obtain a tree ensemble model using the confidence set, from which we derive our inference. Our development is principled and interpretable since, firstly, it is fully theoretically supported and provides frequentist performance guarantees on both inference and predictions; and secondly, the approach only assembles a small set of trees in the confidence set and thereby the model used is interpretable. The development is further extended to handle tree-structured conditional average treatment effect in a causal inference setting. Numerical results have demonstrated superior performance of our proposed approach than existing single and ensemble tree methods.

The repro samples method provides a new toolset for developing interpretable AI and for helping address the blackbox issues in complex machine learning models. The development of the principle random forest is our first attempt on this direction.

About the speaker：

Min-ge Xie, PhD is a Distinguished Professor at Rutgers, The State University of New Jersey. Dr. Xie received his PhD in Statistics from University of Illinois at Urbana-Champaign and his BS in Mathematics from University of Science and Technology of China. He is the current Editor of The American Statistician and a co-founding Editor-in-Chief of The New England Journal of Statistics in Data Science. He is a fellow of ASA, IMS, and an elected member of ISI. His research interests include theoretical foundations of statistical inference and data science, fusion learning, finite and large sample theories, parametric and nonparametric methods. He is the Director of the Rutgers Office of Statistical Consulting and has a rich interdisciplinary research experiences in collaborating with computer scientists, engineers, biomedical researchers, and scientists in other fields.

Speaker：Feng Liang 梁枫, 伊利诺伊大学香槟分校(University of Illinois Urbana Champaign, UIUC)
Time: Monday, 10:00-11:00 am, June 10, 2024
Venue：C546, Shuangqing Complex Building A
Title : Learning Topic Models: Identifiability and Finite-Sample Analysis

Abstract：
Topic models provide a useful text-mining tool for learning, extracting, and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, a formal theoretical investigation on the statistical identifiability and accuracy of latent topic estimation is lacking in the literature. In this work, we propose a maximum likelihood estimator (MLE) of latent topics based on a specific integrated likelihood, which is naturally connected to the concept of volume minimization in computational geometry. Theoretically, we introduce a new set of geometric conditions for topic model identifiability, which are weaker than conventional separability conditions relying on the existence of anchor words or pure topic documents. We conduct finite-sample error analysis for the proposed estimator and discuss the connection of our results with existing ones. We conclude with empirical studies on both simulated and real datasets. This talk is based on joint work with Yinyin Chen, Shishuang He, and Yun Yang.

Title: Model-Assisted Uniformly Honest Inference for Optimal Treatment Regimes in High Dimension

Speaker: Yunan Wu

Time: Wed., 3:00-4:00 pm, Dec. 20, 2023

Venue: Shuangqing Complex Building C654

Abstract:

We develop new tools to quantify uncertainty in optimal decision making and to gain insight into which variables one should collect information about given the potential cost of measuring a large number of variables. We investigate simultaneous inference to determine if a group of variables is relevant for estimating an optimal decision rule in a high-dimensional semiparametric framework. The unknown link function permits flexible modeling of the interactions between the treatment and the covariates but leads to nonconvex estimation in high dimension and imposes significant challenges for inference. We first establish that a local restricted strong convexity condition holds with high probability and that any feasible local sparse solution of the estimation problem can achieve the near-oracle estimation error bound. We verify that a wild bootstrap procedure based on a debiased version of the local solution can provide asymptotically honest uniform inference on optimal decision making.

Yunan Wu:

I am an Assistant Professor at the University of Texas at Dallas, Mathematical Sciences. I obtained my PhD degree in University of Minnesota at n 2020, School of Statistics under the guidance of Prof. Lan Wang. After that, I joined Yale University, School of Public Health, Biostatistics as a Postdoc Associate, working with Prof. Hongyu Zhao. My main research interests are causal inference in precision medicine and Mendelian randomization, non-parametric and semi-parametric analysis, and high dimensional analysis. I am also interested studying incorrupted data and machine learning techniques.

Title: Semiparametric adaptive estimation under informative sampling

Speaker：Jae Kwang Kim（Iowa State University）

Time：Fri, 15:30-16:30, Oct.13, 2023

Venue：Shuangqing Complex Building双清综合楼C654

Abstract:

In probability sampling, sampling weights are often used to remove the selection bias in the sample. The Horvitz-Thompson estimator is well-known to be consistent and asymptotically normally distributed; however, it is not necessarily efficient. This study derives the semiparametric efficiency bound for various target parameters by considering the survey weights as random variables and consequently proposes two semiparametric estimators with working models on the survey weights. One estimator assumes a reasonable parametric working model, but the other estimator requires no specific working models by using the debiased/double machine learning method. The proposed estimators are consistent, asymptotically normal, and can be efficient in a class of regular and asymptotically linear estimators. A limited simulation study is conducted to investigate the finite sample performance of the proposed method. The proposed method is applied to the 1999 Canadian Workplace and Employee Survey data.

Activities

统计讨论班 Statistics seminar