Statistical Seminar

报告人 Speaker:胡祺睿,Chen Cheng
组织者 Organizer:吴宇楠
时间 Time:11:00-12:00, Dec. 16, 2024; 16:00-17:00, Dec. 17, 2024
地点 Venue:Shuangqing Complex Building A; Zoom Meeting ID: 271 534 5558 Passcode: YMSC

Upcoming talks:

 

2024-12-16/17

Zoom Meeting ID: 271 534 5558  Passcode: YMSC
https://us06web.zoom.us/j/2715345558?pwd=eXRTTExpOVg4ODFYellsNXZVVlZvQT09


时间:12 月 16 日 11:00~12:00 

地点:双清综合楼C548

报告人:胡祺睿,清华大学统计学研究中心

报告题目:  Simultaneous Inference for Eigensystems and FPC Scores of Functional Data

摘要:

Functional data analysis has become a pivotal field in statistics, emphasizing data represented by functions rather than scalar values. Although significant progress has been made in estimating fundamental elements such as mean and covariance functions, simultaneous inference for eigensystems and functional principal component (FPC) scores remains challenging.
In this talk, we introduce novel methodologies for the simultaneous inference of eigensystems and the distribution of FPC scores in densely observed functional data, along with the asymptotic properties, especially holding in C[0,1] and for a diverging number of estimators. We validate our approaches through simulations and apply them to electroencephalogram (EEG) data, demonstrating their practical utility in testing hypotheses related to FPCs and the distribution of FPC scores. Finally, we discuss extensions to two-dimensional functional data, functional time series, and a the unified theory bridging sparse and dense functional data.

 

时间:12 月 17 日 16:00~17:00

地点:双清综合楼C548

报告人: Chen Cheng,  Statistics Department in Stanford University

报告题目:Towards modern datasets: laying mathematical foundations to streamline machine learning

摘要:

Datasets are central to the development of statistical learning theory, and the evolution of models. The burgeoning success of modern machine learning in sophisticated tasks crucially relies on the vast growth of massive datasets (cf. Donoho), such as ImageNet, SuperGLUE and Laion-5b. However, such evolution breaks standard statistical learning assumptions and tools.
In this talk, I will present two stories tackling challenges modern datasets present, and leverage statistical theory to shed insight into how should we streamline modern machine learning.
In the first part, we study multilabeling—a curious aspect of modern human-labeled datasets that is often missing in statistical machine learning literature. We develop a stylized theoretical model to capture uncertainties in the labeling process, allowing us to understand the contrasts, limitations and possible improvements of using aggregated or non-aggregated data in a statistical learning pipeline. In the second part, I will present novel theoretical tools that are not simply convenient from classical literature, such as random matrix theory under proportional regime. Theoretical tools for proportional regime are crucially helpful in understanding “benign-overfitting” and “memorization”. This is not always the most natural setting in statistics where columns correspond to covariates and rows to samples. With the objective to move beyond the proportional asymptotics, we revisit ridge regression (ℓ2-penalized least squares) on i.i.d. data X ∈ Rn×d, y ∈ Rn. We allow the feature vector to be infinite-dimensional (d= ∞), in which case it belongs to a separable Hilbert space.

 



Past talks:



时间:11 月 29 日,16:00-17:00 

地点:双清综合楼C654

报告人:李伟,中国人民大学统计学院副教授

报告题目:Discovery and inference of possibly bi-directional causal relationships with invalid instrumental variables

摘要:Learning causal relationships between pairs of complex traits from observational studies is of great interest across various scientific domains. However, most existing methods assume the absence of unmeasured confounding and restrict causal relationships between two traits to be uni-directional, which may be violated in real-world systems. In this paper, we address the challenge of causal discovery and effect inference for two traits while accounting for unmeasured confounding and potential feedback loops. By leveraging possibly invalid instrumental variables, we provide identification conditions for causal parameters in a model that allows for bi-directional relationships, and we also establish identifiability of the causal direction under the introduced conditions. Then we propose a data-driven procedure to detect the causal direction and provide inference results about causal effects along the identified direction. We show that our method consistently recovers the true direction and produces valid confidence intervals for the causal effect. We conduct extensive simulation studies to show that our proposal outperforms existing methods. We finally apply our method to analyze real data sets from UK Biobank.   

 



时间:11 月 22 日,16:00-17:00 

地点:双清综合楼C654

报告人:孙玉莹,中国科学院数学与系统科学研究院副研究员

报告题目: Model averaging for time-varying vector autoregressions

摘要:This paper proposes a novel time-varying model averaging (TVMA) approach to enhancing forecast accuracy for multivariate time series subject to structural changes. The TVMA method averages predictions from a set of time-varying vector autoregressive models using optimal time-varying combination weights selected by minimizing a penalized local criterion. This allows the relative importance of different models to adaptively evolve over time in response to structural shifts. We establish an asymptotic optimality for the proposed TVMA approach in achieving the lowest possible quadratic forecast errors. The convergence rate of the selected time-varying weights to the optimal weights minimizing expected quadratic errors is derived. Moreover, we show that when one or more correctly specified models exist, our method consistently assigns full weight to them, and an asymptotic normality for the TVMA estimators under some regular conditions can be established. Furthermore, the proposed approach encompasses special cases including time-varying VAR models with exogenous predictors, as well as time-varying FAVAR models. Simulations and an empirical application illustrate the proposed TVMA method outperforms some commonly used model averaging and selection methods in the presence of structural changes.