Statistical Seminar

Speaker:Wang Miao
Organizer:吴宇楠
Time:16:00-17:00, March 28, 2025
Venue:Shuangqing Complex Building A

Upcoming talks:


 

组织者:吴宇楠 

时间:3 月 28 日 16:00~17:00

地点:双清综合楼C654

报告人: 苗旺 副教授,北京大学概率统计系


报告题目:Causal inference for dyadic data in randomized experiments with interference

摘要:Estimating the treatment effect in a network is of particular interest in online experimentation conducted everyday in social media companies. We investigate a novel setting where the outcome of interest comprises a series of dyadic outcomes, such as forwarding a message or sharing a link between friends and international trade relation between countries.Dyadic outcomes are pervasive in many social network sources and of particular interest in online experimentation (A/B testing).We propose a causal inference framework for dyadic outcomes in randomized experiments in the presence of network interference, and develop consistent estimators of the global average causal effect.We derive the convergence rate and variance bound of the proposed estimators,and provide a  variance estimator that is conservative for quantifying the estimation uncertainty.We illustrate with a variety of numerical experiments and apply our approach to  an online experiment in Wechat Channels.

 



Past talks:


组织者:吴宇楠  

时间:3 月 21 日 16:00~17:00

地点:双清综合楼C654

报告人:王禹皓 助理教授,清华大学交叉信息研究院


报告题目:Residual permutation test for regression coefficient testing

摘要:We consider the problem of testing whether a single coefficient is equal to zero in linear models when the dimension of covariates p can be up to a constant fraction of sample size n. In this regime, an important topic is to propose tests with finite-population valid size control without requiring the noise to follow strong distributional assumptions. In this paper, we propose a new method, called residual permutation test (RPT), which is constructed by projecting the regression residuals onto the space orthogonal to the union of the column spaces of the original and permuted design matrices. RPT can be proved to achieve finite-population size validity under fixed design with just exchangeable noises, whenever p<n/2. Moreover, RPT is shown to be asymptotically powerful for heavy tailed noises with bounded (1+t)-th order moment when the true coefficient is at least of order n^{-t/(1+t)} for t \in [0,1]. We further proved that this signal size requirement is essentially rate-optimal in the minimax sense. Numerical studies confirm that RPT performs well in a wide range of simulation settings with normal and heavy-tailed noise distributions. 



 

组织者:吴宇楠 

时间:3 月 14 日 16:00~17:00

地点:双清综合楼C654

Speaker: Prof. Fan Li,  Department of Statistical Science, Duke University

Title:Interacted two-stage least squares with treatment effect heterogeneity

Abstract:Treatment effect heterogeneity with respect to covariates is common in instrumental variable (IV) analyses. An intuitive approach, which we term the interacted two-stage least squares (2SLS), is to postulate a linear working model of the outcome on the treatment, covariates, and treatment-covariate interactions, and instrument it by the IV, covariates, and IV-covariate interactions. We clarify the causal interpretation of the interacted 2SLS under the local average treatment effect (LATE) framework when the IV is valid conditional on covariates. Our contributions are threefold. First, we show that the interacted 2SLS with centered covariates is consistent for estimating the LATE if either of the following conditions holds: (i) the treatment-covariate interactions are linear in the covariates; (ii) the linear outcome model underlying the interacted 2SLS is correct. Second, we show that the coefficients of the treatment-covariate interactions from the interacted 2SLS are consistent for estimating treatment effect heterogeneity with regard to covariates among compliers if either condition (i) or condition (ii) holds. Moreover, we connect the 2SLS estimator with the reweighting perspective in Abadie (2003) and establish the necessity of condition (i) in the absence of additional assumptions on potential outcomes. Third, leveraging the consistency guarantees of the interacted 2SLS for categorical covariates, we propose a stratification strategy based on the IV propensity score to approximate the LATE and treatment effect heterogeneity with regard to the IV propensity score when neither condition (i) nor condition (ii) holds.




组织者:吴宇楠

时间:3 月 14 日 14:00~15:00

地点:双清综合楼C548

报告人:喻达磊教授,西安交通大学数学与统计学院


Title:Unified optimal model averaging with a general loss function based on cross-validation

Abstract:Studying unified model averaging estimation for situations with complicated data structures, we propose a novel model averaging method based on cross-validation (MACV). MACV unifies a large class of new and existing model averaging estimators and covers a very general class of loss functions. Furthermore, to reduce the computational burden caused by the conventional leave-subject/one-out cross validation, we propose a SEcond-order-Approximated Leave-one/subject-out (SEAL) cross validation, which largely improves the computation efficiency. As a useful tool, we extend the Bernstein-type inequality for strongly mixing random variables that are not necessarily identically distributed. In the context of non-independent and non-identically distributed random variables, we establish the unified theory for analyzing the asymptotic behaviors of the proposed MACV and SEAL methods, where the number of candidate models is allowed to diverge with sample size. To demonstrate the breadth of the proposed methodology, we exemplify four optimal model averaging estimators under four important situations, i.e., longitudinal data with discrete responses, within-cluster correlation structure modeling, conditional prediction in spatial data, and quantile regression with a potential correlation structure. We conduct extensive simulation studies and analyze real-data examples to illustrate the advantages of the proposed methods.




2024-12-16/17

 

Zoom Meeting ID: 271 534 5558  Passcode: YMSC
https://us06web.zoom.us/j/2715345558?pwd=eXRTTExpOVg4ODFYellsNXZVVlZvQT09


时间:12 月 16 日 11:00~12:00

地点:双清综合楼C548

报告人:胡祺睿,清华大学统计学研究中心

报告题目:  Simultaneous Inference for Eigensystems and FPC Scores of Functional Data

摘要:

Functional data analysis has become a pivotal field in statistics, emphasizing data represented by functions rather than scalar values. Although significant progress has been made in estimating fundamental elements such as mean and covariance functions, simultaneous inference for eigensystems and functional principal component (FPC) scores remains challenging.
In this talk, we introduce novel methodologies for the simultaneous inference of eigensystems and the distribution of FPC scores in densely observed functional data, along with the asymptotic properties, especially holding in C[0,1] and for a diverging number of estimators. We validate our approaches through simulations and apply them to electroencephalogram (EEG) data, demonstrating their practical utility in testing hypotheses related to FPCs and the distribution of FPC scores. Finally, we discuss extensions to two-dimensional functional data, functional time series, and a the unified theory bridging sparse and dense functional data.

 

时间:12 月 17 日 16:00~17:00

地点:双清综合楼C548

报告人: Chen Cheng,  Statistics Department in Stanford University

报告题目:Towards modern datasets: laying mathematical foundations to streamline machine learning

摘要:

Datasets are central to the development of statistical learning theory, and the evolution of models. The burgeoning success of modern machine learning in sophisticated tasks crucially relies on the vast growth of massive datasets (cf. Donoho), such as ImageNet, SuperGLUE and Laion-5b. However, such evolution breaks standard statistical learning assumptions and tools.
In this talk, I will present two stories tackling challenges modern datasets present, and leverage statistical theory to shed insight into how should we streamline modern machine learning.
In the first part, we study multilabeling—a curious aspect of modern human-labeled datasets that is often missing in statistical machine learning literature. We develop a stylized theoretical model to capture uncertainties in the labeling process, allowing us to understand the contrasts, limitations and possible improvements of using aggregated or non-aggregated data in a statistical learning pipeline. In the second part, I will present novel theoretical tools that are not simply convenient from classical literature, such as random matrix theory under proportional regime. Theoretical tools for proportional regime are crucially helpful in understanding “benign-overfitting” and “memorization”. This is not always the most natural setting in statistics where columns correspond to covariates and rows to samples. With the objective to move beyond the proportional asymptotics, we revisit ridge regression (ℓ2-penalized least squares) on i.i.d. data X ∈ Rn×d, y ∈ Rn. We allow the feature vector to be infinite-dimensional (d= ∞), in which case it belongs to a separable Hilbert space. 

 



时间:11 月 29 日,16:00-17:00 

地点:双清综合楼C654

报告人:李伟,中国人民大学统计学院副教授

报告题目:Discovery and inference of possibly bi-directional causal relationships with invalid instrumental variables

摘要:Learning causal relationships between pairs of complex traits from observational studies is of great interest across various scientific domains. However, most existing methods assume the absence of unmeasured confounding and restrict causal relationships between two traits to be uni-directional, which may be violated in real-world systems. In this paper, we address the challenge of causal discovery and effect inference for two traits while accounting for unmeasured confounding and potential feedback loops. By leveraging possibly invalid instrumental variables, we provide identification conditions for causal parameters in a model that allows for bi-directional relationships, and we also establish identifiability of the causal direction under the introduced conditions. Then we propose a data-driven procedure to detect the causal direction and provide inference results about causal effects along the identified direction. We show that our method consistently recovers the true direction and produces valid confidence intervals for the causal effect. We conduct extensive simulation studies to show that our proposal outperforms existing methods. We finally apply our method to analyze real data sets from UK Biobank.   

 



时间:11 月 22 日,16:00-17:00 

地点:双清综合楼C654

报告人:孙玉莹,中国科学院数学与系统科学研究院副研究员

报告题目: Model averaging for time-varying vector autoregressions

摘要:This paper proposes a novel time-varying model averaging (TVMA) approach to enhancing forecast accuracy for multivariate time series subject to structural changes. The TVMA method averages predictions from a set of time-varying vector autoregressive models using optimal time-varying combination weights selected by minimizing a penalized local criterion. This allows the relative importance of different models to adaptively evolve over time in response to structural shifts. We establish an asymptotic optimality for the proposed TVMA approach in achieving the lowest possible quadratic forecast errors. The convergence rate of the selected time-varying weights to the optimal weights minimizing expected quadratic errors is derived. Moreover, we show that when one or more correctly specified models exist, our method consistently assigns full weight to them, and an asymptotic normality for the TVMA estimators under some regular conditions can be established. Furthermore, the proposed approach encompasses special cases including time-varying VAR models with exogenous predictors, as well as time-varying FAVAR models. Simulations and an empirical application illustrate the proposed TVMA method outperforms some commonly used model averaging and selection methods in the presence of structural changes.