【2025春】Applied Regression Analysis 应用回归分析-清华大学数学科学中心

【2025春】Applied Regression Analysis 应用回归分析

任课教师 Speaker：Prof. Per Johansson

时间 Time：Tues. & Thur., 13:30-15:05, Feb. 18-June 5, 2025

地点 Venue：Lecture Hall B725, Shuangqing Complex Building A

文章类型：无

Note:
Apr. 30-May 5: Labor Day Holiday, no classes.

Lecture on Apr. 29 will be cancelled.

The objective is that students, after completing the course, should be able to independently conduct simpler statistical studies, interpret the results in statistical investigations, and critically assess their credibility. To achieve this, the course combines:

· Statistical theory: to reinforce previous knowledge in statistical theory (inference) with a focus on regression models for description, prediction and causal inferences.

· Statistical programming in R and Python: for organizing and managing data as well as data analysis.

· Use of subject knowledge in relation to statistical analyses: practical examples related to, for example, economics, political science, and medicine.

A student who has completed the course should:

· Have the ability to judiciously choose an appropriate approach for a statistical analysis of data.

· Be able to use the programming language R to independently read in, organize, and process data.

· Be able to use the programming language R to sensible apply a regression analysis

· Be able to critically review research reports based on statistical data.

Textbook(s):

Regression with R and Python - an introduction (At the moment: A 350 pages compendium)

Authors: Per Johansson and Mattias Nordin

Prerequisites:

The course is intended for students studying statistics at bachelor's level or for students studying other subjects where regression is used for empirical analyzes at both master's and postgraduate level. We assume that the students have a basic course in statistics and is thus familiar with elementary probability theory and inference theory. We also assumed the students knows the most basic basics of how R is used. For those who have no prior experience with R we organize an introduction to R juts before the course starts.

Course Description：

The content of the course is to introduces linear regression and how it can be used to answer questions related to description, prediction, and analysis of causal relationships. Both simple and multiple linear regression are described, as well as how nonlinear relationships can be estimated. Furthermore, logistic regression for binary outcomes and time series analysis will be covered.

The book comes with a set of data materials that are freely available. In the book there are a number of red boxes with R code that use these data materials. In order to get as much as possible out of the book. We recommend the students to go through the R code with the different datasets to reproduce the analyzes presented in the red boxes.

Syllabus:

1 Introduction (2 hours)

1.1 Description.

1.2 Prediction

1.3 Causality

2 Covariation in Samples (2 hours)

2.1 Pearson’s Correlation.

2.2 Spearman’s Rank Correlation Coefficient.

2.3 Simple linear regression.

3 Basic Probability Theory and Statistical Inference (repetition /4 hours)

3.1 Probabilities for Events

3.2 Stochastic Variables

3.3 Estimators

3.4 Hypothesis testing and confidence intervals....

4 Correlation and inference to a population (2 hours)

4.1 Uncorrelated variables can be dependent

4.2 Sampling, population and inference

5 The linear regression model (6 hours)

5.1 Interpretation of the coefficients in the linear regression

5.2 Uncertainty and the linear regression model

5.3 Inference to the population with spherical error term

5.4 Asymptotic inference to the population

6 Multiple Linear Regression (6 hours)

6.1 The general multiple linear regression model

6.2 Regression anatomy

6.3 Coefficient of determination in multiple linear regression

6.4 Regression anatomy and standard error

6.5 F-test

6.6 Uncertainty of the conditional expected value

6.7 Residual Analysis

7 Nonlinear functional form (6 hours)

7.1 Qualitative independent variables and collinearity

7.2 Logarithms

7.3 Interactions

7.4 Example: Relationship between weather and air pollution

8 Regression analysis with dependent error terms (6 hours)

8.1 Clustered data

8.2 Panel data

8.3 Analysis and inference with clustering and panel data

8.4 Presentation of regression results with fixed effects

9 Binary dependent variable (6 hours)

9.1 The logistic regression model

9.2 Estimation and Inference under logistic regression mode

9.3 Example: The risk of having a heart disease

9.4 Maximum likelihood-estimation

9.5 Logistic regression or linear probability model?

10 Prediction (6 hours)

10.1 Overfitting

10.2 Training data, test data and cross-validation

10.3 Predictions with many independent variables

10.4 Tree-based regression models

10.5 Example: Housing prices

11 Time series analysis (6 hours)

11.1 The different parts of macro time series

11.2 Stationarity

11.3 Autoregressive regression

11.4 Predictions for transformed variables

11.5 Granger causality

11.6 Deterministic or stochastic trend?

12 Causal analyzes (10 hours)

12.1 Causality and Regression

12.2 The randomized experiment

12.3 Quasi-experiment

12.4 Rubin’s Causal Model

12.5 Difference-in-difference

12.6 Difference-within-the-difference

12.7 Instrument variable approach

12.8 The regression discontinuity approach

Course language: English

Registration: https://www.wjx.top/vm/hnyKjL0.aspx#

学术活动

【2025春】Applied Regression Analysis 应用回归分析