Website: learn more about the LML seminar at https://sites.google.com/view/artanesiad/lml-seminar
1 June 2026 – David Pechersky, YMSC/BIMSA and Tsinghua University
Static Word Embeddings - A Precursor to ChatGPT
A word embedding is an embedding of English-language words (or any other language for that matter) into R^n for some large n. in such a way that the geometry of this embedding reflects the relationships between words. While this topic is now classical, the underlying ideas are important for understanding the architecture of modern large language models. The goal of this lecture will be to survey the landscape of the word embedding literature prior to the advent of ChatGPT.
25 May 2026 – Justin Yeh, Tsinghua University
Linear attention
This talk covers linear attention, which reduces the Transformer's complexity from quadratic to linear in sequence length. Standard attention scales poorly due to the softmax operation; linear attention replaces it with a kernelized dot product, enabling a more efficient computation order.
18 May 2026 – Artane Siad, YMSC and Tsinghua University
Scaling laws for LLMs
This talk will explore scaling laws in large language models.
11 May 2026 – Justin Yeh, Tsinghua University
The Transformer Upgrade Path: 1. Tracing the origins of Sinusoidal Encoding
This talk goes into the mathematical details of Su Jianlin's blog on positional embedding to establish the foundations necessary for understanding and later building Rotary Positional Encoding (RoPE) in subsequent talks.
18:00-19:00 in Jingzhai, Monday, April 27th
Speaker: Justin Yeh, Tsinghua University
Title: The Transformer Upgrade Path: 1. Tracing the origins of Sinusoidal Encoding
Abstract: This talk goes into the mathematical details of Su Jianlin's blog on positional embedding to establish the foundations necessary for understanding and later building Rotary Positional Encoding (RoPE) in subsequent talks.
Monday, 18:00-19:00, April 20th, 2026
Speaker: Justin Yeh, Tsinghua University
Title: From Seq2Seq to Transformer
Abstract: This talk introduces the foundational sequence-to-sequence (seq2seq) architecture and the attention mechanism that revolutionized it. We start with the encoder-decoder framework, covering training basics and simple models, then explain why attention is needed and how it works. From there, we build up to the Transformer, the modern workhorse of seq2seq. We also discuss practical essentials: subword segmentation (e.g., Byte Pair Encoding), inference methods like beam search, and finally touch on how we can analyze and interpret what these models have learned. The goal is to go through each component in technical detail, from the ground up making the ideas accessible to a beginner audience without glossing over how things actually work.
Marc Wegmann, Technical University of Munich
Monday April 13 18:00 - 19:00
Title: Implementing Reinforcement Learning in Production Planning: A Step-by-Step Methodology for Industrial Use Cases
Abstract: As traditional production planning and control (PPC) struggles with increasing volatility and complexity, Reinforcement Learning (RL) offers a path toward more adaptive and robust decision-making. This presentation provides a goal-oriented guide on how to practically implement RL in industrial settings, using a structured methodology to bridge the gap between theory and application. We walk through the critical steps of the implementation process: from identifying high-potential use cases to designing the essential "building blocks" (action space, reward function, and state space) tailored to specific industrial settings. The session concludes by addressing practical hurdles such as the "sim-to-real" gap, changing environmental conditions, and the need for transparency in industrial AI systems.
*Note* new website for the LML seminar at: https://sites.google.com/view/artanesiad/lml-seminar
This seminar will build a conceptual foundation in machine learning through careful reading of the classical literature. We will cover foundational work in reinforcement learning, large language models, and world models, picking up current developments as they arise naturally along the way.
The format will be one paper per week, read closely and discussed carefully. We will also engage with implementation — understanding how these systems are actually built.
As a companion activity, we will run regular Sunday hackathons — building real projects to develop hands-on fluency alongside the theoretical work.
We are interested in understanding machine learning from the ground up.