Machine Learning & AI

16 posts

Sort:

Reinforcement Learning in LLMs - Why and How

From imitation to optimization: when LLMs need RL, how verifiable rewards unlock reasoning, and a minimal GRPO playbook.

September 18, 2025•9 min read

Why High-Dimensional Gaussians Feel Like Soap Bubbles

Concentration of measure pushes Gaussian samples onto a thin shell—here's the intuition, the math, and why typicality matters for generative models.

September 15, 2023•6 min read

From Transformers to ChatGPT

This note provides a high-level summary of the progress in large language models (LLMs) covering major milestones from Transformers to ChatGPT. The note serves as a fast-paced recap for readers to catch up on this field quickly.

December 29, 2022•36 min read

Exponential-Min and Gumbel-Max

Exponential-min and Gumbel-max tricks for reformulating sampling from a discrete distribution as argmin and argmax, making the sampling operation differentiable.

January 1, 2019•2 min read

Expectation-Maximization Algorithm in 10 Minutes

A quick walk-through of Expectation-Maximization (EM) algorithm and its cousins.

December 15, 2017•11 min read

From PPO to DPO (and GRPO)

PPO made RLHF work; DPO made it simple. This post derives DPO from PPO, explains why it’s a supervised alternative (not RL), where it shines, and where RL/GRPO still helps.

September 17, 2025•5 min read

Reparameterization vs REINFORCE

You know how to differentiate through a function—but how do you differentiate through a sampling step? Two estimators: score‑function (REINFORCE) and pathwise (reparameterization); pathwise backpropagates through the sampling transform with lower variance.

September 17, 2025•4 min read

Speculative Decoding: Exact Speedups with Draft Models

Can we speed up generation without changing the distribution? A small draft model proposes, the big model accepts/rejects—yielding exact samples, faster.

September 17, 2025•3 min read

Maximum Entropy and Maximum Likelihood

Why that particular sigmoid in logistic regression? This short post shows how simple moment constraints lead to exponential families (MaxEnt chooses the model) and how MLE fits them.

September 16, 2025•6 min read

Diffusion Models

This is the first post of hopefully a series of post walking through diffusion models. This post will introduce the foundations, focusing on two foundational papers, that many other papers built upon.

June 6, 2023•5 min read

Building LLM-Powered Products

This is a quick note to discuss a few topics below related to building LLM-powered products and applications, such as how to let LLM use tools and become autonomous agents, how to incorporate domain adaptation, and the production hurdles.

April 23, 2023•8 min read

How Does Auto-GPT Work?

In this note, we'll take a look at how Auto-GPT work and discuss LLM's ability to do explicit reasoning and to become an autonomous agent. We'll touch upon a few related works such as WebGPT, Toolformer, and Langchain.

April 9, 2023•5 min read

Recent Progress in Language Modeling

This page is a high-level summary / notes of various recent results in language modeling with little explanations

October 9, 2018•3 min read

NLP Starter Resources

A list of starter resources for Natural Language Processing (NLP), mostly with deep learning.

June 30, 2018•1 min read

Recent Progress in Neural Variational Inference

A literature survey of recent papers on Neural Variational Inference (NVI) and its application in topic modeling.

March 8, 2018•1 min read

A Brief Survey of Generative Models

A high-level summary of various generative models including Variational Autoencoders (VAE), Generative Adverserial Networks (GAN), and their notable extentions and generalizations, such as f-GAN, Adversarial Variational Bayes (AVB), Wasserstein GAN, Wasserstein Auto-Encoder (WAE), Cramer GAN and etc

December 20, 2017•7 min read

End of posts • 16 posts