🎓How I Study AIHISA

📖Read

📄Papers 📰Blogs 🎬Courses

💡Learn

🛤️Paths 📚Topics 💡Concepts 🎴Shorts

🎯Practice

📝Daily Log 🎯Prompts 🧠Review

Search Settings

How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts9

Groups

📐Linear Algebra15 📈Calculus & Differentiation10 🎯Optimization14 🎲Probability Theory12 📊Statistics for ML9 📡Information Theory10 🔺Convex Optimization7 🔢Numerical Methods6 🕸Graph Theory for Deep Learning6 🔵Topology for ML5 🌐Differential Geometry6 ∞Measure Theory & Functional Analysis6 🎰Random Matrix Theory5 🌊Fourier Analysis & Signal Processing9 🎰Sampling & Monte Carlo Methods10 🧠Deep Learning Theory12 🛡️Regularization Theory11 👁️Attention & Transformer Theory10 🎨Generative Model Theory11 🔮Representation Learning10 🎮Reinforcement Learning Mathematics9 🔄Variational Methods8 📉Loss Functions & Objectives10 ⏱️Sequence & Temporal Models8 💎Geometric Deep Learning8

Category

🔷All ∑Math ⚙️Algo 🗂️DS 📚Theory

Level

All Beginner Intermediate

∑MathIntermediate

Discount Factor & Return

The discounted return G_t sums all future rewards but down-weights distant rewards by powers of a discount factor γ.

#discount factor#discounted return#reinforcement learning+12

📚TheoryIntermediate

RLHF Mathematics

RLHF turns human preferences between two model outputs into training signals using a probabilistic model of choice.

Group:

Reinforcement Learning Mathematics

#bradley-terry

#pairwise comparisons

+11

📚TheoryIntermediate

Exploration-Exploitation Tradeoff

The exploration–exploitation tradeoff is the tension between trying new actions to learn (exploration) and using the best-known action to earn rewards now (exploitation).

#multi-armed bandit#exploration exploitation#ucb1+12

📚TheoryIntermediate

Value Function Approximation

Value function approximation replaces a huge table of values with a small set of parameters that can generalize across similar states.

#reinforcement learning#value function approximation#linear function approximator+12

⚙️AlgorithmIntermediate

PPO & Trust Region Methods

Proximal Policy Optimization (PPO) stabilizes policy gradient learning by preventing each update from moving the policy too far from the previous one.

#ppo#trust region#trpo+11

⚙️AlgorithmIntermediate

Temporal Difference Learning

Temporal Difference (TD) Learning updates value estimates by bootstrapping from the next state's current estimate, enabling fast, online learning.

#temporal difference learning#td(0)#sarsa+12

∑MathIntermediate

Markov Decision Processes (MDP)

A Markov Decision Process (MDP) models decision-making in situations where outcomes are partly random and partly under the control of a decision maker.

#markov decision process#value iteration#policy iteration+12

📚TheoryAdvanced

Policy Gradient Theorem

The policy gradient theorem tells us how to push a stochastic policy’s parameters to increase expected return by following the gradient of expected rewards.

#policy gradient#reinforce#actor-critic+11

📚TheoryIntermediate

Bellman Equations

Bellman equations express how the value of a state or action equals immediate reward plus discounted value of what follows.

#bellman equation#value iteration#policy iteration+12