Introduction to Reinforcement Learning
Ajit Kumar
19 modules
English
Lifetime access
Reinforcement Learning is a major subtopic within AI. This course covers its basics.
Overview
Test Description
Modules
Udemy Section 4: Monte Carlo Method
4 attachments • 2 mins
ON Policy Monte Carlo Method
Constant Alpha - On Policy MC Method
Coding: Off-Policy MC Method
Correction: Modify the MC code acording to Barto and Sharbat algorithm
Task: Due date - Sept 16 th
Submission - Sept 18th
1 attachment
Did you code the MC method algorithm?
Udemy Section 5: Temporal Difference Methods
9 attachments • 4 mins
Video 52 Notes
Video 53 Notes
Visualize the updates in Q-table in the temporal difference methods.
SARSA
Coding Exercise: Code SARSA for the Maze Environment
Q-Learning
Coding Exercise: Code Q-learning for the Maze example
Coding Task: Comparision between SARSA & Qlearning
Question: What are some of the advantages of Temporal Difference Methods over Monte Carlo and dynamic programming?
Review Exercises
6 attachments
Create a new Environment: Maze2
Maze2: Dynamic Programming
Maze2: Monte Carlo coding
Maze2: SARSA coding exercise
Maze2: Q-learning Coding exercise
Comparison analysis
Udemy Section 6: N Step Bootstrapping
4 attachments • 1 mins
What is the algorithm of N-Step Bootstrapping methods?
How is MonteCarlo Method similar to N-Step bootrstapping method?
How is N Step bootstapping similar to SARSA method?
In video 66, there was discussion on increase in variance with in increase in N. Explain this in the context of Maze example.
Task Due Date: Sept 23
1 attachment • 1 mins
Submit codes for SARSA & Q Learning
Coding Exercise: Develop a Tic Tac Toe player
2 attachments
Develop an Env which plays Tic-Tac-Toe against you
Use any of the method: MC/SARSA/N-step so that it learns to play intelligently.
Section 7: Cotinuous State Spaces
2 attachments • 2 mins
Coding Exercise: Run Cartpole, Acrobot, MountainCar, Pendulum Examples
Basics of Gym Environment
Udemy Section 7: Continuous State
3 attachments • 3 mins
Question: Give an example of a cotinuous state MDP.
🔎Understanding the Moundtain environment
Question: Can we apply the SARSA, Monte Carlo, methods on continuous problems?
Deep Learning - Basics
6 attachments • 4 mins
What is the objective of deep learning?
What is Mathematical meaning of Neural Network?
What is the meaning of Neural Network "training"?
Pytorch Coding Example
Coding: Manual hyperparamter tuning
Coding: Hyperparameter Tuning with Optuna
Section 9 & 10: Deep RL
6 attachments • 6 mins
TASK: Memorize DEEP SARSA algorithm
CODING: Run the DeepSarsa code for the Mountain Car env
TASK: Memorize DEEP-Q algorithm
CODING: Run the Deep-Q code for the Mountain Car env
CODING: Run the Deep-Q code for the CartPole env
TASK: Spot theoretical difference between Deep-Sarsa & Deep-Q algorithm
Using RL Library
5 attachments • 1 mins
Run simple Gym Environments
Train Rllib trained agent on Cartpole
Animate with trained agent
Observe training history on Tensorboard
Hyperparameter Training on Ray tune
CAPSTONE DEEP REINFORCEMENT LEARNING
1 attachment • 1 mins
CODE an agent which can learn to navigate non-constant Maze
Section 11: REINFORCE
13 attachments • 11 mins
What are Policy Gradient Methods?
Question: What is Stochastic Policy?
SOFTMAX function
Coding Exercise: Write a python function to evaluate softmax activation.
How to compare policy performance in REINFORCE?
Why parallel learning?
Using Entropy to Incentivise exploration
TASK: Memorize REINFORCE algorithm
Coding: Run the REINFORCE code for Carpole Example
Coding: Run the REINFORCE code for Double Pendulum Example
Coding: Run the REINFORCE code for Mountain-Car Example
Coding: Create a new simple environment & run the REINFORCE algo on it.
Note: Identify the number of core in your machine, and use them all for parallelization.
Section 10: A2C: Advantage Actor Critic
3 attachments • 1 mins
Task: Memorize A2C algorithm
Code: Run the A2C code for cartpole, MountainCar, Double Pendulum
Task: Create your own simple environment and implment A2C on that environment
Do a Comparison Study between REINFORCE & A2C
1 attachment
Coding Task: On the cartpole example, which method between A2C and REINFORCE is better?
Capstone: Apply RL algorithm on a simple stock market trading learning
Tensortrade
4 attachments • 2 mins
Understand Streams
Simplest Stream + Datafeed code
Create a simple Environment Using RELIANCE two year data
Run the debugger. Understand the action. And reward mechanism
FAQs
How can I enrol in a course?
Enrolling in a course is simple! Just browse through our website, select the course you're interested in, and click on the "Enrol Now" button. Follow the prompts to complete the enrolment process, and you'll gain immediate access to the course materials.
Can I access the course materials on any device?
Yes, our platform is designed to be accessible on various devices, including computers, laptops, tablets, and smartphones. You can access the course materials anytime, anywhere, as long as you have an internet connection.
How can I access the course materials?
Once you enrol in a course, you will gain access to a dedicated online learning platform. All course materials, including video lessons, lecture notes, and supplementary resources, can be accessed conveniently through the platform at any time.
Can I interact with the instructor during the course?
Absolutely! we are committed to providing an engaging and interactive learning experience. You will have opportunities to interact with them through our community. Take full advantage to enhance your understanding and gain insights directly from the expert.
About the creator
Ajit Kumar
Assistant Professor
Dept of Mathematics
Shiv Nadar University
Rate this Course
Free
Order ID:
This course is in your library
What are you waiting for? It’s time to start learning!
Wait up!
We see you’re already enrolled in this course till Lifetime. Do you still wish to enroll again?