About:

This project is concerned with training a Proximal Policy Optimization based parallel DRL control policy for control of a hexapod robot lovingly named Yuna (based on a game character with different eye colors since that’s what Yuna’s LEDs look like :p). The algorithm is based on the following mind-blowing paper:

Learning to Walk in Minutes Using Massively Parallel Deep...

But with a few modification, a lot of interesting observations presented in the chronological order of issues faced and their solutions. Hope you have as much fun reading about the details as I had while embarking on this journey :)

The control policy trained on Isaac Gym

The control policy trained on Isaac Gym

Hardware Experiments:

https://www.youtube.com/watch?v=CqyE6Llzk9M

https://www.youtube.com/watch?v=rmp43W2D2XU

https://youtu.be/JBan5TcjP7o

Some relevant notes:

Isaac Gym Setup (Ubuntu 22.04/20.04)

Notes on: Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning

After the setup I could run the Isaac Gym environments (they look so interesting!) and I could import my robot Yuna in the legged gym environment

Untitled

Screenshot 2023-03-22 233828.png

However no matter how much I tuned or changed the reward terms Yuna just was not able to walk. Turns out the foot joint was set to collapse in the URDF file and some of the most important reward terms were not working as they were supposed to. Visualizing each of the reward terms in Tensor Board came in really handy and helped me save a lot of time I might have spent on random directions instead. Correct visualizations are huge time savers. So with some tuning and reward shaping I was finally able to train a decent policy:

https://youtu.be/M1z-uSiiC2c

Now before deploying the policy on hardware I felt like confirming things with sim-to-sim first. So after multiple rounds of debugging and comparing the observations of pybullet and Isaac Gym:

https://youtu.be/o_A-YhpkzHc

Hardware Deployment→

nse-8881766639798510112-1000016478.mp4

Now things were in motion. We were getting good results. However notice how all the legs perform the same in both simulations but some legs drag when deployed on hardware. Even after using dynamics randomization some sim to real issues were to be expected. There are always more intricacies involved in the hardware. Now the major suspect was the PID tuning of each motors.