This project is concerned with training a Proximal Policy Optimization based parallel DRL control policy for control of a hexapod robot lovingly named Yuna (based on a game character with different eye colors since that’s what Yuna’s LEDs look like :p). The algorithm is based on the following mind-blowing paper:
Learning to Walk in Minutes Using Massively Parallel Deep...
But with a few modification, a lot of interesting observations presented in the chronological order of issues faced and their solutions. Hope you have as much fun reading about the details as I had while embarking on this journey :)
The control policy trained on Isaac Gym
The control policy trained on Isaac Gym
https://www.youtube.com/watch?v=CqyE6Llzk9M
https://www.youtube.com/watch?v=rmp43W2D2XU
Some relevant notes:
Isaac Gym Setup (Ubuntu 22.04/20.04)
Notes on: Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning
After the setup I could run the Isaac Gym environments (they look so interesting!) and I could import my robot Yuna in the legged gym environment
However no matter how much I tuned or changed the reward terms Yuna just was not able to walk. Turns out the foot joint was set to collapse in the URDF file and some of the most important reward terms were not working as they were supposed to. Visualizing each of the reward terms in Tensor Board came in really handy and helped me save a lot of time I might have spent on random directions instead. Correct visualizations are huge time savers. So with some tuning and reward shaping I was finally able to train a decent policy:
Now before deploying the policy on hardware I felt like confirming things with sim-to-sim first. So after multiple rounds of debugging and comparing the observations of pybullet and Isaac Gym:
Hardware Deployment→
nse-8881766639798510112-1000016478.mp4
Now things were in motion. We were getting good results. However notice how all the legs perform the same in both simulations but some legs drag when deployed on hardware. Even after using dynamics randomization some sim to real issues were to be expected. There are always more intricacies involved in the hardware. Now the major suspect was the PID tuning of each motors.