This week I started my onboarding into the lab with Isaac’s help and by going through the the lab onboarding documents. I reviewed Phyton basics and did a few mini-projects mentioned in the onboarding documents with numpy and scipy to get familiar with them. I also got started on learning the basics of ROS which Isaac gave me a really good rundown on during lab hacking hours (held every Friday). I participated in a pilot study for Jindan’s research which looks to simulate a human oracle. Jidan is currently collecting data on the self-reported teaching policies that humans use when giving binary feedback to a robot that is learning to do a basic task and also data on what policy preferences are showcased in the actual feedback that they give the robot across a few episodes of trying to learn and execute the task. Jindan and I about how she decided on a framework to represent teaching policies and how it’s constantly modified based on new insights from the studies. While a simulated human oracle would be extremely helpful to improve time and cost efficiency in reinforcement learning studies, the feedback that you give a robot seems very task dependent. I am having a tough time imagining how that could be scaled to various tasks to really reap the benefits of it. Maybe modelling various teaching policies themselves would help as they seem to be more similar across tasks. I need to learn to google this field better. I tried “do teaching policies vary over robot task” and that did not help… One really cool thing that I noticed during the pilot study was how well Jindan grasped the meaning of what I was trying to convey and relayed it back to me to confirm. I am completely new to this field and did not know the right terminology to use and it’s my first time thinking about giving feedback to a robot. The structure of the different stages of the study had me asking new questions about my policy and my policy kept changing and soometimes I didn’t fully know how to word it but Jindan was really good with distilling the experience. I guess that’s what you become capable of when you spend a lot of time thinking about myriad aspects of a problem. I want to do that too! I also spoke to Hang about how he looked into reinforcement learning with non-binary feedback and how he broke down the feedback distribution to work with algorithms that work with binary feedback. I also got to attend the HRI labs’ reading group this week. Everyone reads the same paper (this week it was this) and brings their thoughts to the group. As I am new to the field I am still learning how to deconstruct a paper and hearing PhD students’ critiques of the paper was extremely insightful.
The lab had a welcome dinner for me right before hacking hours on Friday! Everyone is extremely kind and welcoming. I am grateful to be here.