Implementing a system that can navigate through unfamiliar rooms and perform specific tasks has been realised through a collaboration between researchers at New York University and Meta. They used a robot known as ‘Stretch’, manufactured by Hello Robot, which features a mobile unit with a tall pole and a retractable arm. The robot was able to operate in 10 separate rooms across five different homes.
The researchers used the Record3D iPhone app in a room containing the robot to scan their surroundings. The app exploits the phone’s lidar system to create a 3D video, which could then be shared with the robot.
The system used by the robot, known as OK-Robot, processed an open-source AI object detection model on the video frames. This was aided by other open-source models to help the robot recognise objects in the room, such as a toy dragon, a tube of toothpaste, or a pack of playing cards, as well as locations within the room such as a chair, a table, or a trash can. The robot was then commanded to pick up a specified object and move it elsewhere. The robot’s pincer arm was successful in carrying out this task in 58.5% of cases, and this success rate increased to 82% in less cluttered rooms. It should be noted that this research has not been peer-reviewed as yet.
The significant advancements in AI capabilities recently, especially in the realms of language and computer vision, have given robotics researchers access to a wealth of open-source AI models and resources that were not available even a few years ago. This is according to Matthias Minderer, a Senior Computer Vision Research Scientist at Google DeepMind, who was not part of this project. He noted that it’s not common to solely depend on pre-existing models, making their successful implementation noteworthy.