Spatio-Temporal Deep Learning for Robotic Visuomotor Control

Research paper published by IEEE in the Proceedings of the 4th International Conference on Control, Automation and Robotics (ICCAR 2018) and presented at Auckland University Of Technology.

To perform accurate and smooth behaviors in dynamic environments with moving objects, robotic visuomotor control should include the ability to process spatio-temporal information. We propose a system that uses a spatio-temporal deep neural network (DNN), with video camera pixels as the only input, to handle all the visual perception and visuomotor control functions needed to perform robotic behaviors such as leader following. Our approach combines: (1) end-to-end deep learning for inferring motion control outputs from visual inputs, (2) multi-task learning for simultaneously producing multiple control outputs with the same DNN, and (3) spatio-temporal deep learning for perceiving motion across multiple video frames. We use driving simulations to quantitatively show that spatio-temporal DNNs increase driving accuracy and driving smoothness by improving machine perception of scene kinematics. Experiments conducted with mobile robots in a laboratory test track show real-time embedded systems performance comparable to human reaction times to visual stimuli, and indicate that a spatio-temporal deep learning robot is able to follow a leader for long periods of time, while keeping within lanes and avoiding obstacles.