To improve the safety of autonomous systems, MIT engineers have developed a system that can sense tiny changes in shadows on the ground to determine if there's a moving object coming around the corner.
Autonomous cars could one day use the system to quickly avoid a potential collision with another car or pedestrian emerging from around a building's corner or from in between parked cars. In the future, robots that may navigate hospital hallways to make medication or supply deliveries could use the system to avoid hitting people.
In a paper being presented at next week's International Conference on Intelligent Robots and Systems (IROS), the researchers describe successful experiments with an autonomous car driving around a parking garage and an autonomous wheelchair navigating hallways. When sensing and stopping for an approaching vehicle, the car-based system beats traditional LiDAR -- which can only detect visible objects -- by more than half a second.
That may not seem like much, but fractions of a second matter when it comes to fast-moving autonomous vehicles, the researchers say.
"For applications where robots are moving around environments with other moving objects or people, our method can give the robot an early warning that somebody is coming around the corner, so the vehicle can slow down, adapt its path, and prepare in advance to avoid a collision," adds co-author Daniela Rus, director of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science. "The big dream is to provide 'X-ray vision' of sorts to vehicles moving fast on the streets."
Currently, the system has only been tested in indoor settings. Robotic speeds are much lower indoors, and lighting conditions are more consistent, making it easier for the system to sense and analyze shadows.
Joining Rus on the paper are: first author Felix Naser SM '19, a former CSAIL researcher; Alexander Amini, a CSAIL graduate student; Igor Gilitschenski, a CSAIL postdoc; recent graduate Christina Liao '19; Guy Rosman of the Toyota Research Institute; and Sertac Karaman, an associate professor of aeronautics and astronautics at MIT.
Extending ShadowCam
For their work, the researchers built on their system, called "ShadowCam," that uses computer-vision techniques to detect and classify changes to shadows on the ground. MIT professors William Freeman and Antonio Torralba, who are not co-authors on the IROS paper, collaborated on the earlier versions of the system, which were presented at conferences in 2017 and 2018.
For input, ShadowCam uses sequences of video frames from a camera targeting a specific area, such as the floor in front of a corner. It detects changes in light intensity over time, from image to image, that may indicate something moving away or coming closer. Some of those changes may be difficult to detect or invisible to the naked eye, and can be determined by various properties of the object and environment. ShadowCam computes that information and classifies each image as containing a stationary object or a dynamic, moving one. If it gets to a dynamic image, it reacts accordingly.
Adapting ShadowCam for autonomous vehicles required a few advances. The early version, for instance, relied on lining an area with augmented reality labels called "AprilTags," which resemble simplified QR codes. Robots scan AprilTags to detect and compute their precise 3D position and orientation relative to the tag. ShadowCam used the tags as features of the environment to zero in on specific patches of pixels that may contain shadows. But modifying real-world environments with AprilTags is not practical.
The researchers developed a novel process that combines image registration and a new visual-odometry technique. Often used in computer vision, image registration essentially overlays multiple images to reveal variations in the images. Medical image registration, for instance, overlaps medical scans to compare and analyze anatomical differences.
Visual odometry, used for Mars Rovers, estimates the motion of a camera in real-time by analyzing pose and geometry in sequences of images. The researchers specifically employ "Direct Sparse Odometry" (DSO), which can compute feature points in environments similar to those captured by AprilTags. Essentially, DSO plots features of an environment on a 3D point cloud, and then a computer-vision pipeline selects only the features located in a region of interest, such as the floor near a corner. (Regions of interest were annotated manually beforehand.)
As ShadowCam takes input image sequences of a region of interest, it uses the DSO-image-registration method to overlay all the images from same viewpoint of the robot. Even as a robot is moving, it's able to zero in on the exact same patch of pixels where a shadow is located to help it detect any subtle deviations between images.
Next is signal amplification, a technique introduced in the first paper. Pixels that may contain shadows get a boost in color that reduces the signal-to-noise ratio. This makes extremely weak signals from shadow changes far more detectable. If the boosted signal reaches a certain threshold -- based partly on how much it deviates from other nearby shadows -- ShadowCam classifies the image as "dynamic." Depending on the strength of that signal, the system may tell the robot to slow down or stop.
"By detecting that signal, you can then be careful. It may be a shadow of some person running from behind the corner or a parked car, so the autonomous car can slow down or stop completely," Naser says.
Tag-Free Testing
In one test, the researchers evaluated the system's performance in classifying moving or stationary objects using AprilTags and the new DSO-based method. An autonomous wheelchair steered toward various hallway corners while humans turned the corner into the wheelchair's path. Both methods achieved the same 70-percent classification accuracy, indicating AprilTags are no longer needed.
In a separate test, the researchers implemented ShadowCam in an autonomous car in a parking garage, where the headlights were turned off, mimicking nighttime driving conditions. They compared car-detection times versus LiDAR. In an example scenario, ShadowCam detected the car turning around pillars about 0.72 seconds faster than LiDAR. Moreover, because the researchers had tuned ShadowCam specifically to the garage's lighting conditions, the system achieved a classification accuracy of around 86 percent.
Next, the researchers are developing the system further to work in different indoor and outdoor lighting conditions. In the future, there could also be ways to speed up the system's shadow detection and automate the process of annotating targeted areas for shadow sensing.