Unmanned aerial vehicles (UAVs) rely on an external motion-capture apparatus that gives the vehicles almost perfect state information at high rates. Major challenges in gathering sensing data necessary for flight are the limited payload, computation, and battery life of the vehicles. Lightweight cameras are a good solution, but require computationally efficient machine vision algorithms that can run within the limits of these vehicles.

By detecting at a single depth (dark blue) and integrating the aircraft’s odometry and past detections (lighter blue), a full map of obstacles in front of the vehicle can be built quickly.

A novel method for stereo vision computation was developed that is dramatically faster than the state of the art. The method performs a subset of the processing traditionally required for stereo vision, but is able to recover obstacles in real time at 120 frames per second (fps) on a conventional CPU. The system is lightweight and accurate enough to run in real time on aircraft, allowing for true, self-contained obstacle detection.

A standard block-matching stereo system produces depth estimates by finding pixel-block matches between two images. Given a pixel block in the left image, for example, the system will search through the epipolar line to find the best match. The position of the match relative to its coordinate on the left image, or the disparity, allows the user to compute the 3D position of the object in that pixel block.

One can think of a standard block-matching stereo vision system as a search through depth. As one searches along the epipolar line for a pixel group that matches the candidate block, the space of distance away from the cameras is explored. For example, given a pixel block in a left image, one might start searching through the right image with a large disparity, corresponding to an object close to the cameras. As one decreases disparity, pixel blocks that correspond to objects further and further away are examined until reaching zero disparity, where the stereo base distance is insignificant compared to the distance away and the obtstacle’s location can no longer be determined.

Aircraft hardware in the field. A small catapult is used for consistent launches near obstacles.

The algorithm is called “pushbroom stereo” because the detection region is pushed forward, sweeping up obstacles like a broom on a floor (and similar to pushbroom LIDAR systems). This is distinct from a “pushbroom camera,” which is a one-dimensional array of pixels arranged perpendicular to the camera’s motion. These cameras are often found on satellites and can be used for stereo vision.

The system requires relatively accurate odometry over short time horizons. This requirement is not particularly onerous because long-term accuracy is not required like many map-making algorithms. In this case, the odometry is only used until the aircraft catches up to its detection horizon, which on many platforms is 5-10 meters away. On aircraft, a wind-corrected airspeed measurement is sufficient.

A design and parameters were chosen to cause sparse detections with few false positives. For obstacle avoidance, not every point on an obstacle needs to be seen, but a false positive might cause the aircraft to take unnecessary risks to avoid a phantom obstacle.

To test the full system with an integrated state-estimator, the platform was flown close to obstacles on three different flights, with control inputs, sensor data, camera images, and onboard stereo processing results recorded. During each flight, points on every obstacle were recorded in real time. The state estimate was robust enough to provide online estimation of how the location of the obstacles evolved relative to the aircraft. While these flights were manually piloted, the system could autonomously avoid the obstacles with these data.

Metrics demonstrate that the pushbroom stereo system sacrificies a limited amount of performance for a substantial reduction in computational cost, and thus a gain in speed. Finally, all data used identical threshold, scoring, and camera calibration parameters.

This work was done by Andrew J. Barry and Russ Tedrake of Massachusetts Institute of Technology. MIT-0004