Tech Briefs

Automated human activity recognition can provide clues about a subject’s intentions.

LIDAR is a partial 3D standoff sensing method that illuminates a target with rotatory or flash laser beams, analyzes the reflected lights, and provides both the distance to the target’s surface and the target’s surface shape. An array of laser reflections can be used to map the facing-side surface of a target object as a partial point cloud. Unlike a 360° surface model generated by a traditional full body scanner, the partial point cloud from a LIDAR is a viewing angle dependent 3D representation of the target shape. The resolution of these maps depends on the density of the laser detector array; a good image of a human may require hundreds of detection pixels to capture enough detail to clearly detect changes in limb positions.

Simulated LIDAR Point Clouds

A LIDAR sensor capturing a series of human body poses over a period of time can provide clues as to the observed subject’s intent based on his/her activities. Human observers can often easily identify specific activities and sometimes make a reasonable guess regarding the subject’s intentions. However, computers do not as quickly, nor as easily, achieve the same results.

Automated human activity recognition typically requires development of complex machine learning algorithms whose performance depends on the size and representativeness of available training datasets. A large 3D shape database of human pose images with multiple human subjects and viewing angles would help greatly in the development of activity recognition software. With the release of low-cost range cameras, new datasets in the form of 3D depth images were generated for the purpose of human action analysis and recognition.

Recording a large number of LIDAR images of various human activities with actual research subjects would be unnecessarily difficult and time-consuming. While subjects could certainly be recorded performing activities of interest in a laboratory setting, the variety of angles from which they would have to be recorded would require a large amount of time spent recreating and recapturing said activities, unless one deploys a LIDAR sensor at every viewing angle.

However, multiple LIDARs not only are expensive to acquire but also can cause interference among each other. In addition to being time-consuming, working with live subjects carries the risk that subjects would not be able to replicate their poses consistently over numerous trials. Alternatively, if previously captured 3D body scan and motion capture data were available to recreate a human subject’s pose and activity for evaluation within a virtual LIDAR research environment, these limitations associated with live subject LIDAR research could be avoided, while preserving the authenticity of the poses and activities being studied.

A study completed in the AFRL 711 HPW/RHXB’s 3D Human Signatures Laboratory (3DHSL) facility recorded the needed sets of scan and motion capture data. A virtual laboratory was created wherein a 3D digital model of a human subject, animated based on his/her motion capture data, was introduced for synthetic LIDAR image generation. Since the original human subject’s shape and motion were both captured in the 3D digital model, the virtual model can easily be rotated in increments through 360° to capture the simulated LIDAR images corresponding to different viewing angles.

A software script called Shadows (version 1.5.2) automates the process so that after a brief setup, the computer runs the image data collection largely unattended. This is more efficient and repeatable than using a live human subject throughout the data collection. Moreover, if there are any errors in the data collection, it can be regenerated easily. The main mechanism of generating the simulated LIDAR image is based on the orthographic ray tracing which traces a ray along the path defined by a mesh vertex normal and stores an array comprised of the locations where (i.e., upon which polygon) the simulated ray “hits” the human model’s outer surface mesh. In this dataset, the resolution of the simulated LIDAR images is roughly 100-by-100 pixels, which is in the range of typical commercial flash LIDAR.

This hybrid experimental/virtual approach enables us to generate partial surface point clouds with a complete spherical coverage of viewing angles along different azimuths and elevations. We can also create the same point clouds at different scales to simulated LIDAR images of distant human targets. Unlike many common avatar animations produced by artists, each of our action simulations is individualized with respect to one of the human test subjects. The accompanying figure shows two examples of such point cloud patches.

Sixty-eight human subjects whose data were used for SLI generation were scanned and motions captured in the AFRL 711th HPW/RHXB 3DHSL facility. The raw data collection process consists of two parts – whole-body scans and optical marker-based motion capture. A whole body scanner photometric system with nine camera pods was used to capture the subject’s shape in a standing pose. Each pod wa comprised of two black and white cameras, one color camera, and one (or two) speckle pattern projectors.

The data from each individual pod is used to generate one continuous 3D point cloud, which is then merged with the data from the other camera pods to create a textured high-resolution 3D whole-body image. The subjects wear tight-fitting, stretchy clothing for a true body shape during scanning.

The three-dimensional mocap data are gathered using a passive-optical motion capture system, which tracks a set of 68 retro-reflective markers affixed to a tight-fitting garment worn by the volunteer, based on a modified Helen Hayes type marker set. Marker trajectories are captured during the subject’s activity trials using 18 cameras. The subjects performed a variety of specified actions within the capture volume including digging, picking up and putting down an object, throwing, limping with a weighted ankle brace, and running. The motion capture volume is approximately 20 feet long, 15 feet wide, and eight feet high.

Using 3D scans of the human shapes to create mesh models, followed by animating the models with motion capture data, could yield consistent and repeatable digital human avatars. Coupled with an automatic orthogonal ray- tracing script, the virtual laboratory allows simulating LIDAR point clouds of human actions quickly and consistently for different viewing angles and scales, as well as for a large number of subjects. The simulated data can then be used in research on human pose shape retrieval and action recognition from singleview 3D point clouds.

This work was done by Jeanne Smith and Iaiah Davenport of Infoscitex Corp., and Huaining Cheng of the Air Force Research Laboratory for the Air Force Materiel Command. AFRL-0241.

This Brief includes a Technical Support Package (TSP).

SIMULATED LIDAR IMAGES OF HUMAN POSE USING A 3DS MAX VIRTUAL LABORATORY (reference AFRL-0241) is currently available for download from the TSP library.

Please Login at the top of the page to download.