Reliable online recognition and prediction of human actions and activities in temporal sequences has many potential applications in a wide range of Army-relevant fields, ranging from video surveillance, warfighter assistance, human computer interface, intelligent humanoid robots, and unmanned and autonomous vehicles, to diagnosis, assessment and treatment of musculoskeletal disorders, etc. A computational approach for action prediction can extend these findings to machines and also promote further research in human prediction and intention sensing.

Apparently, a practical prediction system must output a rapid response for partial observations. This brings up a new challenge to the computational models and motivates machine learning researchers to make more progress. Moreover, action prediction will need to model temporal structures and may raise an important advance for action recognition. The underlying basic goal is to enhance the DoD’s capabilities of visual intelligence for leveraging automatic human activity understanding using 3D data acquisition platforms.

Using 2D visual information captured by single or multiple cameras for human activity recognition has been extensively studied and applied to real-world systems in the past decade. However, a remaining open problem is how to generalize existing models and frameworks to robust and viewpoint-independent recognition and even prediction of diverse human actions and activities in a real environment. Recent advances in 3D motion capture technology, 3D depth cameras using structured light or time-of-flight sensors, and 3D information recovery from 2D images/videos have provided commercially viable approaches and hardware platforms to capture 3D data in real-time and have been nurturing a potential breakthrough solution to such problems by using 3D data.

A computational approach for action prediction can extend their findings to machines and also promote further research in human prediction and intention sensing. Apparently, a practical prediction system must output a rapid response for partial observations. This brings up a new challenge to the computational models and motivates machine learning researchers to make more progress. Moreover, action prediction will need to model temporal structures and may raise an important advance for action recognition.

The 3D human data acquisition platform used for this research consists of a set of 3D motion capture sensors (e.g. Vicon) and a set of 3D cameras (e.g. Kinect) that are synchronized and integrated to cross-validate data acquisition, as shown in the accompanying figure. As illustrated in the computing (right) module, new methodologies of 3D motion reconstruction and 3D visual modeling will be developed to fill in the gap between vision and motion data and form the computational component to drive interactions. The gap between the middle level and low level data flow is filled by parametric and composable low-dimensional manifold representations. Such integrated data acquisition and methodologies will link the visual representations to quantitative biomechanical assessment of the human movements in the form of immersive activities, which aid the development of human models and assist in the progressive parametric refinement of modeling.

This work was done by Yun Fu of Northeastern University for the Army Research Office. For more information, download the Technical Support Package (free white paper) below. ARL-0243


This Brief includes a Technical Support Package (TSP).
3D Data Acquisition Platform for Human Activity Understanding

(reference ARL-0243) is currently available for download from the TSP library.

Don't have an account? Sign up here.



Aerospace & Defense Technology Magazine

This article first appeared in the October, 2021 issue of Aerospace & Defense Technology Magazine.

Read more articles from this issue here.

Read more articles from the archives here.