Engineers increasingly rely on simulation to augment and, in some cases, replace costly and time consuming experimental work. However, current simulation capabilities are sometimes inadequate to capture phenomena of interest.

Reseachers undertook the task of simulating a light autonomous vehicle negotiating a pile of rubble.

In tracked vehicle analysis, for example, the interaction of the track with granular terrain has been difficult to characterize through simulation due to the prohibitively long simulation times associated with many-body dynamics problems. This is the generic name used in this case to characterize dynamic systems with a large number of bodies encountered, for instance, when one adopts a discrete representation of the terrain in vehicle dynamics problems.

To illustrate the versatility of the simulation capability, the vehicle was assumed to be equipped with a drilling device used to penetrate the terrain. Shown is a cut-away image of the drilling tool.
However, these many-body dynamics problems can now capitalize on recent advances in the microprocessor industry that are a consequence of Moore's law, of doubling the number of transistors per unit area roughly every 18 months. Specifically, until recently, access to massive computational power on parallel supercomputers has been the privilege of a relatively small number of research groups in a select number of research facilities, thus limiting the scope and impact of high performance computing (HPC).

This scenario is rapidly changing due to a trend set by general-purpose computing on graphics processing unit (GPU) cards. Nvidia's CUDA (compute unified device architecture) library allows the use of streaming multiprocessors available in high-end graphics cards. In this setup, a latest generation Nvidia GPU Kepler card reached 1.5 Teraflops by the end of 2012 owing to a set of 1536 scalar processors working in parallel, each following a SIMD (single instruction multiple data) execution paradigm.

Despite having only 1536 scalar processors, such a card is capable of managing tens of thousands of parallel threads at any given time. This over-committing of the GPU hardware resources is at the cornerstone of a computing paradigm that aggressively attempts to hide costly memory transactions with useful computation, a strategy that has led, in frictional contact dynamics simulation, to a one order of magnitude reduction in simulation time for many-body systems.

The challenge of using parallel computing to reduce simulation time and/or increase system size stems, for the most part, from the task of designing and implementing many-body dynamics specific parallel numerical methods. Designing parallel algorithms suitable for frictional contact manybody dynamics simulation remains an area of active research.

Magnitude of forces in one revolute joint after the track has dropped onto the flat surface. Transient behavior was observed when the torque was applied to the sprocket at 1s and the track shoe connected to this joint came into contact with the sprocket at 5s.
Some researchers have suggested that the most widely used commercial software package for multi-body dynamics simulation, which draws on a so-called penalty or regularization approach, runs into significant difficulties when handling simple problems involving hundreds of contact events, and thus cases with thousands of contacts become intractable. Unlike these penalty or regularization approaches where the frictional interaction is represented by a collection of stiff springs combined with damping elements that act at the interface of the two, the approach embraced by researchers at U.S. Army TARDEC and University of Wisconsin-Madison draws on a different mathematical framework.

Magnitude of force experienced by five revolute joints as their associated track shoes go around the sprocket.
Specifically, the parallel algorithms rely on time-stepping procedures producing weak solutions of the differential variational inequality (DVI) problem that describes the time evolution of rigid bodies with impact, contact, friction, and bilateral constraints. When compared to penalty methods, the DVI approach has a greater algorithmic complexity, but avoids the small time steps that plague the former approach.