For nighttime surveillance, acquisition of visible light imagery is impractical due to the lack of illumination. Thermal imaging, which acquires mid-wave infrared or long-wave infrared radiation naturally emitted by the human body, can be utilized in low-light conditions to perform surveillance tasks. Identification of individuals captured by thermal imaging would significantly enhance nighttime intelligence gathering capabilities. However, government watch lists and databases almost exclusively contain visible-light face imagery of individuals of interest. Matching thermal face imagery to the existing databases therefore requires the development of across-modality face recognition algorithms and methods. Due to the large modality gap caused by the wavelength difference between visible and thermal radiation, thermal-to-visible face recognition is a challenging problem.

The Visible and Thermal Representations of the physical face as acquired by the visible and thermal sensor.
Face recognition has been an active area of research for the past two decades due to its wide range of applications in law enforcement and verification/authentication systems. The focus of face recognition has primarily been on visible (located in the 0.35 μm to 0.74 μm wavelength range) imagery. Some efforts have been devoted to face recognition using illumination invariant modalities such as infrared sensors.

The infrared spectrum consists of four main regions: near infrared (NIR; 0.74- 1μm), shortwave infrared (SWIR; 1-3 μm), mid-wave infrared (MWIR; 3-5 μm), and long-wave infrared (LWIR; 8- 14 μm). While NIR and SWIR are also referred to as reflected infrared, MWIR and LWIR are naturally emitted by the human body and commonly referred to as thermal IR. Due to the proximity of the NIR spectrum to the visible spectrum, NIR face images preserve much of the information as in visible face images. However, both NIR and SWIR require active illumination so it is not very practical for nighttime surveillance.

The natural emission of thermal IR from the human body makes it an ideal modality for nighttime tasks, but the large disparateness between the thermal IR and visible spectrums results in a wide modality gap that makes thermal-to-visible face recognition a significantly more challenging problem than the NIR-tovisible or SWIR-to-visible face recognition problems. The key to solving thermal- to-visible face recognition is the development of an algorithm or transform space that well-correlates the thermal and visible face signatures.

This work addressed the problem of matching thermal probe images to visible gallery images. The gallery imagery consists of visible images to simulate government watch lists, and the thermal IR probe imagery simulates suspect imagery acquired during nighttime surveillance operations. This face identification problem of matching thermal probe images to visible gallery images is cast as a multimodal face recognition problem. Although there are several previous studies dealing with acrossmodality NIR-to-visible face recognition, this work is the first in trying to match thermal face images to visible face images.

To tackle this problem, various preprocessing techniques were explored such as self-quotient images and difference- of-Gaussian filtering, as well as various feature transforms to reduce the variations in each domain and enhance the multi-modal matching. In addition, a discriminant modeling function is used to weight the feature vectors by maximizing covariance between two modalities using partial least squares (PLS) analysis.

Since thermal and visible face images have very different signatures, preprocessing is important in solving the thermal-tovisible face recognition problem. For this work, preprocessing consists of two main stages: thermal image normalization, and local variation reduction for thermal and visible imagery. The dead pixels within the thermal imagery were removed via simple median filtering prior to image normalization. As a first preprocessing step for thermal imagery, the thermal signatures are normalized by its mean and standard deviation to reduce the temperature offset and statistical variation across thermal images. The second preprocessing step adjusts the thermal and visible imagery for local variations. For visible imagery, illumination primarily induces the local variations, whereas for the thermal imagery, the varying heat distribution within the face produces the local variations. Self quotient image (SQI) and difference of Gaussian filtering (DOG) were applied to reduce the local variations in thermal face imagery. SQI emphasizes the edge information in the thermal imagery, while DOG filtering blurs the visible imagery.

The best combination is DOG filtering and HOG features. The reason that HOG with DOG performs the best is that DOG makes the images spatially smooth so the gradient information becomes more stable. LBP is sensitive to subtle pixel-wise differences, which was lost due to the spatial smoothing during preprocessing.

This work was done by Jonghyun Choi and Larry S. Davis of the University of Maryland, and Shuowen Hu and S. Susan Young of the Army Research Laboratory. ARL-0145

This Brief includes a Technical Support Package (TSP).
Thermal-to-Visible Face Recognition

(reference ARL-0145) is currently available for download from the TSP library.

Don't have an account? Sign up here.