Tech Briefs

Genetic program-based data mining is used for automated reverse engineering of a system.

When required to reverse-engineer a product, engineers often do not have design specifications for the system, and the machine may not be disassembled or invasively examined. The engineer might attempt to find the correct signal through trial and error, but this would be very time-consuming, and access to experimental resources is very expensive. To deal with this problem, a genetic program (GP)-based data mining (DM) procedure has been invented.

The first digital logic (DL) to be reverse-engineered using the Genetic Program (GP)-Based Data Mining procedure. This DL is not known to the GP. The GP only has access to a database of input signals to the DL and measured output, as well as a database of rules provided by experts for building the DL.
A genetic program is an algorithm based on the theory of evolution that automatically evolves populations of computer programs or mathematical expressions, eventually selecting one that is optimal in the sense it maximizes a measure of effectiveness, referred to as a fitness function. The system to be reverse-engineered is typically a sensor. The sensor is used to create a database of input signals and output measurements. Rules about the likely design properties of the sensor are collected from experts. The rules are used to create a fitness function for the genetic program. Genetic program-based data mining is then conducted. This procedure incorporates not only experts’ rules into the fitness function, but also the information in the database. The information extracted through this process is the internal design specifications of the sensor. The design properties extracted through this process can be used to design a signal that will produce a desired output. Determination of such signals can be essential to ultimate determination of control rules for automatic multiplatform coordination.

GPs require a terminal set and function set as inputs. The terminals are the actual variables of the problem. These can include a variable like “x” used as a symbol in building a polynomial and also real constants. The function set consists of a list of functions that can operate on the variables. When the GP is used as a data mining function, a database of input and output information is required. When the GP is used as a data mining function for evolving digital logic (DL), the database contains inputs to the DL as well as measured outputs. The experts’ opinions are manifested in the selection of the input and associated output to be included in the database. For the DL case, an additional form of input consisting of “rules” about DL construction is included.