Genetic Algorithm & Genetic Programming
A genetic algorithm (GA) is a search technique used in computing to find exact or approximate solutions to optimization and search problems. Genetic algorithms are categorized as global search heuristics. Genetic algorithms are a particular class of evolutionary algorithms (also known as evolutionary computation) that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover (also called recombination).
Methodology
Genetic algorithms are implemented as a computer simulation in which a population of abstract representations (called chromosomes or the genotype or the genome) of candidate solutions (called individuals, creatures, or phenotypes)
to an optimization problem evolves toward better solutions.
Traditionally, solutions are represented in binary as strings of 0s and
1s, but other encodings are also possible. The evolution usually starts
from a population of randomly generated individuals and happens in
generations. In each generation, the fitness of every individual in the
population is evaluated, multiple individuals are stochastically
selected from the current population (based on their fitness), and
modified (recombined and possibly randomly mutated) to form a new
population. The new population is then used in the next iteration of
the algorithm.
Commonly, the algorithm terminates when either a maximum number of
generations has been produced, or a satisfactory fitness level has been
reached for the population. If the algorithm has terminated due to a
maximum number of generations, a satisfactory solution may or may not
have been reached.
Genetic algorithms find application in bioinformatics, phylogenetics, computer science, engineering, economics, chemistry, manufacturing, mathematics, physics and other fields.
A typical genetic algorithm requires two things to be defined:
- a genetic representation of the solution domain,
- a fitness function to evaluate the solution domain.
A standard representation of the solution is as an array of bits.
Arrays of other types and structures can be used in essentially the
same way. The main property that makes these genetic representations
convenient is that their parts are easily aligned due to their fixed
size, that facilitates simple crossover operation. Variable length
representations may also be used, but crossover implementation is more
complex in this case. Tree-like representations are explored in Genetic programming and graph-form representations are explored in Evolutionary programming.
The fitness function is defined over the genetic representation and measures the quality of the represented solution. The fitness function is always problem dependent. For instance, in the knapsack problem
we want to maximize the total value of objects that we can put in a
knapsack of some fixed capacity. A representation of a solution might
be an array of bits, where each bit represents a different object, and
the value of the bit (0 or 1) represents whether or not the object is
in the knapsack. Not every such representation is valid, as the size of
objects may exceed the capacity of the knapsack. The fitness of
the solution is the sum of values of all objects in the knapsack if the
representation is valid, or 0 otherwise. In some problems, it is hard
or even impossible to define the fitness expression; in these cases, interactive genetic algorithms are used.
Once we have the genetic representation and the fitness function
defined, GA proceeds to initialize a population of solutions randomly,
then improve it through repetitive application of mutation, crossover,
inversion and selection operators.
Initialization
Initially many individual solutions are randomly generated to form
an initial population. The population size depends on the nature of the
problem, but typically contains several hundreds or thousands of
possible solutions. Traditionally, the population is generated
randomly, covering the entire range of possible solutions (the search space). Occasionally, the solutions may be "seeded" in areas where optimal solutions are likely to be found.
Selection
-
During each successive generation, a proportion of the existing population is selected to breed a new generation. Individual solutions are selected through a fitness-based process, where fitter solutions (as measured by a fitness function)
are typically more likely to be selected. Certain selection methods
rate the fitness of each solution and preferentially select the best
solutions. Other methods rate only a random sample of the population,
as this process may be very time-consuming.
Most functions are stochastic
and designed so that a small proportion of less fit solutions are
selected. This helps keep the diversity of the population large,
preventing premature convergence on poor solutions. Popular and
well-studied selection methods include roulette wheel selection and tournament selection.
Reproduction
-
The next step is to generate a second generation population of solutions from those selected through genetic operators: crossover (also called recombination), and/or mutation.
For each new solution to be produced, a pair of "parent" solutions
is selected for breeding from the pool selected previously. By
producing a "child" solution using the above methods of crossover and
mutation, a new solution is created which typically shares many of the
characteristics of its "parents". New parents are selected for each
child, and the process continues until a new population of solutions of
appropriate size is generated.
These processes ultimately result in the next generation population
of chromosomes that is different from the initial generation. Generally
the average fitness will have increased by this procedure for the
population, since only the best organisms from the first generation are
selected for breeding, along with a small proportion of less fit
solutions, for reasons already mentioned above.
Termination
This generational process is repeated until a termination condition has been reached. Common terminating conditions are
- A solution is found that satisfies minimum criteria
- Fixed number of generations reached
- Allocated budget (computation time/money) reached
- The highest ranking solution's fitness is reaching or has reached a
plateau such that successive iterations no longer produce better results
- Manual inspection
- Combinations of the above.
Pseudo-code algorithm
- Choose initial population
- Evaluate the fitness of each individual in the population
- Repeat
- Select best-ranking individuals to reproduce
- Breed new generation through crossover and mutation (genetic operations) and give birth to offspring
- Evaluate the individual fitnesses of the offspring
- Replace worst ranked part of population with offspring
- Until termination
Observations
There are several general observations about the generation of solutions via a genetic algorithm:
- In many problems, GAs may have a tendency to converge towards local optima or even arbitrary points rather than the global optimum
of the problem. This means that it does not "know how" to sacrifice
short-term fitness to gain longer-term fitness. The likelihood of this
occurring depends on the shape of the fitness landscape:
certain problems may provide an easy ascent towards a global optimum,
others may make it easier for the function to find the local optima.
This problem may be alleviated by using a different fitness function,
increasing the rate of mutation, or by using selection techniques that
maintain a diverse population of solutions, although the No Free Lunch
theorem proves that there is no general solution to this problem. A
common technique to maintain diversity is to impose a "niche penalty",
wherein, any group of individuals of sufficient similarity (niche
radius) have a penalty added, which will reduce the representation of
that group in subsequent generations, permitting other (less similar)
individuals to be maintained in the population. This trick, however,
may not be effective, depending on the landscape of the problem.
Diversity is important in genetic algorithms (and genetic programming) because crossing over a homogeneous population does not yield new solutions. In evolution strategies and evolutionary programming, diversity is not essential because of a greater reliance on mutation.
- Operating on dynamic data sets is difficult, as genomes begin to
converge early on towards solutions which may no longer be valid for
later data. Several methods have been proposed to remedy this by
increasing genetic diversity somehow and preventing early convergence,
either by increasing the probability of mutation when the solution
quality drops (called triggered hypermutation), or by occasionally introducing entirely new, randomly generated elements into the gene pool (called random immigrants). Recent research has also shown the benefits of using biological exaptation (or preadaptation) in solving this problem.[1] Again, evolution strategies and evolutionary programming
can be implemented with a so-called "comma strategy" in which parents
are not maintained and new parents are selected only from offspring.
This can be more effective on dynamic problems.
- GAs cannot effectively solve problems in which the only fitness
measure is right/wrong, as there is no way to converge on the solution.
(No hill to climb.) In these cases, a random search may find a solution
as quickly as a GA.
- Selection is clearly an important genetic operator, but opinion is
divided over the importance of crossover versus mutation. Some argue
that crossover is the most important, while mutation is only necessary
to ensure that potential solutions are not lost. Others argue that
crossover in a largely uniform population only serves to propagate
innovations originally found by mutation, and in a non-uniform
population crossover is nearly always equivalent to a very large
mutation (which is likely to be catastrophic). There are many
references in Fogel (2006) that support the importance of
mutation-based search, but across all problems the No Free Lunch theorem holds, so these opinions are without merit unless the discussion is restricted to a particular problem.
- Often, GAs can rapidly locate good solutions, even for difficult search spaces. The same is of course also true for evolution strategies and evolutionary programming.
- For specific optimization problems and problem instantiations,
simpler optimization algorithms may find better solutions than genetic
algorithms (given the same amount of computation time). Alternative and
complementary algorithms include evolution strategies, evolutionary programming, simulated annealing, Gaussian adaptation, hill climbing, and swarm intelligence (e.g.: ant colony optimization, particle swarm optimization).
- As with all current machine learning problems it is worth tuning the parameters such as mutation probability, recombination
probability and population size to find reasonable settings for the
problem class being worked on. A very small mutation rate may lead to genetic drift
(which is non-ergodic in nature). A recombination rate that is too high
may lead to premature convergence of the genetic algorithm. A mutation
rate that is too high may lead to loss of good solutions unless there
is elitist selection. There are theoretical but not yet practical upper
and lower bounds for these parameters that can help guide selection.
- The implementation and evaluation of the fitness function is an important factor in the speed and efficiency of the algorithm.
Variants
The simplest algorithm represents each chromosome as a bit string. Typically, numeric parameters can be represented by integers, though it is possible to use floating point representations. The floating point representation is natural to evolution strategies and evolutionary programming.
The notion of real-valued genetic algorithms has been offered but is
really a misnomer because it does not really represent the building
block theory that was proposed by Holland in the 1970s. This theory is
not without support though, based on theoretical and experimental
results (see below). The basic algorithm performs crossover and
mutation at the bit level. Other variants treat the chromosome as a
list of numbers which are indexes into an instruction table, nodes in a
linked list, hashes, objects, or any other imaginable data structure.
Crossover and mutation are performed so as to respect data element
boundaries. For most data types, specific variation operators can be
designed. Different chromosomal data types seem to work better or worse
for different specific problem domains.
When bit strings representations of integers are used, Gray coding
is often employed. In this way, small changes in the integer can be
readily effected through mutations or crossovers. This has been found
to help prevent premature convergence at so called Hamming walls,
in which too many simultaneous mutations (or crossover events) must
occur in order to change the chromosome to a better solution.
Other approaches involve using arrays of real-valued numbers instead
of bit strings to represent chromosomes. Theoretically, the smaller the
alphabet, the better the performance, but paradoxically, good results
have been obtained from using real-valued chromosomes.
A very successful (slight) variant of the general process of
constructing a new population is to allow some of the better organisms
from the current generation to carry over to the next, unaltered. This
strategy is known as elitist selection.
Parallel implementations of genetic algorithms come in two flavours.
Coarse grained parallel genetic algorithms assume a population on each
of the computer nodes and migration of individuals among the nodes.
Fine grained parallel genetic algorithms assume an individual on each
processor node which acts with neighboring individuals for selection
and reproduction. Other variants, like genetic algorithms for online
optimization problems, introduce time-dependence or noise in the
fitness function.
It can be quite effective to combine GA with other optimization
methods. GA tends to be quite good at finding generally good global
solutions, but quite inefficient at finding the last few mutations to
find the absolute optimum. Other techniques (such as simple hill
climbing) are quite efficient at finding absolute optimum in a limited
region. Alternating GA and hill climbing can improve the efficiency of
GA while overcoming the lack of robustness of hill climbing.
An algorithm that maximizes mean fitness (without any need for the definition of mean fitness as a criterion function) is Gaussian adaptation, See Kjellström 1970[2],
provided that the ontogeny of an individual may be seen as a modified
recapitulation of evolutionary random steps in the past and that the
sum of many random steps tend to become Gaussian distributed (according
to the central limit theorem).
This means that the rules of genetic variation may have a different
meaning in the natural case. For instance - provided that steps are
stored in consecutive order - crossing over may sum a number of steps
from maternal DNA adding a number of steps from paternal DNA and so on.
This is like adding vectors that more probably may follow a ridge in
the phenotypic landscape. Thus, the efficiency of the process may be
increased by many orders of magnitude. Moreover, the inversion operator
has the opportunity to place steps in consecutive order or any other
suitable order in favour of survival or efficiency. (See for instance [1] or example in travelling salesman problem.)
Gaussian adaptation is able to approximate the natural process by an
adaptation of the moment matrix of the Gaussian. So, because very many
quantitative characters are Gaussian distributed in a large population,
Gaussian adaptation may serve as a genetic algorithm replacing the
rules of genetic variation by a Gaussian random number generator
working on the phenotypic level. See Kjellström 1996[3]
Population-based incremental learning is a variation where the population as a whole is evolved rather than its individual members.
Problem domains
Problems which appear to be particularly appropriate for solution by genetic algorithms include timetabling and scheduling problems, and many scheduling software packages are based on GAs. GAs have also been applied to engineering. Genetic algorithms are often applied as an approach to solve global optimization problems.
As a general rule of thumb genetic algorithms might be useful in problem domains that have a complex fitness landscape as recombination is designed to move the population away from local optima that a traditional hill climbing algorithm might get stuck in.
History
Computer simulations of evolution started as early as in 1954 with the work of Nils Aall Barricelli, who was using the computer at the Institute for Advanced Study in Princeton, New Jersey.[4] [5] His 1954 publication was not widely noticed. Starting in 1957 [6], the Australian quantitative geneticist Alex Fraser published a series of papers on simulation of artificial selection
of organisms with multiple loci controlling a measurable trait. From
these beginnings, computer simulation of evolution by biologists became
more common in the early 1960s, and the methods were described in books
by Fraser and Burnell (1970)[7] and Crosby (1973)[8]. Fraser's simulations included all of the essential elements of modern genetic algorithms. In addition, Hans Bremermann
published a series of papers in the 1960s that also adopted a
population of solution to optimization problems, undergoing
recombination, mutation, and selection. Bremermann's research also
included the elements of modern genetic algorithms. Other noteworthy
early pioneers include Richard Friedberg, George Friedman, and Michael
Conrad. Many early papers are reprinted by Fogel (1998).[9]
Although Barricelli, in work he reported in 1963, had simulated the evolution of ability to play a simple game,[10] artificial evolution became a widely recognized optimization method as a result of the work of Ingo Rechenberg and Hans-Paul Schwefel in the 1960s and early 1970s - his group was able to solve complex engineering problems through evolution strategies [11] [12] [13]. Another approach was the evolutionary programming technique of Lawrence J. Fogel, which was proposed for generating artificial intelligence. Evolutionary programming
originally used finite state machines for predicting environments, and
used variation and selection to optimize the predictive logics. Genetic
algorithms in particular became popular through the work of John Holland in the early 1970s, and particularly his book Adaptation in Natural and Artificial Systems (1975). His work originated with studies of cellular automata, conducted by Holland and his students at the University of Michigan. Holland introduced a formalized framework for predicting the quality of the next generation, known as Holland's Schema Theorem.
Research in GAs remained largely theoretical until the mid-1980s, when
The First International Conference on Genetic Algorithms was held in Pittsburgh, Pennsylvania.
As academic interest grew, the dramatic increase in desktop
computational power allowed for practical application of the new
technique. In the late 1980s, General Electric started selling the
world's first genetic algorithm product, a mainframe-based toolkit
designed for industrial processes. In 1989, Axcelis, Inc. released Evolver, the world's second GA product and the first for desktop computers. The New York Times technology writer John Markoff wrote[14] about Evolver in 1990.
Related techniques
- Ant colony optimization
(ACO) uses many ants (or agents) to traverse the solution space and
find locally productive areas. While usually inferior to genetic
algorithms and other forms of local search, it is able to produce
results in problems where no global or up-to-date perspective can be
obtained, and thus the other methods cannot be applied.
- Bacteriologic Algorithms (BA) inspired by evolutionary ecology
and, more particularly, bacteriologic adaptation. Evolutionary ecology
is the study of living organisms in the context of their environment,
with the aim of discovering how they adapt. Its basic concept is that
in a heterogeneous environment, you can’t find one individual that fits
the whole environment. So, you need to reason at the population level.
BAs have shown better results than GAs on problems such as complex
positioning problems (antennas for cell phones, urban planning, and so
on) or data mining.[15]
- Cross-entropy method
The Cross-entropy (CE) method generates candidates solutions via a
parameterized probability distribution. The parameters are updated via
cross-entropy minimization, so as to generate better samples in the
next iteration.
- Cultural algorithm
(CA) consists of the population component almost indentical to that of
the genetic algorithm and, in addition, a knowledge component called
the belief space.
- Evolution strategies
(ES, see Rechenberg, 1971) evolve individuals by means of mutation and
intermediate and discrete recombination. ES algorithms are designed
particularly to solve problems in the real-value domain. They use
self-adaptation to adjust control parameters of the search.
- Evolutionary programming
(EP) involves populations of solutions with primarily mutation and
selection and arbitrary representations. They use self-adaptation to
adjust parameters, and can include other variation operations such as
combining information from multiple parents.
- Extremal optimization (EO) Unlike GAs, which work with a population of candidate solutions, EO evolves a single solution and makes local
modifications to the worst components. This requires that a suitable
representation be selected which permits individual solution components
to be assigned a quality measure ("fitness"). The governing principle
behind this algorithm is that of emergent improvement through
selectively removing low-quality components and replacing them with a
randomly selected component. This is decidedly at odds with a GA that
selects good solutions in an attempt to make better solutions.
- Gaussian adaptation
(normal or natural adaptation, abbreviated NA to avoid confusion with
GA) is intended for the maximisation of manufacturing yield of signal
processing systems. It may also be used for ordinary parametric
optimisation. It relies on a certain theorem valid for all regions of
acceptability and all Gaussian distributions. The efficiency of NA
relies on information theory and a certain theorem of efficiency. Its
efficiency is defined as information divided by the work needed to get
the information[16].
Because NA maximises mean fitness rather than the fitness of the
individual, the landscape is smoothed such that valleys between peaks
may disappear. Therefore it has a certain “ambition” to avoid local
peaks in the fitness landscape. NA is also good at climbing sharp
crests by adaptation of the moment matrix, because NA may maximise the
disorder (average information) of the Gaussian simultaneously keeping the mean fitness constant.
- Genetic programming (GP) is a related technique popularized by John Koza in which computer programs, rather than function parameters, are optimized. Genetic programming often uses tree-based internal data structures to represent the computer programs for adaptation instead of the list structures typical of genetic algorithms.
- Grouping Genetic Algorithm
(GGA) is an evolution of the GA where the focus is shifted from
individual items, like in classical GAs, to groups or subset of items.[17] The idea behind this GA evolution proposed by Emanuel Falkenauer is that solving some complex problems, a.k.a. clustering or partitioning
problems where a set of items must be split into disjoint group of
items in an optimal way, would better be achieved by making
characteristics of the groups of items equivalent to genes. These kind
of problems include Bin Packing,
Line Balancing, Clustering w.r.t. a distance measure, Equal Piles,
etc., on which classic GAs proved to perform poorly. Making genes
equivalent to groups implies chromosomes that are in general of
variable length, and special genetic operators that manipulate whole
groups of items. For Bin Packing in particular, a GGA hybridized with the Dominance Criterion of Martello and Toth, is arguably the best technique to date.
- Harmony search (HS) is an algorithm mimicking musicians behaviors in improvisation process.
- Interactive evolutionary algorithms
are evolutionary algorithms that use human evaluation. They are usually
applied to domains where it is hard to design a computational fitness
function, for example, evolving images, music, artistic designs and
forms to fit users' aesthetic preference.
- Memetic algorithm (MA), also called hybrid genetic algorithm
among others, is a relatively new evolutionary method where local
search is applied during the evolutionary cycle. The idea of memetic
algorithms comes from memes,
which–unlike genes–can adapt themselves. In some problem areas they are
shown to be more efficient than traditional evolutionary algorithms.
- Simulated annealing
(SA) is a related global optimization technique that traverses the
search space by testing random mutations on an individual solution. A
mutation that increases fitness is always accepted. A mutation that
lowers fitness is accepted probabilistically based on the difference in
fitness and a decreasing temperature parameter. In SA parlance, one
speaks of seeking the lowest energy instead of the maximum fitness. SA
can also be used within a standard GA algorithm by starting with a
relatively high rate of mutation and decreasing it over time along a
given schedule.
- Tabu search
(TS) is similar to Simulated Annealing in that both traverse the
solution space by testing mutations of an individual solution. While
simulated annealing generates only one mutated solution, tabu search
generates many mutated solutions and moves to the solution with the
lowest energy of those generated. In order to prevent cycling and
encourage greater movement through the solution space, a tabu list is
maintained of partial or complete solutions. It is forbidden to move to
a solution that contains elements of the tabu list, which is updated as
the solution traverses the solution space.
Building block hypothesis
Genetic algorithms are simple to implement, but their behavior is
difficult to understand. In particular it is difficult to understand
why they are often successful in generating solutions of high fitness.
The building block hypothesis (BBH) consists of
- A description of an abstract adaptive mechanism that performs
adaptation by recombining "building blocks", i.e. low order, low
defining-length schemata with above average fitness.
- A hypothesis that a genetic algorithm performs adaptation by
implicitly and efficiently implementing this abstract adaptive
mechanism.
(Goldberg 1989:41) describes the abstract adaptive mechanism as follows:
- Short, low order, and highly fit schemata are sampled, recombined
[crossed over], and resampled to form strings of potentially higher
fitness. In a way, by working with these particular schemata [the
building blocks], we have reduced the complexity of our problem;
instead of building high-performance strings by trying every
conceivable combination, we construct better and better strings from
the best partial solutions of past samplings.
- Just as a child creates magnificent fortresses through the
arrangement of simple blocks of wood [building blocks], so does a
genetic algorithm seek near optimal performance through the
juxtaposition of short, low-order, high-performance schemata, or
building blocks.
(Goldberg 1989) claims that the building block hypothesis is supported by Holland's schema theorem.
The building block hypothesis has been sharply criticized on the
grounds that it lacks theoretical justification and experimental
results have been published that draw its veracity into question. On
the theoretical side, for example, Wright et al. state that
- "The various claims about GAs that are traditionally made under the name of the building block hypothesis have, to date, no basis in theory and, in some cases, are simply incoherent"[18]
On the experimental side uniform crossover was seen to outperform
one-point and two-point crossover on many of the fitness functions
studied by Syswerda.[19] Summarizing these results, Fogel remarks that
- "Generally, uniform crossover yielded better performance than
two-point crossover, which in turn yielded better performance than
one-point crossover"[20]
Syswerda's results contradict the building block hypothesis because
uniform crossover is extremely disruptive of short schemata whereas one
and two-point crossover are more likely to conserve short schemata and
combine their defining bits in children produced during recombination.
The debate over the building block hypothesis demonstrates that the
issue of how GAs "work", (i.e. perform adaptation) is currently far
from settled. (See the External Links section for more about this)
Applications
- Artificial Creativity
- Automated design, including research on composite material design and multi-objective design of automotive components for crashworthiness, weight savings, and other characteristics.
- Automated design of mechatronic systems using bond graphs and genetic programming (NSF).
- Automated design of industrial equipment using catalogs of exemplar lever patterns.
- Automated design of sophisticated trading systems in the financial sector.
- Building phylogenetic trees.[21]
- Calculation of Bound states and Local-density approximations.
- Chemical kinetics (gas and solid phases)
- Configuration applications, particularly physics applications of
optimal molecule configurations for particular systems like C60 (buckyballs).
- Container loading optimization.
- Code-breaking, using the GA to search large solution spaces of ciphers for the one correct decryption.
- Design of water distribution systems.
- Distributed computer network topologies.
- Electronic circuit design, known as Evolvable hardware.
- File allocation for a distributed system.
- Parallelization of GAs/GPs including use of hierarchical decomposition of problem domains and design spaces nesting of irregular shapes using feature matching and GAs.
- Game Theory Equilibrium Resolution.
- Gene expression profiling analysis.[22]
- Learning Robot behavior using Genetic Algorithms.
- Learning fuzzy rule base using genetic algorithms.
- Linguistic analysis, including Grammar Induction and other aspects of Natural Language Processing (NLP) such as word sense disambiguation.
- Marketing Mix Analysis
- Mobile communications infrastructure optimization.
- Molecular Structure Optimization (Chemistry).
- Multiple population topologies and interchange methodologies.
- Multiple sequence alignment.[23]
- Operon prediction.[24]
- Optimisation of data compression systems, for example using wavelets.
- Protein folding and protein/ligand docking.[25]
- Plant floor layout.
- Representing rational agents in economic models such as the cobweb model.
- RNA structure prediction.[26]
- Scheduling applications, including job-shop scheduling. The objective being to schedule jobs in a sequence dependent
or non-sequence dependent setup environment in order to maximize the
volume of production while minimizing penalties such as tardiness.
- Selection of optimal mathematical model to describe biological systems.
- Software engineering
- Solving the machine-component grouping problem required for cellular manufacturing systems.
- Tactical asset allocation and international equity strategies.
- Timetabling problems, such as designing a non-conflicting class timetable for a large university.
- Training artificial neural networks when pre-classified training examples are not readily obtainable (neuroevolution).
- Traveling Salesman Problem.
- Finding hardware bugs. [27] [28]
References
- ^ Fentress, Sam W (2005). Exaptation as a means of evolving complex solutions. MA Thesis, University of Edinburgh.
- ^ Kjellström, G. (1970). "Optimization of electrical Networks with respect to Tolerance Costs.". Ericsson Technics (3): 157-175.
- ^ Kjellström, G. (January 1996). "Evolution as a statistical optimization algorithm". Evolutionary Theory (11): 105-117.
- ^ Barricelli, Nils Aall (1954). "Esempi numerici di processi di evoluzione". Methodos: 45-68.
- ^ Barricelli, Nils Aall (1957). "Symbiogenetic evolution processes realized by artificial methods". Methodos: 143–182.
- ^ Fraser, Alex (1957). "Simulation of genetic systems by automatic digital computers. I. Introduction". Aust. J. Biol. Sci. 10: 484-491.
- ^ Fraser, Alex; Donald Burnell (1970). Computer Models in Genetics. New York: McGraw-Hill.
- ^ Crosby, Jack L. (1973). Computer Simulation in Genetics. London: John Wiley & Sons.
- ^ Fogel, David B. (editor) (1998). Evolutionary Computation: The Fossil Record. New York: IEEE Press.
- ^ Barricelli,
Nils Aall (1963). "Numerical testing of evolution theories. Part II.
Preliminary tests of performance, symbiogenesis and terrestrial life". Acta Biotheoretica (16): 99-126.
- ^ Schwefel, Hans-Paul (1974). Numerische Optimierung von Computer-Modellen (PhD thesis).
- ^ Schwefel, Hans-Paul (1977). Numerische
Optimierung von Computor-Modellen mittels der
Evolutionsstrategie : mit einer vergleichenden Einführung in die
Hill-Climbing- und Zufallsstrategie. Birkhäuser. ISBN 3764308761.
- ^ Schwefel, Hans-Paul (1981). Numerical
optimization of computer models (Translation of 1977 'Numerische
Optimierung von Computor-Modellen mittels der Evolutionsstrategie'. Wiley. ISBN 0471099880.
- ^ Markoff, John (1989). What's the Best Answer? It's Survival of the Fittest. New York Times.
- ^ Baudry, Benoit; Franck Fleurey, Jean-Marc Jézéquel, and Yves Le Traon (March/April 2005). "Automatic Test Case Optimization: A Bacteriologic Algorithm". IEEE Software: 76-82. IEEE Computer Society.
- ^ Kjellström, G. (Dec. 1991). "On the Efficiency of Gaussian Adaptation". Journal of Optimization Theory and Applications (3): 589-597.
- ^ Falkenauer, Emanuel (1997). Genetic Algorithms and Grouping Problems. Chichester, England: John Wiley & Sons Ltd. ISBN 978-0-471-97150-4.
- ^ Wright, A.H.; et al. (2003). "Implicit Parallelism". Proceedings of the Genetic and Evolutionary Computation Conference.
- ^ Syswerda, G. (1989). "Uniform crossover in genetic algorithms". J. D. Schaffer Proceedings of the Third International Conference on Genetic Algorithms, Morgan Kaufmann.
- ^ Fogel, David B. (2000). Evolutionary Computation: Towards a New Philosophy of Machine Intelligence. New York: IEEE Press, 140.
- ^ Hill
T, Lundgren A, Fredriksson R, Schiöth HB (2005). "Genetic algorithm for
large-scale maximum parsimony phylogenetic analysis of proteins". Biochimica et Biophysica Acta 1725: 19-29. PMID 15990235.
- ^ To
CC, Vohradsky J (2007). "A parallel genetic algorithm for single class
pattern classification and its application for gene expression
profiling in Streptomyces coelicolor". BMC Genomics 8: 49. PMID 17298664.
- ^ Gondro C, Kinghorn BP (2007). "A simple genetic algorithm for multiple sequence alignment". Genetics and Molecular Research 6: 964-982. PMID 18058716.
- ^ Wang
S, Wang Y, Du W, Sun F, Wang X, Zhou C, Liang Y (2007). "A
multi-approaches-guided genetic algorithm with application to operon
prediction". Artificial Intelligence in Medicine 41: 151-159. PMID 17869072.
- ^ Willett P (1995). "Genetic algorithms in molecular recognition and design". Trends in Biotechnology 13: 516-521. PMID 8595137.
- ^ van
Batenburg FH, Gultyaev AP, Pleij CW (1995). "An APL-programmed genetic
algorithm for the prediction of RNA secondary structure". Journal of Theoretical Biology 174: 269-280. PMID 7545258.
- ^ Hitoshi Iba,
Sumitaka Akiba, Tetsuya Higuchi, Taisuke Sato: BUGS: A Bug-Based Search
Strategy using Genetic Algorithms. PPSN 1992:
- ^ Ibrahim, W. and Amer, H.: An Adaptive Genetic Algorithm for VLSI Test Vector Selection
- Bies, Robert R; Muldoon, Matthew
F; Pollock, Bruce G; Manuck, Steven; Smith, Gwenn and Sale, Mark E
(2006). "A Genetic Algorithm-Based, Hybrid Machine Learning Approach to
Model Selection". Journal of Pharmacokinetics and Pharmacodynamics: 196-221. Netherlands: Springer.
- Fraser, Alex S. (1957). "Simulation of Genetic Systems by Automatic Digital Computers. I. Introduction". Australian Journal of Biological Sciences 10: 484-491.
- Goldberg, David E (1989), Genetic Algorithms in Search, Optimization and Machine Learning, Kluwer Academic Publishers, Boston, MA.
- Goldberg, David E (2002), The Design of Innovation: Lessons from and for Competent Genetic Algorithms, Addison-Wesley, Reading, MA.
- Fogel, David B (2006), Evolutionary Computation: Toward a New Philosophy of Machine Intelligence, IEEE Press, Piscataway, NJ. Third Edition
- Holland, John H (1975), Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor
- Koza, John (1992), Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press. ISBN 0-262-11170-5
- Michalewicz, Zbigniew (1999), Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag.
- Mitchell, Melanie, (1996), An Introduction to Genetic Algorithms, MIT Press, Cambridge, MA.
- Rechenberg, Ingo (1971): Evolutionsstrategie - Optimierung
technischer Systeme nach Prinzipien der biologischen Evolution (PhD
thesis). Reprinted by Fromman-Holzboog (1973).
- Schmitt, Lothar M, Nehaniv Chrystopher N, Fujii Robert H (1998), Linear analysis of genetic algorithms, Theoretical Computer Science (208), pp. 111-148
- Schmitt, Lothar M (2001), Theory of Genetic Algorithms, Theoretical Computer Science (259), pp. 1-61
- Schmitt, Lothar M (2004), Theory of Genetic Algorithms II:
models for genetic operators over the string-tensor representation of
populations and convergence to global optima for arbitrary fitness
function under scaling, Theoretical Computer Science (310), pp. 181-231
- Schwefel, Hans-Paul (1974): Numerische Optimierung von Computer-Modellen (PhD thesis). Reprinted by Birkhäuser (1977).
- Vose, Michael D (1999), The Simple Genetic Algorithm: Foundations and Theory, MIT Press, Cambridge, MA.
- Whitley, D. (1994). A genetic algorithm tutorial. Statistics and Computing 4, 65–85.
External links
- ParadisEO A powerful C++ framework dedicated to the reusable design of metaheuristics, included genetic algorithms.
Genetic Programming (GP)
Genetic programming (GP) is an evolutionary algorithm based methodology inspired by biological evolution to find computer programs that perform a user-defined task. Therefore it is a machine learning technique used to optimize a population of computer programs according to a fitness landscape determined by a program's ability to perform a given computational task. The roots of GP begin with the evolutionary algorithms first utilized by Nils Aall Barricelli in 1954 as applied to evolutionary simulations but evolutionary algorithms became widely recognized as optimization methods as a result of the work of Ingo Rechenberg in the 1960s and early 1970s - his group was able to solve complex engineering problems through evolution strategies (1971 PhD thesis and resulting 1973 book). Also highly influential was the work of John Holland in the early 1970s, and particularly his 1975 book. The first results on the GP methodology were reported by Stephen F. Smith (1980)[1] and Nichael L. Cramer (1985),[2]. In 1981 Forsyth reported the evolution of small programs in forensic science for the UK police. John R. Koza
is a main proponent of GP and has pioneered the application of genetic
programming in various complex optimization and search problems [3].
GP is very computationally intensive and so in the 1990s it was
mainly used to solve relatively simple problems. But more recently,
thanks to improvements in GP technology and to the exponential growth in CPU power, GP produced many novel and outstanding results in areas such as quantum computing, electronic design, game playing, sorting, searching and many more.[4] These results include the replication or development of several post-year-2000 inventions. GP has also been applied to evolvable hardware as well as computer programs. There are several GP patents listed in the web site [5].
Developing a theory for GP has been very difficult and so in the
1990s GP was considered a sort of outcast among search techniques. But
after a series of breakthroughs in the early 2000s, the theory of GP
has had a formidable and rapid development. So much so that it has been
possible to build exact probabilistic models of GP (schema theories and
Markov chain models).
Chromosome representation
A function represented as a tree structure
GP evolves computer programs, traditionally represented in memory as tree structures.
Trees can be easily evaluated in a recursive manner. Every tree node
has an operator function and every terminal node has an operand, making
mathematical expressions easy to evolve and evaluate. Thus
traditionally GP favors the use of programming languages that naturally embody tree structures (for example, Lisp; other functional programming languages are also suitable).
Non-tree representations have been suggested and successfully implemented, such as the simpler linear genetic programming which suits the more traditional imperative languages [see, for example, Banzhaf et al. (1998)]. The commercial GP software Discipulus,[6] uses AIM, automatic induction of binary machine code[7] to achieve better performance.[8] MicroGP[9] uses a representation similar to linear GP to generate programs that fully exploit the syntax of a given assembly language.
Genetic operators
Crossover
The main operators used in evolutionary algorithms
such as GP are crossover and mutation. Crossover is applied on an
individual by simply switching one of its nodes with another node from
another individual in the population. With a tree-based representation,
replacing a node means replacing the whole branch. This adds greater
effectiveness to the crossover operator. The expressions resulting from
crossover are very much different from their initial parents.
The following code suggests a simple implementation of individual deformation using crossover:
individual. Children[randomChildIndex] = otherIndividual.Children[randomChildIndex];
Mutation
Mutation affects an individual in the population. It can replace a
whole node in the selected individual, or it can replace just the
node's information. To maintain integrity, operations must be fail-safe
or the type of information the node holds must be taken into account.
For example, mutation must be aware of binary operation nodes, or the
operator must be able to handle missing values.
A simple piece of code:
individual. Information = randomInformation;
or
individual = generateNewIndividual;
Meta-Genetic Programming
Meta-Genetic Programming is the proposed meta learning technique of evolving a genetic programming system using genetic programming itself. [10].
It suggests that chromosomes, crossover, and mutation were themselves
evolved, therefore like their real life counterparts should be allowed
to change on their own rather than being determined by a human
programmer. Meta-GP was proposed by Jürgen Schmidhuber in 1987 [11]; it is a recursive but terminating algorithm, allowing it to avoid infinite recursion.
Critics of this idea often say this approach is overly broad in
scope. However, it might be possible to constrain the fitness criterion
onto a general class of results, and so obtain an evolved GP that would
more efficiently produce results for sub-classes. This might take the
form of a Meta evolved GP for producing human walking algorithms which
is then used to evolve human running, jumping, etc. The fitness
criterion applied to the Meta GP would simply be one of efficiency.
For general problem classes there may be no way to show that Meta GP
will reliably produce results more efficiently than a created algorithm
other than exhaustion. The same holds for standard GP and other search
algorithms, of course.
Notes
- ^ http://www-2.cs.cmu.edu/~sfs/
- ^ http://www.sover.net/~nichael/
- ^ http://www.genetic-programming.com/
- ^ http://www.genetic-programming.com/humancompetitive.html
- ^ http://www.genetic-programming.com/patents.html
- ^ http://www.aimlearning.com
- ^ (Peter Nordin, 1997, Banzhaf et al., 1998, Section 11.6.2-11.6.3)
- ^ http://www.aimlearning.com/aigp31.pdf
- ^ http://www.cad.polito.it/research/microgp.html
- ^ http://www.helpmefigurethisout.com/
- ^ http://www.idsia.ch/~juergen/diploma.html
See also
Bibliography
- Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D. (1998), Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and Its Applications, Morgan Kaufmann
- Barricelli, Nils Aall (1954), Esempi numerici di processi di evoluzione, Methodos, pp. 45-68.
- Crosby, Jack L. (1973), Computer Simulation in Genetics, John Wiley & Sons, London.
- Cramer, Nichael Lynn (1985), "A representation for the Adaptive Generation of Simple Sequential Programs" in Proceedings of an International Conference on Genetic Algorithms and the Applications, Grefenstette, John J. (ed.), Carnegie Mellon University
- Fentress, Sam W (2005), Exaptation as a means of evolving complex solutions, MA Thesis, University of Edinburgh. (pdf)
- Fogel, David B. (2000) Evolutionary Computation: Towards a New Philosophy of Machine Intelligence IEEE Press, New York.
- Fogel, David B. (editor) (1998) Evolutionary Computation: The Fossil Record, IEEE Press, New York.
- Forsyth, Richard (1981), BEAGLE A Darwinian Approach to Pattern Recognition Kybernetes, Vol. 10, pp. 159-166.
- Fraser, Alex S. (1957), Simulation of Genetic Systems by Automatic Digital Computers. I. Introduction. Australian Journal of Biological Sciences vol. 10 484-491.
- Fraser, Alex and Donald Burnell (1970), Computer Models in Genetics, McGraw-Hill, New York.
- Holland, John H (1975), Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor
- Koza, J.R. (1990), Genetic Programming: A Paradigm for Genetically Breeding Populations of Computer Programs to Solve Problems, Stanford University Computer Science Department technical report STAN-CS-90-1314. A thorough report, possibly used as a draft to his 1992 book.
- Koza, J.R. (1992), Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press
- Koza, J.R. (1994), Genetic Programming II: Automatic Discovery of Reusable Programs, MIT Press
- Koza, J.R., Bennett, F.H., Andre, D., and Keane, M.A. (1999), Genetic Programming III: Darwinian Invention and Problem Solving, Morgan Kaufmann
- Koza, J.R., Keane, M.A., Streeter, M.J., Mydlowec, W., Yu, J., Lanza, G. (2003), Genetic Programming IV: Routine Human-Competitive Machine Intelligence, Kluwer Academic Publishers
- Langdon, W. B., Poli, R. (2002), Foundations of Genetic Programming, Springer-Verlag
- Nordin, J.P., (1997) Evolutionary Program Induction of Binary Machine Code and its Application. Krehl Verlag, Muenster, Germany.
- Rechenberg, I. (1971): Evolutionsstrategie - Optimierung
technischer Systeme nach Prinzipien der biologischen Evolution (PhD
thesis). Reprinted by Fromman-Holzboog (1973).
- Schmidhuber, J. (1987). Evolutionary principles in self-referential
learning. (On learning how to learn: The meta-meta-... hook.) Diploma
thesis, Institut f. Informatik, Tech. Univ. Munich.
- Smith, S.F. (1980), A Learning System Based on Genetic Adaptive Algorithms, PhD dissertation (University of Pittsburgh)
- Smith, Jeff S. (2002), Evolving a Better Solution, Developers Network Journal, March 2002 issue
External links
Implementations:
Possibly most used:
Companies:
This article is licensed under the GNU Free Documentation License. It uses material from Wikipedia Encyclopedia article "Genetic Algorithm"
|