All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Description

Error Thresholds in Genetic Algorithms

Tags

Transcript

Error Thresholds in Genetic Algorithms
Gabriela Ochoa
gabro@ldc.usb.veDepartmento de Computacion, Universidad Simon Bolivar, Po. Box 89000, Caracas1080-A, Venezuela
Abstract
The
error threshold
of replication is an important notion of the
quasispecies
evolutionmodel; it is a critical mutation rate (error rate) beyond which structures obtained byan evolutionary process are destroyed more frequently than selection can reproducethem. With mutation rates above this critical value, an error catastrophe occurs andthe genomic information is irretrievably lost. Therefore, studying the factors that alterthis magnitude has important implications in the study of evolution. Here we usea genetic algorithm, instead of the quasispecies model, as the underlying model of evolution, and explore the existence of error thresholds on complex landscapes. Wealso study the effect of modifying the most prominent evolutionary parameters on themagnitude of this critical value. Error thresholds were found to depend mainly on theselection pressure and genotype length. Our empirical study veriﬁes the occurrence of errorthresholdsinevolvingpopulationsofbitstringsusingageneticalgorithm. Inthisway, the notion of error threshold is brought from molecular evolution to evolutionarycomputation.
Keywords
Genetic algorithms, error thresholds, quasispecies model, optimal mutation rates, evo-lutionary parameters, NK landscapes, Royal Staircase functions, multiple knapsackproblems.
1 Introduction
Quasispecies theory was derived by Eigen and Schuster (1979) to describe the dynam-ics of replicating nucleic acid molecules under the inﬂuence of mutation and selection.The theory was srcinally developed in the context of pre-biotic evolution (studies of theoriginoflife),butinawidersenseitdescribesanypopulationofreproducingorgan-isms. The Quasispecies model has been acknowledged as extremely useful in studyingthe behavior of populations evolving on a given landscape (Peliti, 2002), and has be-comeastandardmodeltodescribemolecularandviralevolution(Kampetal.,2003). Aquasispecies is deﬁned as the stationary population distribution of replicating macro-molecules under mutation and selection. The most prominent feature of quasispeciesis the existence of an error threshold of replication. If replication were error free, nomutants would arise and evolution would stop. On the other hand, evolution wouldalso be impossible if the error rate of replication were too high (since selection wouldnot be able to maintain the genetic information in the population). The notion of errorthreshold imposes an upper bound for the mutation rate, beyond this critical value anerror catastrophe occurs and the genetic information is irrecoverably lost.Thenotionoferrorthresholdisintuitivelyrelatedtotheideaofanoptimalbalance between
exploitation
and
exploration
in genetic search. Too low a mutation rate impliestoo little exploration; in the limit of zero mutation, no new individuals would arise
c
200X by the Massachusetts Institute of Technology Evolutionary Computation x(x): xxx-xxx
G. Ochoa
and the search process would stagnate. On the other hand, with an excessively highmutation rate, the evolutionary process would degenerate into random search with noexploitation of the information acquired in preceding generations. Any optimal mu-tation rate must lie between these two extremes, but its precise position will dependon the other evolutionary parameters and the characteristics of the problem at hand.It can, however, be postulated that a mutation rate close to the error threshold would be optimal for the problem under study, because it would maximize the search donethrough mutation subject to the constraint of not losing information already gained.Some biological evidence supports the idea that evolution is effective close to the errorthreshold; certain viruses (such as the HIV virus), which are very efﬁcient evolving en-tities, seem to operate very close to their error threshold (Nowak and Schuster, 1992;Bonhoeffer and Stadler, 1993). Moreover, the existence of a relationship between errorthresholds and optimal mutation rates has been suggested before in the evolutionarycomputation community (Hesser and M¨anner, 1991; Kauffman, 1993). Also, this ideais implicitly suggested by Harvey (1991) , who presents a GA framework (SAGA) spe-ciallydesignedforevolvinggeneticallyconvergedpopulationsofvariablelengthgeno-types. Finally, in (Ochoa et al., 1999) we explicitly tested, using an empirical approach,thehypothesizedrelationshipbetweenerrorthreshold,ourresultssuggestedthatthesenotions are indeed correlated.In (Ochoa and Harvey, 1998) we demonstrated empirically the existence of errorthresholds on two simple landscapes (isolated peak, and plateau) using a standardGA; it was also shown that recombination, in those landscapes, shifted error thresh-olds toward lower values. The present study extends those ﬁndings by studying morecomplex landscapes. The division between simple and complex is somewhat arbitrary.The isolated peak landscape is an extreme uncorrelated landscape, the plateau is lessextreme but still highly uncorrelated. This work, on the other hand, explores corre-lated landscapes, and study the effect of modifying the most prominent evolutionaryparameters on the magnitude of error thresholds. This study complements a previouscontribution where consensus sequence plots (see section 3) and error thresholds werepresented as tools for visualizing the structure of ﬁtness landscapes (Ochoa, 2000)Section2describesthetestproblemsused: bothabstractlandscapesandreal-worldapplications. Section 3 describes the
consensus sequence
plots. These plots, borrowedand adapted from theoretical biology, constitute an empirical approach for locatingerror thresholds on general landscapes. Section 4 uses consensus sequence plots, ontwo ﬁxed abstract problems, to explore the effect of changing various evolutionaryparametersonthemagnitudeoferrorthresholds. Theclosingempiricalsection(section5), explores whether error thresholds may be identiﬁed on real-world applications.
2 Test Problems
Two groups of test problems were considered. First, two families of abstract ﬁtnesslandscapes: Royal Staircase functions, and
NK
landscapes. This selection is consistentwith our belief that ruggedness and neutrality are two important landscape featuresfound in real-world applications. The Royal Staircase family of functions is a verysimple class of functions that allows neutrality to be modelled and tuned. The
NK
family of landscapes is a problem-independent model for constructing landscapes thatcan gradually be tuned from smooth to rugged.Second, in order to study whether the issues explored here carried over from ab-stract landscapes to real-world applications, two real-world domains were selected: acombinatorial optimization problem — the Multiple Knapsack problem, and an en-2
Evolutionary Computation Volume x, Number x
Error Thresholds in Genetic Algorithms
gineering problem — the design of an optimal aircraft Wing-Box. This selection wassomewhat arbitrary, but again is consistent with the following criteria. First, both arecomplex problems: the Wing-Box is an engineering design problem based on real dataand constraints, and the Multiple Knapsack is a highly constrained combinatorial op-timization problem known to be
NP
-hard. Second, both problems were available andrelatively easy to implement, and third, both have a natural bit string encoding whichwas a requirement for the study of error thresholds.
2.1 Royal Staircase Functions
The
Royal Staircase
1
class of functions was proposed by Nimwegen and Crutchﬁeldfor analyzing epochal evolutionary search. They justify their particular choice of ﬁt-ness function both in terms of biological motivations and artiﬁcial evolution issues.Although simple, Royal Staircase functions capture some essential elements found oncomplex problems, namely, highly degenerate genotype-to-phenotype maps, and theexistence of extended neutral networks (i.e. sets of equal-ﬁtness sequences that canreach each other via elementary genetic variation steps such as point mutation). Theworking hypothesis is that many real search problems have genotype search spaceswhich decompose into a number of such neutral networks. This idea is supported by observations in problem domains as diverse as molecular folding (Schuster, 1994),evolvablehardware(HarveyandThompson, 1996; Vassilevetal., 1999), andevolution-ary robotics (Harvey et al., 1997). One symptom of evolutionary search in the presenceof neutral networks is the existence of long periods of ﬁtness stasis (search along aneutral network) punctuated by occasional ﬁtness leaps (transitions to a higher neutralnetwork). TheRoyalStaircaseclassofﬁtnessfunctionscapturetheseessentialelementsin a simpliﬁed form (van Nimwegen and Crutchﬁeld, 1998). A formal deﬁnition of theRoyal Staircase class of functions is given below.1. Genotypes are speciﬁed by binary strings
s
=
s
1
s
2
...s
L
,
s
i
∈ {
0
,
1
}
, of length
L
=
NK
, where
N
is the number of blocks and
K
the number of bits per block.2. Starting from the ﬁrst position, the number
I
(
s
)
of consecutive 1s in a string iscounted.3. Theﬁtness
f
(
s
)
ofstring
s
with
I
(
s
)
consecutiveones, followedbyazero, is
f
(
s
) =1+
I
(
s
)
/K
. The ﬁtness is thus an integer between 1 and
N
+1
, corresponding to1 plus the number of consecutive fully-set blocks starting from the left.4. The single global optimum is
s
= 1
L
; namely, the string of all 1s.Fixing
N
and
K
determines a particular problem or ﬁtness landscape.
2.2
NK
Landscapes
The
NK
family of landscapes was introduced by Kauffman (1989) in order to have aproblem-independent model for constructing ﬁtness landscapes that can gradually betuned from smooth to rugged. In the
NK
model,
N
refers to the number of genes inthe genotype (i.e. the string length) and
K
, to the number of genes that inﬂuence aparticular gene
2
. In other words, the ﬁtness contribution of each gene is determined by the gene itself plus
K
other genes in the genotype. According to Kauffman, most
1
These functions are related to the more familiar
Royal Road
functions (Mitchell et al., 1992).
2
Notice that the meaning of parameters
N
and
K
differs from their meaning on the Royal Staircase classof functions.Evolutionary Computation Volume x, Number x
3
G. Ochoa
propertiesofthismodelareindependentofthealphabetsize
A
,hencethesimplestcaseof
A
= 2
(i.e. bit strings) is considered here.By increasing the value of
K
from 0 to
N
−
1
,
NK
landscapes can be tuned fromsmooth to rugged. When
K
is small, neighboring strings will have small differences inﬁtness, because the bits that are different in the two strings will inﬂuence the contribu-tion of only few bits in each string. The extreme case of
K
= 0
yields a single-peakedand smooth ‘Fujiyama’ landscape. When
K
is large, on the other hand, neighboringstringswillhavelargedifferencesinﬁtness, becausethedifferingbitsofthetwostringswill inﬂuence the ﬁtness of a large number of bits in each string. When
K
assumes itslargest possible value (
K
=
N
−
1
), the ﬁtness landscape will be completely random or“uncorrelated”, because changing the value of only one bit changes the ﬁtness contri- butionofallbitsinthestring, sotheoverallﬁtnessofneighboringstringswillbetotallydifferent. The
NK
landscape, however, was invented not to explore the two extremelandscapes, but to have a model which allows the construction of an ordered family of tunable correlated landscapes.
2.3 Multiple Knapsack
The combinatorial optimization problem described here, called the 1/0 multiple knap-sack problem, follows the speciﬁcations given by Khuri et al.(Khuri et al., 1994). Thisproblem is a generalization of the 0/1 simple Knapsack problem where a single knap-sack of capacity
C
, and
n
objects are given. Each object has a weight
w
i
and a proﬁt
p
i
. The objective is to ﬁll the knapsack with objects producing the maximum proﬁt
P
. In other words, to ﬁnd a vector
x
= (
x
1
,x
2
,...,x
n
)
where
x
i
∈ {
0
,
1
}
, suchthat
ni
=1
w
i
x
i
≤
C
and for which
P
(
x
) =
ni
=1
p
i
x
i
is maximized. The multipleversion consists of
m
knapsacks of capacities
c
1
,c
2
,...,c
m
and
n
objects with proﬁts
p
1
,p
2
,...,p
n
. Each object has
m
possible weights: object
i
weighs
w
ij
when consid-ered for inclusion in knapsack
j
(
1
≤
j
≤
m
). Again, the objective is to ﬁnd a vector
x
= (
x
1
,x
2
,...,x
n
)
that guarantees that no knapsack is overﬁlled:
ni
=1
w
ij
x
i
≤
c
j
for
j
= 1
,
2
,...,m
; and that yields maximum proﬁt
P
(
x
) =
ni
=1
p
i
x
i
. This problem leadsnaturallytoabinaryencoding. Eachstring
x
1
x
2
...x
n
representsapotentialsolution. If the
ith
positionhasthevalue1thenthe
ith
objectisinallknapsacks;otherwise,itisnot.Notice that a string may represent an infeasible solution. A vector
x
= (
x
1
,x
2
,...,x
n
)
that overﬁlls at least one of the knapsacks; i.e., for which
ni
=1
w
ij
x
i
> c
j
for some
1
≤
j
≤
m
, is an infeasible string. Rather than discarding infeasible strings, the ap-proach suggested by Khuri et al. (Khuri et al., 1994) is to allow infeasible strings to jointhe population. A penalty term reduces the ﬁtness of infeasible strings. The fartheraway from feasibility, the higher the penalty term of a string.Themultiple-knapsackprobleminstanceselectedherecontains30sacksand60ob- jects(thatisastringlengthof60); itsknownoptimumis7772. Thisinstanceisavailableonline from the OR-library (Beasley, 1990), where it is identiﬁed as “Sento 1”.
2.4 Wing-Box Problem
For the Wing-Box problem (McIlhagga et al., 1996) an industrial partner, BritishAerospace, provided data from a real Airbus wing box. A common problem whendesigning aircraft structures, is to deﬁne structures of minimum weight that can with-stand a given load. Figure 1 sketches the elements of a wing relevant to this problem.The wing is supported at regular intervals by slid ribs which run parallel to the air-craft’s fuselage. On the upper part of the wing, thin metal panels cover the gap sep-arating adjacent ribs. The objective is to ﬁnd the number of panels and the thickness4
Evolutionary Computation Volume x, Number x
Error Thresholds in Genetic Algorithms
of each of these panels while minimizing the mass of the wing and ensuring that noneof the panels buckle under maximum operational stresses. For the experiments in thispaper, the number of panels was set to 50. For encoding an individual, 13 bits were re-quired for the ﬁrst panel, and 3 for each of the others 49 panels, that is
13+3
×
49 = 160
bits.
FuselageTop panelCavityRibsRib pitch
Figure 1:
Relevant elements of a wing. Wing dimensions are ﬁxed. The variable elements arethe number of ribs and the thickness of the top panels.
3 Consensus Sequence Plots
TheworkofBonhoefferandStadler(1993)studiedtheevolutionofquasispeciesontwocorrelated ﬁtness landscapes, the Sherrington Kirkpatrick spin glass and the GraphBipartitioning landscape. The authors described an empirical approach for locatingthresholds on complex landscapes. In this section, this approach is borrowed andadapted. Instead of the quasispecies model, a GA is used as the underlying modelof evolution. The resulting method can be applied for identifying error thresholds inGAs running on general complex landscapes. The approach is to calculate and plotthe consensus sequence at equilibrium for a range of mutation rates. The consensussequence in a population is deﬁned as the sequence of predominant symbols (bits) ineach position; it is plotted as follows: if the majority of individuals has a ‘1’ or ‘0’ ina position
i
the ﬁeld is plotted white or black, respectively. The ﬁeld is plotted grey if the position is undecided. Figure 2, shows an hypothetical population and calculatesits consensus sequence. The plot shown in Figure 2 will correspond to a single linein a full consensus sequence plot where the per bit mutation rate is varied (see Figure
??
for an example of a full plot). The
equilibrium state
is reached when the proportionof different sequences in the population is stationary. This happens when evolutionis simulated for a large enough number of generations. In practice, it is consideredthat the equilibrium is reached when several parameters of the population (e.g. themaximal and average ﬁtness) reach equilibrium. According to Bonhoeffer and Stadler(1993) the error threshold may be approached from
below
or
above
, with both methodsproducing similar results.
3.1 Approaching the error threshold from below
To approach the error threshold from below, the simulation starts with a homogeneouspopulation at the global optimum. This approach requires knowing the optimal string
Evolutionary Computation Volume x, Number x
5

Related Search

Genetic AlgorithmsGenetic Algorithms Updated by Latest Bio AlgoHuman Based Genetic Algorithmscurtains, thresholds in ArtThe Role of Grammar and Error Correction in Terror analysis in SLAGenetic and Immunogenetic Analysis In DiffereGenetic diversity in plants, Plant Molecular Doors, Thresholds, and Closure Systems in RomGenetic Algorithm in Optimization

Similar documents

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks