Introduction
Professor John Holland in 1975 proposed an attractive class of computational
models, called Genetic Algorithms (GA), that mimic the biological
evolution process for solving problems in a wide domain. The mechanisms
under GA have been analyzed and explained later by Goldberg, De
Jong, Davis, Muehlenbein, Chakraborti, Fogel, Vose
and many others. Genetic Aalgorithms has three major applications, namely, intelligent
search, optimization and machine learning. Currently, Genetic Algorithms is used along with
neural networks and fuzzy logic for solving more complex problems. Because of their joint usage in many problems, these together are often referred to by a
generic name: “;soft-computing”.
A Genetic Algorithms operates through a simple cycle of stages:
models, called Genetic Algorithms (GA), that mimic the biological
evolution process for solving problems in a wide domain. The mechanisms
under GA have been analyzed and explained later by Goldberg, De
Jong, Davis, Muehlenbein, Chakraborti, Fogel, Vose
and many others. Genetic Aalgorithms has three major applications, namely, intelligent
search, optimization and machine learning. Currently, Genetic Algorithms is used along with
neural networks and fuzzy logic for solving more complex problems. Because of their joint usage in many problems, these together are often referred to by a
generic name: “;soft-computing”.
A Genetic Algorithms operates through a simple cycle of stages:
- i) Creation of a “;population” of strings,
- ii) Evaluation of each string,
- iii) Selection of best strings and
- iv) Genetic manipulation to create new population of strings.
The cycle of a Genetic Algorithms is presented below
Each cycle in Genetic Algorithms produces a new generation of possible solutions for a
given problem. In the first phase, an initial population, describing
representatives of the potential solution, is created to initiate the search
process. The elements of the population are encoded into bit-strings, called chromosomes. The performance of the strings, often called fitness, is then
evaluated with the help of some functions, representing the constraints of the
problem. Depending on the fitness of the chromosomes, they are selected for a
subsequent genetic manipulation process. It should be noted that the selection
process is mainly responsible for assuring survival of the best-fit individuals.
After selection of the population strings is over, the genetic manipulation
process consisting of two steps is carried out. In the first step, the crossover
operation that recombines the bits (genes) of each two selected strings
(chromosomes) is executed. Various types of crossover operators are found in
the literature. The single point and two points crossover operations are
illustrated
. The crossover points of any two
chromosomes are selected randomly. The second step in the genetic
manipulation process is termed mutation, where the bits at one or more
randomly selected positions of the chromosomes are altered. The mutation
process helps to overcome trapping at local maxima. The offsprings produced
by the genetic manipulation process are the next population to be evaluated.
given problem. In the first phase, an initial population, describing
representatives of the potential solution, is created to initiate the search
process. The elements of the population are encoded into bit-strings, called chromosomes. The performance of the strings, often called fitness, is then
evaluated with the help of some functions, representing the constraints of the
problem. Depending on the fitness of the chromosomes, they are selected for a
subsequent genetic manipulation process. It should be noted that the selection
process is mainly responsible for assuring survival of the best-fit individuals.
After selection of the population strings is over, the genetic manipulation
process consisting of two steps is carried out. In the first step, the crossover
operation that recombines the bits (genes) of each two selected strings
(chromosomes) is executed. Various types of crossover operators are found in
the literature. The single point and two points crossover operations are
illustrated
. The crossover points of any two
chromosomes are selected randomly. The second step in the genetic
manipulation process is termed mutation, where the bits at one or more
randomly selected positions of the chromosomes are altered. The mutation
process helps to overcome trapping at local maxima. The offsprings produced
by the genetic manipulation process are the next population to be evaluated.
Fig.: Mutation of a chromosome at the 5th bit position.
Example: The Genetic Algorithms cycle is illustrated in this example for maximizing a
function f(x) = x2 in the interval 0 = x = 31. In this example the fitness
function is f (x) itself. The larger is the functional value, the better is the
fitness of the string. In this example, we start with 4 initial strings. The fitness
value of the strings and the percentage fitness of the total are estimated in
Table A. Since fitness of the second string is large, we select 2 copies of the
second string and one each for the first and fourth string in the mating pool.
The selection of the partners in the mating pool is also done randomly. Here in
table B, we selected partner of string 1 to be the 2-nd string and partner of
4-th string to be the 2nd string. The crossover points for the first-second and
second-fourth strings have been selected after o-th and 2-nd bit positions
respectively in table B. The second generation of the population without
mutation in the first generation is presented in table C.
Table A:
Table B:
Table C:
function f(x) = x2 in the interval 0 = x = 31. In this example the fitness
function is f (x) itself. The larger is the functional value, the better is the
fitness of the string. In this example, we start with 4 initial strings. The fitness
value of the strings and the percentage fitness of the total are estimated in
Table A. Since fitness of the second string is large, we select 2 copies of the
second string and one each for the first and fourth string in the mating pool.
The selection of the partners in the mating pool is also done randomly. Here in
table B, we selected partner of string 1 to be the 2-nd string and partner of
4-th string to be the 2nd string. The crossover points for the first-second and
second-fourth strings have been selected after o-th and 2-nd bit positions
respectively in table B. The second generation of the population without
mutation in the first generation is presented in table C.
Table A:
Table B:
Table C:
A Schema (or schemata in plural form) / hyperplane or similarity
template is a genetic pattern with fixed values of 1 or 0 at some
designated bit positions. For example, S = 01?1??1 is a 7-bit schema with
fixed values at 4-bits and don’t care values, represented by ?, at the remaining
3 positions. Since 4 positions matter for this schema, we say that the schema
contains 4 genes.
template is a genetic pattern with fixed values of 1 or 0 at some
designated bit positions. For example, S = 01?1??1 is a 7-bit schema with
fixed values at 4-bits and don’t care values, represented by ?, at the remaining
3 positions. Since 4 positions matter for this schema, we say that the schema
contains 4 genes.
Deterministic Explanation of
Holland’s Observation
To explain Holland’s observation in a deterministic manner let us presume the following assumptions.
- i) There are no recombination or alternations to genes.
- ii) Initially, a fraction f of the population possesses the schema S and those
individuals reproduce at a fixed rate r. - iii) All other individuals lacking schema S reproduce at a rate s < r.
Thus with an initial population size of N, after t generations, we find Nf r t
individuals possessing schema S and the population of the rest of the
individuals is N(1 – f) st. Therefore, the fraction of the individuals with
schema S is given by
For small t and f, the above fraction reduces to f (r / s) t , which
means the population having the schema S increases exponentially at a rate
(r / s). A stochastic proof of the above property will be presented shortly, vide
a well-known theorem, called the fundamental theorem of Genetic
algorithm.
individuals possessing schema S and the population of the rest of the
individuals is N(1 – f) st. Therefore, the fraction of the individuals with
schema S is given by
For small t and f, the above fraction reduces to f (r / s) t , which
means the population having the schema S increases exponentially at a rate
(r / s). A stochastic proof of the above property will be presented shortly, vide
a well-known theorem, called the fundamental theorem of Genetic
algorithm.
Stochastic Explanation of Genetic Algorithms
For presentation of the fundamental theorem of Genetic Algorithms, the following
terminologies are defined in order.
terminologies are defined in order.
Definition: The order of a schema H, denoted by O(H), is the number
of fixed positions in the schema. For example, the order of schema H =
?001?1? is 4, since it contains 4 fixed positions.
of fixed positions in the schema. For example, the order of schema H =
?001?1? is 4, since it contains 4 fixed positions.
For example, the schema ?1?001 has a defining length d(H) = 4 – 0 = 4, while
the d(H) of ???1?? is zero.
the d(H) of ???1?? is zero.
Definition: The schemas defined over L-bit strings may be
geometrically interpreted as hyperplanes in an L- dimensional hyperspace
(a binary vector space) with each L-bit string representing one corner point in
an n-dimensional cube.
geometrically interpreted as hyperplanes in an L- dimensional hyperspace
(a binary vector space) with each L-bit string representing one corner point in
an n-dimensional cube.
THE FUNDAMENTAL THEOREM OF GENETIC ALGORITHMS
(Schema theorem)
Let the population size be N, which contains mH (t) samples of schema H at
generation t. Among the selection strategies, the most common is the
proportional selection. In proportional selection, the number of copies of
chromosomes selected for mating is proportional to their respective fitness
values. Thus, following the principles of proportional selection, a string i
is selected with probability
generation t. Among the selection strategies, the most common is the
proportional selection. In proportional selection, the number of copies of
chromosomes selected for mating is proportional to their respective fitness
values. Thus, following the principles of proportional selection, a string i
is selected with probability
where fi is the fitness of string i. Now, the probability that in a single
selection a sample of schema H is chosen is described by
where f H is the average fitness of the mH (t) samples of H.
selection a sample of schema H is chosen is described by
where f H is the average fitness of the mH (t) samples of H.
Thus, in a total of N selections with replacement, the expected number of
samples of schema H in the next generation is given by
where fav is the population average fitness in generation t. The last expression
describes that the Genetic algorithm allocates over an increasing number of
trials to an above average schema. The effect of crossover and mutation can be
incorporated into the analysis by computing the schema survival probabilities
of the above average schema. In crossover operation, a schema H survives if
the cross-site falls outside the defining length d(H). If pc is the probability of
crossover and L is the word-length of the chromosomes, then the disruption
probability of schema H due to crossover is
A schema H, on the other hand, survives mutation, when none of its
fixed positions is mutated. If pm is the probability of mutation and O(H) is the
order of schema H, then the probability that the schema survives is given by
Therefore, under selection, crossover and mutation, the sample size of
schema H in generation (t + 1) is given by
samples of schema H in the next generation is given by
where fav is the population average fitness in generation t. The last expression
describes that the Genetic algorithm allocates over an increasing number of
trials to an above average schema. The effect of crossover and mutation can be
incorporated into the analysis by computing the schema survival probabilities
of the above average schema. In crossover operation, a schema H survives if
the cross-site falls outside the defining length d(H). If pc is the probability of
crossover and L is the word-length of the chromosomes, then the disruption
probability of schema H due to crossover is
A schema H, on the other hand, survives mutation, when none of its
fixed positions is mutated. If pm is the probability of mutation and O(H) is the
order of schema H, then the probability that the schema survives is given by
Therefore, under selection, crossover and mutation, the sample size of
schema H in generation (t + 1) is given by
The above theorem is called the fundamental theorem of GA or the
Schema theorem. It is evident from the Schema theorem that for a given set
of values of d(H), O(H), L, pc and pm , the population of schema H at the
subsequent generations increases exponentially when fH > fav. This, in fact,
directly follows from the difference equation:
Replacing Δ by (E -1), where E is the extended difference operator, we find
Since m H (t) in equation (E-1-K1)mH(t) ≥ 0 is positive, E ≥ (1 +K1 ). Thus, the
solution of (15.12) is given by
where A is a constant. Setting the boundary condition at t = 0, and substituting
the value of K1 by therein, we finally have:
Schema theorem. It is evident from the Schema theorem that for a given set
of values of d(H), O(H), L, pc and pm , the population of schema H at the
subsequent generations increases exponentially when fH > fav. This, in fact,
directly follows from the difference equation:
Replacing Δ by (E -1), where E is the extended difference operator, we find
Since m H (t) in equation (E-1-K1)mH(t) ≥ 0 is positive, E ≥ (1 +K1 ). Thus, the
solution of (15.12) is given by
where A is a constant. Setting the boundary condition at t = 0, and substituting
the value of K1 by therein, we finally have:
Since K is a positive number, and f H / fav > 1, mH (t) grows
exponentially with iterations. The process of exponential increase of mH (t)
continues until some iteration r, when fH approaches fav. This is all about the
proof of the schema theorem.
exponentially with iterations. The process of exponential increase of mH (t)
continues until some iteration r, when fH approaches fav. This is all about the
proof of the schema theorem.
The Markov Model for
Convergence Analysis
For the sake of understanding, let us now consider the population size =
3 and the chromosomes are 2-bit patterns, as presumed earlier. The set S now
takes the following form.
S = {(00, 00, 00), (00, 00, 01), (00, 00, 10), (00, 00, 11),
(00, 01, 00), (00, 01, 01), (00, 01, 10), (00, 01, 11),
…. … ….. …..
…. … …. ….
(11, 11, 00), (11, 11, 01), (11, 11, 10), (11, 11, 11) }
3 and the chromosomes are 2-bit patterns, as presumed earlier. The set S now
takes the following form.
S = {(00, 00, 00), (00, 00, 01), (00, 00, 10), (00, 00, 11),
(00, 01, 00), (00, 01, 01), (00, 01, 10), (00, 01, 11),
…. … ….. …..
…. … …. ….
(11, 11, 00), (11, 11, 01), (11, 11, 10), (11, 11, 11) }
It may be noted that the number of elements of the last set S is 64. In
general, if the chromosomes have the word length of m bits and the number of
chromosomes selected in each Genetic Algorithm cycle is n, then the cardinality of the set S
is 2 ^ mn.
The Markov transition probability matrix P for 2-bit strings of
population size 2, thus, will have a dimension of (16 x 16), where the element
pij of the matrix denotes the probability of transition from i-th to j-th state. A
clear idea about the states and their transitions can be formed from fig.
It needs mention that since from a given i-th state, there could be a
transition to any 16 j-th states, therefore the row sum of P matrix must be 1.
Formally,
for a given i.
general, if the chromosomes have the word length of m bits and the number of
chromosomes selected in each Genetic Algorithm cycle is n, then the cardinality of the set S
is 2 ^ mn.
The Markov transition probability matrix P for 2-bit strings of
population size 2, thus, will have a dimension of (16 x 16), where the element
pij of the matrix denotes the probability of transition from i-th to j-th state. A
clear idea about the states and their transitions can be formed from fig.
It needs mention that since from a given i-th state, there could be a
transition to any 16 j-th states, therefore the row sum of P matrix must be 1.
Formally,
for a given i.
Now, let us assume a row vector πt, whose k-th element denotes the
probability of occurrence of the k-th state at a given genetic iteration (cycle) t;
then πt + 1 can be evaluated by
Thus starting with a given initial row vector π0 , one can evaluate the state
probability vector after n-th iteration πn by
probability of occurrence of the k-th state at a given genetic iteration (cycle) t;
then πt + 1 can be evaluated by
Thus starting with a given initial row vector π0 , one can evaluate the state
probability vector after n-th iteration πn by
The behavior of Genetic Algorihms without mutation can be of the following three types.
i) The GA may converge to one or more absorbing states (i.e., states
wherefrom the GA has no transitions to other states).
ii) The GA may have transition to some states, wherefrom it may
terminate to one or more absorbing states.
iii) The GA never reaches an absorbing state.
Taking all the above into account, we thus construct P as a partitioned matrix
of the following form:
where I is an identity matrix of dimension (a x a) that corresponds to the
absorbing states; R is a (t x a) transition sub-matrix describing transition to an
absorbing state; Q is a (t x t) transition sub-matrix describing transition to
transient states and not to an absorbing state and 0 is a null matrix of
dimension (t x t).
i) The GA may converge to one or more absorbing states (i.e., states
wherefrom the GA has no transitions to other states).
ii) The GA may have transition to some states, wherefrom it may
terminate to one or more absorbing states.
iii) The GA never reaches an absorbing state.
Taking all the above into account, we thus construct P as a partitioned matrix
of the following form:
where I is an identity matrix of dimension (a x a) that corresponds to the
absorbing states; R is a (t x a) transition sub-matrix describing transition to an
absorbing state; Q is a (t x t) transition sub-matrix describing transition to
transient states and not to an absorbing state and 0 is a null matrix of
dimension (t x t).
It can be easily shown that Pn for the above matrix P can be found to
be as follows.
where the n-step transition matrix Nn is given by
As n approaches infinity,
Consequently, as n approaches infinity,
Goodman has shown [8] that the matrix (I – Q) -1 is guaranteed to exist.
Thus given an initial probability vector p 0, the chain will have a transition to
an absorbing state with probability 1. Further, there exists a non-zero
probability that ab sorbing state will be the globally optimal state .
We now explain: why the chain will finally terminate to an absorbing
state. Since the first ‘a’ columns for the matrix Pn, for n→∞, are non-zero and
the remaining columns are zero, therefore, the chain must have transition to
one of the absorbing states. Further, note that the first ‘a’ columns of the row
vector p n for n→∞ denote the probability of absorption at different states,
and the rest of the columns denote that the probability of transition to nonabsorbing
states is zero. Thus probability of transition to absorbing states is
one. Formally,
be as follows.
where the n-step transition matrix Nn is given by
As n approaches infinity,
Consequently, as n approaches infinity,
Goodman has shown [8] that the matrix (I – Q) -1 is guaranteed to exist.
Thus given an initial probability vector p 0, the chain will have a transition to
an absorbing state with probability 1. Further, there exists a non-zero
probability that ab sorbing state will be the globally optimal state .
We now explain: why the chain will finally terminate to an absorbing
state. Since the first ‘a’ columns for the matrix Pn, for n→∞, are non-zero and
the remaining columns are zero, therefore, the chain must have transition to
one of the absorbing states. Further, note that the first ‘a’ columns of the row
vector p n for n→∞ denote the probability of absorption at different states,
and the rest of the columns denote that the probability of transition to nonabsorbing
states is zero. Thus probability of transition to absorbing states is
one. Formally,
No comments:
Post a Comment