Prediction of Soil Liquefaction Using Genetic Programming

American Society of Civil Engineering
Egypt Section (ASCE-EGS)

Egyptian Society of Irrigation Engineers
III Middel East Regional Conference on Civil Engineering Technology

& III International Symposium on Environmental Hydrology 2002
.
Prediction of Soil Liquefaction

Using Genetic Programming
Ezzat A. Fattah1, Hossam E.A. Ali2, Ahmed M. Ebid3
ABSTRACT
In most geotechnical problems, it is too difficult to predict soil and structural behavior
accurately, because of the large variation in soil parameters and the assumptions of numerical
solutions. But recently many geotechnical problems are solved using Artificial Intelligence
(AI) techniques, by presenting new solutions or developing existing ones. Genetic
Programming, (GP), is one of the most recently developed (AI) techniques based on Genetic
Algorithm (GA) technique.
In this research, GP technique is utilized to develop prediction criteria for liquefaction
phenomena in cohesivless soils using collected historical records. The liquefaction formula is
developed using special software written by the authors in Visual C++ language. The
accuracy of the developed formula was also compared with earlier prediction methods.
Keywords: Soil Liquefaction, Earthquake Engineering, GA, GP, and AI
INTRODUCTION
Liquefaction is a disastrous phenomenon that happens in saturated soils and is usually
triggered by seismic or dynamic shaking. This phenomenon occurs due to the soil's inability
to quickly dissipate pore water pressure buildup under sudden loading. The sudden increase in
pore water pressure as well as the dynamic-induced stresses can bring the soil structure to an
unstable condition (Kramer 1996).
For a long time, liquefaction was exclusively considered as an earthquake-related
phenomenon. Researches in the last three decades have shown that this phenomenon can be
triggered under different conditions and it eventually leads to significant soil densification.
The nature of soil disturbance and densification process plays a crucial factor in liquefaction
hazard evaluation tasks.
Recognizing that soil liquefaction can lead to catastrophic damages, various efforts in the past
have been conducted in order to understand this complex phenomenon. Many liquefaction
assessment criteria have been established based on empirical, conventional and correlations
techniques such as proposed by Seed and DeAlba (1986), Mitchell and Tseng (1990), Stark
and Olson (1995), Shibata and Teparaksa (1988) among others. The most common factor in
many of these approaches is the use of data obtained from in-situ tests such as Cone
Penetration Test (CPT). This is because the CPT as well as other in-situ tests, are influenced
by many soil and site variables such as soil density, soil structure, cementation, stress state
and stress history (Rebertson and Campanella 1985).
1
2
3
Prof. of soil mechanics, Ain Shams University, Cairo, Egypt

Teacher of soil mechanics, Ain Shams University, Cairo, Egypt
Graduate student, Ain Shams University, Cairo, Egypt



.
In this research, a promising technique called Genetic Programming (GP) based on Genetic
Algorithm (GA) will be used to develop a formula to predict the liquefaction potential in
sandy soils. The used historical records contain CPT test results, mean diameter of soil
particles, seismic shear stress ratio, earthquake magnitude and liquefaction observations.
GENETIC ALGORITHM (GA)
The Genetic Algorithm (GA) is an Artificial Intelligence (AI) technique, based on simulating
the natural reproduction process, following the well-known Darwin's rule "The fittest
survive". The natural selection theory for Darwin assumes that, for a certain population, there
is always some differences between its members. These differences make some members
more suitable for the surrounding conditions than the others. Accordingly, they have better
chances to survive and reproduce a next generation with enhanced properties. Generation
after generation most of the population will have these suitable properties, meanwhile the
unsuitable members will eventually be diminished. In other words, during the reproduction
process, the natural selection increases the fitness of the population, which means that this
population is developed to suite the surrounding conditions. In the natural reproduction
process, certain sequence of (DNA) characters represent properties of members, each
character is called "Gene", and every set of genes is called "Chromosome" (Michalewicz,
1992).
The theory of biological reproduction process was first simulated mathematically by John
Holland, 1975, where genes and chromosomes are replaced by a parameters and solutions
respectively, and the surrounding conditions are represented by a fitting function. Hence,
according to Darwin's rule, during the reproduction process the population is developed to
suite the fitting function (Holland 1975). The most important advantage of GA technique is
its generality and its applicability to very wide range of engineering problems. This is because
GA technique is not depending on type of data. Encoding the problem parameters in genetic
form is the first and the most important step in the GA solution.
The standard GA procedure consists of four main steps as depicted in Fig.1. First, a random
population of solutions are generated and encoded in genetic form. Second, using a certain
fitting function, an evaluating of the fitness of each solution is conducted. Then, the solutions
according to there fitness are arranged and the unsuitable (least qualified) solutions are
destroyed. Finally, producing new solutions to keep the population size constant by applying
crossover operator on the surviving solutions. Mutation operator may be applied and then the
cycle started again by evaluating the fitness of the new solutions and so on until the solution
accuracy is accepted (Michalewicz 1992).
GENETIC PROGRAMMING (GP)
GP is one of the most recent developed knowledge-base techniques and it is next
development to the GA, which can be defined as Multivariable Interpolation Procedure
(MLP). The basic concept of GP is to find the best fitting surface in hyper-space for a certain
given points using GP technique. In order to use GA, the previous steps will be followed,
(Koza, 1994).



.
Figure 1: Flow chart for GA procedure

First, fitness evaluation method (function to be optimized) has to be determined. Fitness of
surface is represented herein by the summation of squared errors (SSE), which has to be
minimized for best fitting surface. The SSE is calculated by
SSE = [ GP prediction - Target output ]2
(1)
Then, conducting the most important step in GP which is encoding of chromosomes

(i.e., determination of number of genes for each chromosome, and arrangement of genes on
the chromosome) to represent a formula in genetic form. By doing so, some important points
have to be considered:
1. Any set of points in certain domain of hyper space can be represented by many surfaces
with deferent accuracy depending on the complexity of these surfaces.
2. Any complicated equation can be constructed from certain basic functions (operators)
such as (=, +, -, x, /, sin, cos.etc. ).
3. The most simple case is to use only the five basic operators (=, +, -, x, /) to construct
a polynomial equation.
4. The five basic operators have two inputs and one output except the operator (=) which has
one input and one output.



.
Therefore, to create a formula in genetic form, a binary tree structure will be constructed
using the aforementioned five basic operators (i.e. =, +, -, x, /). This tree structure is
graphically represented in Fig. 2.
Using the previous operators, any polynomial can be represented in a tree form. The more
complexity of the formula, the more levels of tree are needed to represent it. An example for
representing formulas in a tree form is shown in Fig. 3.
Figure 2: The five basic operators in GP
Figure 3: Mathematical and genetic representation of binary tree

As shown in Fig. 3, each chromosome consists of two parts; operators part and variables
part. Operators part represents all the tree except the level 0 and it consists of (2 No. of levels - 1)
genes. The variables part represent only the level 0 of the tree and consists of (2 No. of levels)
genes. Therefore, the total number of genes on every chromosome is ( 2 No. of levels + 1 - 1)
genes.



.
After conducting encoding of chromosome procedure, the procedures to apply the genetic
operations (crossover and mutation) have to be performed. Mutation is very simple operation
to replace some randomly selected genes with random operator (in operators part) or variable
(in variables part). Oppositely, the crossover procedure is not that simple, because the
components of the two parts of the chromosome must not be mixed during crossover. On the
other hand, the new chromosomes generated during crossover must have some features from
their parents. That means that they cannot be generated randomly. So there were two ways to
apply the crossover, the first method, which is suggested by Riccardo (1996), is called twopoint crossover. In this technique, crossover procedure is applied on the two parts of the
chromosome independently. Thus, a certain number of genes from the operators part of one
parent will be swapped with their image from the other parent, and the same operation will be
applied on the variables part too as shown in Fig. 4.
Figure 4: Two-point crossover method

The second way to apply crossover was proposed by the authors. In this technique, a new
generation of chromosomes is generated by randomly selecting each gene from the similar
surviving chromosomes. In other words, the first gene of the new chromosome will be
selected randomly from the first genes of the whole surviving set of parent chromosomes, and
so do the next genes. This process is depicted in Fig. 5 for three parents and one child.
Figure 5: Random selection crossover method



.
In random selection crossover technique number of survivors from one generation to the next
can be chosen. On the other hand, in two-point crossover technique number of survivors must
be half of the population. For this reason, the random selection crossover technique was
chosen in this research to carry out the crossover operation in the developed software.
Practically, the fastest conversion occurs when the number of survivors equals to 30-40%
from the population. For less number of survivors the solution may be trapped in a local
minima. On other hand, if the number of survivors is more than 50% of the population, the
conversion will be very slow.
After the previous three steps, GA can be applied on the first and randomly created
generation. Generation after generation, the fitness will increase (which means a decrease in
SSE). After a certain number of generations the fitness will settle at a certain value (with
minimum SSE). At this stage, the corresponding chromosomes represent the most fitting
surface for this number of tree levels (which means for this degree of complexity). If the
accuracy of this surface is not enough, larger number of tree levels must be used (Riccardo,
1996).
PREDICTION OF SOIL LIQUEFACTION USING GP
GP as a Multivariable Interpolation procedure has a wide range of applications in the
geotechnical field. Various empirical formulas (based on observations or experimental
results) can be enhanced using GP. Correlation between site investigation tests as well as soil
parameters could be formed in certain equations instead of experience or engineering
judgment.
Liquefaction of sand is a good example for applying GP in geotechnical field, as there is no
certain formula based on mathematical derivation to predict the phenomena of sand
liquefaction. There are however number of observations for this phenomenon after
earthquakes, from records of soil parameters of attacked zones. Many empirical formulas are
developed to relate soil parameters and earthquake main characteristics with the potential of
liquefaction.
In order to explain GP procedure utilized in this research, a simple example of soil
liquefaction will be presented, for simplicity only five observations (records) will be
interpolated and a tree of only two levels will be used. The data observed are the magnitude
of the earthquake (M), the mean diameter of sand particles (D50) and the tip resistance from
cone penetration test (CPT). The data of the five observations are summarized in the
following table.
Table 1: Sample of available liquefaction case histories

D50
CPT
State
(mm)
kPa
7.5
6.4
0.33
0.40
3.14
11.8
Liquefaction
No Liquefaction
7.8
5.9
7.1
0.17
0.10
0.26
1.47
5.7
10.0
Liquefaction
No Liquefaction
Liquefaction



.
Assume that the value of the function is equal to 1 in case of liquefaction and equal 0 in case
of no liquefaction, thus the fitness (summation of squared error) of each equation can be
calculated from the following formula.
(2)
SSE = [ f(M,D,CPT ) - ( 1.0 or 0.0 ) ]2
Applying GP procedure on the five available cases is summarized in Fig.6, where the cycles
of generating random formulas, calculating their fitness, choosing the survivors and applying
crossover operator to generate the next generation are all presented graphically until the
solutions is settled on the best fitting formula with minimum SSE.
Figure 6: Procedure of using GP to predict liquefaction



.
From Fig.6, the best fitting surface for these five observations is:
P=
M. D 50
2. CPT
(3)
Where P is the probability of prediction of liquefaction. As shown in table 2, the minimum

value of P triggered liquefaction is 0.4. Therefore, if P is equal to or larger than 0.4 then
liquefaction is likely to accrue, and if P is less than 0.4 then there is no liquefaction. The
evaluation of the accuracy of the formula is summarized in Table 2.
M
7.5
6.4
7.8
5.9
7.1
Table 2: Accuracy evaluation of the developed formula

D50
CPT
P
Prediction
Observation
0.33
0.40
0.17
0.10
0.26
3.140
11.80
1.47
5.70
10.0
0.40
0.10
0.45
0.05
0.10
Liquefaction
No
Liquefaction
No
No
Liquefaction
No
Liquefaction
No
Liquefaction
From Table 2, only one observation is incorrect. Accordingly, the accuracy of this formula
can be considered 80%. For more prediction accuracy, a tree with more levels must be used.
RESULTS
For accurate prediction of liquefaction potential, three trails had been carried out using the
developed software as previously demonstrated. The first trail has only two levels, the second
one has three levels, and the last one has four levels. All the trails used the same data, which
compiled by Olson (1995) and contains 174 records. The results of applying GP procedure
on the available 174 record are summarized in Table 3, which contains the best fitting
formula for each trail and its SSE and prediction accuracy.
Table 3: The best estimated function for each trails and its fitness
No. of
Levels
Prediction
Accuracy %
M + D50
M + CPT
25.21
84
2.SSR + M + D 50
M + CPT
23.23
85
21.14
88
P=
SSE
Best estimated function
P=
P=
M + 3. D 50
CPT
M
+
1
SSR + D 50 CPT



.
In Table 3, M is the magnitude of the earthquake, D50 is Mean diameter of soil particles in
mm, CPT is the tip resistance from Cone Penetration Test in MPa, SSR is the site Seismic
Shear stress Ratio and P is the probability of liquefaction, which less than 0.50 in case of no
liquefaction, and more than or equal 0.5 in case of liquefaction.
Furthermore, in order to determined the best valid range of parameters that yields satisfied
prediction accuracy, the range of the mean diameter of sand particles is divided into four
zones and the prediction accuracy of the formula is determined for each zone, as shown in
Fig. 7. It is clearly shown that most of the misclassifications in prediction occurred with very
fine soils
(D50 < 0.1 mm), which could be classified as silty sand. Therefore, it is
recommended to use this formula in case of clean medium to fine sand (D50 > 0.1 mm).
100
90
80
Accuracy (%)
70
D50 < 0.1mm
60
0.1 < D50 < 0.2
50
40
0.2 < D50 < 0.3
30
0.3 < D50
20
10
0
2 Levels
3 Levels
4 Levels
No. of Levels
Figure 7: Relation between accuracy of formula

and mean diameter of soil particles
COMPARISON WITH EARLIER PREDICTION METHODS
In order to evaluate the accuracy of the developed formula, the liquefaction potential
predicted for the 174 records using the most known prediction approaches. This approaches
are Seed and DeAlba (1986), Mitchell and Tseng (1990), Stark and Olson (1995) and Shibata
and Teparaksa (1988). Comparison results of the are summarized in table 4 and shown in Fig.
8.
Table 4: Comparison between the accuracy of GP formula
and earlier prediction methods
Seed
Shibata
Mitchell
Stark
Prediction Method
and
and
and
and
DeAlba Teparaksa
Tseng
Olson
(1986)
(1988)
(1990)
(1995)
Total No. of predicted cases
174
174
174
174
No. of misclassifications
51
22
35
21
Prediction accuracy %
71
88
80
88
GP
Formula
(2002)
174
21
88



.
100
90
80
Accuracy (%)
70
60
50
40
30
20
10
0
Seed &
DeAlba
1986
Shibata &
Teparaksa
1988
Mitchell &
Tseng
1990
Stark &
Olson
1995
GP 2002
Prediction Methods
Figure 8: Comparison between the accuracy of GP formula

and earlier prediction methods
CONCLUDING REMARKS
In this research, GP as a new promising knowledge-base approach was utilized to develop
a liquefaction potential assessment criteria. The conclusions of this research could be
summarized in the following points:
1.
From the above results, the best fitting formula to predict liquefaction is:
P=
M + 3. D50
CPT
M
+
1
SSR + D 50 CPT
Where M is the magnitude of the earthquake, D50 is Mean diameter of soil particles in
mm, CPT is the tip resistance from Cone Penetration Test in MPa, SSR is the site
Seismic Shear stress Ratio and P is the probability of liquefaction, which less than 0.50
in case of no liquefaction, and more than or equal 0.5 in case of liquefaction.
2.
The new formula provided by GP predicts the liquefaction with accuracy about
88-90%. Therefore, it is more accurate than many empirical relations such as Seed
and DeAlba (1986) and Mitchell and Tseng (1990) and shearing the same range of
prediction accuracy with Stark and Olson (1995) and Shibata and Teparaksa
(1988).
3.
For best accuracy it is recommended to use this formula in case of clean medium
to fine sand (D50 > 0.1 mm).



.
REFERENCES
1.
Holland, J. (1975). "Adaptation in Natural and Artificial Systems," Ann Arbor,

MI, University of Michigan Press.
2.
Koza, J. R., (1994). "Genetic Programming-2," MIT Press, Cambridge, MA.
3.
Kramer, S. K., (1996). "Geotechnical Earthquake Engineering," Prentice-Hall,

Inc.
4.
Lade, P. V., (1992). "Static Instability and Liquefaction of Loose Fine Sandy
Slopes," Journal of Geotechnical Engineering, Vol. 118, No. 1, pp. 51-71.
5.
Michalewicz, Z. (1992)."Genetic Algorithms + Data Structure = Evaluation

Programs", Springer-Verlag Berlin Heidelberg, New York.
6.
Mitchell, J. K. and Tseng D. (1990). "Assessment of liquefaction potential by

Cone Penetration Resistance," Proceeding, H. Bolton Seed Memorial Symposium,
Berkeley, California, Vol. 2, pp. 335-350.
7.
Riccardo, P. (1996). "Introduction To Evolutionary Computation," Collection of

Lectures, School of Computer Science, University of Birmingham, UK.
8.
Robertson, P. K. and Campanella, R. G. (1985). Liquefaction potential of

sands using the CPT, Journal of Geotechnical Engineering, ASCE, Vol 111(3),
pp. 384-403.
9.
Seed, H. B. and De Alba, P. (1986). "Use of CPT and CPT tests for evaluation
the liquefaction resistance of soils," proceeding, InSitu 86, ASCE, pp. 281-302.
10.
Stark, T. D. and Olson, S. (1995). Liquefaction resistance using CPT and field
case histories, Journal of Geotechnical Engineering, Vol. 121, No.12, pp. 856869.

Prediction of Soil Liquefaction Using Genetic Programming

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Prediction of Soil Liquefaction Using Genetic Programming

Uploaded by

Copyright:

Available Formats

American Society of Civil Engineering

Egypt Section (ASCE-EGS)

III Middel East Regional Conference on Civil Engineering Technology