You are on page 1of 99

THE CHANGE EQUATION

(THE FORMULA SYSTEM)

1. User (Autonomous Agent) request(s)/application selections [Morale/Cohesion 3 part format-


right-side]
2. Feasibility study [Goals/Objectives 4 part format-right-side]
3. Investigation [Goals/Objectives 3 part format-left-side]
4. Analysis [Norms/Standards 5 part format-left-side]
5. Systems design [Goals/Objectives 4 part format-right-side]
6. Programming [Morale/Cohesion 5 part format-left-side]
7. Systems testing [Power/Authority 3 part format-right-side]
8. Documentation [Norms/Standards 3 part format-left-side]
9. Conversion and implementation [Goals/Objectives 3 part format-right-side]
10. Maintenance [Goals/Objectives 4 part format-left-side]
11. Evaluation [Norms/Standards 3 part format-left-side]

1. Project initiation (Hardware/Software) Power/Authority


2. Project development (The Project) Norms/Standards
3. Project implementation (The User Climate/Autonomous Agent Conditions of Configuration)
Goals/Objectives
4. Post project evaluation (The Systems Analysts/Autonomous Agent Activities)
Morale/Cohesion

1. Input subsystems [3 part Norms/Standards]


2. Computer subsystems[3 part Norms/Standards]
3. Output subsystems[3 part Norms/Standards]

1. Method Phase-One [5 part Goals/Objectives (The Dictionary of Occupational Titles)]


2. Method Phase-Two [5 part Goals/Objectives (The Dictionary of Occupational Titles)]
3. Method Phase-Three [5 part Goals/Objectives (The Dictionary of Occupational Titles)]
4. Method Phase-Four [5 part Goals/Objectives (The Dictionary of Occupational Titles)]
5. Method Phase-Five [5 part Goals/Objectives(The Dictionary of Occupational Titles)]
Taxonomy Table

Kingdom Introduction:
This it the largest unit of One of the most interesting fields of interest in the
classification. Initially it was thought study of Biology is taxonomy. Although there are
that there were only two kingdoms, other fields out there such as ecology and
plants and animals. Eventually embryology, taxonomy is easy to comprehend,
microscope and other tools helped restricted to a small set of structural information, and
clarify the existence of other is good to know as reference. Taxonomy, also called
organisms. Now, there are a total of systematics, is the study of the classification of all
5 kingdoms. Animalia - the largest living organisms. The current method of taxonomy
with over 1 million named species, was started by Carlous Linnaeus which features
fish, humans; Plantae - 350,000 organisms arranged into groups within groups within
species, trees, grass; Fungi - groups, on and on until an organism is defined within
100,000 species, mushrooms, it's own species or individual group. This orderly
lichen; Protista - 100,000 species, classification helps scientists in a number of ways.
green, golden, brown, and red algae, One is that it keeps them clearly in sync with other
flagellates; Monera - 10,000 species, scientists because of the existence of a universal
blue-green algae or cyanobacteria. system. It also helps scientists in identifying
evolutionary links between certain species.
Phylum/Division
The next most specific unit of How it works:
classification. This further divides the Originally, when Linnaues founded taxonomy,
kingdom into 20 or so divisions organisms were divided based on sole visible
based on very distinct and defining physical characteristics. Now they're separated
characteristics. For example, within based on any unique and defining features mainly
the Animal Kingdom, a major division external physical features and secondarily based on
is the chordates that are animals other features such as feeding habits.
with notochords. This includes
humans, fish, mammals, etc. Each organism is based on binomial nomenclature.
Flowering plants are defined into the This is in which an organism has two words to it's
antrophyta division of the Plant name. The first name is the genus and the second
Kingdom. name is the specie. For example, humans are
scientifically called Sapiens - genus Homo, species
Class Sapiens. The words that make up the names for the
This further classifies the organism. individual groups of taxonomy are based on the
It separates them into categories that Greek or Latin language. This makes for a universal
make them very similar in terms of language throughout the world. Otherwise an English
certain basic features. For example scientist mentioning a "cat" to a Chinese person
the class mammalia includes all would be misunderstood because of language
animals that breast-feed, which differences.
includes humans, cows, dolphins,
etc. Another class would be reptilia There are international commissions out there that
which includes cold-blooded and help filter and record an updated listing of the
scaled animals. classifications. Some names are based on the
equivalent characteristics of the organism in Latin, or
Order they could have no meaning at all and are just
Organisms of the same order are named after their founder.
more similar that that of the same
class. A lot of obvious evolutionary The Origins of Taxonomy:
connections can be drawn from Classification has been around on earth ever since
looking at the order; only a few people paid attention to organisms. One primeval
features separate the organisms as system that was developed was based on "harmful"
a breaking in the evolutionary chain. and "non-harmful" organisms. Then, the beloved
One example is that within the class Aristotle was the first to form a useful system of
Mammalia, carnivores are separated classification during the 300s BC. His was first based
into the order Carnivora while Insect- on whether the organism had red blood or didn't
eaters are separated into the order have red blood. Then he subdivided organisms such
Insectivora. as plants by physical characteristics such as size and
features. This system is somewhat crude by today's
Family standards, yet it lasted over 2,000 years.
Even more specific, the animals
within this share a very close Eventually, as communication improved and science
similarity between each other. Most had advanced to a reasonable point, modern
will probably have the same behavior classification started to develop. The most popular
patterns, feeding habits, and general founder was the Swedish naturalist Carolus Linnaeus
functions. An example is the Cat in the 1700s. He developed the system by which
Family (Felidaes) which all have organisms are classified based on the unique
whiskers, sharp claws, and include characteristics that they had. He also invented the
animals such as Lions and Cats. binomial nomenclature for naming. Linnaeus agreed
with scientists that his work was somewhat crude,
Genus but it's purpose and general concepts were
This is the part that makes up the continually applied. Over time, as evolutionary
first word of the binomial studies were extrapolated, the classification system
nomenclature of an organism. All the has become more advanced showing different
organisms within their genus may groups and links. And as time goes on,
look very similar to each other. And classifications continue to change and are ever-
although it is at most times not growing.
healthy, organisms of the same
genus may breed with each other.

Species
The most specific unit of
classification is the species. The
species makes up all the organisms
and their apparent ancestors and
descendants. Members of the
species are much similar to their
parents and can freely breed with
other members of the same species
without much complication.
The draft sequence of the human genome has been integrated into many
existing resources to facilitate biological discovery. The map below
represents the interconnections between different types of public
biological data available at NCBI.
Cellular Chemistry

Introduction
Hold on to your seat! This document attempts to cover the essentials of several
chemistry courses- general chemistry, organic chemistry, and biochemistry- but just
what a beginning biology student needs to know to survive cell biology, anatomy,
physiology, microbiology, and related biology courses. This document assumes you
know NO chemistry. If that is the case- it is normal to feel a bit overwhelmed as you
study this material, but have courage- many students have studied and survived this
material, and succeeded in their biology studies. You can too!

All organisms are made of cells, but cells are made of organelles and other
subcellular components, that are made of molecules- orderly arrangements of atoms,
or elements.

Atoms are so small that only 12 grams of carbon, such a small piece of charcoal,
contains the amazing quantity of 602,300,000,000,000,000,000,000 atoms!.

So imagine how small a single atom is! There are many atoms or elements that
exist, such as sodium, oxygen, copper, gold, and carbon. Though atoms differ in
their physical properties, all atoms share similarity in their structure in that they are
really all made of just three varieties of subatomic particles.

The Atomic Structure

The atomic structure is such that an atom has a central region, that is a nucleus,
composed of protons and neutrons, and orbiting electrons.
Protons (symbolized as p) have a mass of 1 atomic mass unit (AMU) and an atomic
electrical charge of +1.
Neutrons (symbolized as n) have a mass of 1 atomic mass unit (AMU) and have no
electrical charge (they are neutral).
Electrons (e-) orbit the nucleus at various distances, or shell levels. These minute
particles, traveling at the speed of light, have a mass of almost zero (about 0.008
atomic mass units [AMUs]), and they have an atomic electrical charge of -1.
Normally their number equals that of the number of protons in the nucleus; in this
way, the atom remains electrically neutral.

Calculating the Structure of an Atom or Molecule


The weight, that is mass of an atom or molecule, as well as the net electrical
charge can be determined if the atom or molecules composition of atomic particles
is known. The reverse is also true. Useful formulas for performing such calculations
include the following: (where p, n, and e are symbolic for proton, neutron, and
electron, and # is symbolic for 'the number of.')
Net Atomic Mass=(#p + #n)
Net Atomic Charge=(#p - #e)

[Sample atom with mass of 7 and net charge of 0]

Example: an atom with 5 neutrons, 3 protons, and 7 orbiting electrons would have a
net atomic mass of 8 (=5+3) and a net atomic electrical charge of -4 (=3-7).

Example: How many protons and neutrons are there in an atom with a mass of 23
and 12 orbiting electrons if you know that the atom has a net charge of +3?
Solution: Since the charge if +3, then there are 3 more protons than there are
electrons (12), so there must be 15 protons. The number of neutrons is 23-15=8.
The Hydrogen Atom
H atom and H+ ion Hydrogen atoms are the simplest of all atoms, having a
nucleus with a single proton and a single orbiting electron. The mass of the H atom
is 1.008, with the electron contributing only 0.008 atomic mass units. If the electron
is lost from the H atom, then a lone proton, p, remains, and is positively charged.
The resulting particle is a hydrogen ion, electrically charged because the lone
proton is not countered by any electron negativity. The hydrogen ion is symbolized
as H+. Hydrogen ions are very important biologically because they are small and
electrically charged, and can cause havoc to protein structure and cell function; this
is particularly critical when H+ ions interact with enzyme proteins, critical for cell
metabolism.

pH
The scale used to measure the concentration of H+ ions in a solution (blood,
cytoplasm, etc.) is the pH scale. The pH scale runs from 0 to 14, with 7 neutral,
0 to 6.999 acidic, and 7.001 to 14 alkaline or basic.

|pH2-------pH4------------pH7------------pH11-------pH14
|(acid pH) ------------(neutral)-------------(basic pH)|
Acids, that is molecules that release H+ ions, lower pH, and a low pH implies high
concentrations of H+ ions. Bases, that is molecules that capture H+ ions, raise pH,
and a high pH implies low concentrations of H+ ions. Water has neutral pH. Blood
has a pH of 7.35. Vinegar has a pH of about 4. Concentrated sulfuric acid has pH of
about 1. Stomach acid has a pH of about 2. Toilet bowel cleaner, or lye, creates an
extremely alkaline (basic) pH when added to a solution, resulting in a pH of about
12-14. Cell cytoplasm typically has a slightly acidic pH.

The pH scale is a log scale, based on powers of 10, so that a pH 6 solution has ten
times the acidity as a pH 7 solution, and a pH 5 solution has ten times the acidity as
a pH 6 solution. A pH 5 solution has one-hundred times the acidity as a pH 7
solution. Note that low pH implies high levels of H+, and that high pH implies low
levels of H+ (most beginning students confuse this, so make a mental note of the
reverse nature of the pH scale).

Empty Space
There is a lot of empty space between an atomic nucleus and orbiting electrons, and
there is a lot of empty space between each e-. Physicists have determined that if all
the empty space were removed from all the atoms of all the people of the planet
earth, the entire earth's population could be condensed into a container smaller than
the size of a thimble! And a single human being such as yourself could in theory be
shrunk to the size of a single hydrogen atom. In fact, protons and neutrons are
themselves made of smaller worlds in themselves, made up of quarks.
Quarks
Quarks are what actually comprise protons and neutrons. There are a variety of
quarks, including the strawberry quark, the chocolate quark, and the vanilla quark
(no kidding!). They don't really taste like chocolate, but the scientists that discovered
them got a little giddy one night at the lab and decided to make scientific naming of
atomic particles a bit more fun for everyone!

Gravitons
What holds all these subatomic particles together? We do not know exactly, but
there is one possible answer. Gravitons are theoretical particles believed to exist in
the nucleus, causing protons and neutrons to attract all other p and n, hence the
attraction of all matter for all other matter (the reason your feet stay attracted to the
ground and you do not fly off into outer space, and the reason the moon orbits the
earth).
Molecules

Molecules are combinations of atoms, held together as a "team" by various forces


called molecular bonds. Like a chain gang, if one atom in a molecule moves in one
direction, the others are obliged to follow; though separate atoms, together they
form a molecule. The molecule illustrated above in 3D is acetic acid (common
vinegar acid)- the red spheres represent oxygen atoms, black carbon atoms, and
white hydrogen atoms.

Bond types that hold atoms or molecules together, or in close proximity, include (in
order of strongest to weakest): covalent, ionic, hydrogen, and Van der Waals Forces.

Covalent Molecular Bonds


This type of bonding occurs when two atoms share their orbiting electrons,
somewhat like if two children were to stand inside two hula-hoops (each hoop being
an orbiting electron) and spin the hoops around themselves. Neither child can leave
the spinning pair of hoops (electrons) that keep them in proximity to each other.
Covalent bonds are strong, and each covalent bond, that is each pair of shared
orbiting electrons between atoms, is symbolized by a straight line drawn between the
atoms. Sometimes two pairs of electrons, that is 4 e-, are shared between two
atoms; then a double covalent bond occurs and this is symbolized by a double line
(===). Three or four pairs of electrons can be shared, and that results in triple and
quadruple covalent bonds, symbolized by...you guessed it, 3 or 4 lines drawn
between the atoms, respectively.

For example, consider a molecule composed of 2 atoms of hydrogen and 1 atom of


oxygen. A water molecule can be written as H20, or drawn as H-O-H. Look at.... no,
interact with, the water molecule below! Hold down your left mouse button
on the water molecule and you can rotate and view it in 3D space! [click here
to do this with more molecules]

Consider a molecule similar to water, hydrogen sulfide (rotten egg gas- stinky!)
composed of 2 atoms of hydrogen and 1 atom of sulfur. A molecule of hydrogen
sulfide gas can be written as H2S, or drawn as H-S-H. All of the structures below
represent hydrogen sulfide.
H-S-H

Ionic Molecular Bonding

Hydrogen Molecular Bonding


Hydrogen bonds are attractions between hydrogen atoms and one or more of the
following atoms: O, N, S, P, Cl, F. The six atoms just listed can be thought of as
electron 'thieves,' stealing the majority of an electron's orbit time from the covalent
bond that O, N, S, P, Cl, or F are a part of; as a consequence, the electron 'thief'
atom takes on a partial negative charge density.

Hydrogen atoms, in stark contrast, are very weak at maintaining their electron in
orbit about the hydrogen proton nucleus; an hydrogen atom's electron can be stolen
away most of the time by other O, N, S, P, Cl, F atoms, causing the hydrogen atom
to take on a partial positive charge density (caused by the proton with the orbiting
electron being absent from the covalent bond most of the time). The result is a
partial negative charge density about O, N, S, P, Cl, or F attracting nearby partial
positive charge densities about H. Voila! A hydrogen bond.

It is hydrogen bonds that cause water molecules to have such strong attractions to
each other, making for the high heating temperature needed to cause water
molecules to escape from a water solution as steam.
Van Der Waal Forces

These are weak attractions between carbon atoms. Alone, each force is weak, but
when stacked they become strong, much like lining up several batteries in series to
create a series current (such as in a flashlight).
Van der Waal Forces are significant in a cell's DNA genetic code, where the coiled
DNA molecules have their carbon atoms stacked. In this way the Van der Waal forces
help hold DNA together in its helical coil arrangement.

Ionic Molecular Bonding This occurs when there are electrical attractions between
electrically charged atoms or molecules, that is between ions. Ions are atoms or
molecules where the number of protons does not equal the number of orbiting
electrons. This creates an electrical imbalance, so that the atom is now an ion,
having either a net positive charge (cation), or a net negative charge (anion).

Ionic bonding, also known as a salt bond, occurs when a cation (positively charged
atom or molecule) is electrically attracted to an anion (negatively charged atom or
molecule). Table salt, sodium chloride or Na+Cl-, is a common example of a
molecule held together by an ionic bond. Often the anionic atom species has stolen
an electron from the cation atom species, creating the charged ions. Anion(-) ::::::
(+)Cation Ions are atoms or molecules that have an inequality in terms of the
number of protons and electrons. The cathode of a battery attracts cations, because
the cathode is negatively charged. The anode of a battery attracts anions because it
is negatively charged. Don't confuse a cathode with a cation- they have opposite
electrical charges and so attract each other. Likewise with an anode and anions.

Salts are combinations of cations and anions, such as ordinary table salt, Na+Cl-,
but the term salt can applied to any combination of cation and anion, including
complex and large molecules, such as Tetracycline Hydrochloride (tetracycline
H+Cl-), where the tetracycline is ionized to form a cation, but is kept stable in
solution by combining with a chloride anion (Cl-).

Important Atoms, Ions, and Small Molecules studied in biology include: (memorize
this list!)

• H Hydrogen atom
• H+ Hydrogen ion (pH is a measure of H+ in a solution)
• C Carbon atom (present in almost all cell molecules)
• Oxygen atom
• Na Sodium atom
• Na+ Sodium ion (vital for cell membrane excitability)
• P Phosphorous atom (don't confuse this with Potassium!)
• K Potassium atom
• K+ Potassium ion (vital for cell membrane excitability)
• Cl Chlorine atom
• Cl- Chloride ion
• S Sulfur atom (present in many proteins)
• N Nitrogen atom (critical for amino acids and proteins)
• Ca++ Calcium ion (bone, cell excitability, and hormone regulation)
• Mn++ Manganese ion (stabilized cell enzymes)
• Mg++ Magnesium ion (stabilized cell enzymes)
• CO2 Carbon dioxide gas
• O2 Oxygen gas
• HCO3- Bicarbonate anion
• Zn++ Zinc ion
• Zn Zinc metal

"Wizardry"
Symbolic representations of atoms and bonds are commonly seen, or used, when
observing or drawing chemical structures. When you understand the secrets that
wizards use who draw molecules, you too will easily understand how to decipher
molecular representations! So here are a few rules to commit to memory:

1. Remember that carbon atoms almost always form 4 covalent bonds, so each
carbon atom in a molecule should have 4 bonds associated with it. Look at
the 3D molecule of methan below- can you see the carbon atom (black) and
the hydrogen atoms (white)? The carbon atom has formed 4 covalent bonds,
one with each hydrogen atom (note that hydrogen atoms form only 1
covalent bond with whatever they bond with).

2. If you see a molecular drawing where a carbon has less than 4 covalent
bonds, the remaining "unseen" bonds are always hydrogen atoms bonded to
the carbon atom; they are not usually drawn so that wizards can draw
molecules faster. Look at the wizardry representations of a common organic
molecules, benzene.
3. When you see a straight line extending off a carbohydrate molecule (sugar,
starch) into space, with no atoms at the end of the line, it is a wizard's trick
(those sneaky wizards): wizards know that at the end of that line there is
always an oxygen and then a hydrogen atom, this pair otherwise known as a
hydroxyl group (-O-H, or -OH).

4. When you see molecular bonds drawn with angular bends in them, there is
always a carbon atom at the bend or angle, even though the wizards do not
draw it and so it looks like nothing is there; but now you know better!
Common Biological Molecules

The most common biological molecules include:

• Carbohydrates: Always have an atomic ratio of 1C:2H:1O, that is 1 carbon


for every oxygen and twice as many hydrogen atoms as either carbon or
oxygen atoms.
o Sugars- glucose, fructose, sucrose, and so on. Important for energy
and for building genes.
o Starches- animal starch (glycogen) and plant starch (cellulose).
Starches are simply multiple sugars bonded together with various
branching patterns between and among the bonding between the
sugars.
• N-containing Molecules
o Amino acids- the building blocks of proteins. The amino
acid shown below is leucine, one of 3 amino acids known as the
branched chain amino acids, natural anabolic nutrients that help build
muscle mass and other tissues.
o Peptides- small proteins; sometimes the term peptide
is used in place of protein. A short peptide is illustrated below in 3D
(some hydrogen atoms are hidden from view).

o Proteins- enzymes, muscle protein, collagen skin


protein, and so on. Urea, Ammonia- waste products of amino acid and
protein metabolism.

• Lipids: substances that are not readily soluble


(mixable) in water. The molecule Benzene is illustrated below- it is a ring of 6
carbon atoms (black) with 6 attached hydrogen atoms (white); benzene is a
common solvent used in organic and biochemistry for synthesizing other
molecules, and in industry for cleaning. It is symbolized below as both a 3D
model and a line drawing. Can you use your knowledge of wizardy (see
above) to spot the carbon and hydrogen atoms in the line drawing? (line
drawings are common because they can be drawn quickly)

Compare the above representations of benzene then interact with benzene -


Hold your left mouse on the 3D benzene molecule and you can rotate
and view it in 3D space!
Benzene
C6H6

o gasoline, oils, grease


o Fatty acids- found in food oils; calorie
source, as well as important in cell membranes.

 dietary fatty acids in foods. A fatty


acid is illustrated below.

 Prostaglandins- small lipids,


actually fatty acids, that also act as chemical messengers.
Prostaglandin E (PGE) is illustrated below using line drawing
notation (can you spot the 20 carbon atoms using the chemistry
wizardry rules?)

o Triglycerides- common fat


calorie storage molecules, made of 3 fatty acids linked together.
o Steroid hormones
(estrogen, testosterone). Steroid hormones- are complex cyclic lipids
used as chemical messengers that travel in the blood to target cells.
o Cholesterol (used to make
steroid hormones)

• Organic Acids-
abundant during cell metabolism of sugars and fats. Acetic acid is illustrated
below- it is formed during aerobic metabolism of carbohydrates or lipids
within cells.

Carbohydrates

These biological molecules include the sugars and starches. They always contain a
great deal of O, H and C, with a ratio of [C(H2O)]n, that is 1C:2H:1O Carbohydrates
are important biologically as nutrients, structural components, and as antigens.
Incidentally, the little n subscript is like an algebraic variable- it refers to an
unspecified number of multiples of the molecule to which it is referring, in this case a
molecule containing some multiple of C, H, and O in a specified ratio of 1:2:1.

Sugars combine to form disaccharides (two sugar molecules linked together such as
glucose + fructose forming sucrose cane sugar), polysaccharides (simple chains of
sugars), and then starch (chains of sugars with complex branching patterns). The
most common biological sugar is glucose, a six carbon sugar. Naturally occurring
sugars are what chemists call right-handed, or D sugars, as in D-glucose, D-
galactose, D-fructose. Sugars can also be left-handed, or L sugars. D and L refer to
whether the molecules bend light in a special instrument to the right or left,
respectively, that is, whether the molecules are dextro-rotatory or levo-rototory.

Ribose sugar is illustrated below (3D on left and line drawing on the right)- it is the
sugar used for part of cell genetics, that is for making ribosomes, transfer RNA, and
messenger RNA. By removing only one oxygen atom from ribose, a cell can form
deoxyribose, the sugar used to build deoxyribonucleic acid (DNA).

Starches are long chains of sugar molecules with complex branching patterns of
bonding between the sugar molecules. The two principle starches encountered in
cells include glycogen and cellulose. Chitin is another starch that also contains
nitrogen components; chitin is very strong structurally, and forms the dense
protective shell of crabs, insects, and other animals as well as certain microbes.
Glycogen is animal starch, stored in animal cells. Cellulose is plant starch. Both can
serve as reserve nutrient sources, because sugar molecules can be cleaved off the
starch and used for fuel. Cellulose starch also functions for cell membrane structural
integrity in certain cells.

Lipids
These are substances that are not soluble in water. Lipids include dietary fats
(cholesterol, fatty acids in margarine and other foods) as well as oils, grease,
gasoline, steroid hormones, prostaglandin hormones, and many other biological
molecules. Structurally lipids are comprised of lots of carbon and hydrogen atoms.
Attached to the lipid at various points may be other atoms such as oxygen, or a side
group such as a hydroxyl group (OH), but the great majority of lipid composition is
that of lots of C and H.
Amino acids and Proteins
Proteins are very important molecules, functioning both as structural components of
cells and as enzymatic molecules that catalyze (speed up) chemical reactions in cells.
Proteins are made of building blocks called amino acids, there being about 22
different amino acids in found in nature.
Amino acids (and hence proteins) have what chemists call a left handed (L)
configuration, so that naturally occurring amino acids are named L-arginine, L-
glycine, and so on. Nutrisweet artificial sweetener is actually a synthetic substance
consisting of only two amino acids bonded together. So why does it have zero
calories? Because the amino acids that are part of Nutrisweet are right handed (R)
amino acids, unrecognizable by your body, except of course by your taste buds. All
amino acids (abbreviated as AA) have a generic structure with one end of the AA
having an amino (-NH2 group) and one end of the AA having an organic acid (-
COOH) group, sometimes called the carboxyl group.
Hence the name amino acid. Amino acids combine to form small chains of amino
acids called peptides, or even longer AA chains called polypeptides or proteins.
Sometimes the terms peptide, polypeptide, and protein are used interchangeably,
because of the disagreement among scientists as to what constitutes a peptide
versus a polypeptide versus a protein.

The bond that forms between amino acids to form peptides, polypeptides, and
proteins is called the peptide bond, and is formed between amino and carboxyl
groups. During peptide bond formation, water is removed, so the reaction is that of a
dehydration synthesis reaction. The reverse of bond formation is bond breaking, by
addition of water, in what is called a hydrolysis degradation reaction.
Muscle, skin, and connective tissue proteins, as well as intracellular proteins, are all
formed by joining amino acids together with dehydration reactions. During
starvation, hydrolysis of proteins yields free amino acids that are used for
metabolism for help provide energy.
Cellular Genetics

DNA, RNA, Transcription, Translation, mRNA, tRNA, codon,


anticodon

Genetic Encoding

Ciphering of cell information, that is genetic encoding, occurs in the form of


ribonucleic acid (RNA) or deoxyribonucleic acid (DNA) in viruses, but always as DNA
in cells (at least on our planet, but who knows what occurs in other galaxies?).

Structurally, DNA is not that complicated a molecule. It has a simple backbone


consisting of alternating units of the sugar deoxyribose and phosphate (PO4).
Attached to each sugar is a genetic code "word," a nucleotide base. Two such strands
of DNA are usually bonded together to form what is called a "double helix" of DNA.
The double stranded DNA molecule is twisted to form a helix, appearing as if a ladder
were twisted along its axis.

Biochemically, the nucleotide bases of DNA are known as purines and pyrimidines.
It is the nucleotide bases, or rather their sequence, that constitutes the actual
genetic code for all of a cell's proteins. There are four nucleotides in DNA: adenine,
thymine, guanine, and cytosine. They are abbreviated as A,T,G, and C. As you will
learn soon, a codon, that is a sequence of three bases codes for a single amino acid.
{ How many bases along a length of DNA would be needed to code for an enzyme
protein composed of 1000 amino acids? Answer: 3000. }

RNA

RNA is very similar to DNA. In RNA the base uracil, U, is substituted for thymine, T.
So there is no thymine in RNA. Also, RNA uses the sugar ribose, not deoxyribose.

Genes

Along a molecule of DNA are various sequences of nucleotides coding for various
proteins. Each sequence of nucleic acid coding for a protein is called a gene.
Typically there will be hundreds or thousands of genes along a length of DNA,
interspersed with special nucleotide sequences that are start (e.g. TAC) and stop
(e.g. ACT or ATT) signals for gene reading by the cell.

DNA Arrangements

Viruses sometimes have ssDNA for their genome (or dsDNA, or ssRNA, or dsRNA),
however all cells have double stranded DNA (two strands of DNA twisted on each
other in a helical pattern). Hence the term alpha-helix for the three dimensional
structural description of a double stranded DNA (dsDNA) molecule.

Two opposite strands (lengths) of DNA are able to associate because of the fact that
certain base pairs have a binding affinity for each other. This is known as
complimentary base pairing. A readily pairs with T, and G with C; these are
known as the base pairing rules of DNA. Though two strands of single stranded
DNA (ssDNA) are twisted on each other, each ssDNA carries its own unique genes;
ssDNA is related to its complimentary strand of DNA only spatially, not genetically.

When dsDNA is genetically decoded, the helix is unzipped, a gene is "read" off the
appropriate ssDNA by RNA polymerase enzyme (that creates mRNA), and then the
helix is zippered shut.

dsDNA is always circular in bacteria, but linear in eukaryotic cells. Circular dsDNA
is like taking two lengths of string, twisting them on each other, and then closing off
the ends. Linear dsDNA is like taking two lengths of string and then twisting them on
each other.

A chromosome is a dsDNA molecule coiled around special histone proteins.


Chromosomes are visible in a stained cell when using a light microscope, and are
normally visible when a cell is in the process of division. When a cell is not dividing,
there is less coiling of the dsDNA around the histone proteins, and then the complex
is called chromatin. Chromatin is barely visible in a cell, and is the normal state of
the genetic material in a non-dividing cell.

Gene Decoding

Decoding of a gene to create a gene product involves transcription and translation.


Transcription is the process where RNA polymerase enzyme unzips a region of
dsDNA and reads a gene sequence, creating a copy of the gene sequence in the form
of RNA. This RNA is called messenger RNA, or mRNA. The mRNA then is carried to a
ribosome where it is "read" (decoded).

Translation is the process of building an amino acid chain, that is a polypeptide


(protein) by way of ribosomal decoding of the mRNA. This involves the ribosome
reading the mRNA and the bonding together of appropriate amino acids coded for by
the mRNA. Just as triplets of nucleotide bases along DNA, called codons, encode for
1 amino acid of a gene product, triplets of mRNA are also codons. Ribosomes read
the mRNA codons one at a time, to determine what amino acid should become part
of the gene product. As a codon is read, the complimentary anti-codon nucleotide
sequence of a special amino acid carrier molecule called transfer RNA, or tRNA, base
pairs with the codon, bringing the specific amino acid with it that is coded for by the
mRNA. As each mRNA codon calls its specific amino acid into place through the use
of specific anti-codon complimentary tRNA amino acid carriers, the genetically
encoded amino sequence for the gene product is brought into place. The ribosome
enzymatically bonds the amino acids together, and a polypeptide, or protein, is built.
The translation process is complete, and the gene has been decoded.

Amino Acids Table

By consulting a table of codons coding for each amino acid, you can decipher a
genetic sequence of DNA nucleotides or mRNA nucleotides to determine the resulting
gene product, that is a protein (structural or enzymatic). For example, the nucleotide
base sequence on mRNA (transcribed from the DNA sequence AGC) of UCG codes for
the amino acid serine.

mRNA Second nucleotide base of mRNA codon


U C A G
UUU=phe UAU=tyr UGU=cys
UUC=phe AUC=tyr UGC=cys
U UC*=ser
UUA=leu UAA=stop UGA=stop
UUG=leu UAG=stop UGG=trp
CAU=his
CAC=his
C CU*=leu CC*=pro CG*=arg
First base CAA=gln
of codon CAG=gln
AUU=ile
AAU=asn AGU=ser
AUC=ile
AAC=asn AGC=ser
A AUA=ile AC*=thr
AAA=lys AGA=arg
AUG=met
AAG=lys AGG=arg
(start)
GAU=asp
GAC=asp
G GU*=val GC*=ala GG*=gly
GAA=glu
GAG=glu

Amino Acid Symbol Amino Acid


Ala Alanine
Asp Aspartic
Asn Asparagine
Cys Cysteine
Glu Glutamic acid
Phe Phenylalanine
Gly Glycine
His Histidine
Ile Isoleucine
Lys Lysine
Leu Leucine
Met Methionine
Asn Asparagine
Pro Proline
Gln Glutamine
Arg Arginine
Ser Serine
Thr Threonine
Val Valine
Trp Tryptophan
Tyr Tyrosine
Glu,Gln Glutamic, Glutamine
* End Terminator
Cellular Replication

Mitosis, meiosis, budding, binary fission, conjugation


A detailed monograph on cell replication is available for those seeking more in-
depth information.

Bacterial Cell Replication

Binary fission is the normal method of replication among bacteria; in this method of
cell replication, the bacterial cells simply increase their cell mass slightly, replicate
their cellular genome (DNA) and several other cell components, and then each cell
divides equally into two cells.

[ rod-shaped bacterial cell undergoing binary fission ]

Binary fission as a method of cell replication is very efficient, with division possible
every 5 or 10 minutes! Consider the number of cells formed from 1 cell that divides
every 10 minutes: in just a matter of hours millions of cells may form from just a
single cell!

Conjugation is another means of bacterial "replication" although the cells do not


really replicate as with binary fission. But conjugation is important for propagation of
bacteria. In conjugation, two bacterial cells meet, form a bridge, and exchange
pieces of their DNA. This allows for sharing of genes among bacteria, even among
different genera. To learn more about conjugation and bacterial genetics click here.
Eukaryotic Cell Replication

Budding is a simple method of cell replication used principally by yeasts (single


celled fungi). Following DNA replication (genome replication), unequal splitting of a
cell occurs to form two cells. Part of the cell literally pinches off, taking with it
genetic material as well as some cytoplasmic material.

Mitosis is the common form of cell replication for tissue growth and regeneration
among all multi-cellular organisms. The image panel below shows various phases of
mitosis occurring among plant cells of an onion root tip. Each phase of cell division
will be discussed individually.

During cell division, replication of cell genetic and cytoplasmic material occurs,
followed by a highly organized splitting of the cell's contents. The two cells formed
following mitosis, called daughter cells (lower right image in the six-panel image
seen above), are genetically identical, and each has about 1/2 the cell mass of the
original cell; shortly, however, each daughter cell will increase its size to that of a
typical cell of the type from whence each daughter cell originated.

The process of mitosis is divided for human convenience into discrete stages or
phases (also divisible into early, middle, and late phases) known as interphase,
prophase, metaphase, anaphase, telophase, and finally daughter cells.

These six phases of mitosis can be seen in the photo below, if you read the photos as
you would two lines in a book (left to right, then down to the second row and again
left to right).
Animal Cells Plant Cells

Interphase During interphase cells are


busy doing their normal cell activities.
Cell metabolism is occurring. The cell is
doing whatever its normal function is
(this depends on the cell's genetic
programming). Interphase is actually not
part of the normally listed phases of cell
replication.

Prophase. During interphase, the DNA


is replicated in preparation for prophase.
A new set of genes (DNA) will be
needed for the new cell that will be
formed. As prophase occurs, the DNA
coils tightly and becomes visible as
chromosomes. The chromosomes are
randomly arranged in the cell. The
nuclear membrane disappears.

Metaphase. During metaphase a cell


aligns its chromosomes in the middle
region of the cell. Centrioles at each
pole of the cell send out spindle fibers
that grasp each chromosome. The cell
is preparing to separate the
chromosomes.
Anaphase. During anaphase the cell
chromosomes are separated. Spindle
fibers shorten so that the newly
synthesized chromosomes (DNA) are
pulled to one end of the cell. The
original chromosomes (DNA) is pulled to
the other end of the cell.

Telophase. During telophase,


separation of chromosomes is
complete. The cell begins to break apart
into two cells. The chromosomes begin
to uncoil. Nuclear membranes begin to
reform around the chromosomes.

Daughter Cells. When mitosis is complete, the cell divides into two new cells, each resembling
the original interphase resting cells, but smaller. Two cells now exist as a result of mitosis. One cell
contains the newly synthesized DNA. The other cell contains the original DNA. Each cell has about
one half the biomass of the original cells. Soon each cell will acquire nutrients and will grow in size
so as to acquire the size that is normal for the cell type.
\

Allium. Seen below are phases of mitosis as seen in tissue sections of onion (Allium)
root tip. Root tips are excellent tissue sections to study to learn mitosis, since root
tips are rapidly growing and thus have many cells in stages of replication. Test your
knowledge- can you spot the cells undergoing cellular mitosis? Can you name the
phase for such cells? Click on an image to see an enlargement.

The cell in the very center is in the phase of mitosis known as


anaphase. Notice the chromosomes splitting- half moving to the right,
half moving to the left. The spindle fibers are faintly visible. The cells
to either side of the anaphase cell are in interphase.
This is a very low magnification photograph of onion root tip cells. Can
you spot the cell undergoing metaphase in the center of this tissue
section of about 50 cells? Also, the cell along the bottom, 4th from the
left, is in metaphase.
About 8 cells are seen here. In the lower left is a cell in anaphase. In
the middle and somewhat towards the top is a cell in metaphase
(aligned chromosomes). The other cells are in interphase and
prophase.
The cell in the very center is in the phase of mitosis known as
prophase. The chromosomes are coiled and are randomly arranged in
the cell center. Just above the prophase cell is a cell that is just ending
telophase- with daughter cells forming.
The cell in the upper left is undergoing anaphase (first row, first cell
on left). Move just one cell to the right and down one cell and you will
see a cell in late telophase - with a cell plate having formed down the
middle and with two nuclei of the soon-to-be daughter cells reforming.

Meiosis is a mode of cell replication that occurs only in the gonads (testis and
ovary) of eukaryotes, in order to produce germ cells (sperm and egg cells, not
'germs' such as bacteria). Meiosis is a reduction division, where a cell's content of
genetic material is reduced to form daughter cells having 1/2 the amount of DNA
(and genes) found in regular body cells. Following meiosis, sperm and egg cells
potentially combine during fertilization to form a fertilized egg called a zygote. The
zygote now has the full complement of genetic material (1/2 + 1/2=1). When viewed
under the microscope, the stages of meiosis can appear very similar to those of
mitosis, so phases of meiosis will not be shown here.

Tumors
Uncontrolled replication of cells leads to cell overgrowths, that is tumors. Tumors
can be classified as benign or malignant. Benign tumors are simply excessive cell
growths that will not cause any significant harm. Malignant tumors, that is
cancers, are cell growths where the cells are replicating without any inhibition of cell
growth, and they will cause death to the organism if allowed to continue growing.

Naming Conventions for Tumors

Here are the naming conventions used for the more common tumors:

• Carcinomas are cancers of epithelial tissues (cells lining the surfaces of an


organism).
• Sarcomas are cancers of connective tissues. Leukemias are cancers of white
blood cells.
• Lymphomas are tumors of the lymph nodes.
• Osteomas are tumors of bone.
• Osteosarcomas are sarcomas of bone tissue.
• Neuromas are benign tumors of nerve tissue.
• Leiomyomas are benign tumors of smooth muscle tissue.
• Rhabdomyomas are benign tumors of voluntary (skeletal) muscle.
• Chrondromas are benign tumors of cartilage.
• Chrondrosarcomas are malignant tumors of cartilage.
• Adenomas are benign tumors of glandular tissue
• Adenocarcinomas are malignant tumors of glandular tissue. Look at the photo below- it is
from a biopsy of a cancer. Several (3) cells show visible stages of mitosis (dark coiled
chromosomes), indicating that the tissue is cancerous ( tissues have a certain
percentage of their cells undergoing mitosis, called the mitotic index; when the mitotic
index is high, as with the tissue below, a cancer or tumor of some sort is suspected.)

Carcinogens

Agents that can trigger cells to become tumorous include: environmental carcinogens in
food, water, or air; cancer-causing genes called oncogenes that are transmitted by certain
viruses; and inherent oncogenes, triggered by repeated trauma to a cell.
Cellular Arrangements
and
Tissues

There are four tissue types: nervous tissue, muscle tissue, connective tissue, and
epithelial tissue. All multi-celled animal life forms are composed of various
combinations of these four tissues.

BASIC TISSUE TYPES Example Photo


Nervous Tissue is specialized for creating and conducting
electrical signals, and includes neurons (nerve cells) as part of
its tissue. Neurons are the cells adapted for receiving and
eliciting electrical signals. Signals are sent to other neurons,
glands, and muscle cells. The photo on the right shows a classic
nerve cell ("neuron") appearance- pointed edges giving it a
quality somewhat like a "ninja star" or thorn.
Muscle Tissue is specialized for cellular contraction, and hence
movement of the organism or parts of the organism. The photo
on the right shows several muscle cells of the heart. Muscle cells
tend to be elongated and red in appearance. Heart muscle has
the characteristic cellular branchings such as are seen in this
tissue section.
Connective Tissue is specialized to connect parts of an
organism. Types of connective tissue include loose (like fascia,
the filmy material you see when you pull the skin off chicken
when skinning a chicken), tendons, ligaments, and so on. The
photo on the right shows a section of bone tissue, just one of
the many types of connective tissues (tendons, ligaments, bone,
cartilage, fat, and blood).
Epithelial Tissue lines body surfaces, both internal and
external, and is adapted for protection, secretion, and
absorption. Epithelium is named, that is classified, according to
its outer cell layer's shape, whether the tissue is one cell thick
("simple") or is layered ("stratified"), and whether the outer cells
have cilia and whether some of the cells are goblet shaped
mucous secreting cells. The photo on the right is a 3D scanning
electron microscope photo showing several relatively flat
epithelial cells covering a tissue surface.

Remember-- you are only to learn to differentiate the four basic tissue types! You
are NOT expected to learn each of the specific tissue subtypes of the four basic
tissues. So don't panic when you view all the different tissue subtypes.
NOTE: For more experience studying tissues, visit the histology lab center where
you can learn more about cell arrangements and tissue types. Many digital images
are available their for your viewing.

Tissue Development

Development of tissues occurs from primitive embryonic cell layers called germ
layers. There are 3 germ layers that form in the embryonic cell mass:

GERM LAYER DEVELOPS INTO...


ectoderm (outer shell of cells) skin, brain, eye, nerves
mesoderm (middle cell layer) muscle, bone, vessels, connective tissues
endoderm (inner cell layer) gut, liver, pancreas

Fertilization and Zygote Formation

When a sperm cell fertilizes an egg cell, a fertilized egg or zygote is formed.

The zygote then divides into 2 cells, then 4 cells, then 16 cells, then 32 cells, then 64
cells, then 128 cells. Note that growth is at a geometric rate. The cell numbers of a
developing embryo increases at a fantastic rate as a single fertilized cell matures into
an embryo and then a fetus (nymphs or larvae in the case of insects, worms, and so
on.)

Morula Formation

As cell mass increases from a fertilized egg dividing and with geometric cell mass
increase, the embryo begins looking like a bunch of mulberries (well, sort of if you
use some imagination), so that is what it is called. Except "mulberry" is translated
into Latin, the universal scientific language, to form the word morula.

Blastula Formation

Soon the mulberry (morula) hollows out, forming a hollow cavity, sort of like a
blown-up balloon, and it is then called a blastula.

Gastrula Formation

One end of the blastula invaginates, sort of like pushing your finger into the blown-
up balloon. Now the embryo is said to be a gastrula; gastrulation has occurred. Note
that there are now two cavities- the cavity in the balloon filled with air (call this
cavity #1) and the cavity formed by gastrulation (call this cavity #2). Cavity #1 will
become the thoracic and abdominal cavities, and cavity #2 will become the
gastrointestinal tract (Did you notice the "gast-" prefix in both gastrulation and
gastrointestinal tract?).

Ectoderm

The outer layer of cells, that is the outer skin of the balloon, is what is called the
ectoderm germ layer of cells, and as the embryo continues to grow and differentiate
into a fetus the ectoderm cells will form tissues and organs such as skin, nervous
tissue, brain, and the eye.

Mesoderm

The middle layer of cells, that is the inner skin of the balloon, is what is called the
mesoderm germ layer of cells, and as the embryo continues to grow and differentiate
into a fetus the mesoderm cells will form tissues such as muscle, blood vessels,
cartilage, bone, ligaments, and other connective tissues.

Endoderm

The layer of cells lining the gastrulation cavity (cavity #2), that is the skin of the balloon
surrounding your finger that you poked into the balloon, is what is called the endoderm germ layer
of cells, and as the embryo continues to grow and differentiate into a fetus the endoderm cells will
form epithelium lining the entire gut.

Tissue Components

Tissues are made of matrix and cells. Matrix is the non-cellular material between
tissue cells, secreted by cells; matrix consists of both organic components (such as
collagen and elastic proteins to give tissues strength and elasticity) and inorganic
components (such as water and minerals).

Useful Suffixes. Cells of tissues are named according their tissue type, but many
cells share common suffixes that reveal clues about their function. "- cytes" are
mature cells that perform common tissue functions. "- blasts" are immature tissue
cells that give rise to other mature tissue cells. "- clasts" are tissue destroying cells.
Homo sapiens Map View
Chromosome:
[ 1 ] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y
Master: Genes On Sequence Map Display settings
Total Genes On Chromosome: 955
Region Displayed: 0-220467 Kbp
Genes Labeled: 20 Total Genes in Region: 955
symbol orient. links cyto. full name
DKFZP564C186 + av sv 1pter-1p12 DKFZP564C186 protein

SRM - av sv 1p36-p22 spermidine synthase

PLA2G2A - av sv 1p35 phospholipase A2, group IIA (platelets, synovial fluid)

PRO2047 + av sv 1 PRO2047 protein

FLJ10468 + av sv 1 hypothetical protein FLJ10468

FAAH - av sv 1p35-p34 fatty acid amide hydrolase

C8B - av sv 1p32 complement component 8, beta polypeptide

GADD45A - av sv 1p31.2-p31.1 growth arrest and DNA-damage-inducible, alpha

PRKCL2 + av sv 1pter-1q31.1 protein kinase C-like 2

LOC51189 + av sv 1pter-1q31.1 ATPase inhibitor precursor

FLJ10330 + av sv 1 hypothetical protein FLJ10330

HPRP3P + av sv 1q21.1 U4/U6-associated RNA splicing factor

ARNT + av sv 1q21 aryl hydrocarbon receptor nuclear translocator

JTB - av sv 1q21 jumping translocation breakpoint

PEA15 + av sv 1q21.1 phosphoprotein enriched in astrocytes 15

F5 - av sv 1q23 coagulation factor V (proaccelerin, labile factor)

FLJ10083 - av sv 1 hypothetical protein FLJ10083

ST16 + av sv 1 suppression of tumorigenicity 16 (melanoma differentiation)

ESRRG - av sv 1q41 estrogen-related receptor gamma

CHS1 - av sv 1q42.1-q42.2 Chediak-Higashi syndrome 1


Archaeoglobus fulgidus, complete genome -
49546..99545
62 protein coding genes

Legend

Find Open Reading Frames


Coding region on direct strand
Coding region on complementary strand
Overlapping region
Genetic States
Disease Histogram of Chromosome
Genetic Manipulation
Codons Found In DNA
Second Position of Codon
T C A G
TTT Phe [F] TCT Ser [S] TAT Tyr [Y] TGT Cys [C] T
TTC Phe [F] TCC Ser [S] TAC Tyr [Y] TGC Cys [C] C
T
TTA Leu [L] TCA Ser [S] TAA Ter [end] TGA Ter [end] A
F T
TTG Leu [L] TCG Ser [S] TAG Ter [end] TGG Trp [W] G
i h
r CTT Leu [L] CCT Pro [P] CAT His [H] CGT Arg [R] T i
s CTC Leu [L] CCC Pro [P] CAC His [H] CGC Arg [R] C r
t C d
CTA Leu [L] CCA Pro [P] CAA Gln [Q] CGA Arg [R] A
P CTG Leu [L] CCG Pro [P] CAG Gln [Q] CGG Arg [R] G P
o ATT Ile [I] ACT Thr [T] AAT Asn [N] AGT Ser [S] T o
s s
ATC Ile [I] ACC Thr [T] AAC Asn [N] AGC Ser [S] C
i A i
t ATA Ile [I] ACA Thr [T] AAA Lys [K] AGA Arg [R] A t
i ATG Met [M] ACG Thr [T] AAG Lys [K] AGG Arg [R] G i
o o
GTT Val [V] GCT Ala [A] GAT Asp [D] GGT Gly [G] T
n n
GTC Val [V] GCC Ala [A] GAC Asp [D] GGC Gly [G] C
G
GTA Val [V] GCA Ala [A] GAA Glu [E] GGA Gly [G] A
GTG Val [V] GCG Ala [A] GAG Glu [E] GGG Gly [G] G

Codons Found In Messenger RNA


Second Position
U C A G
F UUU UCU UAU UGU U T
i Phe Tyr Cys h
UUC UCC UAC UGC C
r U Ser i
s UUA UCA UAA Stop UGA Stop A r
Leu
t UUG UCG UAG Stop UGG Trp G d
CUU CCU CAU CGU U
P His P
o CUC CCC CAC CGC C o
C Leu Pro Arg
s CUA CCA CAA CGA A s
i Gln i
CUG CCG CAG CGG G
t t
i AUU ACU AAU AGU U i
Asn Ser
o AUC Ile ACC AAC AGC C o
n A Thr n
AUA ACA AAA AGA A
Lys Arg
AUG Met (start) ACG AAG AGG G
G GUU Val GCU Ala GAU GGU Gly U
Asp
GUC GCC GAC GGC C
GUA GCA GAA Glu GGA A
GUG GCG GAG GGG G

An explanation of the Genetic Code: DNA is a two-stranded molecule. Each strand is a


polynucleotide composed of A (adenosine), T (thymidine), C (cytidine), and G
(guanosine) residues polymerized by "dehydration" synthesis in linear chains with
specific sequences. Each strand has polarity, such that the 5'-hydroxyl (or 5'-phospho)
group of the first nucleotide begins the strand and the 3'-hydroxyl group of the final
nucleotide ends the strand; accordingly, we say that this strand runs 5' to 3' ("Five prime
to three prime") . It is also essential to know that the two strands of DNA run antiparallel
such that one strand runs 5' -> 3' while the other one runs 3' -> 5'. At each nucleotide
residue along the double-stranded DNA molecule, the nucleotides are complementary.
That is, A forms two hydrogen-bonds with T; C forms three hydrogen bonds with G. In
most cases the two-stranded, antiparallel, complementary DNA molecule folds to form a
helical structure which resembles a spiral staircase. This is the reason why DNA has been
referred to as the "Double Helix".

One strand of DNA holds the information that codes for various genes; this strand is
often called the template strand or antisense strand (containing anticodons). The other,
and complementary, strand is called the coding strand or sense strand (containing
codons). Since mRNA is made from the template strand, it has the same information as
the coding strand. The table above refers to triplet nucleotide codons along the sequence
of the coding or sense strand of DNA as it runs 5' -> 3'; the code for the mRNA would be
identical but for the fact that RNA contains U (uridine) rather than T.

An example of two complementary strands of DNA would be:

(5' -> 3') ATGGAATTCTCGCTC (Coding, sense strand)


(3' <- 5') TACCTTAAGAGCGAG (Template, antisense strand)

(5' -> 3') AUGGAAUUCUCGCUC (mRNA made from Template strand)

Since amino acid residues of proteins are specified as triplet codons, the protein sequence
made from the above example would be Met-Glu-Phe-Ser-Leu... (MEFSL...).

Practically, codons are "decoded" by transfer RNAs (tRNA) which interact with a
ribosome-bound messenger RNA (mRNA) containing the coding sequence. There are 64
different tRNAs, each of which has an anticodon loop (used to recognize codons in the
mRNA). 61 of these have a bound amino acyl residue; the appropriate "charged" tRNA
binds to the respective next codon in the mRNA and the ribosome catalyzes the transfer
of the amino acid from the tRNA to the growing (nascent) protein/polypeptide chain. The
remaining 3 codons are used for "punctuation"; that is, they signal the termination (the
end) of the growing polypeptide chain.

Lastly, the Genetic Code in the table above has also been called "The Universal Genetic
Code". It is known as "universal", because it is used by all known organisms as a code for
DNA, mRNA, and tRNA. The universality of the genetic code encompases animals
(including humans), plants, fungi, archaea, bacteria, and viruses. However, all rules have
their exceptions, and such is the case with the Genetic Code; small variations in the code
exist in mitochondria and certain microbes. Nonetheless, it should be emphasized that
these variances represent only a small fraction of known cases, and that the Genetic Code
applies quite broadly, certainly to all known nuclear genes.

Codon Tables
Third Position

A C G U
_____________________________
AA | Lys Asn Lys Asn
F AC | Thr Thr Thr Thr
i AG | Arg Ser Arg Ser
r AU | Ile Ile MET Ile
s P CA | Gln His Gln His
t o CC | Pro Pro Pro Pro
s CG | Arg Arg Arg Arg
& i CU | Leu Leu Leu Leu
t GA | Glu Asp Glu Asp
S i GC | Ala Ala Ala Ala
e o GG | Gly Gly Gly Gly
c n GU | Val Val Val Val
o UA | . Tyr . Tyr
n UC | Ser Ser Ser Ser
d UG | . Cys Trp Cys
UU | Leu Phe Leu Phe

Another way to look at this is:

3 Letter 1 Letter DNA codons for each Amino Acids


NAME Abbreviation Abbreviation

Alanine Ala 1. A GCA,GCC,GCG,GCU


Cysteine Cys 3. C UGC,UGU
Aspartic Acid Asp 4. D GAC,GAU
Glutamic Acid Glu 5. E GAA,GAG
Phenylalanine Phe 6. F UUC,UUU
Glycine Gly 7. G GGA,GGC,GGG,GGU
Histidine His 8. H CAC,CAU
Isoleucine Ile 9. I AUA,AUC,AUU
Lysine Lys 11. K AAA,AAG
Leucine Leu 12. L UUA,UUG,CUA,CUC,CUG,CUU
Methionine Met 13. M AUG
Asparagine Asn 14. N AAC,AAU
Proline Pro 16. P CCA,CCC,CCG,CCU
Glutamine Gln 17. Q CAA,CAG
Arginine Arg 18. R CGA,CGC,CGG,CGU
Serine Ser 19. S UCA,UCC,UCG,UCU,AGC,AGU
Threonine Thr 20. T ACA,ACC,ACG,ACU
Valine Val 22. V GUA,GUC,GUG,GUU
Tryptophan Trp 23. W UGG
Tyrosine Tyr 25. Y UAC,UAU

Stop Codons . UAA,UAG,UGA – B(2)J(10)O(15)U(21)


Z(26)

An example of the multiple combinations of DNA possible for a single


peptide is an example of spelling my first name (without a termination
codon):

So to code for 'MARK' there would be 16 combinations, other sequences


of 4 letters would vary in the number of possibilities based on the
number of codons that could code for a single amino acid. Some amino
acids have up to 6 codons that will be translated into a single Amino
Acid.

M A R K M A R K M A R K M A R K
MET Ala Arg Lys MET Ala Arg Lys MET Ala Arg Lys MET Ala Arg Lys
=============== =============== =============== ===============
AUG-GCU-AGA-AAG AUG-GCU-AGG-AAG AUG-GCU-AGA-AAA AUG-GCU-AGG-AAA
AUG-GCG-AGA-AAG AUG-GCG-AGG-AAG AUG-GCG-AGA-AAA AUG-GCG-AGG-AAA
AUG-GCC-AGA-AAG AUG-GCC-AGG-AAG AUG-GCC-AGA-AAA AUG-GCC-AGG-AAA
AUG-GCA-AGA-AAG AUG-GCA-AGG-AAG AUG-GCA-AGA-AAA AUG-GCA-AGG-AAA

Clusters of Orthologous Groups


Clusters of Orthologous Groups of proteins (COGs) were delineated by comparing
protein sequences encoded in 34 complete genomes, representing 26 major phylogenetic
lineages. Each COG consists of individual proteins or groups of paralogs from at least 3
lineages and thus corresponds to an ancient conserved domain. Proteins from two
eukaryotic genomes were assigned to COGs and can be reached from each individual
COG page

Code Name Proteins Principal component analysis of genomes


in COGs
List of COGs
Archaeoglobus
A 2420 1849
fulgidus Distribution
Halobacterium sp.
O NRC-1 2058 1404 Co-occurrences
Methanococcus Phylogenetic patterns
M jannaschii 1786 1320
Methanobacterium Phylogenetic patterns search
T thermoautotrophicum 1873 1375
Thermoplasma Functional categories
P acidophilum 1479 1176
Pyrococcus J K L
horikoshii 2080 1365
K D O M N P T
Pyrococcus abyssi 1767 1443
Z Aeropyrum pernix 2722 1169 G C E F H I
Saccharomyces
Y cerevisiae 5954 2175 R S
Q Aquifex aeolicus 1560 1317
Thermotoga
Pathways and
V maritima 1858 1507 functional systems
Deinococcus FTP
D radiodurans 3194 2176
Mycobacterium
R tuberculosis 3924 2468
Bacillus subtilis 4118 2803
B
Bacillus halodurans 4066 2728
C Synechocystis 3168 2113
Escherichia coli 4286 3327
E
Buchnera sp. APS 575 559
Pseudomonas
F aeruginosa 5567 4191
G Vibrio cholerae 3834 2745
Haemophilus
H influenzae 1695 1504
S Xylella fastidiosa 2766 1491
Neisseria
N meningitidis 2081 1455
Helicobacter pylori 1578 1081
U Helicobacter pylori
J99 1492 1062
J Campylobacter jejuni 1634 1289
X Rickettsia prowazekii 836 674
Chlamydia
trachomatis 895 631
I Chlamydia
pneumoniae 1053 647
Treponema pallidum 1036 707
L
Borrelia burgdorferi 1637 694
Ureaplasma
urealyticum 613 401
Mycoplasma
W pneumoniae 689 423
Mycoplasma
genitalium 471 376
Total 76,765 51,645

Gene Classification based on COG functional categories

Protein coding genes distribution map


To see map locations of genes, click on a region in
the map, to zoom in on that region
Birgid Schlindwein's

Hypermedia Glossary Of Genetic Terms

Chromosome The term was proposed by Waldeyer (1888) for the individual threads
within a cell nucleus (gk. chroma, colour; soma, body). The self-
replicating genetic structures of cells containing the cellular DNA that
bears in its nucleotide sequence the linear array of genes. In prokaryotes,
chromosomal DNA is circular, and the entire genome is carried on one
chromosome. Eukaryotic genomes consist of a number of chromosomes
whose DNA is associated with different kinds of proteins.

Related Terms:
Nucleus The term introduced by Brown (1833) for the more or less spherical structure
which occures in cells and stains deeply with basic dyes. The cellular organelle
in eukaryotes that contains the genetic material.
Nucleotide A subunit of DNA or RNA consisting of a nitrogenous base (purine in adenine
and guanine, pyrimidine in thymine, or cytosine for DNA and uracil cytosine
for RNA), a phosphate molecule, and a sugar molecule (deoxyribose in DNA
and ribose in RNA). Depending one the sugar the nucleotides are called
deoxyribonucleotides or ribonucleotides. Thousands of nucleotides are linked
to form a DNA or RNA molecule. See also base pair.
Gene The term coined by Johannsen (1909) for the fundamental physical and
functional unit of heredity. The word gene was derived from De Vries' term
pangen, itself a derivative of the word pangenesis which Darwin (1868) had
coined. A gene is an ordered sequence of nucleotides located in a particular
position (locus) on a particular chromosome that encodes a specific functional
product (the gene product, i.e. a protein or RNA molecule). It includes regions
involved in regulation of expression and regions that code for a specific
functional product. See gene expression, allele.
Prokaryote Cell or organism lacking a membrane-bound, structurally discrete nucleus and
other subcellular compartments. Bacteria are prokaryotes. Compare eukaryote.
See chromosomes.
Eukaryote Cell or organism with membrane-bound, structurally discrete nucleus and other
well-developed subcellular compartments. Eukaryotes include all organisms
except viruses, bacteria, and blue-green algae. Compare prokaryote. See
chromosomes.
Protein A large molecule composed of one or more chains of amino acids in a specific
order; the order is determined by the base sequence of nucleotides in the gene
coding for the protein. Proteins are required for the structure, function, and
regulation of the bodys cells, tissues, and organs, and each protein has unique
functions. Examples are hormones, enzymes, and antibodies.

Related Terms:
Genetic The sequence of nucleotides, coded in triplets (codons) along the mRNA, that
code determines the sequence of amino acids in protein synthesis. The DNA
sequence of a gene can be used to predict the mRNA sequence, and the genetic
code can in turn be used to predict the amino acid sequence.

Related Terms:
Nucleotide A subunit of DNA or RNA consisting of a nitrogenous base (purine in
adenine and guanine, pyrimidine in thymine, or cytosine for DNA and
uracil cytosine for RNA), a phosphate molecule, and a sugar molecule
(deoxyribose in DNA and ribose in RNA). Depending one the sugar
the nucleotides are called deoxyribonucleotides or ribonucleotides.
Thousands of nucleotides are linked to form a DNA or RNA
molecule. See also base pair.
Codon The term proposed by Crick (1963) for the sequence of nucleotides in
DNA or RNA.which is responsible for determining that a specific
amino acid shall be inserted into a polypeptide chain. There is more
than one codon for most amino acids. It has now been established that
the codon is a triplet of nitrogenous bases in DNA or RNA that
specifies a single amino acid. See genetic code.
Messenger RNA RNA that serves as a template for protein synthesis or for synthesis of
(mRNA) cDNA. See genetic code.
Amino acid Any of a class of 20 molecules that are combined to form proteins in
living things. The sequence of amino acids in a protein and hence
protein function are determined by the genetic code.
Amino acids contain a basic amino (NH2) group, an acidic carboxyl
(COOH) group and a side chain (R - of a number of different kinds)
attached to an alpha carbon atom.

Thus the general formula is:


Protein A large molecule composed of one or more chains of amino acids in a
specific order; the order is determined by the base sequence of
nucleotides in the gene coding for the protein. Proteins are required
for the structure, function, and regulation of the bodys cells, tissues,
and organs, and each protein has unique functions. Examples are
hormones, enzymes, and antibodies.
Deoxyribonucleic The molecule that encodes genetic information. DNA is a double-
acid (DNA) stranded molecule held together by weak bonds between base pairs of
nucleotides. The four nucleotides in DNA contain the bases: adenine
(A), guanine (G), cytosine (C), and thymine (T). In nature, base pairs
form only between A and T and between G and C; thus the base
sequence of each single strand can be deduced from that of its partner.
DNA sequence The relative order of base pairs, whether in a fragment of DNA, a
gene, a chromosome, or an entire genome. See base sequence.
Gene The term coined by Johannsen (1909) for the fundamental physical
and functional unit of heredity. The word gene was derived from De
Vries' term pangen, itself a derivative of the word pangenesis which
Darwin (1868) had coined. A gene is an ordered sequence of
nucleotides located in a particular position (locus) on a particular
chromosome that encodes a specific functional product (the gene
product, i.e. a protein or RNA molecule). It includes regions involved
in regulation of expression and regions that code for a specific
functional product. See gene expression, allele.

Related Terms:
Yeast artificial A vector used to clone DNA fragments (up to 400 kb); it is constructed
chromosome from the telomeric, centromeric, and replication origin sequences
(YAC) needed for replication in yeast cells. The inserts can be much larger than
those accepted by other vectors such as plasmids or cosmids. (Cf.
cloning vector).

Related Terms:
Sequence Short (200 to 500 base pairs) sequence of genomic DNA that has a single
tagged site occurrence in the human genome and whose location and base sequence are
(STS) known. Detectable by polymerase chain reaction, STSs are useful for
localizing and orienting the mapping and sequence data reported from many
different laboratories and serve as landmarks on the developing physical
map of the human genome.

Expressed sequence tag (EST) is STS derived from cDNA.

Related Terms:
Single nucleotide Sequence polymorphism differing in a single base pair.
polymorphism (SNP)
Example for a single nucleotide substitution:
Rice cultivars with 18% or less amylose had the sequence
AGTTATA at the putative leader intron 5' splice site, while all
cultivars with ahigher proportion of amylose had AGGTATA.
See abstract of publication.

Genes

The units of hereditary information that occupies a fixed position (locus) on a


chromosome. Genes achieve their effects by directing the synthesis of proteins.

Genes are composed of deoxyribonucleic acid (DNA), except in some viruses,


which have genes consisting of a closely related compound called ribonucleic
acid (RNA). A DNA molecule is composed of two chains of nucleotides that wind
about each other to resemble a twisted ladder. The sides of the ladder are made
up of sugars and phosphates; the rungs are formed by bonded pairs of
nitrogenous bases. These bases are adenine (A), guanine (G), cytosine (C), and
thymine (T). An A on one chain bonds to a T on the other (thus forming an A-T
ladder rung); similarly, a C on one chain bonds to a G on the other. If the bonds
between the bases are broken, the two chains unwind, and free nucleotides
within the cell attach themselves to the exposed bases of the now-separated
chains. The free nucleotides line up along each chain according to the base-
pairing rule--A bonds to T, C bonds to G. This process results in the creation of
two identical DNA molecules from one original and is the method by which
hereditary information is passed from one generation of cells to the next.

The sequence of bases along a strand of DNA determines the genetic code.
When the product of a particular gene is needed, the portion of the DNA
molecule that contains that gene will split. A strand of RNA with bases
complementary to those of the gene is created from the free nucleotides in the
cell. (RNA has the base uracil [U] instead of thymine, so A and U form base pairs
during RNA synthesis.) This single chain of RNA, called messenger RNA (mRNA),
then passes to the organelles called ribosomes, where protein synthesis takes
place. A second type of RNA, transfer RNA (tRNA), matches up the nucleotides
on mRNA with specific amino acids. Each set of three nucleotides codes for one
amino acid. The series of amino acids built according to the sequence of
nucleotides forms a polypeptide chain; all proteins are made from one or more
linked polypeptide chains.

Experiments indicate that one gene is responsible for the assembly of one
polypeptide chain. This is known as the one-gene-one-polypeptide hypothesis.

Other experiments have shown that many of the genes within a cell are inactive
much or even all of the time. Thus, at any time, it seems that a gene can be
switched on or off. The process by which genes are activated and deactivated in
bacteria has been determined. Bacteria actually have three types of genes:
structural, operator, and regulator. Structural genes code for the synthesis of
specific polypeptides. Operator genes contain the code necessary to begin the
process of transcribing the DNA message of one or more structural genes into
mRNA. Thus, structural genes are linked to an operator gene in a functional unit
called an operon. Ultimately, the activity of the operon is controlled by a
regulator gene, which produces a small protein molecule called a repressor. The
repressor binds to the operator gene and prevents it from initiating the synthesis
of the protein called for by the operon. The presence or absence of certain
repressor molecules determines whether the operon is off or on. As mentioned,
this model applies to bacteria. Gene regulation in higher organisms is less clearly
understood.

Mutations occur when the number or order of bases in a gene is disrupted.


Nucleotides can be deleted, doubled, rearranged, or replaced, with each
alteration having a particular effect. The mutation generally has little or no
effect; when it does alter an organism, the change is frequently lethal. A
beneficial mutation will rise in frequency within a population until it becomes the
norm.

The Cell

In biology, the basic unit of which all living things are


composed. As the smallest units retaining the
fundamental properties of life, cells are the "atoms"
of the living world. A single cell is often a complete
organism in itself, such as a bacterium or yeast.
Other cells, by differentiating in order to acquire
specialized functions and cooperating with other
specialized cells, become the building blocks of large
multicellular organisms as complex as the human
being. Although they are much larger than atoms,
Figure 1: The initial these building blocks are still very small. The smallest
proposal of the structure of known cells are a group of tiny bacteria called
DNA by James Watson and mycoplasmas; some of these single-celled organisms
are spheres about 0.3 micrometre in diameter, with a
... total mass of 10-14 gram--equal to that of
8,000,000,000 hydrogen atoms. Human cells typically have a mass 400,000
times larger, but even they are only about 20 micrometres across. It would
require a sheet of about 10,000 human cells to cover the head of a pin, and
each human being is composed of more than 75,000,000,000,000 cells.

This article discusses the cell both as an individual unit and as a contributing
part of a larger organism. As an individual unit the cell is capable of digesting its
own nutrients, providing its own energy, and replicating itself in order to
produce succeeding generations. It can be viewed as an enclosed vessel
composed of even smaller units that serve as its skin, skeleton, brain, and
digestive tract. Within this vessel innumerable chemical reactions take place
simultaneously, all of them controlled so that they contribute to the life and
procreation of the cell. In a multicellular organism cells specialize to perform
different functions. In order to do this each cell keeps in constant communication
with its neighbours. As it receives nutrients from and expels wastes into its
surroundings, it adheres to and cooperates with other cells. Cooperative
assemblies of similar cells form tissues, and a cooperation between tissues in
turn forms organs, the functional units of an organism.

Special emphasis is given in this article to animal cells, with some discussion of
the energy-synthesizing processes and extracellular components peculiar to
plants. For detailed discussion of the biochemistry of plant cells, see
photosynthesis. For full-length treatment of the genetic events in the cell
nucleus, see heredity.

Contents of this article:

Introduction
The nature and function of cells
The cell as a self-replicating collection of catalysts
The structure of biologic catalysts
Coupled chemical reactions
Photosynthesis: the beginning of the food chain
ATP: fueling chemical reactions
The cell as a replicator of information
DNA: the genetic material
RNA: replicated from DNA
The cell as an organized unit
Intracellular communication
Intercellular communication
The plasma membrane
Chemical composition and structure of the membrane
Membrane lipids
Membrane proteins
Membrane fluidity
Transport across the membrane
Permeation
Membrane channels
Facilitated diffusion
The glucose transporter
The anion transporter
Secondary active transport
Counter-transport
Co-transport
Primary active transport
The sodium pump
Calcium pumps
Hydrogen ion pumps
Transport of particles
Endocytosis
Exocytosis
Internal membranes
General functions and characteristics
Cellular organelles and their membranes
The vacuole
The lysosome
Microbodies
The endoplasmic reticulum
The smooth endoplasmic reticulum
The rough endoplasmic reticulum
The Golgi apparatus
Secretory vesicles
Sorting of products by chemical receptors
The nucleus
Structural organization of the nucleus
DNA packaging
Nucleosomes: the subunits of chromatin
Organization of chromatin fibre
The nuclear envelope
Genetic organization of the nucleus
The structure of DNA
Rearrangement and modification of DNA
Genetic expression through RNA
RNA synthesis
Processing of mRNA
Regulation of genetic expression
Regulation of RNA synthesis
Regulation of RNA after synthesis
The mitochondrion and the chloroplast
Mitochondrial and chloroplastic structure
Metabolic functions
The mitochondrion
Formation of the electron donors NADH and FADH2
The electron-transport chain
The chemiosmotic theory
The chloroplast
Trapping of light
Fixation of carbon dioxide.
Evolutionary origins
The mitochondrion and chloroplast as independent entities
The endosymbiont hypothesis
The cytoskeleton
Actin filaments
Microtubules
Intermediate filaments
Structural relation of the filaments
The cell matrix and cell-to-cell communication
The extracellular matrix
Matrix polysaccharides
Matrix proteins
Cell-matrix interactions
Intercellular recognition and cell adhesion
Tissue and species recognition
Cell junctions
Adhering junctions
Tight junctions
Gap junctions
Cell-to-cell communication via chemical signaling
Types of chemical signaling
Signal receptors
Cellular response
The plant cell wall
Mechanical properties of wall layers
Components of the cell wall
Cellulose
Matrix polysaccharides
Proteins
Plastics
Intercellular communication
Plasmodesmata
Oligosaccharides with regulatory functions
Cell division and growth
Duplication of the genetic material
Cell division
Mitosis and cytokinesis
Meiosis
The cell division cycle
Controlled proliferation
Failure of proliferation control
Cell differentiation
The differentiated state
The process of differentiation
Embryonic differentiation
Adult differentiation
Errors in differentiation
The evolution of cells
The development of genetic information
The development of metabolism
The history of cell theory
Formulation of the theory
Early observations
The problem of the origin of cells
The protoplasm concept
Contribution of other sciences
Bibliography
General works
Nature and function of cells
Special studies in cell morphology
Special studies in cell biology
Evolution

Summary

In biology, the basic unit of which all living things are composed. The cell is the
smallest structural unit of living matter that is capable of functioning
independently. All cells are similar in composition, form, and function. A single
cell can be a complete organism in itself, as in bacteria and protozoans. Groups
of specialized cells are organized into tissues and organs in multicellular
organisms such as the higher plants and animals.

Cells were first observed in the 17th century, shortly after the discovery of the
microscope. Their significance, however, was not understood until the early 19th
century, when improvements in microscopy permitted closer observation.

Cells are made up of macromolecules (giant molecules) and various smaller


molecules. The chief macromolecules are nucleic acids (DNA [deoxyribonucleic
acid] and RNA [ribonucleic acid]), proteins, and polysaccharides. DNA comprises
the genetic code that carries the essential character of the organism from
generation to generation. RNA translates the genetic information into proteins,
which carry out vital cell functions. Proteins, for example, recognize and
transport specific molecules into and out of the cell and catalyze all chemical
reactions within the cell. Polysaccharides function as structural molecules in the
rigid cell walls of bacterial and plant cells and as storage molecules in the
glycogen granules of vertebrate muscle cells.

Important among the smaller molecular components of cells are lipids, ATP
(adenosine triphosphate), cyclic AMP(adenosine monophosphate), porphyrins,
and water. Lipids are fatty substances that are a major component of cell
membranes. ATP is the energy currency of the cell; this energy-rich molecule is
formed when the cell needs to store energy and is broken down when the cell
requires energy. Cyclic AMP functions as a regulator of cell activities; porphyrins
are pigments essential for oxidation and photosynthesis. About 70 to 80 percent
of a cell is water, which is vital to the chemistry of life.

There are two distinct types of cells: procaryotic cells, found only in blue-green
algae and in bacteria, and eucaryotic cells, composing all other life forms. A
eucaryotic cell consists of an outer membrane, cytoplasm that contains various
membrane-bound structures (organelles), and a membrane-bound nucleus that
encloses the gene-bearing chromosomes. Procaryotic cells have a cell membrane
and cytoplasm, but they have no nucleus (their genetic material is organized
into a single chromosome) and they lack membrane-bound cytoplasmic
organelles. The molecular composition and activities of the two types of cells,
however, are very similar.

A cell is bound by a semipermeable membrane (called the plasma membrane)


that enables it to exchange certain materials with its surroundings. The plasma
membrane is made up of a double layer of lipids studded with proteins. Some of
the proteins extend completely through the lipid layer, others only partially
penetrate it, and still others are thought to be completely embedded within the
lipid layer. In plants the membrane is enclosed in a rigid cellulose cell wall.

The space between cells is filled with the extracellular matrix, a gel of
polysaccharides swollen with water molecules in which are suspended protein
fibres that hold cells together to form tissues.

Within the cytoplasm of both procaryotic and eucaryotic cells are ribosomes,
small bodies that are the sites of protein synthesis. In addition, eucaryotic cells
have a variety of separate membrane-bound cytoplasmic organelles with special
functions. These organelles include the endoplasmic reticulum, Golgi apparatus,
lysosomes, mitochondria, and plastids. The endoplasmic reticulum is a network
of channels that functions in the movement of materials within the cell.
Associated with these channels is the Golgi apparatus, which is composed of
sacs that bud off from the endoplasmic reticulum. These sacs transport cell
products from the endoplasmic reticulum to their appropriate locations either
inside or outside the cell. Lysosomes are sacs filled with digestive enzymes; they
are capable of digesting worn-out cell parts or extracellular debris, such as dead
cells or foreign microorganisms that have been engulfed by the cell.
Mitochondria serve as the power plants of the cell; it is within these organelles
that ATP is synthesized. Plastids are found in the cells of most plants but are
absent from animal cells. Of immense importance are the plastids known as
chloroplasts; they contain the machinery for photosynthesis, the process by
which the energy of sunlight is captured to produce carbohydrates.

The nucleus is the control centre of eucaryotic cells. Within this membrane-
bound structure lie the chromosomes, which carry the hereditary material. The
DNA of the chromosomes directs protein synthesis in the cell; the DNA
instructions are carried from the nucleus to the cytoplasm by messenger RNA
(mRNA). Procaryotic cells have no membrane-enclosed nucleus. They do,
however, have nuclear matter consisting of a single chromosome.

A eucaryotic cell divides, or reproduces, to form two genetically identical


daughter cells in a process called mitosis. Prior to mitosis, the chromosomes
replicate, so that there will be a complete set of hereditary instructions for each
daughter cell. During mitosis, the doubled chromosomes are separated, with one
copy of each going to each daughter cell. Among sexually reproducing
eucaryotes, another type of cell division occurs in the formation of sex cells
called gametes (i.e., eggs and sperm). This process is known as meiosis. It
produces four gametes, each of which contains half the number of chromosomes
of the parent cell. When a male gamete and a female gamete unite, they form a
new individual in which the full number of chromosomes is restored.

Procaryotic cells reproduce in various ways, the most common being binary
fission. This process involves replication of the cell's lone chromosome and the
subsequent splitting of the parent cell into two daughters. It thus resembles
mitosis in eucaryotes, but it lacks the special apparatus involved in true mitotic
division.

The two main types of cell death are necrotic cell death, or coagulative necrosis,
and apoptosis, or programmed cell death. Necrosis occurs in a variety of
contexts produced by disease, injury, or accident and is cell death imposed by
external factors. A cell undergoing necrosis typically swells in size before its
lysosomes rupture and the cell's internal contents spill out into extracellular
space.

In response to specific intracellular and extracellular signals, cells can also


undergo programmed cell death. This apoptosis is a normal cellular process that
plays an important role in growth and development. This type of cell death is
marked by the shrinking of the cytoplasm and nucleus, degradation of the
chromosomes, and the final splitting of the nucleus into a number of membrane-
bound fragments.
Approximate Chemical Composition of a Typical
Mammalian Cell

Component weight percent of total cell

Water 70
Inorganic ions (sodium, potassium, magnesium, 1
calcium, chloride, etc.)
Miscellaneous small metabolites 3
Proteins 18
RNA 1.1
DNA 0.25
Phospholipids and other lipids 5
Polysaccharides 2
Biological Development

The progressive changes in size, shape, and function during the life of an
organism by which its genetic potentials (genotype) are translated into
functioning mature systems (phenotype). Most modern philosophical outlooks
would consider that development of some kind or other characterizes all things,
in both the physical and biological worlds. Such points of view go back to the
very earliest days of philosophy.

Among the pre-Socratic philosophers of Greek Ionia, half a millennium before


Christ, some, like Heracleitus, believed that all natural things are constantly
changing. In contrast, others, of whom Democritus is perhaps the prime
example, suggested that the world is made up by the changing combinations of
atoms, which themselves remain unaltered, not subject to change or
development. The early period of post-Renaissance European science may be
regarded as dominated by this latter atomistic view, which reached its fullest
development in the period between Newton's laws of physics and Dalton's
atomic theory of chemistry in the early 19th century. This outlook was never
easily reconciled with the observations of biologists, and in the last hundred
years a series of discoveries in the physical sciences have combined to swing
opinion back toward the Heracleitan emphasis on the importance of process and
development. The atom, which seemed so unalterable to Dalton, has proved to
be divisible after all, and to maintain its identity only by processes of interaction
between a number of component subatomic particles, which themselves must in
certain aspects be regarded as processes rather than matter. Albert Einstein's
theory of relativity showed that time and space are united in continuum, which
implies that all things are involved in time; that is to say, in development.

The philosophers who charted the transition from the nondevelopmental view,
for which time was an accidental and inessential element, were Henri Bergson
and, in particular, Alfred North Whitehead. Karl Marx and Friedrich Engels, with
their insistence on the difference between dialectical and mechanical
materialism, may be regarded as other important innovators of this trend,
although the generality of their philosophy was somewhat compromised by the
political context in which it was placed and the rigidity with which their later
followers have interpreted it.

Philosophies of the Heracleitan type, which emphasize process and development,


provide much more appropriate frameworks for biology than do philosophies of
the atomistic kind. Living organisms confront biologists with changes of various
kinds, all of which could be regarded as in some sense developmental; however,
biologists have found it convenient to distinguish the changes and to use the
word development for only one of them. Biological development can be defined
as the series of progressive, nonrepetitive changes that occur during the life
history of an organism. The kernel of this definition is to contrast development
with, on the one hand, the essentially repetitive chemical changes involved in
the maintenance of the body, which constitute "metabolism," and on the other
hand, with the longer term changes, which, while nonrepetitive, involve the
sequence of several or many life histories, and which constitute evolution.

As with most formal definitions, these distinctions cannot always be applied


strictly to the real world. In the viruses, for instance, and even in bacteria, it is
difficult to make a distinction between metabolism and development, since the
metabolic activity of a virus particle consists of little more than the development
of new virus particles. In certain other cases, the distinction between
development and evolution becomes blurred: the concept of an individual
organism with a definite life history may be very difficult to apply in plants that
reproduce by vegetative division, the breaking off of a part that can grow into
another complete plant. The possibilities for debate that arise in these special
cases, however, do not in any way invalidate the general usefulness of the
distinctions as conventionally made in biology.

Contents of this article:

Introduction
The scope of development
Types of development
Quantitative and qualitative development
Progressive and regressive development
Single-phase and multiphase development
Structural and functional development
Normal and abnormal development
General systems of development
Development of single-celled organisms
Open and closed systems of development
Blastogenesis versus embryogenesis
Constituent processes of development
Growth
Morphogenesis
Morphogenesis by differential growth
Morphogenetic fields
Morphogenesis by the self-assembly of units
Differentiation
Control and integration of development
Phenomenological aspects
Analytical aspects
Development and evolution
Effect on life histories
Length and timing of the reproductive phase
Recapitulation of ancestral stages
Adaptability and the canalization of development
Genetic assimilation
Bibliography

The Human Body


The physical substance of the human organism, composed of living cells and
extracellular materials and organized into tissues, organs, and systems.

Human anatomy and physiology are treated in many different articles. For
detailed coverage of the body's biochemical constituents, see Proteins;
Carbohydrates; Lipids; Nucleic Acids; Vitamins; and Hormones. For information
on the structure and function of the cells that constitute the body, see Cells. For
detailed discussions of specific tissues, organs, and systems, see Blood;
Circulation and Circulatory Systems: The human cardiovascular system;
Digestion and Digestive Systems; Endocrine Systems: The human endocrine
system; Excretion and Excretory Systems: The human excretory system;
Integumentary Systems: The human skin; Muscles and Muscle Systems; Nerves
and Nervous Systems; Reproduction and Reproductive Systems: The human
reproductive system; Respiration and Respiratory Systems: Human respiration;
Sensory Reception: Human sensory reception; Supportive and Connective
Tissues: The human skeletal system. For a description of how the body
develops, from conception through old age, see Growth and Development,
Biological: Human growth and development.

Many entries describe the body's major structures. For example, see abdominal
cavity; adrenal gland; aorta; bone; brain; ear; eye; heart; kidney; large
intestine; lung; nose; ovary; pancreas; pituitary gland; small intestine; spinal
cord; spleen; stomach; testis; thymus; thyroid gland; tooth; uterus; vertebral
column.

Human beings are, of course, animals--more particularly, members of the order


Mammalia in the subphylum Vertebrata of the phylum Chordata. Like all
chordates, the human animal has a bilaterally symmetrical body that is
characterized at some point during its development by a dorsal supporting rod
(the notochord), gill slits in the region of the pharynx, and a hollow dorsal nerve
cord. Of these features, the first two are present only during the embryonic
stage in the human; the notochord is replaced by the vertebral column, and the
pharyngeal gill slits are lost completely. The dorsal nerve cord is the spinal cord
in human beings; it remains throughout life.

Characteristic of the vertebrate form, the human body has an internal skeleton
that includes a backbone of vertebrae. Typical of mammalian structure, the
human body shows such characteristics as hair, mammary glands, and highly
developed sense organs.

Beyond these similarities, however, lie some profound differences. Among the
mammals, only human beings have a predominantly two-legged (bipedal)
posture, a fact that has greatly modified the general mammalian body plan.
(Even the kangaroo, which hops on two legs when moving rapidly, walks on four
legs and uses its tail as a "third leg" when standing.) Moreover, the human
brain, particularly that part called the neocortex, is far and away the most highly
developed in the animal kingdom. As intelligent as are many other mammals--
such as chimpanzees and dolphins--none have achieved the intellectual status of
the human species.

Contents of this article:


Introduction
Chemical composition of the body.
Organization of the body.
Basic form and development.
Effects of aging.
Change incident to environmental factors.

Summary

The Chemical composition of the body.

Chemically, the human body consists mainly of water and of organic


compounds--i.e., lipids, proteins, carbohydrates, and nucleic acids. Water is
found in the extracellular fluids of the body (the blood plasma, the lymph, and
the interstitial fluid) and within the cells themselves. It serves as a solvent
without which the chemistry of life could not take place. The human body is
about 60 percent water by weight.

Lipids--chiefly fats, phospholipids, and steroids--are major structural


components of the human body. Fats provide an energy reserve for the body,
and fat pads also serve as insulation and shock absorbers. Phospholipids and the
steroid compound cholesterol are major components of the membrane that
surrounds each cell.

Proteins also serve as a major structural component of the body. Like lipids,
proteins are an important constituent of the cell membrane. In addition, such
extracellular materials as hair and nails are composed of protein. So also is
collagen, the fibrous, elastic material that makes up much of the body's skin,
bones, tendons, and ligaments. Proteins also perform numerous functional roles
in the body. Particularly important are those cellular proteins called enzymes,
which catalyze the chemical reactions necessary for life.

Carbohydrates are present in the human body largely as fuels, either as simple
sugars circulating through the bloodstream or as glycogen, a storage compound
found in the liver and the muscles. Small amounts of carbohydrates also occur in
cell membranes, but, in contrast to plants and many invertebrate animals,
humans have little structural carbohydrate in their bodies.

Nucleic acids make up the genetic materials of the body. Deoxyribonucleic acid
(DNA) carries the body's hereditary master code, the instructions according to
which each cell operates. It is DNA, passed from parents to offspring, that
dictates the inherited characteristics of each human being. Ribonucleic acid
(RNA), of which there are several types, helps carry out the instructions encoded
in the DNA.

Along with water and organic compounds, the body's constituents include
various inorganic minerals. Chief among these are calcium, phosphorus, sodium,
magnesium, and iron. Calcium and phosphorus, combined as calcium-phosphate
crystals, form a large part of the body's bones. Calcium is also present as ions in
the blood and interstitial fluid, as is sodium. Ions of phosphorus, potassium, and
magnesium, on the other hand, are abundant within the intercellular fluid. All of
these ions play vital roles in the body's metabolic processes. Iron is present
mainly as part of hemoglobin, the oxygen-carrying pigment of the red blood
cells. Other mineral constituents of the body, found in minute but necessary
concentrations, include cobalt, copper, iodine, manganese, and zinc.

The Organization of the body.

The cell is the basic living unit of the human body--indeed, of all organisms. The
human body consists of more than 75 trillion cells, each capable of growth,
metabolism, response to stimuli, and, with some exceptions, reproduction.
Although there are some 200 different types of cells in the body, these can be
grouped into four basic classes. These four basic cell types, together with their
extracellular materials, form the fundamental tissues of the human body: (1)
epithelial tissues, which cover the body's surface and line the internal organs,
body cavities, and passageways; (2) muscle tissues, which are capable of
contraction and form the body's musculature; (3) nerve tissues, which conduct
electrical impulses and make up the nervous system; and (4) connective tissues,
which are composed of widely spaced cells and large amounts of intercellular
matrix and which bind together various body structures. (Bone and blood are
considered specialized connective tissues, in which the intercellular matrix is,
respectively, hard and liquid.)

The next level of organization in the body is that of the organ. An organ is a
group of tissues that constitutes a distinct structural and functional unit. Thus,
the heart is an organ composed of all four tissues, whose function is to pump
blood throughout the body. Of course, the heart does not function in isolation; it
is part of a system composed of blood and blood vessels as well. The highest
level of body organization, then, is that of the organ system.

The body includes nine major organ systems, each composed of various organs
and tissues that work together as a functional unit. The chief constituents and
prime functions of each system are summarized below. (1) The integumentary
system, composed of the skin and associated structures, protects the body from
invasion by harmful microorganisms and chemicals; it also prevents water loss
from the body. (2) The musculoskeletal system (also referred to separately as
the muscle system and the skeletal system), composed of the skeletal muscles
and bones (with about 206 of the latter in adults), moves the body and
protectively houses its internal organs. (3) The respiratory system, composed of
the breathing passages, lungs, and muscles of respiration, obtains from the air
the oxygen necessary for cellular metabolism; it also returns to the air the
carbon dioxide that forms as a waste product of such metabolism. (4) The
circulatory system, composed of the heart, blood, and blood vessels, circulates a
transport fluid throughout the body, providing the cells with a steady supply of
oxygen and nutrients and carrying away such waste products as carbon dioxide
and toxic nitrogen compounds. (5) The digestive system, composed of the
mouth, esophagus, stomach, and intestines, breaks down food into usable
substances (nutrients), which are then absorbed from the blood or lymph; this
system also eliminates the unusable or excess portion of the food as fecal
matter. (6) The excretory system, composed of the kidneys, ureters, urinary
bladder, and urethra, removes toxic nitrogen compounds and other wastes from
the blood. (7) The nervous system, composed of the sensory organs, brain,
spinal cord, and nerves, transmits, integrates, and analyzes sensory information
and carries impulses to effect the appropriate muscular or glandular responses.
(8) The endocrine system, composed of the hormone-secreting glands and
tissues, provides a chemical communications network for coordinating various
body processes. (9) The reproductive system, composed of the male or female
sex organs, enables reproduction and thereby ensures the continuation of the
species.

Cellular Articles in other Topics:


cytoskeleton
cytoskeleton
from cytoskeleton

division

aging process
Tissue cell loss and replacement
from aging

blastema formation
animal development
from animal development
Cell reproduction
from reproduction

cellular components
Cytology
from morphology

cleavage
Early development
from animal development

cloning
clone
from clone

epidermal differentiation
The epidermis
from skin

fetus growth rate


Types and rates of human growth
from human development

plant growth determination


Origin of the primary organs
from plant development
The contribution of cells and tissues
from plant development

regeneration and cell renewal


Repair and regeneration
from human disease

sexual reproduction specialization


Sex cells
from sex
Hormones
from sex

structural unit of life


Life on Earth
from life
The earliest living systems
from life

vitamin deficiencies
Vitamins
from nutritional disease

physiology
Historical background
from physiology

aging process
human aging
from human aging
aging
from aging
Internal environment: consequences of metabolism
from aging

cellular metabolism
Endocrine system
from human aging

fluid regulation
Regulation of water and salt balance
from excretion

genetic behaviour
genetics
from genetics

hormones
Hormone chemistry.
from hormone

interaction with drugs


General principles
from drug
metabolism
metabolism
from metabolism
Coarse control
from metabolism

circulatory system
Main features of circulatory systems
from circulation

human body
Organization of the body.
from human body

metabolic disease
metabolic disease
from metabolic disease
Disorders of porphyrin metabolism
from metabolic disease

pathology
Characteristics of cell and tissue changes
from animal disease

cancer
cancer
from cancer
ref. [cancer] passim to
ref. [cancer20]

cell death
The "point of no return"
from death

cryosurgical tissue destruction


cryosurgery
from cryosurgery

growth inhibition
Abnormal growth of cells
from human disease

infection
virus
from virus

radiation damage
Radiation injury
from human disease
Major types of radiation injury
from radiation
scientific study

cytology
cytology
from cytology

genetic continuity and organization


Genetics
from zoology

morphology of cells
The study of structure
from biology

observations by

Braun
Braun, Alexander Carl Heinrich
from Braun, Alexander Carl Heinrich

Claude
Claude, Albert
from Claude, Albert

Goodsir
Goodsir, John
from Goodsir, John

Mohl
Mohl, Hugo von
from Mohl, Hugo von

Mller
Müller, Johannes Peter
from Mller, Johannes Peter

Palade
Palade, George E.
from Palade, George E.

tissue culture examination


tissue culture
from tissue culture

structure and function

bacteria ingestion in phagocytosis


phagocytosis
from phagocytosis

difference between animal and plant cells


ref. [animal]
fertilization
fertilization
from fertilization

human respiration
Peripheral chemoreceptors
from respiration, human

lipid structural components


lipid
from lipid
Figure 1: nucleic acid formation
Structure of an nucleic acid
information from nucleic acid
system.
spatial patterns localization
Structural and functional development
from biological development

Figure 3: A
parsing graph. Information Processing

Query languages

The uses of databases are manifold. They provide a means of


Figure 4: A
retrieving records or parts of records and performing various
semantic calculations before displaying the results. The interface by
network which such manipulations are specified is called the query
representation. language. Whereas early query languages were originally so
complex that interacting with electronic databases could be
done only by specially trained individuals, recent interfaces are
more user-friendly, allowing casual users to access database
information.

The main types of popular query modes are the "menu," the
Figure 2: "fill-in-the-blank" technique, and the structured query.
Document Particularly suited for novices, the menu requires a person to
imaging. choose from several alternatives displayed on the video
terminal screen. The fill-in-the-blank technique is one in which
the user is prompted to enter key words as search statements.
The structured query approach is effective with relational
databases. It has a formal, powerful syntax that is in fact a
programming language, and it is able to accommodate logical
Figure 5: The operators. One implementation of this approach, the
Structured Query Language (SQL), has the form
architecture of
a networked
select [field Fa, Fb, . . . , Fn]
information
system.
from [database Da, Db, . . . , Dn]

where [field Fa = abc] and [field Fb = def].

Structured query languages support database searching and other operations by


using commands such as "find," "delete," "print," "sum," and so forth. The
sentencelike structure of an SQL query resembles natural language except that
its syntax is limited and fixed. Instead of using an SQL statement, it is possible
to represent queries in tabular form. The technique, referred to as query-by-
example (or QBE), displays an empty tabular form and expects the searcher to
enter the search specifications into appropriate columns. The program then
constructs an SQL-type query from the table and executes it.

The most flexible query language is of course natural language. The use of
natural-language sentences in a constrained form to search databases is allowed
by some commercial database management software. These programs parse the
syntax of the query; recognize its action words and their synonyms; identify the
names of files, records, and fields; and perform the logical operations required.
Experimental systems that accept such natural-language queries in spoken voice
have been developed; however, the ability to employ unrestricted natural
language to query unstructured information will require further advances in
machine understanding of natural language, particularly in techniques of
representing the semantic and pragmatic context of ideas. The prospect of an
intelligent conversation between humans and a large store of digitally encoded
knowledge is not imminent.

Information searching and retrieval

State-of-the-art approaches to retrieving information employ two generic


techniques: (1) matching words in the query against the database index (key-
word searching) and (2) traversing the database with the aid of hypertext or
hypermedia links.

Key-word searches can be made either more general or more narrow in scope by
means of logical operators (e.g., disjunction and conjunction). Because of the
semantic ambiguities involved in free-text indexing, however, the precision of
the key-word retrieval technique--that is, the percentage of relevant documents
correctly retrieved from a collection--is far from ideal, and various modifications
have been introduced to improve it. In one such enhancement, the search
output is sorted by degree of relevance, based on a statistical match between
the key words in the query and in the document; in another, the program
automatically generates a new query using one or more documents considered
relevant by the user. Key-word searching has been the dominant approach to
text retrieval since the early 1960s; hypertext has so far been largely confined
to personal or corporate information-retrieval applications.

The exponential growth of the use of computer networks in the 1990s presages
significant changes in systems and techniques of information retrieval. In a
wide-area information service, a number of which began operating at the
beginning of the 1990s on the Internet computer network, a user's personal
computer or terminal (called a client) can search simultaneously a number of
databases maintained on heterogeneous computers (called servers). The latter
are located at different geographic sites, and their databases contain different
data types and often use incompatible data formats. The simultaneous,
distributed search is possible because clients and servers agree on a standard
document addressing scheme and adopt a common communications protocol
that accommodates all the data types and formats used by the servers.
Communication with other wide-area services using different protocols is
accomplished by routing through so-called gateways capable of protocol
translation. The architecture of a typical networked information system is
illustrated in Figure 5. Several representative clients are shown: a "dumb"
terminal (i.e., one with no internal processor), a personal computer (PC), and
Macintosh (trademark; Mac), and NeXT (trademark) machines. They have
access to data on the servers sharing a common protocol as well as to data
provided by services that require protocol conversion via the gateways. Network
news is such a wide-area service, containing hundreds of news groups on a
variety of subjects, by which users can read and post messages.

Evolving information-retrieval techniques, exemplified by an experimental


interface to the NASA space shuttle reference manual, combine natural
language, hyperlinks, and key-word searching. Other techniques, seeking higher
levels of retrieval precision and effectiveness, are studied by researchers
involved with artificial intelligence and neural networks. The next major
milestone may be a computer program that traverses the seamless information
universe of wide-area electronic networks and continuously filters its contents
through profiles of organizational and personal interest: the information robot of
the 21st century.

Contents of this article:

Introduction
General considerations
Basic concepts
Information as a resource and commodity
Elements of information processing
Acquisition and recording of information in analog form
Acquisition and recording of information in digital form
Recording media
Recording techniques
Inventory of recorded information
Primary and secondary literature
Databases
Organization and retrieval of information
Description and content analysis of analog-form records
Description and content analysis of digital-form information
Machine indexing
Semantic content analysis
Image analysis
Speech analysis
Storage structures for digital-form information
Query languages
Information searching and retrieval
Information display
Video
Print
Printers
Microfilm and microfiche
Voice
Dissemination of information
Information systems
Impact of information technology
Analysis and design of information systems
Categories of information systems
Management-oriented information systems
Administration-oriented information systems
Service-oriented information systems
Computer-integrated manufacturing
Transaction-processing systems
Expert systems
Public information utilities
Impact of computer-based information systems on society
Effects on the economy
Effects on governance and management
Effects on the individual
Bibliography
Concepts of information and information systems
Information processing
Organizational information systems
Public information utilities
Impact of information systems
Bibliographic sources

Information only adds value to your organization if people can find the content
they need, when they need it. Your users need the tools to search, navigate and
view mission-critical information—whether it’s stored in a structured database
down the hall, on a Web server across the street, or in a word processing
document saved on a file server half-way around the world. They need an
intuitive solution that can keep up with the increasing amount of information
they create and use every day. They need the power of Verity K2 Enterprise.

Connects the Right Users with the Right Content at the Right Time

The most accurate, scalable infrastructure available to power corporate portals,


Verity K2 Enterprise gives your users the tools they need to turn information
overload into competitive advantage. K2 Enterprise delivers rapid, relevant
information retrieval with Verity’s advanced search, while its Intelligent
Classification features let you organize information the way you organize your
business. This lets your users navigate directly to the information they need
through K2 Enterprise’s advanced user interfaces.

Behind your users’ browsers, K2 Enterprise’s open design ensures rapid


integration with your existing e-business environment, while its scalable
architecture gives your portal unlimited growth and reliable fault-tolerance.
Regardless of how many documents are being searched or how many users are
searching them, K2 Enterprise scales linearly with zero performance
degradation. And its global support extends your portal to 24 languages and
provides the flexibility to distribute content administration to the local offices
that created and know it best.

Advanced Search

If your users can’t find information, they can’t act on it. That’s why the
advanced Verity search, navigation, and viewing technologies that K2 Enterprise
incorporates are so important to the success of your business. Using the robust
Verity Query Language, you can implement these transparently to put the power
of sophisticated queries behind simple, one-word searches. Novice users can get
accurate results without using complex query syntax or understanding your
corporate taxonomy. Features like smart correction of user errors, stemming
expansion, query-by-example and automatic summarization guide your users to
the information they need—even if they misspell search terms or don’t know
where to start looking.

Intelligent Classification

Portals powered by Verity K2 Enterprise can do more than search and retrieve
specific information for your users. They can automatically organize your
information assets to make them easier for your users to browse. Unlike
automatic classification methods that rely solely on statistical clustering
algorithms to group documents, Intelligent Classification combines machine
efficiency with human intellect. Subject matter experts can refine the rules
created by computers to apply business logic that can only be understood by the
human mind.

Advanced User Interfaces

Effective portal solutions make information as easy to find and retrieve for
novice users as it is for experts familiar with advanced search techniques and
corporate taxonomies. Verity K2 Enterprise provides your users with advanced
user interfaces that make both unstructured and structured information assets
readily accessible. For example, you can create directories based on your
corporate terminology through which users can navigate and restrict searches to
find unstructured content. Or you can utilize K2 Enterprise’s parametric search
to let users sort, filter and drill through structured information.

Rapid E-business Integration


Verity K2 Enterprise is designed for rapid integration into existing e-business
environments. Its straightforward integration leverages your current
investments, minimizing implementation costs and ensuring project success. The
key is Verity K2 Architecture, which supports technologies such as COM and
Java, and includes a flexible API that provides access to all of its advanced
features. K2 Enterprise also supports the widest range of information and
repositories of any portal solution on the market. These include HTML, XML,
multibyte data, Web and file systems and ODBC compliant databases.

Unlimited Growth

Verity K2 Enterprise’s distributed architecture powers your portal with unlimited


growth potential. By brokering searches, you can increase both the amount of
information being searched and the number of users submitting queries—
without any degradation in performance.

Fault Tolerant Operation

Verity K2 Enterprise’s brokered search ensures your site will always be up and
running by routing queries to servers that are best suited to the task. This
distributes load evenly, ensuring that response time never suffers because one
server is sitting idle while others are overloaded and isolating hardware failures
to deliver uninterrupted service to your users enterprise-wide—24 hours a day,
seven days a week.

Global Support

Verity K2 Enterprise is the only portal infrastructure that supports true


enterprise-wide and global scale solutions. Features include multiple language
capabilities and built-in flexibility that allows administration to be distributed
across different geographic locations.

Multiple Language Support—All of K2 Enterprise’s components support multi-


byte character sets, which allows you to index, classify, search and view
information in 24 Asian and European languages. By partnering with leading
vendors like IBM, Inxight and Basis Technologies, Verity provides best-of-breed
language locales to guarantee that K2 Enterprise always delivers the most
advanced stemming, tokenization and concept extraction available.

Flexible, Distributed Administration—By allowing you to distribute


administration functions across geographic locations, K2 Enterprise puts
administration of content in the hands of the groups that created and
understand it best. Content can be administered on local servers, yet remain
searchable enterprise-wide. Queries are transparently brokered to each local
server, returning relevant results from across your enterprise with the
performance of a single search engine.
The key to success in e-commerce is turning browsers into buyers—faster and
more efficiently than your competitors can. Verity® K2 Catalog gives your e-
commerce portal the power to do just that. By intuitively matching the right
products to the right people, Verity Catalog dramatically increases sales and
creates loyal customers who keep coming back for more.

The most effective, scalable infrastructure available to


power e-commerce portals, Verity K2 Catalog ensures that your customers find
exactly what they’re looking for—and more. Besides providing advanced Verity
search that makes finding products on your site quick and easy, Verity K2
Catalog’s Intelligent Merchandising lets you influence purchasing decisions by
suggesting related products, up-selling and promoting specific merchandise—
adding profitable site stickiness to your online store. Adaptive personalization
features take this a step further by tailoring the online shopping experience
based on customer browsing patterns.

Behind the shelves of your e-store, Verity K2 Catalog’s open design ensures
rapid integration with your existing e-business environment. And its scalable
architecture gives your e-commerce solution the capability to accommodate
unlimited growth of both your catalog and customers with zero performance
degradation. This means your customers can fill their shopping carts fuller and
faster with Verity K2 Catalog—24 hours a day, seven days a week.

Intelligent Merchandising

E-commerce portals powered by Verity Catalog can do more than retrieve


specific products for customers. They can influence purchasing decisions through
sophisticated online merchandising techniques that increase sales and recognize
more revenue. Verity Catalog’s Intelligent Merchandising leverages Verity’s
Intelligent Classification technology to create online aisles through which you
can guide your customers directly to the products you want to sell them. Or you
can employ it to build business-rules that promote overstocked products,
recommend items that complement the ones your customers are looking for, or
suggest substitutes for out-of-stock merchandise.

Profitable Site Stickiness

Site stickiness isn’t just about keeping customers on your site longer. It’s about
keeping them longer because they’re spending more money. Verity K2 Catalog
profitably increases your site’s "stickiness" with intuitive, accurate search. This is
one of the key advantages of portals powered by Verity—because if your
customers can’t find what they’re looking for with a few clicks of their mouse,
you’ll lose them to a site where they can.

Rapid E-Business Integration

Verity K2 Catalog is designed to fit within existing e-business environments. Its


rapid integration leverages your current investments by minimizing
implementation costs and decreasing time-to-market. In addition, only Verity K2
Catalog gives administrators the control and flexibility necessary to deliver the
organized, relevant information customers need to make quick, informed
purchasing decisions without costly administrative overhead or expensive
content repurposing.

Adaptive Personalization

Instead of relying on static user profiles, Verity personalizes the online shopping
experience by dynamically adapting to each search based on past queries and
customer preferences. Specific products can be promoted based on previous
purchasing history to provide the right match between products and customers—
whether they’re shopping for themselves or someone else.

Unlimited Growth

Verity K2 Catalog’s scalability, fault-tolerance and wide range of supported data


are the foundation of a solid e-commerce portal. This means your customers can
rely on you to sell them the products they want, when they want them—no
matter how many people are shopping at your site. And you can grow your e-
commerce business one customer—or one million customers—at a time.

Scalability—Expand your catalog and handle more queries as your customer


base grows, without any degradation in performance.

Fault-Tolerant Operation—Verity K2 Catalog’s brokered search ensures your


site will always be up and running by routing queries to servers that are best
suited to the task. This distributes load evenly, ensuring that response time
never suffers because one server is sitting idle while others are overloaded and
isolating hardware failures to deliver uninterrupted service to your customers—
24 hours a day, seven days a week.

Structured and Unstructured Information—Verity K2 Catalog supports the


widest range of both structured and unstructured information and repositories of
any portal solution on the market: HTML, XML, multibyte data, Web and file
systems and ODBC databases.

Multiple Language Support—Optional Verity Locales give K2 Catalog the


power to sell your products in 24 Asian and European languages by recognizing,
filtering, indexing and searching selected international character sets.

SIM is ideally suited to web site content management, especially for web sites
that have a need for;
• Management of structured documents,
• Large data volumes (up to millions of documents),
• Web based workflow and release control, including the ability to preview
changes and additions in place in the web site,
• Tightly integrated searching and table of contents support,
• Media asset management, where multimedia objects are Dublin Core
metadata cataloged and managed as a collected resource for the site.
• Dynamic presentation of documents which allows for customization based
on user needs,
• Hypertext link creation and multimedia object embedding that is
implemented in a completely word-processing package independent
manner, greatly reducing integration costs for new editing packages,
• Hypertext link management that tracks all links, allowing change impact
analysis and easy "what points at me?" checking,
• A choice of editing packages and approaches including MS Word, XML
editors, SGML editors, HTML fill-in form support, and Direct XML editing
through a fill-in form (for administrators!)

Public reference sites

To see the output of this web management system, visit the Textile Clothing and
Footwear Australia site at TCFOZ

This site is maintained by non-technical content editors, who create content


using Xmetal.

Another web site running with SIM Web site content management is Standards
Australia , who wrote extensions to the SIM system in the ACE programming
language to meet their particular needs. Standards Australia use MS Word as
their editing package, using the SIM RTF->XML translator to convert and
manage those documents in XML format.

Key Characteristics

Web Server: SIM Web server – multithreaded server – ACE used for application
logic.

Platforms: Windows NT, Solaris

Code Base: ACE (SIM scripting language – object oriented java-like language
with SGML/XML support).

User Interface: All user interfaces are provided with a standard web browser.
Editing package is configurable.

Database Used: SIM Content Management Server – text retrieval database


with SGML/XML native support. ODBC support is included in ACE, so content
from other sources can be integrated.

Authentication Mechanism: The Web content management system currently


maintains its own internal user database, but is being extended to support LDAP
lookup for user authentication.

StyleSheet mechanism: The ACE language is ideally suited to XML->HTML


conversion processing, as it is integrated with SGML and XML parsers (such as
EXPAT, and sgmlp). The Web content management system does not currently
support XSLT for stylesheeting, but the SIM group does have an XSLT engine in
beta test, and it will be added as a supplementary mechanism in the future. One
of the advantage of using ACE for stylesheeting is that it has powerful text
manipulation features as well as XML/SGML support.

Workflow Support: Simple workflow support and release control is included.


Documents can exist in a number of states including draft, pending review,
released, suspended and deleted. Documents can be previewed on the web
while in any status other than released – released documents are visible to other
users. For complex workflow support, the SIM DMS (Document Management
System) is available. This application supports complex Workflow management
coalition standard workflow, with a web user interface. The SIM DMS product is
separately licensed, and is still currently in Beta release.

SIM Documentation Management Solution

One of the keys to successful electronic delivery of technical documentation is


the ability to re-use content, that is, deliver content in a number of different
ways from a single source. This allows the same document and document
components to be used over and over again. Re-use guarantees consistency :
every user sees the same, correct version of a document. Re-use means
efficiency : a document is written once only. Re-use allows for refinement : a
document can be developed over time. It also allows, for example, different
customized views of the same source documentation to be delivered to different
classes of users; similarly, it allows the same source documentation to be
delivered in multiple formats.

A Documentation Management Scenario

Consider, for example, a company that is producing a set of technical documents


that are to be delivered to a number of different clients. Internet based delivery
of the documents is one of the requirements; as a consequence, changes to any
document will be immediately provided to customers via the web. Although the
content to be delivered to each client may be substantially the same, there will
typically be some differences. These differences may result from variations in
the products the technical documents are describing. Also, the clients may wish
to add annotations to their documentation, to reflect, for example, field
knowledge obtained in using the manuals to repair various problems. In these
cases, the annotations may represent valuable intellectual property of each of
the clients and customers will require that access to them be restricted to their
own personnel. Thus the document repository to be delivered to the clients will
generally consists of a core of common content, with additional content that is
private to specific clients.

Documentation Components
Managing database content is more than just storing the raw text of documents
and their accompanying figures. Documents can have internal structure, and
there can be an external structure relating separate documents. For example,
documents are often interlinked in a number of ways and these links are
essential parts of the document content. When searching for documents, users
often scan indexes to browse the terms contained in the document repository;
these terms constitute the vocabulary of the document collection. Sophisticated
users may also require to know the frequency of each of these terms in the
document collection when conducting searches, in order to produce more
effective queries. Documents can also have associated metadata that provides
information about the document, such as author, or status, or security level.
Metadata, too, can be used to drive more productive searches.

Customized Delivery and Effectivity

It is essential that the electronic publishing system deliver the correct document
content, links and vocabulary to each class of users accessing the system. The
need to provide an accurate snapshot of the database contents (i.e. text,
figures, links and vocabulary and term frequencies) for each particular class of
users is referred to as effectivity . Efficient provision of effectivity requires very
sophisticated text database support.

Automatic Tables of Content

Another requirement for technical documentation include the ability to


dynamically produce tables of content (TOCs) for each document from the XML
document structure and content. Technical documents are often long, so that
when viewing a fragment of a document, it is important to understand the
location of that fragment in the context of the whole document. This can be
achieved by displaying the TOC along with a document fragment, when the
fragment is displayed. Since the documents change over time, it is necessary to
generate these TOCs dynamically when the document is viewed.

Dynamic Update

Technical documentation can involve very large document collections, which


must be updated dynamically. This means that the delivery systems must
provide a scalable solution, one that is able to update and deliver content
efficiently for fast growing document collections.

Key Points

In summary some of the key requirements for a technical document delivery


system include:

• The ability to repurpose content; for example, support multiple delivery


formats from a single source,
• Manage all components of documentation, including content, images,
internal structure, links, vocabulary and metadata,
• Support effectivity, namely deliver database snapshot appropriate to
each class of users,
• Provide dynamic tables of content (TOC) from the XML document
structure and content,
• Update and deliver documents quickly and efficiently,
• Provide powerful navigation searching and viewing, and
• Provide scalable solutions.

SIM Legislation Management Solution


The Nature of Legislation

The law is both complex and comprehensive. Not surprisingly, legislation


databases are examples of large, very structured text collections. For example a
single Act of Parliament which might be broken into many tens or hundreds of
numbered sections, which in turn are broken into numbered subsections or
paragraphs or sub-paragraphs. In large Acts these sections are grouped into
chapters, parts, divisions and/or subdivisions, each with a label, and usually a
section or title. A formal system of reference (or citation) allows each
component of the database to be identified clearly and unambiguously.

Amendments to Legislation

An important characteristic of legislation is that it changes over time. Sections or


even larger units can be added, removed or altered. New law may be handed
down to become legislation, creating a new principal Act where no Act previously
existed. Existing legislation may undergo a complete restructuring, creating a
new Act or Acts, replacing those previously in place. In between such creation
and replacement, amending Acts can specify alterations to the principal Acts,
perhaps changing the wording of one or two sections, or replacing complete
sections, or even removing or inserting whole parts or chapters.

Legislation's Temporal Nature

Although only the principal Acts and the amending Acts have legal force, lawyers
and legal researchers need access to the law as it existed during the time period
relevant to their particular problem. From time to time, authorized Government
bodies issue consolidations of particular Acts. A consolidation represents current
law, presenting the principal Act as modified by the relevant amending Acts;
that is, with all additions, deletions, and changes to wording applied, and with all
new components inserted. However, lawyers are often interested in the state of
the law at times other than those for which officially released consolidations are
available. Ideally, they would like to access consolidations of the law at arbitrary
points in time .

Representing Structure with XML


The use of XML solves the problem of how to represent the structured text
inherent in legislation. XML defines an abstract grammar for representation and
exchange of text with tags interspersed throughout the text. A DTD (Document
Type Definition) is a particular XML grammar describing which document
components are valid and what sub-components they can contain. Acts from a
given jurisdiction can be stored in XML in a format satisfying a particular DTD
(which would state that every Act must contain sections and each section must
contain text, or two or more subsections, and so on). One would then describe
how to display a particular Act that satisfied the DTD by describing the
presentation in terms of the DTD. A number of different presentation schemes
can be described for a single DTD so that one might specify a presentation which
only displays the table-of-contents to a specified depth, as well as a presentation
for the whole Act. This is one of the advantages most often cited for using XML:
the ability to reuse the same information for multiple purposes.

Long-term Availability

For information like legislation that continually changes over time, XML provides
a safe format for the archiving of documents. Utilities such as word processors
often use proprietary formats and are unable to read legacy documents, even
those authored by a pervious version of same word processor. These problems
do not exist if XML is used, because only the content and structure of documents
are represented by XML; the presentation of documents is treated separately.

An End-to end Solution

Because the structure and content of the legislation is available to the


application, in a form separate from presentation information, it is possible to
develop powerful end-to-end solutions, not easily achievable if proprietary data
representation standards are used. Using the Structured Information Manager, a
legislation drafting and access system called EnAct was developed for the State
Government of Tasmania in Australia. Enact solves the second problem listed
above for legislation databases: the ability to search legislation databases at an
arbitrary point in time and view the correct consolidation of an Act at that point
of time. Note that accessing legislation databases does not only involve viewing
text. Legislation databases consist of a large number of interrelated documents
linked together by hyperlinks. Viewing a consolidation of legislation at a
particular point in time involves retrieving the correct text as well as the correct
hyperlinks at that point in time.

SIM, XML, and Legislation: an Ideal Partnership

The EnAct system exemplifies the direction that legislation databases will
develop in the future, namely providing accesses to the correct state of the law
at any point in time. EnAct is able to achieve this goal because the legislation is
maintained in XML, allowing access to the structure and content of data, and
because the SIM document management system, used for the development of
EnAct, efficiently performs the operations on XML content required to achieve
automatic consolidations.
SIM Intelligence Applications
In intelligence applications, it is normal to build and maintain an information
repository fed from a number of sources and then conduct searches in order to
locate relevant information. Such information repositories are in use in both
military and commercial applications. Where the information is highly structured,
conventional database management systems are used to maintain these data
warehouses. Where the information consists of text and metadata, systems with
advanced text database capabilities are required.

Large Scale, Dynamic Applications

In these applications, the information repository can range from a few gigabytes
in size to hundreds of gigabytes or more. The repository may be static or, more
typically, continually growing. For example, in the case of a news feed more
than one gigabyte of new data can arrive over the course of every day. Other
application areas may need to handle even greater dataflow. Some applications
also need to migrate non-current data for archiving. For all large-scale high-load
intelligence applications, high performance hardware/software architectures,
such as multiprocessor Unix workstations, have to be deployed.

Building Information Repositories

The most important task when building an intelligence application is building and
maintaining the information repository. When a new document is inserted into
the repository, every word in the document must be extracted and indexed. This
is a very expensive operation as a document may contain several thousand
words. And, as noted, the amount of information to process can be very large
indeed. SIM has been optimized for just such high volume environments,
handling the update process as efficiently as is possible.

Another problem is that new documents may be arriving at the same time as the
database is in use for searching. Although many existing text database systems
support fast batch loading of data as an overnight operation when the database
is off-line, they do not allow updates of the repository during the day when the
database is in use. However, for any organization that requires up-to-date
access to the most recent data, or access to its intelligence 24 hours a day,
seven days a week, this is not acceptable. SIM has been specifically designed to
support concurrent updates and queries, thereby providing 24 hour access to
up-to-date information.

Searching Information Repositories

The reason for building an information repository is to provide access to the data
it contains. Since the document collection can be very large, advanced search
techniques are needed to locate desired information. SIM has been developed to
support just such sophisticated searching. Queries can use Boolean logic, word
position information (such as "same sentence", "same paragraph", "within n
words"), document structure, and ranked relevance queries (where the
documents are returned in order of relevance to the query) to locate target data.
Each query type can combined as required. For example, to achieve high
accuracy when querying a collection, a searcher could combine a Boolean query
with a ranked query to identify a subset of the collection that can then be ranked
against a set of ranking terms. Fuzzy matching is also important: for example, it
can be common to have several alternative spellings (or misspellings) of a word.
SIM provides support for fuzzy matching by computing a distance measure
between two terms, so that the presence of alternate spellings need not
frustrate the user's task.

Repository Management

To maintain large, high-performance information repositories, the quality of a


system's database administration capabilities are of the utmost importance. For
very large repositories, it can be desirable to split the data collection over
multiple databases. SIM has the ability to do just that, while retaining the ability
to search each database in parallel. With critical information collections, it is
necessary to be able to back up repositories efficiently and robustly, and to be
able to monitor and refine database performance. SIM provides administration
utilities that are of the very highest quality and reliability, and that deliver the
finest level of control.

A Proven Track Record

SIM provides an advanced, extremely rich, extremely reliable set of capabilities


that support high-performance, secure intelligence applications. SIM has been
successfully adopted by the Departments of Defense of both Australia and the
U.S.A. for managing and searching large repositories of information.

SIM Knowledge Base Solution


Whether in the form of a human service or embodied as a physical product,
ultimately, knowledge is every corporation's stock in trade. The pooled
knowledge of an enterprise is its fundamental capital, its true wealth.
Knowledge management is about leveraging corporate knowledge: identifying it
where ever it may be found, storing it for re-use, and delivering it to where it is
needed. SIM's advanced content management enables organizations to do just
that. By focusing on content, SIM transforms opaque, non-functional documents
into richly structured information sources. SIM's support of sophisticated
content, structure, time, and metadata querying opens up the organizational
knowledge base. And SIM's high-performance database management and web
delivery enable it to deliver the right information to right people at the right
time.
A Simple Model

Consider an organization that builds two databases over time and matches the
documents from one against the contents of the other. One database represents
knowledge, the other needs. This simple model of a knowledge base is
applicable to many practical situations.

Sample Applications

For example, the human resources department of an organization might have


one database of tasks needing to be performed, and another describing the
qualifications and expertise of current staff. The department wishes to assign the
most appropriate employee to each task. In order to make the assignments, it is
necessary to match the tasks against the expertise database. In order to
determine where an employee may best be deployed, the complementary action
of matching of the expertise of the employee against the database of tasks can
be performed. Similar requirements exist in an employment agency, in real-
estate management, and in other information gathering and analysis
applications.

A Detailed Example

Another example that fits this model is the administration of grant applications
by a research body that is responsible for determining which grant applications
should receive funding. A panel of experts has overall responsibility for
recommending applications for funding. In order to do so, it is necessary to
assess each application; accordingly, the panel must assign each application to
an appropriate expert assessor. There are many types of interaction with such a
system. Grant applications are submitted by applicants or by their organizations.
Assessors, possibly from all over the world, must submit their reports and
update their personal details. Members of the panel require full access to
information about applications and assessors. A team of administrators may
need access for general system maintenance or to generate reports.

Matching Information against Needs

There will typically be tens of thousands of assessors and applications covering a


very wide range of research areas. In line with our simple model, a database of
the submitted grant applications and a database of the assessors who may be
approached to review applications must be built. A difficult task facing the panel
of experts is choosing the appropriate assessors to review an application. SIM
can use advanced relevance matching to help with this problem. In this
approach, the text of an application is matched against the expertise of the
potential assessors. Assessors who describe their expertise in terms similar to
those used in an application are likely to be appropriate reviewers. With a single
relevance query, an application can be matched against the complete database
of assessors, and a ranked list of the closest matching assessors returned. The
stronger the correlation between assessor and application, the higher the
assessor is ranked. The panel of experts can then examine this ranked list and
allocate assessors appropriately.
Knowledge Base Requirements

To develop such a knowledge base, the three important requirements are:


A sophisticated content management system with advanced information retrieval
and relevance matching capabilities, Web-based access to accommodate users
that will be geographically dispersed throughout the world and High
performance, including the ability to handle large volumes of data, and the
ability to cope with heavy, peak interactive loads.

SIM and Knowledge Base

SIM technology has been successfully deployed to build knowledge databases.


Our experience has been that the provision of web-based access has meant that
the application is readily available to users, and the use of relevance matching
between databases has led to significantly improved decision making within the
organization.

SIM, the Structured Information Manager, delivers the enabling technology for
the key components of knowledge management: storing knowledge, ensuring
that it can be located, and delivering it to where it's needed, when it's needed.
SIM Metadata Repository Management Solution.

SIM MetaSite is a comprehensive solution for the collection, validation,


classification and searching of metadata. SIM MetaSite forms a metadata
repository which describes a distributed collection of resources, and provides a
powerful browsing and searching interface to that repository.

Both a simple and advanced Web searching interface are provided with the SIM
MetaSite product, to satisfy the needs of the general public or specialized users.
Also, two lower level system interfaces are provided to the repository (one using
http and one using Z39.50) to allow integration of the SIM MetaSite product into
existing environments.

Metadata Repository

SIM MetaSite stores and manages a database of resource metadata. The


resource metadata is stored in a standard XML format (RDF).
The Metadata repository is managed dynamically, and can be updated while
users are querying the system. A web interface is provided for management of
batch operations, and interactive updates, deletes and insertions.

The Metadata repository is searchable and brows able on all Dublin Core fields.
The Metadata repository can also support non-Dublin Core fields in a dynamic
manner – allowing the system to change or evolve as standards change –
without programmer intervention.

SIM MetaSite can handle metadata databases of very large size (>20 Gbytes).
The Metadata repository includes fields for tracking of popularity/usage of
metadata records. This information can be used to improve the visibility of
metadata resources that are visited most often.

In addition to searching on metadata, the Metadata repository is capable of


containing the full text of resources, where searching of full text in combination
with metadata fields is required.

MetaSite User Interface

SIM MetaSite is supplied with a user interface which allows metadata and full
text searching along with thesaurus browsing, all in an easy to use HTML
interface.

The interface allows frames or no-frames, java script or no-java script operation.

The provided user interface is highly configurable, which allows the MetaSite
interface to evolve along with changing or clarified user requirements.
The interface is stateless, yet allows user customization of the operation of the
interface (for example in selection of the thesaurus to be used).

Creation and Loading of Metadata

The MetaSite crawler collects resources from the web in order to build the
metadata repository. Its operation can be controlled in many ways;

• by regular expression for included URLs,


• by regular expression for excluded URLs,
• by Mime type of data to be collected,
• by number of steps that can be followed (depth) from the starting
configuration file URL, and
• by number of steps that can be followed off-site from a valid on-site URL.

The crawler is multi-threaded, and can crawl multiple sites simultaneously.


Delays between requests can be configured in order to reduce load on harvested
sites. The crawler conforms to the ROBOTS.TXT standard for inclusion and
exclusion. The crawler understands the RDF standard, and can follow links
expressed in the RDF standard. Because the crawler is open and configurable,
new data types and document types can be supported. Where data is located
locally, the crawler will not duplicate such data, but will record references to the
local data, thus allowing the repository to be populated without data duplication.

The MetaSite crawler also includes a configurable validation program. This


program is usually run after the data has been collected, in a batch mode. The
validation program checks the RDF data for validity, and can perform operations
such as setting default values for metadata fields according to configurable rules
– i.e. automatic generation of metadata, checking keyword entries against a
central thesaurus, detecting duplicate data, and translating from META tags in
HTML documents into RDF expressed in XML format.

Thesaurus Support
MetaSite allows the management of multiple thesaurus databases. These are
used for validation of metadata records, and for browsing and searching within
the user interface. Where multiple Thesauri are loaded, users can dynamically
choose which thesaurus to use, depending on their preferences.
Thesaurus access is tightly integrated into the user interface, and helps
significantly in targeting user queries, and helps in giving the user a sense of the
overall content of the metadata repository. When the user browses through the
thesaurus, the number of records within the metadata repository that
correspond with each category in the thesaurus are displayed. When the user
conducts a search the search results for each thesaurus category are shown.

Searching accuracy is also enhanced by using the Thesaurus to expand the


user's query to include synonyms for the user's query terms. The query terms
that were included are displayed to the user, and the user can choose to disable
this functionality if they wish.

Thesauri are also used for validation of records during the loading process, for
example for checking that restricted vocabularies are adhered to. This can also
be used to map large vocabularies down to restricted vocabularies using the
"alternate" field in the thesaurus. Thesaurus entries are stored in a standard
XML format, making it easy to export and import new Thesauri. The thesaurus
records are completely dynamically maintainable on-line – through a
administration web interface. The thesaurus databases themselves are fully
accessible via Z39.50, and the schema used for that access is configurable for
the particular site requirements.

Open and InterOperable

A low-level http interface is provided to SIM MetaSite, which allows embedding


of the MetaSite functionality into other web interfaces. The low level API allows
access to most of the searching and presentation functionality of SIM MetaSite.
SIM MetaSite is also fully accessible via Z39.50, and the schemas used for that
access can be configured for the particular site requirements – indeed multiple
Z39.50 Schemas can be used simultaneously for MetaSite access.

SIM Web site content management


SIM is ideally suited to web site content management, especially for web sites
that have a need for;

• Management of structured documents,


• Large data volumes (up to millions of documents),
• Web based workflow and release control, including the ability to preview
changes and additions in place in the web site,
• Tightly integrated searching and table of contents support,
• Media asset management, where multimedia objects are Dublin Core
metadata cataloged and managed as a collected resource for the site.
• Dynamic presentation of documents which allows for customization based
on user needs,
• Hypertext link creation and multimedia object embedding that is
implemented in a completely word-processing package independent
manner, greatly reducing integration costs for new editing packages,
• Hypertext link management that tracks all links, allowing change impact
analysis and easy "what points at me?" checking,
• A choice of editing packages and approaches including MS Word, XML
editors, SGML editors, HTML fill-in form support, and Direct XML editing
through a fill-in form (for administrators!)

You might also like