Natural Language Processing

ASSIGNMENT THREE
Q1. Describe the different levels of knowledge that are necessary for
natural languagae processing.
Solution: There are 5 main types of knowledge representation in Artificial
Intelligence.
1. Meta Knowledge - its knowledge about knowledge and how to gain them
2. Heuristic Knowledge - Representing knowledge of some expert in a field
or
subject.
3. Procedural Knowledge - Gives information/ knowledge about how to
achieve
something
4. Declarative Knowledge - Its about statements that describe a
particular
object and its attributes, including some behavior in relation with it
5. Structural Knowledge - Describes what relationship exists between
concepts/ objects.
Q2. Define Augmented Transition Network.
Solution: An augmented transition network (ATN) is a type of graph
theoretic structure used in the operational definition of formal languages,
used especially in parsing relatively complex natural languages, and having
wide application in artificial intelligence. An ATN can, theoretically, analyze
the structure of any sentence, however complicated.
ATNs build on the idea of using finite state machines (Markov model) to parse
sentences. W. A. Woods in "Transition Network Grammars for Natural
Language Analysis" claims that by adding a recursive mechanism to a finite
state model, parsing can be achieved much more efficiently. Instead of
building an automaton for a particular sentence, a collection of transition
graphs are built. A grammatically correct sentence is parsed by reaching a
final state in any state graph. Transitions between these graphs are simply
subroutine calls from one state to any initial state on any graph in the
network. A sentence is determined to be grammatically correct if a final state
is reached by the last word in the sentence.
This model meets many of the goals set forth by the nature of language in
that it captures the regularities of the language. That is, if there is a process
that operates in a number of environments, the grammar should encapsulate
the process in a single structure. Such encapsulation not only simplifies the
grammar, but has the added bonus of efficiency of operation. Another
advantage of such a model is the ability to postpone decisions. Many
grammars use guessing when an ambiguity comes up. This means that not
enough is yet known about the sentence. By the use of recursion, ATNs solve
this inefficiency by postponing decisions until more is known about a
sentence.
Q6. What are the characteristics of expert system? Write the various
application area of expert system.
Solution: Characteristics of an Expert System
Separates knowledge from control.
Possesses expert task-specific knowledge.

Focuses expertise.
Reasons with symbols (which is what all computation is) - can relate to
original problems of machine translation.
Reasons heuristically (cf. expertise above).
Permits inexact reasoning (not essential).
Limited to solvable problems (or might not terminate procedure).
Expert systems address areas where combinatory is enormous:
Highly interactive or conversational applications, IVR, voice serve.

Fault diagnosis, medical diagnosis.
Decision support in complex systems, process control, interactive user
guide.
Educational and tutorial software.
Logic simulation of machines or systems.
Knowledge management.
Constantly changing software.
They can also be used in software engineering for rapid prototyping

applications (RAD). Indeed, the expert system quickly developed in front of
the expert shows him if the future application should be programmed.
Q7. Explain the rule based system architecture.
Solution: Conventional problem-solving computer programs make use of
well-structured algorithms, data structures, and crisp reasoning strategies to
find solutions. For the difficult problems with which expert systems are
concerned, it may be more useful to employ heuristics: strategies that often
lead to the correct solution, but that also sometimes fail Expert knowledge is
often represented in the form of rules or as data within the computer.
Depending upon the problem requirement, these rules and data can be
recalled to solve problems. Rule-based expert systems have played an
important role in modern intelligent systems and their applications in
strategic goal setting, planning, design, scheduling, fault monitoring,
diagnosis and so on.
A typical rule-based system has four basic components:
A list of rules or rule base, which is a specific type of knowledge base.
An inference engine or semantic reasoner, which infers information or

takes action based on the interaction of input and the rule base. The
interpreter executes a production system program by performing the
following match-resolve-act cycle.
Match: In this first phase, the left-hand sides of all productions are
matched against the contents of working memory. As a result a conflict
set is obtained, which consists of instantiations of all satisfied
productions. An instantiation of a production is an ordered list of
working memory elements that satisfies the left-hand side of the
production.
Conflict-Resolution: In this second phase, one of the production

instantiations in the conflict set is chosen for execution. If no
productions are satisfied, the interpreter halts.
Act: In this third phase, the actions of the production selected in the
conflict-resolution phase are executed. These actions may change
the contents of working memory. At the end of this phase,
execution returns to the first phase.
Temporary working memory.
A user interface or other connection to the outside world through

which input and output signals are received and sent.
Q8. Describe and compare the different types of problems solved by

these expert systems: DENDRAL, MYCIN, PROSPECTOR and R1.
Solution:
DENDRAL
Heuristic Dendral is a program that uses mass spectra or other
experimental data together with knowledge base of chemistry, to
produce a set of possible chemical structures that may be responsible
for producing the data
It is an influential pioneer project in artificial intelligence (AI) of the
1960s, and the computer software expert system that it produced. Its
primary aim was to help organic chemists in identifying unknown
organic molecules, by analyzing their mass spectra and using
knowledge of chemistry. A mass spectrum of a compound is produced
by a mass spectrometer, and is used to determine its molecular
weight, the sum of the masses of its atomic constituents. For example,
the compound water (H2O), has a molecular weight of 18 since
hydrogen has a mass of 1.01 and oxygen 16.00, and its mass spectrum
has a peak at 18 units. Heuristic Dendral would use this input mass
and the knowledge of atomic mass numbers and valence rules, to
determine the possible combinations of atomic constituents whose
mass would add up to 18.As the weight increases and the molecules
become more complex, the number of possible compounds increases
drastically. Thus, a program that is able to reduce this number of
candidate solutions through the process of hypothesis formation is
essential.
MYCIN
MYCIN was an early expert system that used artificial intelligence to
identify bacteria causing severe infections, such as bacteremia and
meningitis, and to recommend antibiotics, with the dosage adjusted for
patient's body weight .The Mycin system was also used for the
diagnosis of blood clotting diseases. MYCIN operated using a fairly
simple inference engine, and a knowledge base of ~600 rules. It would
query the physician running the program via a long series of simple
yes/no or textual questions. At the end, it provided a list of possible
culprit bacteria ranked from high to low based on the probability of
each diagnosis, its confidence in each diagnosis' probability, the
reasoning behind each diagnosis (that is, MYCIN would also list the
questions and rules which led it to rank a diagnosis a particular way),
and its recommended course of drug treatment.
PROSPECTOR
Consultation system to assist geologists working in mineral
exploration
Attempts to represent the knowledge and reasoning processes of
experts in the geological domain
System has been kept domain independent
It matches data from a site against models describing regional and
local characteristics favorable for specific ore deposits
The input data are assumed to be incomplete and uncertain
PROSPECTOR performs a consultation to determine such things as
Which model best fits the data?
Where the most favorable drilling sites are located?
What additional data would be most helpful in reaching firmer
conclusions?
What is the basis for these conclusions and recommendations?
R1
Rule-based system developed by DEC and CMU to configure Vax

computers
Input is customer order
Output is corrected order with diagrams showing component layout

and wiring suggestions
1. Check order for missing/ mismatched pieces

2. Layout processor cabinets
3. Put boxes in input/output cabinets and put components in boxes
4. Put panels in input/output cabinets
5. Layout floor plan
6. Indicate cabling
Q9. What is expert system shell?
Solution: It is a Tool for building an expert system. A software package that
includes an inference engine ,knowledge representation language, user
interface and all the code used by an expert system regardless of the
domain. All we have to do is add the knowledge, i.e., the rules and facts used
by an expert to solve problems in a certain domain.
CLIPS are an example of an expert system shell.
Q10. Give the advantages of expert system architecture based on

decision trees over those of production rules. What are the main
disadvantages?
Solution: Amongst decision support tools, decision trees (and influence
diagrams) have several advantages. Decision trees:
Are simple to understand and interpret. People are able to understand

decision tree models after a brief explanation.
Have value even with little hard data. Important insights can be
generated based on experts describing a situation (its alternatives,
probabilities, and costs) and their preferences for outcomes.
Possible scenarios can be added
Worst, best and expected values can be determined for different
scenarios
Use a white box model. If a given result is provided by a model.
Can be combined with other decision techniques. The following
example uses Net Present Value calculations, PERT 3-point estimations
(decision #1) and a linear distribution of expected outcomes (decision
#2):
Disadvantages of decision trees:
For data including categorical variables with different number of

levels, information gain in decision trees are biased in favor of those
attributes with more levels.
Calculations can get very complex particularly if many values are
uncertain and/or if many outcomes are linked.
Q12. Give an example of the use of Meta knowledge in expert

system inference.
Solution: A specific example of an expert system using Meta knowledge is
PXDES which is a pneumoconiosis, a lung disease, X-ray diagnosis. This
expert system incorporates the inference engine to examine the shadows on
the X-ray. The shadows are used to determine the type and the degree of
pneumoconiosis. This system also includes three other modes: the knowledge
base, the explanation interface, and the knowledge acquisition modes. The
knowledge base mode contains the data of X-ray representations of various
stages of the disease. These elements are in the form of fuzzy production
rules discussed in the previous paragraphs. The explanation interface details
the conclusions based on the Meta knowledge, and the knowledge acquisition
mode allows medical experts to add or change information in the system.
This phase also adds details into the system based on the Meta knowledge.
This system hence uses Meta knowledge to infer.
ASSIGNMENT FOUR
Q1. How does an artificial neural network model the brain? Describe
two major classes of learning paradigms: supervised and
unsupervised learning. What are the features that distinguish these
paradigms from each other?
Solution: In particular, the most basic element of the human brain is a
specific type of cell which, unlike the rest of the body, doesn't appear to
regenerate. Because this type of cell is the only part of the body that isn't
slowly replaced, it is assumed that these cells are what provides us with our
abilities to remember, think, and apply previous experiences to our every
action. These cells, all 100 billion of them, are known as neurons. Each of
these neurons can connect with up to 200,000 other neurons, although 1,000
to 10,000 is typical.
The power of the human mind comes from the sheer numbers of these basic
components and the multiple connections between them. It also comes from
genetic programming and learning.
The individual neurons are complicated. They have a myriad of parts, subsystems, and control mechanisms. They convey information via a host of
electrochemical pathways. There are over one hundred different classes of
neurons, depending on the classification method used. Together these
neurons and their connections form a process which is not binary, not stable,
and not synchronous. In short, it is nothing like the currently available
electronic computers, or even artificial neural networks.
These artificial neural networks try to replicate only the most basic elements
of this complicated, versatile, and powerful organism. They do it in a primitive
way
Supervised learning which incorporates an external teacher, so that each
output unit is told what its desired response to input signals ought to be.
During the learning process global information may be required. Paradigms of
supervised learning include error-correction learning, reinforcement learning
and
stochastic
learning.
An important issue concerning supervised learning is the problem of error
convergence, ie the minimization of error between the desired and computed
unit values. The aim is to determine a set of weights which minimizes the
error. One well-known method, which is common to many learning
paradigms, is the least mean square (LMS) convergence.
Unsupervised learning uses no external teacher and is based upon only local
information. It is also referred to as self-organization, in the sense that it selforganizes data presented to the network and detects their emergent
collective properties. Paradigms of unsupervised learning are Hebbian
learning and competitive learning.
From Human Neurons to Artificial Neuron Esther aspect of learning concerns

the distinction or not of a separate phase, during which the network is
trained, and a subsequent operation phase. We say that a neural network
learns off-line if the learning phase and the operation phase are distinct. A
neural network learns on-line if it learns and operates at the same time.
Usually, supervised learning is performed off-line, whereas unsupervised
learning is performed on-line.
Q2. What is meant by an activation function in an artificial neuron
model? Describe the various activation functions that are employed
and compare their merits and demerits.
Solution: In computational networks, the activation function of a node defines the output of
that node given an input or set of inputs. Most units in neural network transform their
net inputs by using a scalar-to-scalar function called an activation function,

yielding a value called the unit's activation. Except possibly for output units,
the activation value is fed to one or more other units. Activation functions
with a bounded range are often called squashing functions. Some of the
most commonly used activation functions are:
1)
Identity function (Figure 2-2)
It is obvious that the input units use the identity function. Sometimes a
constant is multiplied by the net input to form a linear function.
Figure 2-2 Identity function

2) Binary step function (Figure 2-3)
Also known as threshold function or Heaviside function. The output of this
function is limited to one of the two values:
(2.4)
This kind of function is often used in single layer networks.
Figure 2-3 Binary step function

3) Sigmoid function (Figure 2-4)
(2.5)
This function is especially advantageous for use in neural networks
trained by back-propagation; because it is easy to differentiate, and thus
can dramatically reduce the computation burden for training. It applies to
applications whose desired output values are between 0 and 1.
Figure 2-4 Sigmoid function

4) Bipolar sigmoid function (Figure 2-5)
(2.6)
This function has similar properties with the sigmoid function. It works
well for applications that yield output values in the range of [-1,1].
Figure 2-5 bipolar sigmoid function

Activation functions for the hidden units are needed to introduce nonlinearity into the networks. The reason is that a composition of linear
functions is again a linear function. However, it is the non-linearity (i.e.,
the capability to represent nonlinear functions) that makes multi-layer
networks so powerful. Almost any nonlinear function does the job,
although for back-propagation learning it must be differentiable and it
helps if the function is bounded (see Section 3.4). The sigmoid functions
are the most common choices .
Q3. What is memory based learning? Discuss about it.
Solution: In neural networks, instance-based learning or memory-based
learning is a family of learning algorithms that, instead of performing explicit
generalization, compare new problem instances with instances seen in
training, which have been stored in memory. Instance-based learning is a kind
of lazy learning.
It is called instance-based because it constructs hypotheses directly from the
training instances themselves. This means that the hypothesis complexity
can grow with the data in the worst case, a hypothesis is a list of n training
items and the computational complexity of classification a single new
instance is O(n). One advantage that instance-based learning has over other
methods of machine learning is its ability to adapt its model to previously
unseen data. Where other methods generally require the entire set of training
data to be re-examined when one instance is changed, instance-based
learners may simply store a new instance or throw an old instance away.
A simple example of an instance-based learning algorithm is the k-nearest
neighbor algorithm. Daelemans and Van den Bosch describe variations of this
algorithm for use in natural language processing (NLP), claiming that
memory-based learning is both more psychologically realistic than other
machine-learning schemes and more effective in practice.
Q4. Discuss how neural networks help in solving AI problems. Discuss about different
types of functions of an ANN.
Solution: Neural Network

neural networks are used to solve a wide variety of problems, some of which
have been solved by existing statistical methods, and some of which have
not. These applications fall into one of the following three categories:
Forecasting: predicting one or more quantitative outcomes from both

quantitative and nominal input data,
or
Classification: classifying input data into one of two or more categories,
Statistical pattern recognition: uncovering patterns, typically spatial or

temporal, among a set of variables.
Forecasting, pattern recognition and classification problems are not new.
They existed years before the discovery of neural network solutions in the
1980's. What is new is that neural networks provide a single framework for
solving so many traditional problems and, in some cases, extend the range of
problems that can be solved.
Traditionally, these problems were solved using a variety of widely known
statistical methods:
linear regression and general least squares,
logistic regression and discrimination,
principal component analysis,
discriminant analysis,
k-nearest neighbor classification, and
ARMA and NARMA time series forecasts.
In many cases, simple neural network configurations yield the same solution
as many traditional statistical applications. For example, a single-layer,
feedforward neural network with linear activation for its output perceptron is
equivalent to a general linear regression fit. Neural networks can provide
more accurate and robust solutions for problems where traditional methods
do not completely apply.
A neural network is defined not only by its architecture and flow, or
interconnections, but also by computations used to transmit information from
one node or input to another node. These computations are determined by
network weights. The process of fitting a network to existing data to

determine these weights is referred to as training the network, and the data
used in this process are referred to as patterns. Individual network inputs are
referred to as attributes and outputs are referred to as classes. The table
below lists terms used to describe neural networks that are synonymous to
common statistical terminology.
Types of functions:
Step Function
A step function is a function like that used by the original Perceptron. The
output is a certain value, A1, if the input sum is above a certain threshold and
A0 if the input sum is below a certain threshold. The values used by the
Perceptron were A1 = 1 and A0 = 0.
These kinds of step activation functions are useful for binary classification
schemes
Linear combination
A linear combination is where the weighted sum input of the neuron plus a
linearly dependant bias becomes the system output. Specifically:
In these cases, the sign of the output is considered to be equivalent to the 1
or 0 of the step function systems, which enables the two methods, be to
equivalent if
Continuous Log-Sigmoid Function
A log-sigmoid function, also known as a logistic function, is given by the
relationship:
Where is a slope parameter. This is called the log-sigmoid because a

sigmoid can also be constructed using the hyperbolic tangent function
instead of this relation, in which case it would be called a tan-sigmoid. Here,

we will refer to the log-sigmoid as simply sigmoid
Softmax Function
The softmax activation function is useful predominantly in the output layer of
a clustering system. Softmax functions convert a raw value into a posterior
probability. This provides a measure of certainty. The softmax activation
function is given as:
L is the set of neurons in the output layer.

Natural Language Processing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Natural Language Processing

Uploaded by

Copyright:

Available Formats

ASSIGNMENT THREE

Separates knowledge from control.

Possesses expert task-specific knowledge.

Expert systems address areas where combinatory is enormous:

Highly interactive or conversational applications, IVR, voice serve.

They can also be used in software engineering for rapid prototyping

A list of rules or rule base, which is a specific type of knowledge base.

An inference engine or semantic reasoner, which infers information or

Conflict-Resolution: In this second phase, one of the production

Temporary working memory.

A user interface or other connection to the outside world through

Q8. Describe and compare the different types of problems solved by

Rule-based system developed by DEC and CMU to configure Vax

Output is corrected order with diagrams showing component layout

1. Check order for missing/ mismatched pieces

Q10. Give the advantages of expert system architecture based on

Are simple to understand and interpret. People are able to understand

Disadvantages of decision trees:

For data including categorical variables with different number of

Q12. Give an example of the use of Meta knowledge in expert

From Human Neurons to Artificial Neuron Esther aspect of learning concerns

net inputs by using a scalar-to-scalar function called an activation function,

Identity function (Figure 2-2)

Figure 2-2 Identity function

Figure 2-3 Binary step function

Figure 2-4 Sigmoid function

Figure 2-5 bipolar sigmoid function

Solution: Neural Network

Forecasting: predicting one or more quantitative outcomes from both

Classification: classifying input data into one of two or more categories,

Statistical pattern recognition: uncovering patterns, typically spatial or

linear regression and general least squares,

logistic regression and discrimination,

principal component analysis,

k-nearest neighbor classification, and

ARMA and NARMA time series forecasts.

network weights. The process of fitting a network to existing data to

Where is a slope parameter. This is called the log-sigmoid because a

instead of this relation, in which case it would be called a tan-sigmoid. Here,

L is the set of neurons in the output layer.

You might also like