You are on page 1of 4

Dependency Parsing using Neural Network Classifier

Objective
This project aims to develop a data driven dependency parser and improve the accuracy of state of the art
dependency parser for Telugu. Current advancements in learning methods improved the sequence
learning accuracy, here we are going to use a Recurrent Neural Network variant called Long Short Term
Memory (LSTM) as Classifier which can improve the overall accuracy of the Parser.

Technical Details
Dependency parsing is done by generating a tree, with nodes denoting the words of the sentence and the
directed links denoting the relationship with the other words in the sentence. These links have a label
which denotes the type of relation between the words.

Transition based Dependency Parser consists of two main components 1. Transition System 2. Oracle

1. Transition system is a quadruple S = (C, T, cs, Ct) which maps sentence to its graph
Where
C is a set of configurations, each of which contains a buffer of (remaining) nodes and a set A of
dependency arcs, and stack which contain the current processing nodes.
T is a set of transitions (Left Arc, Right Arc, Reduce, Shift), each of which is a function t: C C
cs is the initial configuration of the system.
Ct is the set of terminal configurations of the system. Ct = {c C|c= (,[ ],A)}.

2. Oracle
An oracle for a transition system S = (C, T, cs, Ct) is a function o: C T.
Given a transition system S = (C, T, cs, Ct) and an oracle o, deterministic parsing can be achieved by the
following simple algorithm:
PARSE(x = (w0, w1, ..., wn))
1 c cs(x)
2 while c Ct
3 c o(c)
4 return Gc //dependency graph with configuration c
Existing Malt parser uses Support Vector Machine as oracle (classifier) which is a two class classifier
and the output of one element is independent of the previous element. Since we have to deal with
sequences, a classifier whose output is dependent on the previous history might be helpful and more
accurate.

Recurrent Neural Network :: RNNs are the networks having the loops in them allowing information to
persist.

'A' takes input xt and outputs the value ht, the loop allows the information to pass from one step to next.
A RNN can be thought of as multiple copies of the same network each passing message to the successor.
RNNs connect the previous information to the current task, sometimes we only look recent information to
perform the present task. But there are also cases where we need more context (long term dependencies).
RNNs also have the problem of Vanishing gradient or exploding gradient, because error signals flowing
backwards tend to blow up or vanish.
LSTMs are designed explicitly to avoid long term dependency problem, remembering information for
long periods of time.
Standard RNNs repeating module will have very simple single tanh layer.

LSTMs have this chain like structure but have different structure in the place of single tanh function, there
are four functions interacting in very different way.
The key to LSTMs is the cell state, the horizontal line running through the top of the diagram. It runs
straight down the entire chain, with only some minor linear interactions. The LSTM have the ability to
remove or add information to the cell state, regulated by structures called gates.

The first step in our LSTM is to decide what information were going to throw away from the cell state.
This decision is made by a sigmoid layer called the forget gate layer. It looks at ht-1 and xt and outputs a
number between 0 and 1 for each number in cellstate Ct-1. A 1 represent completely keep this, 0 represents
completely get rid of this.

The next step is to decide what new information were going to store in the cell state. This has two parts.
First, a sigmoid layer called the input gate layer decides which values well update. Next, a tanh layer
creates a vector of new candidate values t that could be added to the state. In the next step, well
combine these two to create an update to the state.

We now update the old cellstate Ct-1 to Ct.


We multiply the old state with ft, forgetting the things we decided to forget earlier. Then we add it*t.
This is the new value scaled by how much we decided to update each state value.

Finally, we need to decide what were going to output. This output will be based on our cell state, but will
be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state were going
to output. Then, we put the cell state through tanh (to push the values to be between -1 and 1) and
multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.

Advantages of Dependency Parsing


1. Free word order languages can be handled better using dependency based framework.

2. Parsing can be useful in major NLP applications like Machine Translation, Dialogue System, Question
Answering, etc.

References
1.Jaokim Nivre .2008
Algorithms For Deterministic Incremental Dependency Parsing

You might also like