You are on page 1of 16

Automated Summarization of Restaurant Reviews

Manoj Pawar, Deepak Mallya


{mpawar,dmallya}@stanford.edu
June 4th, 2010
Introduction
Websites like Yelp.com or we8there.com and other restaurants reviews
websites nowadays have lots of user reviews. Numbers of reviews for each
restaurant are increasing day by day because of accessibility of Internet
everywhere. For each restaurant normally more than 50 reviews are
available. Websites like yelp.com have 5 star rating system along with
reviews but it hardly helps for users to understand what is great and what is
bad for this restaurant. It is also nearly impossible for someone to read all
those reviews. Lot of people wants to make quick decisions about
restaurants for dining or lunch. Reading through all the reviews is not only
cumbersome; it can confuse lot of people about overall positive or negative
sentiments about specific features.

Objective
In this project, we are thinking of providing for each restaurant a set of
features from all available reviews. These features can be anything from
general features like “food”, “ambience” to very specific features like
“Caesar salad”. We are also going to classify them under either Good or Bad
from positive or negative sentiments from reviews.

Overview
We view solution to this problem as 3 steps.
1.feature extraction 2.Sentiment Analysis 3.Classification.

Sentiment
Feature Sentiment
Classificati
ExtractionsFood Analysis
on
Our first
task is to extract important features about the restaurant. Taking out features
like “food”, “ambience”, “staff” etc. is the important part of whole process
of summarization. Once we retrieve these features from reviews we need to
analyze each sentence in which these features are being talked about. Taking
out important sentiment from the sentence and giving it positive or negative
class is second step of the process. Once we are done with each feature and
their sentiments, we need to summarize on these features, classification will
help users to understand what is overall sentiment on a particular feature.

Related Work
Recently many ideas have been proposed on automatic summarization of
reviews. Hu and Liu [1] talked about extracting frequent features and
bootstrapping techniques to analyze sentiments. Popescu and Etzioni [2]
describe relaxation modeling for product features semantics. They have
introduce OPINE system for feature extractions and associating opinions to
these features. Pang and Lee [3] have discussed various different
classification and sentiment analysis method in opining mining. We have
taken simple approach of finding frequents nouns and adjectives from
reviews which gives us a very good idea about features and sentiments on
that. Combination of nouns and adjectives and their pair count helps us to
prune unwanted features. Bootstrapping and WordNet expansions of positive
and negative sentiments along with part of speech structure of the sentence
have helped us in sentiment classification.
Data Collection
We have used reviews provided by Prof Andrew Ng’s lab for our
experiments. We have worked on data of 196 restaurants with a total number
of 99693 reviews. These reviews are from we8there.com. These reviews are
in the following form.

Example:
<Restaurant>
<id> 595 </id>
<Review>
<overallRating> 4.0 </overallRating>
<foodRating> 5.0 </foodRating>
<ambianceRating> 3.0 </ambianceRating>
<serviceRating> 4.0 </serviceRating>
<noiseRating> 2.0 </noiseRating>
<text> Food and service is always top notch at Vinny's - We both had brunch
and it was very well prepared and the service is always attentive, but not
overbearing - Best value in the Windward area in our book </text>
</Review>
..
</Restaurant>

We wrote initial script to retrieve reviews text for each restaurant and
extracted sentences from these reviews. These sentences were then used for
feature extraction and analysis. We found that some of the reviews text
doesn’t have sentence boundaries so we use delimiter like comma,
punctuation mark and hyphen for sentence extractions for further analysis.
As you can see every review has overall rating, food rating etc., but we have
not used these rating, we have just analyzed text part of each review.
Feature Extraction
We extracted feature from 99693 available reviews. For all available
reviews, we ran Stanford POS tagger to tag each word in the reviews with
POS tags. Following is an example of the tagged text for a sentence.
Example Sentence
Food and service is always top notch at Vinny's.

POS tagged Sentence:


Food_NNP and_CC service_NN is_VBZ always_RB top_JJ notch_NN
at_IN Vinny_NNP ._.

Our intuition is that most of the features for the restaurants will be noun (NN
and NNS). We collected counts for each noun on all the available 99693
reviews and sorted them according to their number of occurrences. This
worked pretty well for us, as we were able to get most talked features about
restaurants.
We wrote a Perl script to parse this file and generate a count of all features
for every possible POS Tag. Here are two lines of this file with counts for
NN and JJ for a few features.

Per line we have POStag->Word,WordCount


NN-> food,51179 service,32587 restaurant,20863 time,16012
experience,15294 menu,13775 dinner,11099 place,11097 table,10279
wine,10229 meal,8866 staff,8565 dining,6943 waiter,6738 night,6154
atmosphere,6053 server,5932 everything,5042 dessert,5000 bit,4879
evening,4860 steak,4763 reservation,4698 bar,4530 lunch,4363 salad,4228
ambiance,4091 list,3887 price,3763

JJ-> great,31836 good,30915 excellent,17761 wonderful,9861 nice,9808


special,8322 delicious,7846 outstanding,6443 friendly,5998 attentive,5906
first,5797 little,5678 other,5537 amazing,4391 fantastic,4294 small,4160
perfect,3958 . . .
We extracted top 200 features according to most frequent words from POS
tags "NN" and "NNS". So as we can see from above most people talk in
general about feature "food", "service", "restaurant", "time", "experience"
etc. a lot while commenting in reviews. Also we saw that “restaurant” itself
is a feature, which tells that people talk in general about the restaurant a lot.
Other features are specific to "food", "service", "experience" and what
people commented about a restaurant.

Bigram Feature Extraction


There is a chance that some features can be multiple word, approach
described in previous section won’t be able to handle it. We decided to
separately extract bigram features from POS tagged reviews. We look for
consecutive NN and NNS as bigram independently and take bigrams, which
occur frequently with each other. By doing this we were able to extract
features like “table service”, “wine menu”, “caesar salad” and “dining
experience”. We combined bigram features and unigram features from
previous section with each other to make rich set of features for restaurants.

Automatic Feature Pruning


As we are collecting frequent nouns, we get some high frequency nouns like
“wife”, “husband” etc. But in our approach, while doing sentiment analysis
on each sentences, we check for adjectives associated with these features,
which helps us to prune this features. As these high frequency nouns, which
are not actual feature, wont be having any adjectives associated with them
most of the time. So ratio of adjectives with these features is pretty low. This
indicates that these features are not been opinioned in the reviews. So these
features get pruned automatically at later stage of our approach.
Example:
Feature “wife” - my wife gave it her greatest compliment.
nsubj(“wife”, ”gave”);
As you can see “gave” wont be on our top adjective list. Feature “wife” will
be dropped at sentiment analysis phase.
Sentence Pruning
Once we have most frequent features of restaurants, we do sentence pruning
from reviews text and only extract those sentences, which contain any of the
features from our list of frequent noun features. The motivation is that
reviews contain a lot of sentences, which don't talk about opinions on any
feature of a restaurant. We prune such sentences and only use those
sentences, which talk about any features that we have extracted from our
feature extraction step. We only store those sentences and the feature labels
for these sentences and pass it to the Type Dependency Parser.

Example:
Actual review:
I have forgotten his first name but his last name is Frankel. Our food was
delicious. The restaurant became noisier as the evening progressed but
because we were in one of the intimate side sections, the noise never became
overwhelming.

Data after Sentence Pruning:


food => Our food was delicious.
restaurant => The restaurant became noisier as the evening progressed but
because we were in one of the intimate side sections, the noise never became
overwhelming.

As you can see we have pruned first sentence, as it doesn’t contain in any
extracted feature.

Type Dependencies on Pruned Set of Sentences


We use the pruned set of sentences for each restaurant that contain the
features from the Sentence Pruning step and run type dependency parser on
these sentences for each Restaurant. We have used Stanford Parser to output
type dependency format for each sentence. Stanford parser uses dot,
exclamation mark and hyphen as a sentence boundary. We have used same
boundaries for extracting sentences from reviews. We store type dependency
for each sentence.
Example:
The type dependency generated for a sentence below is as follows
we run it as follows using LexicalParser and output only the dependencies.
java -mx1200m -cp "$scriptdir/stanford-parser.jar:"
edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat
"typedDependencies" $scriptdir/englishPCFG.ser.gz $*

Feature => Sentence pair:


staff => staff was super attentive.
food => the food was brilliant.
ambience => the ambiance and decor excellent.

Ouput of the parser.


nsubj(attentive-4, staff-1)
cop(attentive-4, was-2)
amod(attentive-4, super-3)
det(food-7, the-6)
nsubj(brilliant-9, food-7)
cop(brilliant-9, was-8)
ccomp(attentive-4, brilliant-9)
det(ambiance-12, the-11)
dep(excellent-15, ambiance-12)
conj_and(ambiance-12, decor-14)
dep(excellent-15, decor-14)
ccomp(attentive-4, excellent-15)

We generate separate files(196 files) for each Restaurant with all sentences,
which contain features. They are followed by the Type Dependencies as in
the above format. We call this the dependencies file for each Restaurant.
Once we have these dependencies file we extract the opinions about each
feature by running our SentenceAnalyser script.

Opinion Extraction
We have implemented a Perl script that performs the Opinion Extraction
from the dependencies file for each restaurant. We have automated this task
by having a single script that calls our SentenceAnalyser on each of the
Restaurant dependency files. The script takes three input files and produces
one output file. The first input file is the Restaurant review sentences and the
second file is the feature for each sentence and third file is the Type
dependencies (dependencies file) for each Sentence. The input files have one
to one correspondence on line level.

Restaurant1FeaturesFile Line 1 -> Restaurant1ReviewSentencesFile Line 1


Restaurant1FeaturesFile Line 2-> Restaurant1ReviewSentencesFile Line 2
...
Restaurant1FeaturesFile Line N -> Restaurant1ReviewSentencesFile Line
N
we use the three important type dependencies in doing opinion word
extraction for given feature.

1) nsubj : nominal subject


A nominal subject is a noun phrase which is the syntactic subject of a clause.
The governor of this relation might not always be a verb: when the verb is a
copular verb, the root of the clause is the complement of the copular verb.
Example:-
“Clinton defeated Dole” nsubj(defeated, Clinton)
“The baby is cute” nsubj(cute, baby)

2) amod: adjectival modifier


An adjectival modifier of an NP is any adjectival phrase that serves to
modify the meaning of the NP.
Example
“Sam eats red meat” amod(meat, red)
3) neg: negation modifier
The negation modifier is the relation between a negation word and the word
it modifies.
Example
“Bill is not a scientist” neg(scientist, not)
“Bill doesn’t drive” neg(drive, n’t)

The SentenceAnalyser takes the dependencies files and extracts "nsubj",


"amod" and "neg" tags and interprets them to extract opinions of a feature.
For example consider the sentence below.
"staff was super attentive , the food was brilliant , the ambiance and decor
excellent."
We have already extracted the features like "staff", "food" and "ambiance"
from our feature extraction step. We then look for these features in the nsubj,
amod and neg tags to obtain opinions. For example from the dependencies
above we have only "nsubj" and "amod" tags to identify opinions about the
sentence.
nsubj(attentive-4, staff-1)
amod(attentive-4, super-3)
nsubj(brilliant-9, food-7)
We report opinions as follows for each sentence and store the results in
another file called summaryFile per Restaurants.
"staff" -> "super","attentive"
"food" -> "brilliant"
Even though we identified "staff", "food", "ambiance" and "decor" as
features we could get opinion information only about staff and food looking
at the dependencies.
In the above example, we identified "attentive" as well as "super" to be
opinion words for "staff". We do that by looking at if any of the features has
an nsubj and if so we get the opinion identifier and see if there is any other
adjective modifying the opinion adjective, which in the above case is
“super”, which is modifying "attentive". Here even though "super" seems to
be an intensifying the opinion "attentive" we currently report it as an
"opinion".
We also try to identify negation of opinion words, which will invert the
polarity of an opinion.

Example
"we both felt that the clam chowder broth was really thin and not as creamy
and thick as previous trips"
nsubj(felt-3, we-1)
nsubj(thin-11, broth-8)
nsubj(creamy-15, broth-8)
advmod(creamy-15, as-14)
ccomp(felt-3, creamy-15)
amod(trips-20, previous-19)
prep_as(thin-11, trips-20)

So we extract "broth" and "chowder" as features and identify the opinion


words from the type dependencies below
nsubj(thin-11, broth-8)
neg(creamy-15, not-13)
We identify the opinion words for "broth" as "~creamy" and "thin" as shown
below. We have put a tilde "~" before the opinion word to signify a negation
in polarity.
"broth"->~creamy,thin
We did not get any opinion about "chowder" from the type dependencies.

The Algorithm for the Sentence Analyzer using Type Dependencies is as


follows
O= Opinion Word F= Feature Word I= Intensifier S= Sentence

(F,S)

NO
nsubj(O, amod(O,F
F) )
Yes
Yes

amod(O,I)

Yes

NO
neg
print(O)
(O) Yes

print(~O)

Sentiment Analyzer
Bootstrapping and Expansion using WordNet
We extracted frequent adjectives(JJ), which were generated by the POS
tagger on all the 99693 reviews. Our intuition is most of the opinion words
are adjectives. We labeled 200 words from this extracted frequent adjective
list as good and 200 words as bad. Since these 200 words don’t cover all
opinion words we decided to automatically expand list by using WordNet.
We used Java(JAWS) API of WordNet for expansion problem. We have
generated synsets of for these good and bad words and took only synsets of
these words, which are of the form AdjectiveSatellite. For each of these
synsets we have extracted all the wordForms and produced the expansion set
of good and bad words.

Here is an example of a good and bad word expansion

For opinion Word "excellent"


excellent, first-class, fantabulous, splendid

For opinion Word "elegant"


elegant, graceful, refined

For opinion Word "disappointed"


disappointed, defeated, discomfited, foiled, frustrated, thwarted

For opinion Word "worst"


worst, bad, big, tough, spoiled, spoilt, uncollectible, risky, high-risk,
speculative, unfit, unsound, forged, defective

Once we have this list of Good and Bad opinion words. Making decision on
feature given an opinion word is trivial.
Example:
"staff" -> "super","attentive"
As “super” and “attentive” are both present in our good opinion list, we
increase the count of good for staff.
staff(good=2,bad=0)

Sentiment Classifier
We maintain count of good and bad adjectives for each feature and in the
end depending on count of good opinion vs. bad opinions we assign a class
for feature.

For a given Restaurant


∃ Feature F ∑opinions⊆good > ∑opinions⊆bad = GOOD (F)
else ∑opinions⊆good < ∑opinions⊆bad = BAD (F)

Example:
Restaurant id “595”
food(good:8, bad:0)
service(good:3, bad:0)
breakfast(good:0, bad:1)

So for above example “food” and “service” will get overall sentiment of
good from all the reviews while “breakfast” will go under bad sentiments.
GOOD BAD
Food Breakfast
Service

Error Analysis
As you can see we found out that not all noun can be taken as features. So
nouns like ‘something’ and ‘nothing’ etc. were also produced as output of
feature extraction. We took out top features with extremely high frequency;
this process was able to prune some words, which were not features. But still
there are some unwanted word nouns, which occur frequently.

Here is a list of few sentences, which have a "neg" type dependency, which
helps in inverting polarity. "~" symbol as described earlier inverts the
polarity of the opinion Word.
The format below is in the form
"feature"->"opinion Words"->"sentence".
These are some correct examples.
1) waiter->delightful,~attentive,-> waiter was delightful but
not especially attentive.
2) food->~disappointment,->every time i come here the presentation
and food are never a disappointment.
3) wine->~good->the wine was n't good and the `` wine expert '' lead
us astray.
4) atmosphere->~stuffy,first,class,->the building is historic and the
restaurant's atmosphere is first class but not stuffy.
5) cobbler->~best,->my favorite dessert was the creme brulee but the
mississippi mud pie and the apple cobbler were not the best.
6) grill->~disappoint,->as per usual water grill did not disappoint.
7) food->~fabulous,->service was very good -food was not as fabulous as I
remember.

We found following errors at type dependencies step.

1) service->~attentive,->service was attentive but not over-bearing .


We get an Error here saying that the service is not attentive which is wrong.
This is because the neg type dependencies for this sentence is
nsubj(attentive-3, service-1)
neg(attentive-3, not-5)

Some other sentences with this problem are shown below


2) service->~prompt,->our service was prompt but not professional .
3) place->~crowded,->the place was pretty crowded but not so
overwhelmingly packed so we still had privacy .
4) service->wonderful,unpretentious,~intrusive,->our service was
wonderful - attentive not intrusive and unpretentious .
Here we get an error on unpretentious since it's polarity is not inverted. This
is difficult because there is no "not" before unpretentious but the reviewer
actually means not intrusive and not unpretentious.

Evaluation for Opinion Extraction for Features

We evaluated the opinion's by taking 2 main features "food" and "service"


of restaurant and evaluating them manually and comparing with our system.

Feature "Food"

Number of Sentences containing "Food" feature = 42.


Number of Sentences identifying correct opinion for "Food" feature = 32
Percentage Correct = 32/42 = 76.19%

Feature "service"
Number of Sentences containing "service" feature = 47.
Number of Sentences identifying correct opinion for "service" feature = 36
Percentage Correct = 36/47 = 76.59%

Evaluation for Feature Extraction


We evaluated feature extraction by manually looking at sentences for a
sample of 50 sentences in a restaurant and determined how many features
we missed.

Number of Sentences = 50
Total Features Automatically Extracted Correctly = 68
Total Features Automatically Extracted InCorrectly = 27
Total Features Not Extracted(checked manually) = 10

Precision = 68/95 = 71.57%


Recall = 68/78 = 87.17%

Results:
Restaurant Id: 10381
GOOD BAD
amazing food:11.0 dark bit:4.0

marvelous service:6.0 small beef hash:2.0

great lunch:2.0 other restaurant:1.0

great place:2.0 horrible seating:1.0

friendly ambiance:2.0

great wine list:2.0

unbelievable experience:2.0

superb wait staff:1.0

Restaurant Id: 10636


GOOD BAD

amazing food:30.0 cold lobster soup:1.0

decent service:20.0 slow kitchen:1.0

great experience:8.0 small table:1.0

amazing steak:5.0
satisfied dining experience:4.0

solid wine list:4.0

great time:3.0

professional reservation
process:2.0

refreshing ambiance:2.0

impressive atmosphere:2.0

Conclusions and Future Work


We found combination of NN or NNS with adjectives in the sentence gives
you more accurate feature and opinion mining. We were able to achieve
better precision and recall because of WordNet expansion on opinion
words.
Since our problem definition is little bit different from most of reviews
summarization approaches, we found out that our approach need long
reviews and complete natural sentences to perform well.
In future work, I think if someone needs less features, we can classify
various features under some big feature classes. Features like "food", "drink"
and "salad" can go under bigger class like "Food", and features like "waiter",
"waitress" and "server" can go under "staff" feature class. This work will
basically include feature classification.
In future work, we also plan to use machine-learning classifier for semantic
classifier.
Also we will also like to work on extending this strategy for digital world
product review summarizer.

Contributions
Initially, Manoj worked on converting data from protocol buffer storage
system to xml format. Deepak worked on writing scripts for running
Stanford POS tagger and Type dependency Parser. He also worked on
feature extraction and opinion extraction, WordNet expansion. Manoj
worked on Sentence pruning, Semantic analyzer and classification. We both
worked on doing error analysis and generating result and writing this
document.

References
1. Mining Opinion Features in Customer Reviews" by Hu etc.
2. Extracting Product Features and Opinions from Reviews, Popescu etc.
3. Opinion mining and sentiment analysis Bo Pang and Lillian Lee
4. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up?
Sentiment classification using machine learning techniques
5.Bo Pang and Lillian Lee, A Sentimental Education: Sentiment Analysis
Using Subjectivity Summarization Based on Minimum Cuts
6. Stanford POS tagger http://nlp.stanford.edu/software/tagger.shtml
7. WordNet lexical database for the English language. Site:
http://WordNet.princeton.edu
8.We8there.com
9.Yelp.com

You might also like