You are on page 1of 6

H I S T O R COMPUTING

AFFECTIVE I E S A N D AND
F U SENTIMENT
T U R E S ANALYSIS
Editor: Erik cambria, Nanyang Technological University, Singapore, cambria@ntu.edu.sg

Deep Learning-Based
Document Modeling
for Personality
Detection from Text
Navonil Majumder, Instituto Politcnico Nacional
Soujanya Poria, Nanyang Technological University
Alexander Gelbukh, Instituto Politcnico Nacional
Erik Cambria, Nanyang Technological University

P
ersonality is a combination of an individuals Openness (OPN). Is the person inventive and cu-
behavior, emotion, motivation, and thought- rious versus dogmatic and cautious?
pattern characteristics. Our personality has great im-
Texts often reflect various aspects of the au-
pact on our lives; it affects our life choices, well-being, thors personality. In this article, we present a
health, and numerous other preferences. Automatic method to extract personality traits from stream-
detection of a persons personality traits has many of-consciousness essays using a convolutional
important practical applications. In the context of neural network (CNN). We trained five different
sentiment analysis,1 for example, the products and networks, all with the same architecture, for the
services recommended to a person should be those five personality traits (see the Previous Work in
that have been positively evaluated by other users Personality Detection sidebar for more informa-
with a similar personality type. Personality detection tion). Each network was a binary classifier that
can also be exploited for word polarity disambigua- predicted the corresponding trait to be positive or
tion in sentiment lexicons,2 as the same concept can negative.
convey different polarity to different types of people. To this end, we developed a novel document-
In mental health diagnosis, certain diagnoses cor- modeling technique based on a CNN features ex-
relate with certain personality traits. In forensics, tractor. Namely, we fed sentences from the essays to
knowing personality traits helps reduce the circle of convolution filters to obtain the sentence model in
suspects. In human resources management, person- the form of n-gram feature vectors. We represented
ality traits affect ones suitability for certain jobs. each individual essay by aggregating the vectors of
Personality is typically formally described in its sentences. We concatenated the obtained vectors
terms of the Big Five personality traits, 3 which are with the Mairesse features,4 which were extracted
the following binary (yes/no) values: from the texts directly at the preprocessing stage;
this improved the methods performance. Discard-
Extroversion (EXT). Is the person outgoing, talk- ing emotionally neutral input sentences from the es-
ative, and energetic versus reserved and solitary? says further improved the results.
Neuroticism (NEU). Is the person sensitive and For final classification, we fed this document vec-
nervous versus secure and confident? tor into a fully connected neural network with one
Agreeableness (AGR). Is the person trustworthy, hidden layer. Our results outperformed the current
straightforward, generous, and modest versus state of the art for all five traits. Our implementa-
unreliable, complicated, meager, and boastful? tion is publicly available and can be downloaded
Conscientiousness (CON). Is the person effi- freely for research purposes (see http://github.com
cient and organized versus sloppy and careless? /senticnet/personality-detection).

74 1541-1672/17/$33.00 2017 IEEE iEEE iNTElliGENT SYSTEmS


Published by the IEEE Computer Society
Previous Work in Personality Detection

T
he Big Five, also known as the Five Factor Model, is the References
most widely accepted model of personality. Initially, it 1. E. Tupes and R. Christal, Recurrent Personality Factors Based
was developed by several independent groups of re- on Trait Ratings, tech. report ASD-TR-61-97, Lackland Air Force
searchers. However, it was advanced by Ernest Tupes and Ray- Base, 1961.
mond Christal1; J.M. Digman made further advancements,2 2. J. Digman, Personality Structure: Emergence of the Five-
and Lewis Goldberg later perfected it.3 Factor Model, Ann. Rev. Psychology, vol. 41, no. 1, 1990,
pp. 417440.
Some earlier work on automated personality detection from
3. L. Goldberg, The Structure of Phenotypic Personality Traits,
plain text was done by James Pennebaker and Laura King,4 Am. Psychologist, vol. 48, no. 1, 1993, pp. 2634.
who compiled the essay dataset that we used in our experi- 4. J.W. Pennebaker and L.A. King, Linguistic Styles: Language
ments (see http://web.archive.org/web/20160519045708/http:// Use as an Individual Difference, J. Personality and Social Psy-
mypersonality.org/wiki/doku.php?id=wcpr13). For this, they chology, vol. 77, no. 6, 1999, pp. 12961312.
collected stream-of-consciousness essays written by volunteers 5. J.W. Pennebaker, R.J. Booth, and M.E. Francis, Linguistic In-
in a controlled environment and then asked the authors of the quiry and Word Count: LIWC2007, operators manual, 2007.
essays to define their own Big Five personality traits. They used 6. F. Mairesse et al., Using Linguistic Cues for the Automatic Rec-
Linguistic Inquiry and Word Count (LIWC) features to deter- ognition of Personality in Conversation and Text, J. Artificial
Intelligence Research, vol. 30, 2007, pp. 457500.
mine correlation between the essay and personality.5
7. S.M. Mohammad and S. Kiritchenko, Using Hashtags to Cap-
Franois Mairesse and colleagues used, in addition to ture Fine Emotion Categories from Tweets, Computational In-
LIWC, other features, such as imageability, to improve per- telligence, vol. 31, no. 2, 2015, pp. 301326.
formance.6 Saif Mohammad and Svetlana Kiritchenko per- 8. F. Liu, J. Perez, and S. Nowson, A Language-Independent and
formed a thorough study on this essays dataset, as well as Compositional Model for Personality Trait Recognition from
the MyPersonality Facebook status dataset, by applying dif- Short Texts, Computing Research Repository (CoRR), 2016;
ferent combinations of feature sets to outperform Mairesses http://arxiv.org/abs/1610.04345.
results, which they called the Mairesse baseline.7 9. S. Poria et al., A Deeper Look into Sarcastic Tweets using Deep
Recently, Fei Liu and colleagues developed a language- Convolutional Neural Networks, Proc. 26th Intl Conf. Compu-
tational Linguistics, 2016, pp. 16011612.
independent and compositional model for personality trait
10. S. Poria, E. Cambria, and A. Gelbukh, Aspect Extraction for
recognition for short tweets.8 Opinion Mining with a Deep Convolutional Neural Network,
On the other hand, researchers have successfully used Knowledge-Based Systems, vol. 108, 2016, pp. 4249.
deep convolutional networks for related tasks such as senti- 11. S. Poria et al., Convolutional MKL Based Multimodal Emotion
ment analysis,9 aspect extraction,10 and multimodal emotion Recognition and Sentiment Analysis, Proc. IEEE Intl Conf.
recognition.11 Data Mining, 2016, pp. 439448.

Overview of the Method ing and unification, such as reduction with the word2vec embeddings. 5
Our method includes input data pre- to lowercase. This gives a variable-length feature
processing and filtering, feature ex- Document-level feature extraction. set for the document: the document
traction, and classification. We use two We used the Mairesse baseline fea- is represented as a variable number
types of features: a fixed number of ture set, which includes such global of sentences, which are represented
document-level stylistic features, and features as the word count and av- as a variable number of fixed-length
per-word semantic features that are erage sentence length. word feature vectors.
combined into a variable-length repre- Filtering. Some sentences in an es- Classification. For classification, we
sentation of the input text. This vari- say may not carry any personality use a deep CNN. Its initial layers pro-
able-length representation is fed into a clues. Such sentences can be ignored cess the text in a hierarchical man-
CNN, where it is processed in a hier- in semantic feature extraction for ner. Each word is represented in the
archical manner by combining words two reasons: first, they represent input as a fixed-length feature vector
into n-grams, n-grams into sentences, noise that reduces the classifiers using word2vec, and sentences are
and sentences into a whole document. performance, and second, removal represented as a variable number of
The obtained values are then com- of those sentences considerably re- word vectors. At some layer, this vari-
bined with the document-level stylistic duces the input size, and thus the able-length vector is reduced to fixed-
features to form the document repre- training time, without negatively length vector of each sentence, which
sentation used for final classification. affecting the results. So, we remove is a kind of sentence embedding in a
Specifically, our method includes the such sentences before the next step. continuous vector space. At that level,
following steps: Word-level feature extraction. We documents are represented as a vari-
represent individual words by word able number of such fixed-length sen-
Preprocessing. This includes sen- embedding in a continuous vector tence embeddings. Finally, at a deeper
tence splitting as well as data clean- space; specifically, we experimented layer, this variable-length document

march/april 2017 www.computer.org/intelligent 75


Softmax outp
ut

rd t
cto n
ve ume
Fully connecte

c
Accordingly, the network comprises

Do
d layer
seven layers: input (word vectorization),
Mairesse feat
ures d Mairesse convolution (sentence vectorization),
1-max pooling max pooling (sentence vectoriza-
layer

cto ce
tion), 1-max pooling (document vec-

ve nten
rs
i
torization), concatenation (document

Se
Concatenation vectorization), linear with Sigmoid
layer
activation (classification), and two-
neuron softmax output (classification).
Max pooling la Figure 1 depicts the end-to-end
yer network for two sentences. In the rest
of this article, we discuss these steps
and layers in detail.

Input
200
We represent the dataset as a set of
Convolution la documents: each document d is a se-
yer
quence of sentences, each sentence si
is a sequence of words, and each word
200 wj is a real-valued vector of fixed
wi
I length known as word embedding. In
vis ll
In t i It our experiments, we used Googles
in dia is
wi to pretrained word2vec embeddings.5
nte Word ho o
t
Word embedd
r
vector su
in Thus, our input layer is a four-
ing mm
size: 300 er dimensional real-valued array from
DSWE , in which D is the number
of documents in the dataset, S is the
Figure 1. Architecture of our network. The network consists of seven layers. The input maximum number of sentences in a
layer (shown at the bottom) corresponds to the sequence of input sentences (only
two are shown). The next two layers include three parts, corresponding to trigrams,
document across all documents, W is
bigrams, and unigrams. The dotted lines delimit the area in a previous layer to which the maximum number of words in a
a neuron of the next layer is connectedfor example, the bottom-right rectangle sentence across all documents, and E
shows the area comprising three word vectors connected with a trigram neuron. is the length of word embeddings.
In implementation, to force all doc-
vector is reduced to a fixed-length doc- Network Architecture uments to contain the same number
ument vector. This fixed-length fea- We trained five separate neural classifiers, of sentences, we padded shorter doc-
ture vector is then concatenated with all with the same architecture, for the Big uments with dummy sentences. Simi-
the document-level features giving a Five personality traits. The processing flow larly, we padded shorter sentences
fixed-length document vector, which in our network comprises four main steps: with dummy words.
is then used for final classification.
word vectorization, in which we use Aggregating Word Vectors into
When aggregating word vectors into fixed-length word2vec word embed- Sentence Vectors
sentence vectors, we use convolution to dings as input data; We use three convolutional filters to
form word n-gram features. However, sentence vectorization, from se- extract unigram, bigram, and trigram
when aggregating sentence vectors into quences of words in each sentence features from each sentence. After max
the document vector, we do not use to fixed-length sentence vectors; pooling, the sentence vector is a concat-
convolution to form sentence n-gram document vectorization, from the enation of the feature vectors obtained
features. We tried this arrangement, sequence of sentence vectors to the from these three convolutional filters.
but the network did not converge in 75 document vector; and
epochs, so we left this experiment to classification, from the document vec- Convolution. To extract the n-gram
our future work. tor to the classification result (yes/no). features, we apply a convolutional filter

76 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS


of size n E on each sentence s Adding Document-Level Features ity of the document to belong to the
RWE . We use 200 n-gram feature to Document Vector classes yes and no. For this, we build
maps for each n = 1, 2, 3. So, for each Franois Mairesse and colleagues de- a vector
n, our convolutional filter applied veloped a document-level feature set
on the matrix s is Fnconv 200 n E . for personality detection, consisting of (xyes, xno) = dfc Wsm + Bsm,
We add a bias Bnconv 200 to the out- 84 features.4 It comprises the Linguis-
put of the filter, which gives, for a tic Inquiry and Word Count features6; where Wsm R2002 and the bias Bsm
given sentence, three feature maps Medical Research Council features7; R2 , and we calculate the class prob-
FMn 200(W n +1)1, n = 1, 2, 3. To utterance-type features; and prosodic abilities as
introduce nonlinearity, we apply the features. Examples of the features in-
Rectified Linear Unit (ReLU) function cluded in this set are the word count exp ( xi )
(
P i network parameters = )
to the feature maps FMn. and average number of words per sen- ( )
exp xyes + exp ( xno )
tence, as well as the total number of
Max pooling. Next, we apply max pronouns, past tense verbs, present for i {yes, no}.
pooling to each feature map FMn to fur- tense verbs, future tense verbs, letters,
ther down-sample it to a feature map phonemes, syllables, questions, and Training
DFMn 20011, which we flatten to assertions in the document. We use Negative Log Likelihood as the
obtain a feature-vector of size 200. We then concatenated those 84 fea- objective function for training. We ran-
tures, dMairesse, with the document vector domly initialize the network parame-
Convolution. Finally, we concatenate dnetwork. This gave the final 684-dimen- ters F1conv , F2conv , F3conv , B1conv , B2conv , B3conv ,
the vectors obtained for the three sional document vector d = (dnetwork, Wfc, Bfc, Wsm, and Bsm. We use Stochas-
types of n-gram to obtain a vector s dMairesse) R684. We also used the feature tic Gradient Descent with Adadelta8
R600 representing the sentence. We set dMairesse as a baseline in our evaluation. update rules to tune the network pa-
apply convolution and max pooling rameters in order to minimize the error
to each sentence in the document. The Classification defined as negative log likelihood. In
network parameters are shared be- For final classification, we use a two- our experiments, after 50 epochs, the
tween all sentences of the document. layer perceptron consisting of a fully network converged, with 98 percent
In particular, although we pad all sen- connected layer of size 200 and the training accuracy.
tences to a common size with dummy final softmax layer of size two, repre-
words, we do not need to pad all doc- senting the yes and no classes. Experimental Results
uments to a common size with dummy To evaluate our method, we tested
sentences. Fully connected layer. We multiply it on a well-known dataset typically
the document d R684 by a matrix used to compare personality detec-
Aggregating Sentence Vectors Wfc R684200 and add a bias Bfc tion techniques.
into a Document Vector R200 to obtain the vector dfc R200.
After individual sentences are pro- Introducing nonlinearity with Sig- Dataset
cessed, the document vector is a vari- moid activation improved the results: We used James Pennebaker and Laura
able-sized concatenation of all its Kings stream-of-consciousness essay
sentence vectors. dfc = (d Wfc + Bfc), dataset.6 It contains 2,468 anonymous
We assume that the document has essays tagged with the authors person-
some feature if at least one of its sen- where ality traits: EXT, NEU, AGR, CON,
tences has this feature. Each sentence and OPN. We removed from the dataset
is represented as a 600-dimensional (x) = 1/(1 + exp(x)). one essay that contained only the text
vector. To obtain the document vec- Err:508, and we experimented with
tor, for each of these 600 features, We also experimented with ReLU the remaining 2,467 essays.
we take the maximum across all and tanh as activation functions, but
the sentences of the document. This they yielded lower results. Experimental Setting
gives a 600-dimensional real-valued In all of our experiments, we used ten-
vector dnetwork R600 of the whole Softmax output. We use the softmax fold cross-validation to evaluate the
document. function to determine the probabil- trained network.

march/april 2017 www.computer.org/intelligent 77


Table 1. Accuracy obtained with different configurations.
Personality traits
Document Convolution
vector d Filter Classifier filter EXT NEU AGR CON OPN
N/A N/A Majority N/A 51.72 50.02 53.10 50.79 51.52 says contained at least one emotion-
Word n-grams Not used SVM N/A 51.72 50.26 53.10 50.79 51.52 ally charged word.
Mairesse12 N/A SVM N/A 55.13 58.09 55.35 55.28 59.57
We also experimented with not re-
moving any sentences and with ran-
Mairesse N/A SVM N/A 55.82 58.74 55.70 55.25 60.40
(our experiments) domly removing half of each essays
sentences. Randomly removing half
Published N/A N/A N/A 56.45 58.33 56.03 56.73 60.68
state of the of the sentences improved the results
art per trait12 as compared with no filtering at all;
CNN N/A MLP 1, 2, 3 55.43 55.08 54.51 54.28 61.03 we do not have a plausible explana-
CNN N/A MLP 2, 3, 4 55.73 55.80 55.36 55.69 61.73 tion for this fact. Removing emotion-
CNN N/A SVM 2, 3, 4 54.42 55.47 55.13 54.60 59.15
ally neutral sentences as described
earlier further improved the results,
CNN + Mairesse N/A MLP 1, 2, 3 54.15 57.58 54.64 55.73 61.79
producing the best results for all five
CNN + Mairesse N/A SVM 1, 2, 3 55.06 56.74 53.56 56.05 59.51
traits. Filtering also improved the
CNN + Mairesse N/A sMLP/FC 1, 2, 3 54.61 57.81 55.84 57.30 62.13 training time by 33.3 percent.
CNN + Mairesse Used sMLP/MP 1, 2, 3 58.09 57.33 56.71 56.71 61.13
CNN + Mairesse Used MLP 1, 2, 3 55.54 58.42 55.40 56.30 62.68 Extracting word-level features. We
CNN + Mairesse Used SVM 1, 2, 3 55.65 55.57 52.40 55.05 58.92 used the word2vec embeddings5 (http://
CNN + Mairesse Used MLP 2, 3, 4 55.07 59.38 55.08 55.14 60.51
drive.google.com/file/d/0B7XkCwpI5
KDYNlNUTTlSS21pQmM/edit) to
CNN + Mairesse Used SVM 2, 3, 4 56.41 55.61 54.79 55.69 61.52
convert words into 300-dimensional
CNN + Mairesse Used MLP 3, 4, 5 55.38 58.04 55.39 56.49 61.14 vectors. If a word was not found in
CNN + Mairesse Used SVM 3, 4, 5 56.06 55.96 54.16 55.47 60.67 the list, we assigned all 300 coordi-
*Bold indicates the best result for each trait. nates randomly with a uniform distri-
bution in [0.25, 0.25].

Preprocessing. We split the text into Sentence filtering. We assumed that Word n-gram baseline. As a baseline
a sequence of sentences at the period a relevant sentence would have at least feature set, we used 30,000 features:
and question mark characters. Then one emotionally charged word. After 10,000 most-frequent-word unigrams,
we split each sentence into words at extracting the document-level features, bigrams, and trigrams in our dataset. We
whitespace characters. We reduced but before extracting the word2vec fea- used the Scikit-learn library to extract
all letters to lowercase and removed tures, we discarded all sentences that these features from the documents.11
all characters other than ASCII let- had no emotionally charged words.
ters, digits, exclamation marks, and We used the NRC Emotion Lexicon Classification. We experimented with
single and double quotation marks. (http://saifmohammad.com/WebPages/ three classification settings. In the vari-
Some essays in the dataset con- NRC-Emotion-Lexicon.htm) to ob- ant marked MLP in Table 1, we used
tained no periods or missing periods, tain emotionally charged words.9,10 the network shown in Figure 1, which is
resulting in absurdly long sentences. This lexicon contains 14,182 words a multiple-layer perceptron (MLP) with
For these cases, we split each obtained tagged with 10 attributes: anger, antici- one hidden layer, trained together with
sentence that was longer than 150 pation, disgust, fear, joy, negative, posi- the CNN. In the variant marked SVM
words into sentences of 20 words tive, sadness, surprise, and trust. We (support vector machine) in the table,
each (except the last piece that could considered a word to be emotionally we first trained the network shown in
happen to be shorter). charged if it had at least one of these at- Figure 1 to obtain the corresponding
tributes; there are 6,468 such words in document vector d for each document
Extracting document-level features. the lexicon (most of the words in this in the dataset, and then used these vec-
We used Mairesse and colleagues lexicon have no attributes). tors to train a polynomial SVM of de-
library (http://farm2.user.srcf.net So, if a sentence contained none of gree 3. In the variant marked sMLP/
/ r e search /personalit y/recog ni zer the 6,468 words, we removed it be- MP in the table, in a similar manner
.html) to extract the 84 Mairesse fea- fore extracting the word2vec features we used the vectors d (the max pool-
tures from each document.4 from the text. In our dataset, all es- ing layer) to train a stand-alone MLP

78 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS


(using 50 epochs) with the same config- tion filters to obtain document vector d 8. M.D. Zeiler, ADADELTA: An Adap-
uration as the last two layers in Figure from the sequence of sentence vectors tive Learning Rate Method, Comput-
1 (that is, using the 1-max pool layer si. However, training did not converge ing Research Repository (CoRR), 2012;
from Figure 1 as input). after 75 epochs, so we used 1-max https://arxiv.org/abs/1212.5701.
In another experiment, we fed to the pooling layer on the array of sentence 9. S.M. Mohammad and P.D. Turney,
stand-alone MLP the values from the fully vectors to obtain the document vector. Crowdsourcing a Word-Emotion As-
connected layer instead of d; this variant is sociation Lexicon, Computational Intel-
marked as sMLP/FC in Table 1. For base-
I
ligence, vol. 29, no. 3, 2013, pp. 436465.
line experiments not involving the use of n the future, we plan to incorpo- 10. S. Mohammad and P. Turney, Emotions
CNN, we used only a linear SVM. rate more features and preprocess- Evoked by Common Words and Phrases:
ing. We plan to apply the Long Short Using Mechanical Turk to Create an
Results Term Memory (LSTM) recurrent net- Emotion Lexicon, Proc. NAACL-HLT
Table 1 shows our results. Our method work to build both the sentence vec- Workshop Computational Approaches
outperformed the state of the art for tor from a sequence of word vectors to Analysis and Generation of Emotion
all five traits, although with different and the document vector from a se- in Text, 2010, pp. 2634.
configurations for different traits. quence of sentence vectors. In addi- 11. F. Pedregosa et al., Scikit-learn: Ma-
Using n-grams showed no improve- tion, we plan to apply our document chine Learning in Python, J. Machine
ment over the majority baseline: the modeling technique to other emotion- Learning Research, vol. 12, Oct. 2011,
classifier rejected all n-grams. Apply- related tasks, such as sentiment analysis pp. 28252830.
ing filtering and adding the document- or mood classification.13 12. S.M. Mohammad and S. Kiritchenko,
level (Mairesse) features proved to be Using Hashtags to Capture Fine Emo-
beneficial. In fact, the CNN alone References tion Categories from Tweets, Com-
without the document-level features 1. E. Cambria, Affective Computing and putational Intelligence, vol. 31, no. 2,
underperformed the Mairesse base- Sentiment Analysis, IEEE Intelligent Sys- 2015, pp. 301326.
line. We attribute this to insufficient tems, vol. 31, no. 2, 2016, pp. 102107. 13. B.G. Patra, D. Das, and S. Bandyopad-
training data: our training corpus was 2. E. Cambria et al., SenticNet 4: A Se- hyay, Multimodal Mood Classifica-
only 1.9 million running words. mantic Resource for Sentiment Analysis tion Framework for Hindi Songs,
Contrary to our expectations, ap- based on Conceptual Primitives, Proc. Computacin y Sistemas, vol. 20, no. 3,
plying SVM to the document vector d 26th Intl Conf. Computational Lin- 2016, pp. 515526.
built with the CNN did not improve guistics, 2016, pp. 26662677.
the results. Surprisingly, applying a 3. J. Digman, Personality Structure: Emer- Navonil Majumder is a postgraduate student
stand-alone MLP to d improved the gence of the Five-Factor Model, Ann. Rev. at the Centro de Investigacin en Computacin
results. We cannot attribute this to the Psychology, vol. 41, 1990, pp. 417440. (CIC) of the Instituto Politcnico Nacional,
fact that the system had thus received 4. F. Mairesse et al., Using Linguistic Mexico. Contact him at navo@nlp.cic.ipn.mx.
an additional 50 epochs of training, Cues for the Automatic Recognition of
because the network used to build the Personality in Conversation and Text, Soujanya Poria is a research assistant at
document vector d has converged in J. Artificial Intelligence Research, vol. Temasek Laboratories at Nanyang Technologi-
its 50 epochs of initial training. 30, 2007, pp. 457500. cal University. Contact him at sporia@ntu.edu.sg.
Increasing the window size for convo- 5. T. Mikolov et al., Efficient Estimation of
lution filters did not seem to consistently Word Representations in Vector Space, Alexander Gelbukh is a research professor
improve the results; while the best result Computing Research Repository (CoRR), at the CIC of the Instituto Politcnico Na-
for the NEU trait was obtained with 2-, 2013; http://arxiv.org/abs/1301.3781. cional. Contact him at gelbukh@cic.ipn.mx.
3-, and 4-grams, even sizes 1, 2, and 3 6. J.W. Pennebaker and L.A. King,
outperformed the current state of the art. Linguistic Styles: Language Use as an Erik Cambria is an assistant professor in the
We also tried several configurations Individual Difference, J. Personality School of Computer Science and Engineer-
not shown in Table 1, as well as some and Social Psychology, vol. 77, no. 6, ing at Nanyang Technological University.
variations of the network architecture. 1999, pp. 12961312. He is also affiliated with the Rolls-Royce@
In particular, in addition to using con- 7. M. Coltheart, The MRC Psycholinguis- NTU Corporate Lab, A*STAR SIMTech,
volution filters to obtain a vector for tic Database, Quarterly J. Experimental and MIT Synthetic Intelligence Lab. Contact
each sentence, we tried using convolu- Psychology, vol. 33A, 1981, pp. 497505. him at cambria@ntu.edu.sg.

march/april 2017 www.computer.org/intelligent 79

You might also like