Professional Documents
Culture Documents
Page 1
James W. Pennebaker, Cindy K. Chung, Molly Ireland, Amy Gonzales, and Roger J. Booth The University of Texas at Austin and The University of Auckland, New Zealand
This article is published by LIWC.net, Austin, Texas 78703 USA in conjunction with the LIWC2007 software program. Correspondence should be sent to Pennebaker@mail.utexas.edu.
Contents
The Development and Psychometric Properties of LIWC2007 ..................................................... 3 The LIWC2007 Framework............................................................................................................ 3 The LIWC2007 Main Text Processing Module.............................................................................. 3 The Default LIWC2007 Dictionary ................................................................................................ 4 LIWC2007 Dictionary Development.............................................................................................. 7 Internal Reliability and External Validity....................................................................................... 8 Base Rates of Word Usage ............................................................................................................. 9 Comparing LIWC2007 with LIWC200113 Helpful References........................................................................................................................ 14
analyseseachsequentially,writingtheoutputtoasinglefile.Processingtimeforapageof singlespacedtextistypicallyafractionofasecondonbothPCandMaccomputers. LIWC2007readseachdesignatedtextfile,onetargetwordatatime.Aseachtargetwordis processed,thedictionaryfileissearched,lookingforadictionarymatchwiththecurrent targetword.Ifthetargetwordmatchesthedictionaryword,theappropriateword categoryscale(orscales)forthatwordisincremented.Asthetargettextfileisbeing processed,countsforvariousstructuralcompositionelements(e.g.,wordcountand sentencepunctuation)arealsoincremented. Witheachtextfile,approximately80outputvariablesarewrittenasonelineofdatatoa designatedoutputfile.Thisdatarecordincludesthefilename,4generaldescriptor categories(totalwordcount,wordspersentence,percentageofwordscapturedbythe dictionary,andpercentofwordslongerthansixletters),22standardlinguisticdimensions (e.g.,percentageofwordsinthetextthatarepronouns,articles,auxiliaryverbs,etc.),32 wordcategoriestappingpsychologicalconstructs(e.g.,affect,cognition,biological processes),7personalconcerncategories(e.g.,work,home,leisureactivities),3 paralinguisticdimensions(assents,fillers,nonfluencies),and12punctuationcategories (periods,commas,etc).AcompletelistofthestandardLIWC2007scalesisincludedin Table1.
Table1.LIWC2007OutputVariableInformation
Category Linguistic Processes Word count words/sentence Dictionary words Words>6 letters Total function words Total pronouns Personal pronouns 1st pers singular 1st pers plural 2nd person 3rd pers singular 3rd pers plural Impersonal pronouns Articles [Common verbs]a Auxiliary verbs Past tense a Present tense a Future tense a Adverbs Prepositions Conjunctions Negations Quantifiers Numbers Swear words Psychological Processes Social processesb Family Friends Humans Affective processes Positive emotion Negative emotion Anxiety Anger Sadness Cognitive processes Insight Causation Discrepancy Tentative Certainty Inhibition Inclusive Abbrev wc wps dic sixltr funct pronoun ppron i we you shehe they ipron article verb auxverb past present future adverb prep conj negate quant number swear social family friend human affect posemo negemo anx anger sad cogmech insight cause discrep tentat certain inhib incl Examples Words in category Validity (judges) Alpha: Binary/raw
I, them, itself I, them, her I, me, mine We, us, our You, your, thou She, her, him They, their, theyd It, its, those A, an, the Walk, went, see Am, will, have Went, ran, had Is, does, hear Will, gonna Very, really, quickly To, with, above And, but, whereas No, not, never Few, many, much Second, thousand Damn, piss, fuck Mate, talk, they, child Daughter, husband, aunt Buddy, friend, neighbor Adult, baby, boy Happy, cried, abandon Love, nice, sweet Hurt, ugly, nasty Worried, fearful, nervous Hate, kill, annoyed Crying, grief, sad cause, know, ought think, know, consider because, effect, hence should, would, could maybe, perhaps, guess always, never block, constrain, stop And, with, include
464 116 70 12 12 20 17 10 46 3 383 144 145 169 48 69 60 28 57 89 34 53 455 64 37 61 915 406 499 91 184 101 730 195 108 76 155 83 111 18
.52
.79
.97/.40 .91/.38 .88/.20 .62/.44 .66/.47 .73/.34 .75/.52 .50/.36 .78/.46 .14/.14 .97/.42 .91/.23 .94/.75 .91/.74 .75/.02 .84/.48 .88/.35 .70/.21 .80/.28 .88/.12 .87/.61 .65/.48 .97/.59 .81/.65 .53/.12 .86/.26 .97/.36 .97/.40 .97/.61 .89/.33 .92/.55 .91/.45 .97/.37 .94/.51 .88/.26 .80/.28 .87/.13 .85/.29 .91/.20 .66/.32
.87 .70
.44 .21
Words in category 17 273 72 51 75 567 180 236 96 111 638 168 220 239 327 186 229 93 Validity (judges)
6 Alpha: Binary/raw .67/.47 .96/.43 .90/.43 .89/.37 .88/.26 .95/.53 .93/.45 .85/.38 .69/.34 .86/.68 .98/.51 .96/.41 .96/.44 .94/.58 .91/.69 .93/.37 .88/.50 .81/.57
Category Exclusive Perceptual processesc See Hear Feel Biological processes Body Health Sexual Ingestion Relativity Motion Space Time Personal Concerns Work Achievement Leisure
Abbrev excl percept see hear feel bio body health sexual ingest relativ motion space time work achieve leisure
Examples But, without, exclude Observing, heard, feeling View, saw, seen Listen, hearing Feels, touch Eat, blood, pain Cheek, hands, spit Clinic, flu, pill Horny, love, incest Dish, eat, pizza Area, bend, exit, stop Arrive, car, go Down, in, thin End, until, season Job, majors, xerox Earn, hero, win Cook, chat, movie Apartment, kitchen, family Audit, cash, owe Altar, church, mosque Bury, coffin, kill
.53
Home home 173 .90/.53 Money money 159 .91/.53 Religion relig 62 .86/.40 Death death Spoken categories Agree, OK, yes 30 .59/.41 Assent assent Er, hm, umm 8 .28/.23 Nonfluencies nonflu Blah, Imean, youknow 9 .63/.18 Fillers filler Words in category refers to the number of different dictionary words that make up the variable category; Validity judges reflect the simple correlations between judges ratings of the category with the LIWC variable (from Pennebaker & Francis, 1996). Alphas refer to the Cronbach alphas for the internal reliability of the specific words within each category. The binary alphas are computed on the occurrence/non-occurrence of each dictionary word whereas the raw or uncorrected alphas are based on the percentage of use of each of the category words within the texts. All alphas were computed on a sample of 2800 randomly selected text files from our language corpus. The LIWC dictionary generally arranges categories hierarchically. For example, all pronouns are included in the overarching category of function words. The category of pronouns is the sum of personal and impersonal pronouns. There are some exceptions to the hierarchy rules: a Common verbs are not included in the function word category. Similarly, common verbs (as opposed to auxiliary verbs) that are tagged by verb tense are included in the past, present, and future tense categories but not in the overall function word categories. b Social processes include a large group of words that denote social processes, including all non-first-personsingular personal pronouns as well as verbs that suggest human interaction (talking, sharing). c Perceptual processes include the entire dictionary of the Qualia category (which is a separate dictionary), which includes multiple sensory and perceptual dimensions associated with the five senses.
Step1.WordCollection.InthedesignanddevelopmentoftheLIWCcategoryscales,setsof wordswerefirstgeneratedforeachcategoryscale.WithinthePsychologicalProcesses category,forexample,theemotionoraffectivesubdictionarieswerebasedonwordsfrom severalsources.Wedrewoncommonemotionratingscales,suchasthePANAS(Watson, Clark,&Tellegen,1988),RogetsThesaurus,andstandardEnglishdictionaries.Following thecreationofpreliminarycategorywordlists,brainstormingsessionsamong36judges wereheldinwhichwordsrelevanttothevariousscalesweregeneratedandaddedtothe initialscalelists.Similarschemeswereusedfortheothersubjectivedictionarycategories. Step2.JudgesRatingPhases.Oncethebroadwordlistswereamassed,wordsinthe PsychologicalProcessesandPersonalConcernsandmostintheRelativity(excludingverb tense)categorieswerethenratedbythreeindependentjudges.Inthedevelopmentofthe firstLIWCprogram,thejudgeswereinstructedtofocusonboththeinclusionandexclusion ofwordsineachLIWCdictionaryscalelist.Inthefirstratingphase,thejudgesindicated whethereachwordinthecategorylistshouldorshouldnotbeincludedontheparticular categoryinquestion.Theywerealsoinstructedtoincludeadditionalwordstheyfelt shouldbeincludedinthecategory.Allcategorywordlistswereupdatedbythefollowing setofrules:1)awordremainedinthecategorylistiftwooutofthreejudgesagreedit shouldbeincluded,2)awordwasdeletedfromthecategorylistifatleasttwoofthethree judgesagreeditshouldbeexcluded,and3)awordwasaddedtothecategorylistiftwoout ofthreejudgesagreeditshouldbeincluded.Duetotheobjectivenatureofelementsinthe StandardLanguageDimensionscategory(e.g.,articles,pronouns,prepositions),judges ratingswerenotcollectedforthevariouslistsinthatcategory. ThesecondratingphaseinvolvedthediscriminationofLIWCcategorywordelements. Judgesweregivencategorylevelalphabetizedwordlists(e.g.,allCognitiveProcesswords) andaskedtoindicatewhethereachwordinthelistshouldorshouldnotbeincludedinthe highlevelcategoryinquestion.Judgesweretheninstructedtoindicateinwhich,ifany,of themidlevelscaleliststhewordshouldbeincluded(e.g.,Insight,Causation).Allcategory scalewordlistswereupdatedbythefollowingrules:1)awordremainedonthescalelistif twooutofthreejudgesagreeditshouldbeincludedand2)awordwasdeletedfromthe scalelistifatleasttwoofthethreejudgesagreeditshouldbeexcluded.Thefinal percentagesofjudgesagreementforthissecondratingphaserangedfrom93%agreement forInsightto100%agreementforIngestion,Death,Religion,Friends,Relatives,and Humans. Step3.PsychometricEvalutation.TheinitialLIWCjudgingtookplacein19921994.A significantLIWCrevisionwasundertakenin1997tostreamlinetheoriginalprogramand dictionaries.Textfilesfromseveraldozenstudies,totalingover8millionwordswere analyzedusingthe1997versionofLIWCaswellasWordSmith,apowerfulwordcount programusedindiscourseanalysis.OriginalLIWCcategoriesthatwereusedatverylow rates(lessthan0.3percentofwordsmadeupthecategory)orthatsufferedfrom consistentlypoorreliabilityorvaliditywereomitted.Severalnewcategories,including socialprocesses,severalpersonalconcerncategories,andtherelativitydimensions,were addedfollowingthesamestringentjudgebasedproceduresdescribedabove(including bothpasses).Finally,oncetheentirenewLIWCdictionarywasassembled,anywordsthat
werenotusedatleast0.005percentofthetimeinourprevioustextfilesorwerenotlisted inFrancisandKuceras(1982)FrequencyAnalysisofEnglishUsagewereexcluded. Step4.UpdatesandExpansions.Themostrecentversion,LIWC2007,involvedsubstantial updatingofthedictionariesandmodificationinthedictionarystructure.Drawingonover severalhundredthousandtextfilesmadeupofseveralhundredmillionwordsfromboth writtenandspokenlanguagesamples,wesoughttoidentifycommonwordsandword categoriesnotcapturedintheearlierLIWCversions.Examiningthe2000mostfrequently usedwords,agroupoffourjudgesindividuallyandcollectivelyagreedwhichnewwords andnewwordcategorieswereappropriateforinclusion.Basedonrecentstudies suggestingthatfunctionwordsareparticularlyrelevanttopsychologicalprocesses,we addedthecategoriesofConjunctions,Adverbs,Quantifiers,AuxiliaryVerbs,Commonly usedVerbs,ImpersonalPronouns,TotalFunctionWords,andTotalRelativityWords.In addition,thirdpersonpronounsweredividedinto3rdpersonsingularand3rdperson plural.Finally,alargegroupofpunctuationmarkshavebeenaddedasseparatecategories. ForthosewhoarefamiliarwithLIWC2001,itwillbeclearthatsomeoftheoriginal categorieshavebeenremovedprimarilybecausethesecategorieshadconsistentlylow baseratesandwererarelyused:Optimism,PositiveFeelings,CommunicationVerbs,Other References,Metaphysical,Sleeping,Grooming,School,Sports,Television,Up,andDown. ThecategoryofUniqueWords(alsoknownasType/Tokenratio)hasalsobeenremoved. Thiscategorytypicallycorrelateswithwordcountat.80.Notethatanalternativedefault LIWC2001dictionaryisavailable.
wordsareused.Thebinarymethodhasthepotentialtooverestimatereliabilitybasedon thelengthoftexts;theuncorrectedmethodtendstounderestimatereliabilitybasedonthe highlyvariablebaseratesofwordusagewithinanygivencategory. Butbewarned:thepsychometricsofnaturallanguageusearenotasprettyaswith questionnaires.Thereasonisobviousonceyouthinkaboutit.Onceyousaysomething, yougenerallydontneedtosayitagaininthesameparagraphoressay.Thenatureof discourse,then,isweusuallysaysomethingandthenmoveontothenexttopic.Sayingthe samethingoverandoveragainisgenerallybadform. Issuesofvalidityarealsoabittricky.Wecanhavepeoplecompleteaquestionnairethat assessestheirgeneralmoodsandthenhavethemwriteanessaywhichwethensubjectto theLIWCprogram.Wecanalsohavejudgesevaluatetheessayforitsemotionalcontent. Inotherwords,wecangetselfreported,judged,andLIWCnumbersthatallreflecta participantsanger. OneofthefirsttestsofthevalidityoftheLIWCscaleswasundertakenbyPennebakerand Francis(1996)aspartofanexperimentinwhichfirstyearcollegestudentswroteabout theexperienceofcomingtocollege.Duringthewritingphaseofthestudy,72Introductory Psychologystudentsmetasagrouponthreeconsecutivedaystowriteontheirassigned topics.Participantsintheexperimentalcondition(n=35)wereinstructedtowriteabout theirdeepestthoughtsandfeelingsconcerningtheexperienceofcomingtocollege.Those inthecontrolcondition(n=37)wereaskedtodescribeanyparticularobjectoreventof theirchoosinginanunemotionalway.Afterthewritingphaseofthestudywascompleted, fourjudgesratedtheparticipantsessaysonvariousemotional,cognitive,content,and compositiondimensionsdesignedtocorrespondtoselectedLIWCDictionaryscales. UsingLIWCoutputandjudgesratings,Pearsoncorrelationalanalyseswereperformedto testLIWCsexternalvalidity.Results,presentedinTable1,revealthattheLIWCscalesand judgesratingsarehighlycorrelated.ThesefindingssuggestthatLIWCsuccessfully measurespositiveandnegativeemotions,anumberofcognitivestrategies,severaltypesof thematiccontent,andvariouslanguagecompositionelements.Thelevelofagreement betweenjudgesratingsandLIWCsobjectivewordcountstrategyprovidessupportfor LIWCsexternalvalidity.
10
representedinthesestudies.Athirdclassoftextwasbasedon113highlytechnical articlesinthejournalSciencepublishedin1997or2007.Afourthsampleincluded714,000 internetweblogs,orblogs,fromapproximately20,000individualswhopostedeitheron Blog.comin2004orLiveJournal.cominthesummerandfallof2001.Thefifthsamplewas based209novelspublishedinEnglishbetween1700and2004.TheAmericanandBritish novelsincludedbestsellingpopularbooksaswellasmoreclassicnovels.Finally,we analyzeddatafromsevenobservationalstudiesinwhichparticipantsweretaperecorded whileengaginginconversationswithothers.Thespeechsamplesrangedfromtranscripts ofpeoplewearingaudiorecordersoverdaysorweeks,strangerstalkinginawaitingroom, tocouplestalkingaboutproblems,toopenairtaperecordingsofpeopleinpublicspaces.
11
Category 1st pers singular 1st pers plural 2nd person 3rd pers singular 3rd pers plural Impersonal pronouns Articles Common verbsa Auxiliary verbs Past tense a Present tense a Future tense a Adverbs Prepositions Conjunctions Negations Quantifiers Numbers Swear words Psychological Processes Social processesb Family Friends Humans Affective processes Positive emotion Negative emotion Anxiety Anger Sadness Cognitive processes Insight Causation Discrepancy Tentative Certainty Inhibition Inclusive Exclusive Perceptual processesc See Hear Feel Biological processes Body Health
Emotional writing 10.40 0.73 0.39 2.01 0.71 6.00 4.97 17.44 10.65 5.76 9.16 1.12 6.29 12.94 7.39 2.24 3.12 1.31 0.11 9.09 0.99 0.50 0.84 6.02 3.28 2.67 0.68 0.66 0.63 19.66 3.25 1.85 2.13 2.93 1.73 0.46 5.09 3.49 2.08 0.53 0.44 0.96 1.95 0.51 0.93
Control writing 8.50 0.93 0.20 0.73 0.41 3.51 6.63 13.59 7.42 4.55 6.74 1.54 4.48 16.06 7.71 0.84 2.46 2.73 0.03 5.55 0.33 0.42 0.38 2.57 1.83 0.71 0.21 0.14 0.14 14.42 1.31 1.28 1.08 2.31 0.80 0.38 6.37 1.71 1.91 0.83 0.35 0.62 2.97 1.05 0.49
Science Articles 0.12 0.37 0.00 0.04 0.28 2.36 7.67 4.98 3.90 1.45 2.70 0.37 1.35 12.87 4.30 0.40 1.93 7.05 0.00 2.61 0.08 0.04 0.24 2.18 1.33 0.84 0.16 0.13 0.29 11.28 1.82 2.16 0.48 1.33 0.56 0.63 4.08 0.92 1.15 0.65 0.06 0.24 1.02 0.28 0.57
Blogs 6.42 0.88 1.23 1.48 0.65 5.40 5.89 14.61 8.81 3.83 8.68 1.06 5.46 12.06 6.39 1.78 2.79 1.96 0.33 8.65 0.38 0.25 0.79 5.84 3.72 2.07 0.30 0.76 0.42 15.97 2.17 1.42 1.54 2.65 1.40 0.47 4.66 2.78 2.27 0.87 0.65 0.60 2.05 0.75 0.54
Novels 2.55 0.55 1.29 4.92 0.98 4.61 8.21 13.01 7.76 6.29 4.57 1.14 3.76 14.06 6.65 1.69 2.27 1.17 0.06 12.26 0.41 0.17 1.05 4.89 2.86 1.98 0.44 0.55 0.57 15.23 1.99 1.02 1.52 2.16 1.43 0.61 5.35 2.22 3.28 1.26 1.15 0.74 2.13 1.21 0.44
Talking 6.30 1.09 3.94 1.46 0.84 7.89 4.42 19.94 12.38 3.98 13.97 0.99 6.22 9.33 5.67 2.92 2.23 1.95 0.37 11.75 0.24 0.16 0.81 4.93 3.42 1.49 0.18 0.58 0.19 15.66 2.34 1.55 1.73 2.36 1.34 0.37 3.88 3.26 2.27 0.99 0.69 0.48 1.52 0.59 0.31
Grand Means 5.72 0.76 1.18 1.77 0.65 4.96 6.30 13.93 8.49 4.31 7.64 1.04 4.59 12.89 6.35 1.65 2.47 2.70 0.15 8.32 0.41 0.26 0.69 4.41 2.74 1.63 0.33 0.47 0.37 15.37 2.15 1.55 1.41 2.29 1.21 0.49 4.91 2.40 2.16 0.86 0.56 0.61 1.94 0.73 0.55
Mean SDs 2.48 0.83 0.93 1.33 0.57 1.56 1.95 2.73 2.11 2.25 2.73 0.80 1.44 2.08 1.64 0.95 0.94 1.60 0.29 2.93 0.53 0.32 0.62 1.59 1.27 0.91 0.33 0.48 0.37 2.85 1.05 0.84 0.79 1.05 0.64 0.39 1.54 1.06 1.16 0.79 0.47 0.50 1.44 0.85 0.65
12
Category Sexual Ingestion Relativity Motion Space Time Current Concerns Work Achievement Leisure Home Money Religion Death Spoken categories Assent Nonfluencies Fillers Punctuation Total Punctuation Periods Commas Colons Semicolons Question marks Exclamation marks Dashes Quotation marks Apostrophes Parentheses Other punctuation
Emotional writing 0.34 0.26 13.77 2.07 5.38 6.03 2.14 1.63 0.78 0.64 0.34 0.17 0.18 0.11 0.19 0.03 12.19 6.12 2.90 0.05 0.04 0.17 0.12 0.32 0.27 1.69 0.15 0.20
Control writing 0.05 1.44 20.13 3.57 7.92 8.20 3.74 1.47 1.86 1.86 0.56 0.17 0.03 0.07 0.13 0.01 12.85 6.60 3.24 0.58 0.03 0.04 0.07 0.45 0.21 0.95 0.20 0.29
Science Articles 0.06 0.15 10.19 1.21 6.08 2.65 1.74 1.60 0.41 0.14 0.36 0.06 0.06 0.08 0.06 0.00 33.94 11.73 7.63 0.21 0.38 0.05 0.00 2.54 0.18 0.16 4.87 1.32
Blogs 0.41 0.44 13.75 2.06 5.61 5.72 1.71 1.45 1.60 0.52 0.59 0.34 0.15 0.64 0.32 0.02 23.80 10.66 4.09 0.73 0.11 0.60 1.27 1.11 0.71 2.37 0.50 1.08
Novels 0.18 0.36 13.92 2.18 6.83 4.65 1.01 1.13 0.69 0.63 0.51 0.39 0.23 0.19 0.14 0.00 22.05 5.51 7.36 0.16 0.63 0.57 0.46 1.60 3.39 2.11 0.05 0.14
Talking 0.32 0.37 12.77 2.69 5.46 4.34 1.67 0.95 1.04 0.36 0.60 0.19 0.07 3.61 0.73 1.20 49.37 9.81 5.05 0.07 0.05 2.33 0.21 0.75 0.17 3.82 0.01 27.11
Grand Means 0.23 0.50 14.09 2.30 6.21 5.27 2.00 1.37 1.06 0.69 0.49 0.22 0.12 0.78 0.26 0.21 25.70 8.41 5.05 0.30 0.21 0.63 0.36 1.13 0.82 1.85 0.96 5.02
Mean SDs 0.39 0.65 3.21 1.15 1.82 1.84 1.40 0.84 0.84 0.62 0.54 0.45 0.20 0.76 0.35 0.35 10.48 4.16 2.16 0.74 0.41 1.03 0.70 1.65 0.82 1.50 0.56 4.87
Note: Grand Means are the unweighted means of the six genres; Mean SDs refer to the unweighted mean of the standard deviations across the six genre categories. The LIWC dictionary generally arranges categories hierarchically. For example, all pronouns are included in the overarching category of function words. The cateory of pronouns is the sum of personal and impersonal pronouns. There are some exceptions to the hierarchy rules: a Common verbs are not included in the function word category. Similarly, common verbs (as opposed to auxiliary verbs) that are tagged by verb tense are included in the past, present, and future tense categories but not in the overall function word categories. b Social processes include a large group of words (originally used in LIWC2001) that denote social processes, including all non-first-person-singular personal pronouns as well as verbs that suggest human interaction (e.g., talking, sharing). c Perceptual processes include the entire dictionary of the Qualia category (which is a separate dictionary), which includes multiple sensory and perceptual dimensions associated with the five senses.
In many ways, Table 3 points to the important role that context plays in peoples use of language. Not surprisingly, the topics of writing as reflected in the current concerns category
13
vary substantially as a function of genre. More striking, however, are the large differences in peoples use of function words as well as punctuation from genre to genre (cf., Biber, 1988).
LIWC2007 mean 4.96 2.89 0.79 0.56 0.69 0.77 0.36 2.33 5.86 5.75 1.87 1.27 1.20 0.77 0.49 0.23 0.14 0.73 0.30 0.22 LIWC2001 mean 5.80 3.56 0.68 0.96 0.44 0.69 0.33 1.54 3.41 4.60 2.12 0.78 1.25 0.73 0.35 0.20 0.12 0.45 0.10 0.21
14
Inclusive Exclusive Seeing Hearing Feeling Body Sexual Motion Space Time Occupation Achievement Leisure Home Money Religion Death Assent Nonfluencies Fillers
sd 1.90 1.49 0.72 0.56 0.63 0.86 0.66 1.34 2.02 2.40 1.63 0.87 1.05 0.90 0.60 0.47 0.32 1.28 0.49 0.80
sd 1.62 1.35 0.53 0.77 0.53 0.81 0.59 1.07 1.41 2.10 1.55 0.59 1.11 0.80 0.46 0.43 0.30 0.87 0.38 0.79
correlation 0.72 0.61 0.61 0.60 0.68 0.79 0.91 0.86 0.76 0.93 0.89 0.80 0.67 0.89 0.91 0.79 0.96 0.92 0.82 0.99
Helpful References
Argamon, S., Koppel, M., Fine, J., and Shimoni, A. R. (2003). Gender, genre, and writing style in formal written texts. Text, 23(3).
15
Argamon, S., Koppel, M., Pennebaker, J.W., & Schler, J. (in press). Automatically profiling the author of an anonymous text. Communications of the Association for Computing Machinery (CACM). Baayen, R. H., Piepenbrock, R., & Bulickers, L. (1995). The CELEX Lexical Database [CD ROM]. Philadelphia: Linguistic Data Consortium, University of Pennsylvania. Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press. Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers conception of time. Cognitive Psychology, 43, 1-22. Bosson, J.K., Swann, W.B., Jr., & Pennebaker, J.W. (2000). Stalking the perfect measure of implicit self-esteem: The blind men and the elephant revisited? Journal of Personality and Social Psychology, 79, 631-643. Brewer, M. B., & Gardner, W. (1996). Who is this We? Levels of collective identity and self representations. Journal of Personality & Social Psychology, 71, 83-93. Brown, R. (1968). Words and Things: An Introduction to Language. New York: Free Press Bruner, J. S. (1973). Beyond the Information Given: Studies in the Psychology of Knowing. Oxford: W. W. Norton; 1973. Bucci, W. (1995). The power of the narrative: a multiple code account. In J.W. Pennebaker (Ed.), Emotion, Disclosure, and Health (pp. 93-122). Washington, DC: American Psychological Association Buchanan, L., Westbury, C., & Burgess, C. (in press). Characterizing semantic space: Neighborhood effects in word recognition. Psychonomics Bulletin & Review. Campbell, R.S. & Pennebaker, J.W. (2003). The secret life of pronouns: Flexibility in writing style and physical health. Psychological Science, 14, 60-65. Chambers, J. K., Trudgill, P., and Schilling-Estes, N., eds. (2004). The Handbook Of Language Variation And Change (London: Blackwell). Chung, C.K., & Pennebaker, J.W. (2005). Assessing quality of life through natural language use: Implications of computerized text analysis. In W.R. Lenderking and D.A. Revicki (eds.), Advancing health outcomes research methods and clinical applications (pp 7994). Washington, DC: Degnon Associates. Chung, C.K., & Pennebaker, J.W. (2007). The psychological functions of function words. In K. Fiedler (Ed.), Social communication (pp. 343-359). New York: Psychology Press. Chung, C.K., & Pennebaker, J.W. (in press). Revealing peoples thinking in natural language: Using an automated meaning extraction method in open-ended self-descriptions. Journal of Research in Personality. Cohn, M. A., Mehl, M. R., & Pennebaker, J. W. (2004). Linguistic markers of psychological change surrounding September 11, 2001. Psychological Science, 15, 687-93. Crammer, K. and Singer, Y. (2003). Ultraconservative Online Algorithms for Multiclass Problems. Journal of Machine Learning Research, 3:951991.
16
Damasio, A. R. (1995). Descartes' Error: Emotion, Reason and the Human Brain. New York: Harper Collins. Davison, K.P, Pennebaker, J.W., & Dickerson, S.S. (2000). Who talks? The social psychology of illness support groups. American Psychologist, 55, 205-217. Feixas, G., Geldschlager, H., & Neimeyer, R. A. (2002). Content analysis of personal constructs. Journal of Constructivist Psychology, 15, 1-19. Fiedler, K., & Semin, G. R. (1992). Attribution and language as a socio-cognitive environment. In G. R. Semin, and K. Fiedler (Eds.), Language, Interaction, and Social Cognition, pp. 58-78. Thousand Oaks, CA: Sage Publications, Inc. Fitzsimmons, G. M., & Kay, A. C. (2004). Language and interpersonal cognition: Causal effects of variations in pronoun usage on perceptions of closeness. Personality and Social Psychology Bulletin, 5, 547-557, Foltz, P. W. (1996). Latent semantic analysis for text-based research. Behavior Research Methods, Instruments & Computers, 28, 197-202. Francis, W.N., & Kucera, H. (1982). Frequency analyses of English usage: Lexicon and grammar. Boston: Houghton Mifflin. Gazzaniga, M. S. (2005). The Ethical Brain. New York: Dana Press. Genkin, A., Lewis, D. D., and Madigan, D. (2006). Large-scale Bayesian logistic regression for text categorization. Technometrics (to appear). Gill, A. (2003). Personality and language: The projection and perception of personality in computer-mediated communication. Unpublished doctoral dissertation. University of Edinburgh, Edinburgh, Scotland. Gill, A. J., Oberlander, J., & Austin, E. (2006). The perception of e-mail personality at zeroacquaintace. Personality and Individual Differences, 40, 497-507. Gortner, E.M., & Pennebaker, J.W. (2003). The anatomy of a disaster: Media coverage and community-wide health effects of the Texas A&M Bonfire tragedy. Journal of Social and Clinical Psychology, 22, 580-603. Gottschalk, L. A. (1997). The unobtrusive measurement of psychological states and traits. In C. W. Roberts (Ed.) Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts, pp. 117-129. Mahwah, NJ: Erlbaum. Gottschalk, L.A., & Gleser, G.C. (1969). The measurement of psychological states through the content analysis of verbal behavior. Berkeley: University of California Press. Graesser, A. C., Gernsbacher, M. A., & Goldman, S. R. (2003). Introduction to the Handbook of Discourse Processes. In A. C. Graesser, M. A. Gernsbacher, and S. R. Goldman, Handbook of Discourse Processes (pp. 1-23). Mahwah, NJ: Lawrence Erlbaum Associates. Graesser, A. C., Lu, S., Jackson, G. T., Mitchell, H., Ventura, M., Olney, A., & Louwerse, M. M. (2004). AutoTutor: A tutor with dialogue in natural language. Behavioral Research Methods, Instruments, and Computers, 36, 180-193.
17
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments & Computers, 36, 193-202. Graham, L. E., Scherwitz, L., & Brand, R. (1989). Self reference and coronary heart disease incidence n the Western Collaborative Group Study. Psychosomatic Medicine, 51, 137144. Graybeal, A., Seagal, J.D., & Pennebaker, J.W. (2002). The role of story-making in disclosure writing: The psychometrics of narrative. Psychology and Health, 17, 571-581. Groom, C.J., & Pennebaker, J.W. (2005). The language of love: Sex, sexual orientation, and language use in online personal advertisements. Sex Roles, 52, 447-461. Groom, C.J., & Pennebaker, J.W. (2003). Words. Journal of Research in Personality, 36, 615621. Hajek, C., & Giles, H. (2003). New directions in intercultural communication competence. In J. O. Greene and B. R. Burleson (Eds.), Handbook of communication and social interaction skills, pp.935-957. Mahwah, NJ: Lawrence Erlbaum Associates, Publishers. Halliday, M. A. K., and Matthiessen, C. (2004). An Introduction To Functional Grammar (3rd ed.) (London: Arnold). Hart, R. P., Jarvis, S. E., Jennings, W. P., & Smith-Howell, D. (2005). Political keywords: Using language that uses us. New York: Oxford University Press. Hartley, J., Pennebaker, J.W., & Fox, C. (2003). Using new technology to assess the academic writing styles of male and female pairs and individuals. Journal of Technical Writing and Communication, 33, 243-261. Hartley, J., Sotto E., & Pennebaker, J. W. (2003). Speaking versus typing: A case-study of the effects of using voice-recognition software on academic correspondence. British Journal of Educational Technology, 34, 5-16. Hartley, J., Sotto, E. and Pennebaker, J. W. (2002). Style and substance in psychology: Are influential articles more readable than less influential ones. Social Studies of Science, 32, 321-334. Heberlein, A.S., Adolphs, R., Pennebaker, J.W., & Tranel, D. (2003). Effects of damage to right-hemisphere brain structures on spontaneous emotional and social judgments. Political Psychology, 24, 705-726. Kanagawa, C., Cross, S. E., & Markus, H. R. (2001). "Who am I?" The cultural psychology of the conceptual self. Personality & Social Psychology Bulletin, 27, 90-103. Kashima, E. S., & Kashima, Y. (1998). Culture and language: The case of cultural dimensions and personal pronoun use. Journal of Cross-Cultural Psychology, 29, 461-486. Kashima, E. S., & Kashima, Y. (2005). Erratum to Kashima and Kashima (1998) and reiteration. Journal of Cross-Cultural Psychology, 36, 396-400. Koppel, M., Schler, J., and Zigdon, K. (2005), Determining an Author's Native Language by Mining a Text for Errors (short paper), Proceedings of KDD, Chicago IL, August 2005.
18
Koppel, M., Schler, J., Argamon, S., and Pennebaker, J. W. (2006). Effects of age and gender on blogging. Presented at AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs, Stanford, CA, March 2006. Lee, Chang H., Nam, K., & Pennebaker, J.W. (2004). Is writing as much phonological as speaking? Homophone usage across speaking and writing. Psychologia: An International Journal of Psychology in the Orient, 47, 1-9. Lepore, S. J., & Smyth, J. M. (2002). The Writing Cure: How Expressive Writing Promotes Health and Emotional Well-Being. Washington, DC: American Psychological Association. Li, J., Zheng, R., and Chen, H. (2006). From fingerprint to writeprint. Communications of the ACM 49:4 (Apr. 2006), pp. 76-82. Liehr, P., Mehl, M.R., Summers, L.C., & Pennebaker, J.W. (2004). Connecting with others in the midst of a stressful upheaval on September 11, 2001. Applied Nursing Research, 17, 2-9. Liehr, P., Takahashi, R., Nishimura, C., Frazier, L., Kuwajima, I. & Pennebaker, J.W. (2002). Embodied language: Comparison of the cardiac and stroke health experience for Japanese elders. Journal of Nursing Scholarship, 34, 27-32 Lyons, E. J., Mehl, M. R., & Pennebaker, J. W. (2006). Linguistic self-presentation in anorexia: Differences between pro-anorexia and recovering anorexia internet language use. Journal of Psychosomatic Research, 60, 253-256. Markus, H. R., & Kitayama, S. (1991). Culture and the self: Implications for cognition, emotion, and motivation. Psychological Review, 98, 224-253. McAdams, D. P. (2001). The psychology of life stories. Review of General Psychology, 5, 100122. Mehl, M. R., Pennebaker, J. W. (2003). The social dynamics of a cultural upheaval: Social interactions surrounding September 11, 2001. Psychological Science, 14, 579-85. Mehl, M.R., & Pennebaker, J.W. (2003). The sounds of social life: A psychometric analysis of students daily social environments and conversations. Journal of Personality and Social Psychololgy, 84, 857-870. Miller, G. A. (1995). The Science of Words. New York: Scientific American Library. Mitchell, T. (1999). Machine Learning. (New York: McGraw-Hill) Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying words: Predicting deception from linguistic style. Personality and Social Psychology Bulletin, 29, 665-675. Newman, M.L., Pennebaker, J.W., Berry, D.S., & Richards, J.M. (2003). Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin, 29, 665-675. Niederhoffer, K.G. & Pennebaker, J.W. (2002). Linguistic style matching in social interaction. Journal of Language and Social Psychology, 21, 337-360.
19
Nisbett, R. E. (2003). The Geography of Thought: How Asians and Westerners Think Differently. New York, NY: Free Press. Oberlander, J., & Gill, A. J. (2006). Language with character: A stratified corpus comparison of individual differences in e-mail communication. Discourse Processes, 42, 239-270. Peng, K., & Nisbett, R. E. (1999). Culture, dialectics, and reasoning about contradiction. American Psychologist, 54, 741-754. Pennebaker, J. W. (1997). Writing about emotional experiences as a therapeutic process. Psychological Science, 8, 162-166. Pennebaker, J. W. (2002). What our words can say about us: Towards a broader language psychology. Psychological Science Agenda, 15, 8-9. Pennebaker, J. W. (2003). The social, linguistic, and health consequences of emotional disclosure. In J. Suls and K.A. Wallston (Eds.), Social psychological foundations of health and illness (pp 288-313). Malden, MA: Blackwell Publishing. Pennebaker, J. W. & Campbell, R.S. (2000). The effects of writing about traumatic experience. Clinical Quarterly, 9, 17-21. Pennebaker, J. W. & Chung, C.K. (2005). Tracking the social dynamics of responses to terrorism: Language, behavior, and the Internet. In S. Wessely and V.N. Krasnov (Eds.), Psychological responses to the new terrorism: A NATO-Russia dialogue. Amsterdam: ISO Press. Pennebaker, J. W. & Graybeal, A. (2001). Patterns of natural language use: Disclosure, personality, and social integration. Current Directions in Psychological Science, 10, 90-93. Pennebaker, J. W. & Lee, Chang H. (2002). The power of words in social, clinical, and personality psychology. The Korean Journal of Thinking and Problem Solving, 12, 3543. Pennebaker, J. W., & Chung, C.K. (in press). Computerized text analysis of Al-Qaeda transcripts. In K. Krippendorff & M. Bock (Eds.), A content analysis reader. Thousand Oaks, CA: Sage. Pennebaker, J. W., & Francis, M.E. (1996). Cognitive, emotional, and language processes in disclosure. Cognition and Emotion, 10, 601-626. Pennebaker, J. W., Francis ME, Booth RJ. (2001). Linguistic Inquiry and Word Count (LIWC): LIWC2001. Mahwah: Lawrence Erlbaum Associates. Pennebaker, J. W., Groom, C. J., Loew, D., & Dabbs, J. M. (2004). Testosterone as a social inhibitor: Two case studies of the effect of testosterone treatment on language. Journal of Abnormal Psychology, 113, 172-175. Pennebaker, J. W., & Ireland, M. (in press). Analyzing words to understand literature. In W. van Peer and J. Auracher (Eds.), New beginnings for the study of literature. Cambridge, UK: Cambridge Scholars Publishing. Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality & Social Psychology, 77, 1296-1312.
20
Pennebaker, J. W., & Lay, T. C. (2002). Language use and personality during crises: Analyses of Mayor Rudolph Giulianis press conferences. Journal of Research in Personality, 36, 271-82. Pennebaker, J. W., Mayne, T., & Francis, M. E. (1997). Linguistic predictors of adaptive bereavement. Journal of Personality and Social Psychology, 72, 863-871. Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. (2003). Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology, 54, 547-577. Pennebaker, J. W., Mehl, M.R., & Niederhoffer, K. (2003). Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology, 54, 547-577. Pennebaker, J. W., & Stone, L.D. (2003). Words of wisdom: Language use over the lifespan. Journal of Personality and Social Psychology, 85, 291-301. Pennebaker, J. W., & Stone, L.D. (2004). Translating traumatic experiences into language: Implications for child abuse and long-term health. In L.J. Koenig, L.S. Doll, A. OLeary, and W. Pequegnat (Eds.), From child sexual abuse to adult sexual risk: Trauma, revictimization, and intervention (pp 201-216). Washington, DC: American Psychological Association Pennebaker, J. W., Slatcher, R.B., & Chung, C.K. (2005). Linguistic markers of psychological state through media interviews: John Kerry and John Edwards in 2004, Al Gore in 2000. Analysis of Social and Public Policy, 5, 1-9. Ramirez-Esparza, N., & Pennebaker, J.W. (2006). Do good stories produce good health? Exploring words, language, and culture. Narrative Inquiry, 16, 211-219. Ramirez-Esparza, N., Pennebaker, J.W., Garcia, F.A., & Suria, R. (2007). La psychologa del uso de las palabras: Un programa de comutadora que analiza textos en Espaol (The psychology of word use: A computer program that analyzes texts in Spanish). Revista Mexicana de Psicologa, 24, 85-99. Rochon, E., & Saffran, E. M., Berndt, R. S., & Schwartz, M. F. (2000). Quantitative analysis of aphasic sentence production: Further development and new data. Brain and Language, 72, 193-218. Rosenberg, S.D. & Tucker, G.J. (1978). Verbal behavior and schizophrenia: The semantic dimension. Archives of General Psychiatry, 36, 1331-1337. Rude, S. S., Gortner, E. M., & Pennebaker, J. W. (2004). Language use of depressed and depression-vulnerable college students. Cognition & Emotion, 18, 1121-1133. Scherwitz, L., Berton, K., & Leventhal, H. (1978). Type A behavior, self-involvement, and cardiovascular response. Psychosomatic Medicine, 40, 593-609. Schiller, R., Tellegen, A., & Evens, J. (1995). An idiogrpahic and nomothetic study of personality description. In J. N. Butcher and C. D. Spielberger (Eds.), Advances in personality assessment (Vol. 10, pp. 1-23). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Schultheiss, O. C., & Brunstein, J. C. (2001). Assessment of implicit motives with a research version of the TAT: Picture profiles, gender differences, and relations to other
21
personality measures. Journal of Personality Assessment, 77, Special issue: More data on the current Rorschach controversy, 71-86. Scott, M. (1996). WordSmith. New York, NY: Oxford University Press. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1). Semin, G. R., Rubini, M., & Fiedler, K. (1995). The answer is in the question: The effect of verb causality on the locus of explanation. Personality & Social Psychology Bulletin, 21, 834-841. Slatcher, R.B. & Pennebaker, J.W. (2006). How do I love thee? Let me count the words: The social effects of expressive writing. Psychological Science, 17, 660-664. Slatcher, R.B., Chung, C.K., Pennebaker, J.W., & Stone, L.D. (2007). Winning words: Individual differences in linguistic style among U.S. presidential and vice presidential candidates. Journal of Research in Personality, 41, 63-75. Slobin, D. (1996). From thought and language to thinking for speaking. From J. J. Gumperz and S. J. Levinson (Eds.), Rethinking linguistic relativity (pp. 70-96). New York, NY: Cambridge University Press. Stiles, W.B. (1992). Describing talk: A taxonomy of verbal response modes. Newbury Park, CA: Sage. Stirman, S. W., & Pennebaker, J. W. (2001). Word use in the poetry of suicidal and non-suicidal poets. Psychosomatic Medicine, 63, 517-522. Stone, L. D., & Pennebaker, J. W. (2002). Trauma in real time: Talking and avoiding online conversations about the death of Princess Diana. Basic & Applied Social Psychology, 24, 172-182. Stone, L.D. & Pennebaker, J.W. (2002). Trauma in real time: Talking and avoiding online conversations about the death of Princess Diana. Basic and Applied Social Psychology, 24, 172-182. Stone, P. J., Dunphy, D. C., & Smith, M. S. (1966). The General Inquirer: A Computer Approach to Content Analysis. Cambridge, MA: MIT Press. Tannen, D. (1993). Framing in discourse. London: Oxford University Press. Van Petten, C., & Kutas, M. (1991). Influences of semantic and syntactic context on open- and closed-class words. Memory & Cognition, 19, 95-112. Vyrynen, J.J., & Honkela, T. (2005). Comparison of independent component analysis and singular value decomposition in word context analysis. In T. Honkela, V. Knnen, M. Pll, and O. Simula (Eds.), Proceedings of AKRR'05, International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (pp. 135-140). Espoo, Finland. Watson, D., Clark, L.A., & Tellegen, A. (1988). Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology, 54, 1063-1070.
22
Weber-Fox, & Neville (2001). Sensitive periods differentiate processing of open- and closedclass words: An event-related brain potential study of bilinguals. Journal of Speech, Language, and Hearing Research, 44, 1338-1353. Weintraub, W. (1989). Verbal behavior in everyday life. NY: Springer. Winter, D. G., & McClelland, D. C. (1978). Thematic analysis: An empirically derived measure of the effects of liberal arts education. Journal of Educational Psychology, 70, 8-16. Wolf, M., Horn, A., Mehl, M., Haug, S., Pennebaker, J. W., & Kordy, H. (2008, in press). Computergesttzte quantitative Textanalyse: quivalenz und Robustheit der deutschen Version des Linguistic Inquiry and Word Count [Computer-aided quantitative text analysis: Equivalence and robustness of the German adaption of the Linguistic Inquiry and Word Count]. Diagnostica. Zijlstra, H., van Meerveld, T., van Middendorp, H., Pennebaker, J.W., & Geenen R. (2004). De Nederlandse versie van de Linguistic Inquiry and Word Count (LIWC), een gecomputeriseerd tekstanalyseprogramma [Dutch version of the Linguistic Inquiry and Word Count (LIWC), a computerized text analysis program]. Gedrag & Gezondheid, 32, 273-283. Portionsoftheresearchreportedinthismanualweremadepossiblebygrantsfromthe NationalInstitutesofHealth(MH52391).Wearedeeplyindebtedtoanumberofpeople whohelpedwithdifferentphasesofthisproject:LauraKing,CherylHughes,BeckySmith, KathyDavison,JanieKeller,MarySueHayward,BrookeNovales,AnneVano,MichaelCrow, SallyDickerson,andBernardRim.