Experiment No. 8 Lexical Diversity: 1 Objective

Uploaded by

Sreejith Cherikkallil

0% found this document useful (0 votes)

179 views3 pages

Computational llinguistics lab report

Original Title

Lexical Diversity

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Computational llinguistics lab report

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

179 views3 pages

Experiment No. 8 Lexical Diversity: 1 Objective

Uploaded by

Sreejith Cherikkallil

Computational llinguistics lab report

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 3

Search inside document

Experiment No.

8 Lexical Diversity
14 February 2013

Objective:

To create a vocabulary set for given text collection and analyse the diversity.

Requirements:

Python 2.7.3, module nltk

Theoretical Background:

Lexical diversity is a measure of vocabulary variation within a written text or a persons speech. Token is a collection of all the words in the text. A token type or vocaulary is a set of token, where each token appears exactly once. The ratio of token to token type gives the lexical diversity of the document.Type refers to all dierent types of words of a piece of text. For example, if a text has 100 words, but all of them are the same word, we say that it has only one type. If all of the 100 words are dierent from each other, we say that it has 100 types. Token, on the other hand, refers to all words of a piece of text. Therefore, a 100-word text has 100 tokens.

Algorithm and Datastructure design:

Input: A set of documents Output: Vocabulary set for the collection of documents 1

Datastrucutres: A list vocabulary Steps: Open nltk.book module and select a textle Print list(textle) Print len(textle) Print set(textle) Print len(set(textle)) Print oat(len(textle)) / len(set(textle))

Experimental setup:

The experiment is carried out by analyzing nltk.book which contains several texts from the Gutenburg corpus . The algorithm is implementd in Python. The following python/nltk tools are used for the implementation[?]: len :Return the length (the number of items) of an object. The argument may be a sequence (string, tuple or list) or a mapping (dictionary). sets: The sets module provides classes for constructing and manipulating unordered collections of unique elements. Common uses include membership testing, removing duplicates from a sequence, and computing standard math operations on sets such as intersection, union, dierence, and symmetric dierence. The method set(list) will eliminate the duplicate entries of the list. This property is used to create a vocabulary set from the tokens. list :Return a list whose items are the same and in the same order as iterables items. iterable may be either a sequence, a container that supports iteration, or an iterator object. The lexical richness of a given text is a count, on average, of how many times each word appears in a text. We get this by dividing the token count by the type count. 2

Observations/Results:
1. Since the size of token-type is at denomenator, a high value of token/type ratio indicates, a low diversity of the document. 2. The size of vocabulary will not grow as linearly as size of tokens, with size of text. 3. Type-token-ratio can act as an indicator of a persons vocabulary size 4. Type-token-ratio is also taken as an important indicator of an authors style.

List of major references

1. Magnus Lie Hetland,Beginning Python: from novice to professional, 2008. 2. S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python, OReilly Media Inc., 2009. 3. Introduction to NLTK ,Ivan V. Meza-Ruiz, Srinivasan C Janarthanam, Bonnie Webber and Chris Gorgolewski , 16 October 2009 4. http://nlpdotnet.com/SampleCode/ComputeTypeTokenRatio.aspx .

Semantic Modeling In Formal English
From Everand
Semantic Modeling In Formal English
Dr. Ir. Andries Van Renssen
No ratings yet
XSL Primer
From Everand
XSL Primer
Stephen Cote
No ratings yet
NLP-Neuro Linguistic Programming: What Is A Corpus?
Document3 pages
NLP-Neuro Linguistic Programming: What Is A Corpus?
yousef shaban
No ratings yet
Corpus Linguistics Part 1
Document30 pages
Corpus Linguistics Part 1
Amani Adam Dawood
No ratings yet
2 Text Operation
Document42 pages
2 Text Operation
Tensu Aweke
No ratings yet
Chapter Two
Document31 pages
Chapter Two
latigudata
No ratings yet
Ontology Based Text Categorization - Telugu Documents: Mrs.A.Kanaka Durga, Dr.A.Govardhan
Document4 pages
Ontology Based Text Categorization - Telugu Documents: Mrs.A.Kanaka Durga, Dr.A.Govardhan
Sagar Sagar
No ratings yet
Unit 1
Document4 pages
Unit 1
Shiv M
No ratings yet
Chapter 1: Boolean Retrieval
Document9 pages
Chapter 1: Boolean Retrieval
Amber Saxena
No ratings yet
Information Retrieval On Cranfield Dataset
Document15 pages
Information Retrieval On Cranfield Dataset
vanya
No ratings yet
Synonym or Similar Word Detection in Assignment Papers: Gayatri Behera
Document2 pages
Synonym or Similar Word Detection in Assignment Papers: Gayatri Behera
anil kasot
No ratings yet
Corpus Linguistics 1
Document48 pages
Corpus Linguistics 1
Abdul Moaiz
No ratings yet
Ans Key CIA 2 Set 1
Document9 pages
Ans Key CIA 2 Set 1
kyahogatera45
No ratings yet
Bag of Words
Document32 pages
Bag of Words
ravinyse
No ratings yet
T 2V: D R T: OP EC Istributed Epresentations of Opics
Document25 pages
T 2V: D R T: OP EC Istributed Epresentations of Opics
Zahra Ulinnuha
No ratings yet
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
Document28 pages
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
Jai Gaizin
No ratings yet
Tnready Blueprint g8 Ela
Document5 pages
Tnready Blueprint g8 Ela
api-282869532
No ratings yet
NLP Asgn2
Document7 pages
NLP Asgn2
[TE A-1] Chandan Singh
No ratings yet
IR Unit 2
Document54 pages
IR Unit 2
jaganbecs
No ratings yet
Lab1 IR
Document14 pages
Lab1 IR
Pac SaQii
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
Document42 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
Gnanasekaran
No ratings yet
Introduction To Natural Language Processing and NLTK
Document23 pages
Introduction To Natural Language Processing and NLTK
Nikhil Saini
No ratings yet
Chapter Five (ISR)
Document17 pages
Chapter Five (ISR)
Wudneh Aderaw
No ratings yet
EBUS622 - Week 5 - Lecture - Text Preparation
Document40 pages
EBUS622 - Week 5 - Lecture - Text Preparation
kulkarniakshay1402
No ratings yet
Systemic Functional Linguistics As A Model For Tex
Document16 pages
Systemic Functional Linguistics As A Model For Tex
Thiên Dii
No ratings yet
Approximating HIERARCHY Based Similarity For WORDNET Nominal SYNSETS Using Topic Signatures
Document8 pages
Approximating HIERARCHY Based Similarity For WORDNET Nominal SYNSETS Using Topic Signatures
tomor2
No ratings yet
NLP Unit II Notes
Document18 pages
NLP Unit II Notes
nsyadav8959
0% (1)
Object-Oriented Programming With Class Dictionaries: Karl J. Lieberherr, Northeastern University April 27, 2004
Document33 pages
Object-Oriented Programming With Class Dictionaries: Karl J. Lieberherr, Northeastern University April 27, 2004
sathiyanit
No ratings yet
Tokenization
Document14 pages
Tokenization
ASHU K
No ratings yet
Descriptive Morphological Analysis in Montage
Document56 pages
Descriptive Morphological Analysis in Montage
Feri Dwi Hartanto
No ratings yet
CSE442 Text
Document89 pages
CSE442 Text
sanskritiiiii.2002
No ratings yet
A Computer Approach To Content Analysis: Studies Using The General Inquirer System
Document16 pages
A Computer Approach To Content Analysis: Studies Using The General Inquirer System
Beenish Fatima
No ratings yet
A Free Word Dependency Parser in Prolog
Document8 pages
A Free Word Dependency Parser in Prolog
nmamdali
No ratings yet
IJALS - Volume 9 - Issue Proceedings of The First International Conference On Language Focus - Pages 71-82
Document12 pages
IJALS - Volume 9 - Issue Proceedings of The First International Conference On Language Focus - Pages 71-82
Rajpoot Bhatti
No ratings yet
Ass7 Write Up .Final
Document11 pages
Ass7 Write Up .Final
adagalepayale023
No ratings yet
Modeling Word Meaning: Distributional Semantics and The Corpus Quality-Quantity Trade-Off
Document16 pages
Modeling Word Meaning: Distributional Semantics and The Corpus Quality-Quantity Trade-Off
music2850
No ratings yet
NounPhrasEntrykoizumi
Document6 pages
NounPhrasEntrykoizumi
Maria Todorova
No ratings yet
Chapter-1 Introduction To NLP
Document12 pages
Chapter-1 Introduction To NLP
Sruja Koshti
No ratings yet
Tei Header
Document14 pages
Tei Header
aymynet
No ratings yet
(A) What Is Traditional Model of NLP?: Unit - 1
Document18 pages
(A) What Is Traditional Model of NLP?: Unit - 1
Sonu Kumar
No ratings yet
Demos 049
Document8 pages
Demos 049
music2850
No ratings yet
121 Paper
Document5 pages
121 Paper
acouillault
No ratings yet
Tuple Dict-Imp Qa1
Document1 page
Tuple Dict-Imp Qa1
MS Dillikumar
No ratings yet
Construction
Document10 pages
Construction
coco
No ratings yet
Computational Lexicography
Document3 pages
Computational Lexicography
Dovilė Landauskaitė
No ratings yet
Parsing and Parsing Techniques in Compiler Construction
Document12 pages
Parsing and Parsing Techniques in Compiler Construction
Franklin okolo
No ratings yet
Usage of Regular Expressions in NLP
Document7 pages
Usage of Regular Expressions in NLP
International Journal of Research in Engineering and Technology
No ratings yet
M.Suhaib Khalid PDF
Document10 pages
M.Suhaib Khalid PDF
Mohammad Suhaib
No ratings yet
Variables, Data Types and Keywords in Python
Document8 pages
Variables, Data Types and Keywords in Python
ahmedshifa59
No ratings yet
Introduction To Python Programming For Numerical Computation
Document17 pages
Introduction To Python Programming For Numerical Computation
Binayak Mahato
No ratings yet
Chapter #4: Query Languages
Document16 pages
Chapter #4: Query Languages
Maxamed Cabdi garaw
No ratings yet
XML Query Languages: Experiences and Exemplars: Editors
Document33 pages
XML Query Languages: Experiences and Exemplars: Editors
Richard Jones
No ratings yet
7.2 Latent
Document27 pages
7.2 Latent
Matrix Bot
No ratings yet
Reasoning-Based Adaptive Language Parsing
Document6 pages
Reasoning-Based Adaptive Language Parsing
Ataullah Patel
No ratings yet
Made By:-Bhawana Agarwal Cs Iiiyr
Document29 pages
Made By:-Bhawana Agarwal Cs Iiiyr
Bhawana Agarwal
No ratings yet
IRS-Class - Unit-3
Document95 pages
IRS-Class - Unit-3
pooja.d
No ratings yet
Automatic Analysis of Thematic Structure in Written English: Kwanghyun Park and Xiaofei Lu
Document21 pages
Automatic Analysis of Thematic Structure in Written English: Kwanghyun Park and Xiaofei Lu
Nur
No ratings yet
Grammar Lexicon and Natural Language Processing
Document4 pages
Grammar Lexicon and Natural Language Processing
thangdaotao
No ratings yet
250 Essential Chinese Characters Volume 2: Revised Edition (HSK Level 2)
From Everand
250 Essential Chinese Characters Volume 2: Revised Edition (HSK Level 2)
Philip Yungkin Lee
Rating: 1 out of 5 stars
1/5 (1)
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
Real Analysis Series and Sequences
Document3 pages
Real Analysis Series and Sequences
Kh Jl
No ratings yet
Ohpee
Document3 pages
Ohpee
Joseph Jung
No ratings yet
CHAPTER 1 MODULE For Students - Numerical Solutions
Document10 pages
CHAPTER 1 MODULE For Students - Numerical Solutions
Jewel Galvez Octaviano
No ratings yet
Loops practice questions
Document5 pages
Loops practice questions
Ayesha Khan
No ratings yet
Q1 Math 10
Document95 pages
Q1 Math 10
Marc Devera
No ratings yet
Math 138: Assignment 6
Document7 pages
Math 138: Assignment 6
KEN
No ratings yet
Logical Depth and Physical Complexity
Document33 pages
Logical Depth and Physical Complexity
Tiasa Mondol
No ratings yet
Sequences and Series in The AMC and AIME
Document18 pages
Sequences and Series in The AMC and AIME
José Maurício Freire
No ratings yet
Calculus Course
Document354 pages
Calculus Course
Mefisto El
No ratings yet
Workflow Table
Document2 pages
Workflow Table
gumphekar
No ratings yet
Detailed Lesson Plan Grade 10 - Mathematics
Document5 pages
Detailed Lesson Plan Grade 10 - Mathematics
kat castillo
No ratings yet
g10 DLL First Quarter
Document133 pages
g10 DLL First Quarter
Ferdinand Asuncion
No ratings yet
Quadratic Equations: Solutions to Common Problems
Document13 pages
Quadratic Equations: Solutions to Common Problems
Ayush Singh
100% (1)
GRADE 10 MATHEMATICS Full
Document132 pages
GRADE 10 MATHEMATICS Full
Aleck Franchesca Yap
No ratings yet
Math 10 Q1 Week 5
Document8 pages
Math 10 Q1 Week 5
G16Sadiwa, Jaiehza Eliana M.
No ratings yet
Pigeonhole Principle
Document0 pages
Pigeonhole Principle
Marco Collantes
No ratings yet
Lyons Joan Ed Artists Books A Critical Anthology and Sourcebook 1985 PDF
Document276 pages
Lyons Joan Ed Artists Books A Critical Anthology and Sourcebook 1985 PDF
Denise Bandeira
50% (2)
23mst124tma04 B
Document8 pages
23mst124tma04 B
Thomas Raffaëlly
No ratings yet
Recurrence Relation
Document13 pages
Recurrence Relation
Carlon Baird
No ratings yet
Lecture Notes On Fundamentals of Vector Spaces
Document30 pages
Lecture Notes On Fundamentals of Vector Spaces
chandrahas
No ratings yet
Sequence and Series-Jeemain - Guru PDF
Document16 pages
Sequence and Series-Jeemain - Guru PDF
Gourav
100% (1)
MA203 Real Analysis: Lecture Notes
Document69 pages
MA203 Real Analysis: Lecture Notes
TOM DAVIS
No ratings yet
Steps For MM Pricing Procedures
Document12 pages
Steps For MM Pricing Procedures
ashish sawant
No ratings yet
Functions Sequences, Sums, Countability: Zeph Grunschlag
Document61 pages
Functions Sequences, Sums, Countability: Zeph Grunschlag
Chandra Sekhar D
No ratings yet
Documents - Pub - Boardworks LTD 2004 1 of 27 A4 Sequences ks3 Mathematics
Document69 pages
Documents - Pub - Boardworks LTD 2004 1 of 27 A4 Sequences ks3 Mathematics
Yan En
No ratings yet
Arithmetic Sequence: Finding The Common Difference (D)
Document11 pages
Arithmetic Sequence: Finding The Common Difference (D)
Chris Tin
No ratings yet
Euler's Infinite Series Proof
Document19 pages
Euler's Infinite Series Proof
Lawrence Acob Eclarin
100% (1)
Math10 - Synthetic Divisions
Document20 pages
Math10 - Synthetic Divisions
wintaehv
No ratings yet
Analysis II: Topology and Differential Calculus of Several Variables (Lecture Notes) - Peter Philip
Document159 pages
Analysis II: Topology and Differential Calculus of Several Variables (Lecture Notes) - Peter Philip
vic1234059
No ratings yet