Computational Linguistics: The Future of Programming?

Running Head: NATURAL LANGUAGE PROCESSING: The Future of Programming?
1

Natural Language Processing: The Future of Programming?
Yousef Fareed
IT-103-008
February 20, 2014
George Mason University

"By placing this statement on my webpage, I certify that I have read and understand the
GMU Honor Code on http://oai.gmu.edu/the-mason-honor-code/. I am fully aware of the
following sections of the Honor Code: Extent of the Honor Code, Responsibility of the
Student and Penalty. In addition, I have received permission from the copyright holder for
any copyrighted material that is displayed on my site. This includes quoting extensive
amounts of text, any material copied directly from a web page and graphics/pictures that are
copyrighted. This project or subject material has not been used in another class by me or any
other student. Finally, I certify that this site is not for commercial purposes, which is a
violation of the George Mason Responsible Use of Computing (RUC) Policy posted on
http://copyright.gmu.edu/?page_id=301 web site."
2
NATURAL LANGUAGE PROCESSING: The Future of Programming?
Introduction
Natural language processing is a relatively new field in computer science that incorporates
the interdisciplinary fields of computer science, computational linguistics, and artificial
intelligence to produce software that can aid in machine communication. The purpose of this
paper is to discuss and analyze the ramifications of the benefits, legal, social, ethical and security
issues that are set forth by this development as well as explain the technology that is required to
implement this development. The benefits of this development include augmented machine
translation, increased machine-human dialogue, and perfecting search engine software. This
paper will analyze social issues such as globalization and legal issues concerning legal
information retrieval. It will also analyze ethical issues such as linguistic discrimination and
security issues that deal with information security systems.
Background and Technology
Natural language processing began in the mid-20
th
century when information technology
played a large role in global competition. During the Cold War, efforts were made to translate
Russian scientific journals to English before experts discovered how difficult it was to process
human language into machinery (Jacobs, 2001). One would need expert knowledge of different
facets of the selected languages such as morphology and syntax to produce any accurate
rendition or translation. The origins of natural language processing as a subfield of computer
science began with the publication of Computing Machinery and Intelligence by Alan Turing. In
his work, he formulates the idea that would later become the so-called Turing Test. The Turing
Test was devised to test artificial intelligence in machinery to determine whether human-level
intelligence and communication were attained. This test is central to the implementation of
3
artificial intelligence as well as technology of natural language processing (Addley & University,
2008).
Benefits
Natural language processing software entails augmented natural language translation
accuracy. Prolonged research into the field of natural language processing will eventually yield
the creation of a machine that can pass the Turing test and therefore be able to learn, interpret,
and translate languages just as any native speaker can. However, it will be long before any
machine can reach that level of communicative complexity and therefore we must focus on the
actual code development process to produce a viable translator. A memoir of the development of
SLC (Simulated Linguistic Computer) is presented in the publication, Machine Translation: Just
a Question of Finding the Right Programming Language? Anthony Brown, the author of this
publication, discussed how his claim of using the 704 program, a more advanced super-computer
development at the time, would quicken the re-coding of Russian flowcharts to be translated to
English because of its higher programming language (Anthony, 2000, p. 131). This software is
evidence of constant improvements in translation software over decades. Translation that once
consisted of flowcharts later on evolved into Googles hallmarked translation software of today.
Perhaps in the future, translation software may yield a new method combining corpus use as well
as Googles text parsing.
Increased human-machine dialogue has sparked the imagination of many through science
fiction movies and novels that depict robots with human intellectual qualities, capable of
understanding, interpreting, and relaying information to others. There is even a humorous
mention of Readings professor of cybernetics, Kevin Warwick, who claims that by 2045, all
humanity would be enslaved by computers (Addley & University, 2008). Though the former
4
statement is jesting at best, it is still of interest to workers in the artificial intelligence world what
it would be like to experience computers with human-like intelligence so convincing that it is
indistinguishable from human interaction. Natural language processing technology can be a great
asset to this development, as it will offer insight into how a computers processing system must
be developed in order to carry out a coherent conversation, taking note of sociological as well as
psychological implications when communicating with a fellow human being.
Perfecting search engine software is crucial to increasing the accuracy and breadth of
information supplied to an internet user of any native language. With Google Inc. being the
world leader in browser and search engine programs, it is not surprising that it has taken up the
task of formulating a working and viable search engine for languages with more complex writing
systems. East Asian languages such as Korean, Chinese, and Japanese, are faced with the
difficulty of an English web and so accommodations must be made so these speakers may
search literary documents using their native scripts. In this publication, Levander (2000) stated
that according to the Internet Council's State of the Internet Report 2000, non-English speakers
account for nearly half of the Internet users world-wide. The number is expected to grow to more
than one billion users by 2005, and more than 700 million users outside of North America.
Currently Chinese, Japanese and Korean users account for 5.2%, 7.2% and 3.6%, respectively, of
all Internet users. The software that Google developed to support East Asian languages was
more complex and required more search tools than the software developed for European
languages. Levander (2000) also mentions about how Google needed to come up with software
that would determine where one word ends and another begins. It relied on both in-house
researchers and outside help, turning to linguistic-software concern Teragram Corp. of Boston,
among others, for help. This accomplishment would not be possible without the deliberate input
5
of natural language into software in order to program a machine to make an accurate and
comprehensive search.
Social and Legal Issues
With the increased globalization of todays world, communication has become so inclusive
that one message will often contain terms from many other languages through the entrance of
foreign names, items, and terminologies that are associated with many different cultures. An
investigation done to track internal emails from a German company with its sister location in the
United States returned with many German inclusions in English emails (Ahmed, 2005). As
Ahmed (2005) has discussed, this multi-linguality phenomena that occurs frequently in todays
world places a great difficulty on the text-to-speech, Machine translation, and Information
Retrieval systems. Development of text-to-speech software can alleviate the burden from
computational linguists to have to parse through every word to devise an accurate rendition of a
given literal message. Some sectors, such as corporations from around the world, have enlisted
the help of automatic speech recognition and text-to-speech technology to improve customer
service experience. Other sectors, such as government organizations such as the IRS, have
started using ASR/text-to-speech for employee training with disabilities (Ahmed, 2005).
Legal Information retrieval is a legal issue not only because it is practical in a law setting as
well as aids in the persecution of an individual, but also because the legality of this practice can
be questioned. What occurs is that the machine uses a databases index, much like a books, to
perform quick searches for relatable legal material to be used in a specific court setting, e.g.
precedential cases. The issue at hand is that legal searches, unlike web searches, must account for
precision as well as recall because the information the machine is retrieving is not always sought
out by the court. Hogan, Bauer, & Brassil (2010) have cited the following:
6
Legal review represents an information need known as exhaustive search (Rosenfeld and
Morville 2002) in which both recall and precision must be simultaneously optimized.
Furthermore, approaches emphasizing ranked results, such as those developed for web search,
make for difcult integration with legal review tasks that require a binary assessment. Such
integrations often focus on identifying the appropriate cutoff value in order to optimize results,
with very deep cutoff values being required in the legal task in order to ensure high recall (Oard
et al. 2009) (Hogan, Bauer, & Brassil, 2010). Considering this need, it is imperative on litigators
to devise software that not only blends precision and recall but also seeks to develop a binary
assessment so that a court can come to an unbiased agreement.
Ethical Issues
Ethical issues that affect the global internet community that also plays a role in natural
language processing. The aforementioned accomplishments of search engine software can also
yield biases about which languages should be processed and which should not. It is important to
note that English is the most widely used language around the world; specifically in
programming and therefore, minority languages are often at the mercy of dominant languages.
Google Inc. has already launched search tools in 10 European languages so one would expect to
see more languages from other areas around the world implemented into the Google search
engine system (Levander, 2000). This issue can lead to linguistic discrimination if left
unchecked. With Romanian being the 10
th
most widely spoken language in Europe, it is hardly a
significant measure considering other languages that have not been represented. Languages such
as Telugu of southern India, constituting over 76 million speakers, are poorly implemented into
Googles system evident by their poor translation renditions and lack of online texts. This is
striking because Telugu is renowned in India for its extensive classical literature that dates back
7
centuries. If natural language processing is to be fair in all respects, it must accommodate for not
only dominant languages such as English, Mandarin, & Telugu, but also minority languages such
as Native American languages.
Security Concerns
Information security is a growing field of cybersecurity that utilizes natural language
processing to improve security measures notably issues that involve human aspects. One high-
risk facet of information vulnerability is password authentication that has been an issue since
they were first put into place. As they are the first line of defense against security breaches, they
hold a critical place in information security. As Topkara (2007) states in her dissertation:
The conicting requirements of more entropy for increased security, and less entropy
for increased memorability have created the password usability problem [2]. Passwords are
usually seen by employees as burdens before doing real work. They are almost always disabled if
company policy allows or otherwise weakened against the policy. It has been repeatedly shown
that users preference for easily memorizable passwords creates major vulnerabilities in systems
[3-5] in the form of the two step attack process of (i) gaining access as the user with a weak
password, followed by (ii) escalation of privilege (e.g., through buffer overflow [6]) or other
forms of system subversion.
This explanation outlines Topkaras (2007) work in generating automatic password
mnemonics by creating memorable sentences that encode random passwords. A password
mnemonic generator outputs a sentence that encodes a random password, therefore making it
easier to remember. The only issue with this is that these mnemonic generators may enable
hackers to decipher patterns they see after exposing them to a few of the softwares passwords. If
a potential hacker learns of the software, it is not long before an algorithm is implemented to
8
pinpoint where the strings of the password are located. This has been done in the past through
brute-force cracking attack.
Conclusion
As we have discussed in this paper, many developments are far-reaching in the field of
natural language processing. The ability of computers to use higher programming languages to
translate text from one language to another will greatly influence how translation is done in the
future by not only speeding the process, but also automating it. Some experts believe the Turing
Test will be broken in the very near future by increased natural language processing being used
in mapping artificial intelligence. In addition, search engine software may very well become a
line of command for a machine to follow, as code is today. Social needs regarding
accommodation for globalization are being met with increased attention, as is legal information
retrieval software. Linguistic discrimination may disappear in computational circles if languages
are given equal status on the web as well as in software. Lastly, security concerns regarding
password cracking are being dealt with using randomly generated passwords. Some critics are
skeptical of the potential that language software companies such as Google offer as
breakthroughs in information technology (Safran, 2013). However, as this paper states, natural
language processing is a software that must be implemented in contemporary large businesses if
it is to see a brighter future in information technology.

9
References
1. Jacobs, P. (2001). Natural language processing: A brief history for scientists and
skeptics. SunServer, 15(2), 1-1,19. Retrieved February 20, 2014 from ProQuest
http://search.proquest.com/docview/224862942?accountid=14541
This trade journal discusses some of the history of natural language processing,
focusing on the origins and the evolution of natural language processing throughout the
years. This source strengthens this paper because it highlights the successes of natural
language processing over the years.
2. Addley, E., & University, R. (2008, Oct 13). Lost for words: Computers fail the turing
thought test. The Guardian. Retrieved February 20, 2014 from ProQuest
This newspaper article tells the history of the Turing Test as well as explains its
application in natural language processing. This source aids the paper in that it shows
how efficiency of the software can be tested.
3. Ahmed, B. U. (2005). Detection of foreign words and names in written text. (Order No.
3172339, Pace University). ProQuest Dissertations and Theses, , 172-172 p. Retrieved
February 20, 2014 from ProQuest
http://search.proquest.com/docview/305390146?accountid=14541. (305390146).
This dissertation speaks about the problems with translation software due to the
mixing of languages in documentation. This source helps prove the paper because it
offers great evidence for globalization influencing natural language processing.
4. Brown, A. F. R. (2000). Machine translation: Just a question of finding the right
programming language? () Retrieved February 26, 2014 from
ProQuesthttp://search.proquest.com/docview/85542338?accountid=14541
This book narrates the memoirs of Brown as well as tells the story of his ideas on
higher programming languages being used for translation software. This source helps
prove the argument because it is a direct source as well as contains valid research in the
field of translation software.
5. By, M. L. (2000, Oct 18). Google launches asian language-search tools. Wall Street
Journal. Retrieved February 26, 2014 from
This newspaper article highlights the news of Google devising search engine
software for East Asian languages. This source helps prove the argument because it lists
statistics relating to Googles progress in this field.
6. Hogan, C., Bauer, R. S., & Brassil, D. (2010). Automation of legal sensemaking in e-
discovery. Artificial Intelligence and Law,18(4), 431-457. Retrieved February 26, 2014
from ProQuest doi:http://dx.doi.org/10.1007/s10506-010-9100-1
This scholarly journal speaks about the legal implications of information retrieval
as well as its implementation into natural language processing for automated use. This
source helps prove the paper because it utilizes a legal setting as well as standpoint to
reflect relationships between different fields.
7. Safran, N. (2013, October 08). 3 reasons natural language processing (not google glass) is
the future of search.Search Engine Watch. Retrieved February 26, 2014 from
http://searchenginewatch.com/article/2299178/3-Reasons-Natural-Language-Processing-
Not-Google-Glass-is-the-Future-of-Search
10
This news article is the point of view of a writer on the site that attempts to convey
his thoughts on the future of natural language processing as well as Google software.
This source helps to prove the paper because it verifies the claims that were made in the
introductory paragraph.

Computational Linguistics: The Future of Programming?

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computational Linguistics: The Future of Programming?

Uploaded by

Copyright:

Available Formats

Running Head: NATURAL LANGUAGE PROCESSING: The Future of Programming?

You might also like