You are on page 1of 81

 

© rennie walker 2002, 2003, 2010

www.renniewalker.com

  2
 

TABLE OF CONTENTS

Introduction 3

A Model of Taxonomy and its 4 Functional Elements 5

A Model of Communication 15

A Model of Special Languages – Thing, Definition, Term 23

An OntoLexical Layers Model 30

Taking the Notion – An Introduction to the Set of Perspectives on Paradigms 37

February 2003 - A Brief History of Dictionaries: Meanings and Their Words 40

May 2003 - A Brief History of Newspapers: All the News that's Fit to Find 45

Review Essays - Introduction 51

A Practical Course in Terminology Processing by Juan C. Sager. Part 1: Why 55


Terminology?

A Practical Course in Terminology Processing by Juan C. Sager. Part 2: 62


Models Let us Devise Methodologies

About 75

Affirmation 81

  3
 

Introduction to Essays on Taxonomy, Terminology and the Approach


of Juan Sager

History of These Essays

In 2002 and 2003, after I had left Sageware Inc., an early thought leader in the field of
content categorization by software into rules-based taxonomies, I wrote a short series of
essays and published them on my then website semanthink.com. semanthink.com is
long gone. (Although there is a good archive of semanthink.com in the Wayback
Machine – but without the images, and more of that later.) Still to this day I occasionally
get asked about one or other of these essays, or about themes and points of interest
covered in them. So, I have decided to re-publish them “permanently”.

They were fun to write. I was working on exciting engagements and there was then, in
2002, much less guidance of a holistic theoretical nature which wove together all of what
I considered to be the relevant paradigms that affected how we should think about, thus
how we should approach analyzing, business problems of content description, modeling
of subject domains etc. No map … some map fragments … and some fragments that
most definitely should be part of the overall map. So, I wrote my own map! And so this
map became important, indispensible, to my work. My work in solving business
problems surfacing at various points in the content supply chain was theoretically based
in large part on the kinds of ideas contained in these essays (and also in some other
essays that just never made it to completion due to work pressures).

When I re-read them today, I still “like” them. And my current work as a consultant is
still to an extent predicated on these ideas, even though the information ecosystem today
is overlay upon overlay of applications and content types that did not exist back in
2002/2003. Their durability, to me, is because I did attempt to write at an abstract level,
at a level of first principles – with a strong view towards business analysis and project
design. However, they are old. They were never written, truly, with an eye and mind to
“publishing” – they were “just web content” to me, then. So … Caveat Emptor … Happy
Trails … and enjoy.

Editorial Changes

Many of the original essays had graphics and witty (at least to me) comix interspersed
through the content to make a point. Alas, the originals of these are lost to me, and as
mentioned, they are not available in the Wayback Machine archive. So these are
omitted, along with any references in the text to them.

On re-reading the essays and re-formatting the content, I was tempted to make some
minor changes. It’s a natural reaction in a way. But I resisted and have made no
editorial changes of any kind. However, I did do my best to re-find URLs referenced in
the footnotes where an URL had changed. Mostly I succeeded in this.

The Essays Themselves

The essays are of a couple of different types.

  4
 

I was then, and still am today, forceful about the need for us to have models that we can
use as the basis for business analysis and data gathering to fulfill business requirements.
So, for example, a series of these essays were models for conceptual understanding and
as the basis of analysis – e.g. the model of taxonomy (the functional elements of
taxonomy) that I wrote for myself, and the model of communication that I abstracted
from the work of the terminology thought leader Juan Sager.

Some others were part of a series that I called “Perspectives on Paradigms”. These were
short, high-level and strategic thought pieces about paradigms that were changing,
before my very eyes, in the world that I worked in. A couple of these are included
because the themes, which cover the history of the particular paradigms, e.g.
newspapers, dictionaries, are still intrinsically interesting. Some of the others don’t wear
so well over time since the paradigm change has been resolved – these are omitted.

There was also a series of review essays on key books that were informing my work. Only
one of these review essays was completed – a review of “A Practical Course in
Terminology Processing” by Juan Sager, which had a profound impact on how I
strategized, communicated to teams, wrote business requirements and approached
gathering data to solve business problems. Several other review essays were begun (and
nearly finished), but these are not included.

semanthink.com

The color scheme used here is the original semanthink.com color scheme. ☺ I’ve also
included the content item about my approach which was in the About directory of the
site – it’s old, but accurately reflects how I described myself back then. It is included for
completeness’ sake only.

rennie walker

clayton, california

june 2010

  5
 

A Model of Taxonomy and its 4 Functional Elements

Understanding how Taxonomies do what they Do

The primary purpose of a taxonomy is to represent a domain. It does this by defining all
the concepts of the domain (i.e. giving boundaries) and relating the concepts to each
other. The representation is always in service of a particular user community. In fact, the
representation is how this community "sees" this part of the world, and aligns with how
this community does their "work" in this part of the world. This term "represent" is key.
Just as any map is not the territory itself, so any taxonomy is not the domain itself. The
concept of taxonomy is not expressly or intrinsically linked to a purpose of "organizing"
documents (or "leading to" documents). This constraint on purpose is shared also by
ontologies. Whereas classification schemes developed within the library science world,
such as Dewey and UDC, are expressly purposed to organize information resources. 1

Of course, taxonomies are extremely powerful tools for organizing content. But that is
because they have been designed to model a user domain, not because they are designed
to capture content characteristics. Documents that include discussion of domain
concepts can be associated with nodes from the taxonomy model, and so using the
communication tool of taxonomy we can organize documents for a user community. But
in fact, what we are really doing is using the taxonomy as a specification of the concepts
that need to be recognized (by an application or through editorial workflow). And
successful subject recognition allows the recognized subjects to be arranged in the map
of the taxonomy that the user community requires. The fact that these discussions of
subjects are wrapped in documents is purely artifactual.

There is often a great deal of semantic variance around the concept of "taxonomy". We'll
just briefly touch on this here, since the foreground task is to isolate functionality. But
we'll note in passing, that while the word "taxonomy" is often used as a technical term,
the model of special languages tells us that terms have consensual definitions in
technical communities. And while Alice did say "... at least I mean what I say - that's the
same thing (as saying what I mean), you know" during her adventures in Wonderland,
she's both right and wrong. Poor Alice! She's right, in the fact that the model of special
languages sees terms and definitions as interchangeable. But she's wrong, immediately
she steps outside the context of special subjects and the special languages that are used
by communities to communicate their special subject interests.

How Taxonomies "Work": Their Functional Space and Functional


Elements

On the surface, taxonomies seem to be simple tools. Here is a general statement. The
better we understand how any tool is "supposed" to work, the better we can design them.
Taxonomies are no exception. The more we understand what taxonomies do, and how,
the better we can manage project teams to design, apply and maintain them.

There are two sets of parameters that we need to carry foreground, at all times, about
taxonomies. One set of parameters gives us information about the "functional space"
within which taxonomies operate. The other set of parameters is the set of individual
functional elements that do the "work", i.e. carry the information that makes the
taxonomy useful (and usable) to users. The functional space describes the set of

  6
 

parameters within which the taxonomy does its work. The set of functional elements
describes the set of parameters by which the taxonomy does its work.

Defining Taxonomy Usability

These two sets of parameters are not decoupled, either in design or in use. The functional
space is where the taxonomy operates as a communication tool. The individual
functional elements are the elements that actually carry the information that needs to
be communicated. Taxonomy usability, in a nutshell, is about measuring the information
that a taxonomy carries while it performs as a communication tool for a particular user
community.

Understanding Taxonomies: Their Functional Space

Taxonomies function on different levels and in different ways. These "levels" and "ways"
are the functional space within which the taxonomy does its work. In the early stages of
designing information access projects a number of important characteristics of
taxonomies at the level of being a tool are often overlooked completely. Every taxonomy
designed to be used within a total content management/information access environment
is a -

Cognitive tool
Communication tool
Culturally predicated tool
Model (and a way, or template, to model)
Special language variant
Terminology

Taxonomies are cognitive tools. They are cognition-facing. A taxonomy is a cognitive


tool in the sense that it speaks to the intrinsic component of human cognition in each of
us whose purpose is classificatory. A taxonomy offers one arrangement, plus a set of
definitions, of a conceptual domain that is useful to at least one user community. This
representation aligns with the result of the intrinsic human classification function, which
has in turn been socialized within that technical community. As designers, it is our task
to elicit the shared conceptual map of the community of practice and re-engineer it into a
taxonomy.

Taxonomies are communication tools. Within any model of communication there are
senders of communication and recipients, and the purpose of the sender is to change the
knowledge state of the recipient. Taxonomies play a variety of roles as a communication
tool. Taxonomies communicate what is related to what. They communicate the
definitions of the concepts. They communicate using the language of the community of
practice. Etc. Taxonomies play a role in users changing their state of knowledge when
they are seeking information in an information access environment such as an enterprise
intranet. Any model of communication should be able to explain the parameters of what
degrades effective communication. In a very general sense, this whole article is about
what can degrade taxonomic communication.

Taxonomies are also tools that reflect a "cultural" view of domains. Every user
community in every technical domain is a culture, whether it is capital markets,
insurance or manufacturing in-flight entertainment systems. This culture manifests itself

  7
 

in its desired attribution of domain scope and granularity, the selection of terminology to
label concepts, the definitions of the concepts (which can in turn, depending on the
definition, impact preferred relationships). And so on.

Taxonomies are models. They are also ways to model. To build a taxonomy is to follow
one way of building particular models that serve particular purposes. In our case we
want ways to model sets of concepts, so we can apply subject recognition of those
concepts to sets of documents, and so, finally, give knowledge workers access to
documents, based upon the known (i.e. recognized) subjects that they discuss.

Any taxonomy is a particular typed variant of a special language. Special languages


are technical communication tools. The lexicon of any special language substitutes for
the full definitions of the concepts of the particular domain and additionally carries
information about the organization of the discipline. Taxonomies are closely aligned with
the model of special languages. Taxonomies do this by formalizing the elements of any
special language that they include. If a taxonomy includes any elements of a special
language, then the element in the taxonomy carries (and has to carry) the same
definition as that special language. It has to, otherwise the community of users will have
two definitions for one concept. When a taxonomy includes elements of a special
language it also has to take and carry the same relationships as the terms that it takes.

Taxonomies are terminologies. Remove the definitions and the relational information
and we have a technical lexicon.

These six characteristics of the functional space are all integrated into designing the
technical elements of the taxonomy and their components.

Understanding Taxonomies: Their Key Functional Features

Technically, a taxonomy is an intersection of four different sets of data. These are the
four functional elements of any taxonomy -

Domain Scope
Concept Definitions
Concept Relationships
Concept Labels

It is these four functional elements that "do the work" of a taxonomy being a
communication tool. These four sets of data have to be collected in any taxonomy design
project for there to be any "taxonomy design".

Understanding Taxonomies: The Functional Feature of Domain Scope

The set of concepts that comprises the taxonomy is the scope of the taxonomy. Scope can
be thought of as having two aspects -

Extent (its boundaries)


Granularity

Both of these are equally important in designing taxonomies that are usable, and so
successful in business terms and in terms of project completion. They are like two

  8
 

dimensions of the conceptual space they aim to model - a dimension of extent and a
dimension of depth.

One part of the scope is the extent of the domain; it's "boundaries" in "conceptual space".
At a pragmatic level these boundaries are defined in a binary way; "is this concept in or
out?" These binary decisions are always in the service of user requirements. In the very
early stages of taxonomy design in particular, sets of these kinds of binary decisions get
queued and are dealt with in the methodology of the project. In the very early stages of
the project these kinds of decisions might be dealt with by methods of designated expert
review, which is less resource intensive than involving numbers of the user community.

The second part of the scope is to discover and designate the level of granularity of the
taxonomy; how 'discrete' are our concepts to be? We can formalize this design task as -
when does a child node become a leaf node, the end of the branch? Are these leaf nodes
granular enough? Or do they bundle "kinds of" that the users require to be
disaggregated?

These two aspects of taxonomy scope are orthogonal in terms of the workflow of the
information access design project, at least to a substantial extent. We can certainly define
the boundaries of the domain and move on, if project milestones or time management
issues are pivotal, and deal with issues of granularity later. But we cannot really settle on
the level of granularity and then move on. Because of the lingering question: "move on to
what exactly?" The extent of the domain is still undefined.

Both aspect of scope are always iterative in the design stages. And both can be, and
should be expected to be, iterative in the ongoing maintenance stages after the taxonomy
is implemented.

In its function as a cognitive tool, the taxonomy scope tells the user community what
things, and their kinds, are to be found. In its function as a communication tool, the
feature of scope communicates boundary, what is to be found and what is not, and at
what level of granularity. In its function as a cultural translation, the feature of scope is a
mapping of the user community's view of the domain or world. Scope is language-
independent, so it plays no role as a variety of special language, nor does scope indicate
terminology.

Understanding Taxonomies: The Functional Feature of Node


Relationships

Each concept within the taxonomy is one node in a set of relationships. There are two
kinds of relationships -

Subsumption
Siblingship

The subsumption relationship binds the parent and child together. The relationship
between siblings of the same parent, what makes each child a child and what
differentiates each child from the other siblings, is siblingship.

The subsumption relationship is one of "kind of". Any child is a "kind of" its parent. For
example, Pinot Noir is a kind of grape (it is also a kind of wine). And a grape is a kind of

  9
 

… And a Persian cat (like our pet cat Baby Cakes) is a kind of cat (Felis Catus). And Felis
Catus is a variety (or "kind of") of ...

This "kind of" relationship is one of the key definitional parameters of any taxonomy.
This categorically differentiates taxonomy from classification. If the representation is not
based on the "kind of" relationship being the sole subsumption relationship, then we do
not have a taxonomy. We have some other kind of representation, that can be both
highly required and utilitarian, but it is not a taxonomy.

"Kind of" is user defined. This can sound strange at first. In fact "kinds" are always
defined to meet user needs. (Because we are not in the business of seeking out abstract
Platonic ideals, but engineer and deliver useable cognitive maps.)

"Kind of", from the point of view of the child, is technically defined as the intersection of
what is the key characteristic of its parent together with what differentiates it from each
of its siblings. The relationship between a node and its siblings is one of differentiation,
or discrimination. Each of the siblings is a "kind of" of the parent, but each is
differentiated from the other by its position along some kind of dimension that is the
differentiating dimension. This basis for the differentiation is also user defined.

This is a Linnaean model of taxonomy design. The model was originally designed (by
Linnaeus) to type biology into classes.

Understanding Taxonomies: Node Relationships: an Example

"Kind of" typing has near universal applicability. Here is an example of typing events
into a Linnaean hierarchy to show both the utility of the underlying model and the fact
that the definition of "kind of" is user driven and not absolute.

Business news carried by newswires and business newspapers is a description of the


occurrence of events in the marketplace, and their impacts, implications and
consequences. We can create a taxonomy of these events. Let us say we have a user
community of asset managers and they want to taxonomize the universe of business
events into a number of parents. This community decides that two of these parent are the
set of events that impact/change a company's capital structure and the set of events that
are changes in a company's behavior at the operational level.

An IPO, a stock split and a bond issue all impact a company's capital structure. And a
facility closure or a round of layoffs impact a company at the operational level.

For this user community we can create a taxonomy that looks something like this -

Marketplace Events
...
Capital Structure Events
Debt Capital Events
...
Equity Capital Events
IPOs
Rights Issues
Secondary Offerings

  10
 

Stock Buy-Backs
Stock Splits
Stock Options Plans Introductions & Revisions
...
Operational Events
...
Facilities Openings and Closures
Facility Closures
Facility Openings
Labor Force Events
Layoffs and Furloughs
Short-Time Working Arrangements

For a different user community practicing a different special subject we might very well
arrange all the instances under the root of "Marketplace Events" quite differently. But we
would still make it be a taxonomy, where the subsumption relationship is "kind of". And
the set of relationships would reflect the special subject point of view (i.e. the cultural
view) of this other user community.

Understanding Taxonomies: Node Relationships: Usability Vectors

This "kind of" relationship instantiates a usability vector that is often critical to user
communities; the ability to browse from broad to narrower, or narrow to broader. This
parameter is a semantic parameter - the semantics of moving from parent to child, or
child to parent. If our users don't understand the semantics of this relationship, then
they are not going to know exactly what they are moving to. This raises a whole set of
issues around user expectations.

Without the formality of the "kind of" relationship there can be no meaningfulness to
traversing from broad to narrower, or narrow to broader. In fact, without the formality of
"kind of" we (and our users) cannot move from broad to narrower, and vice versa. Sure,
we can move from "something" to "something", but that is all we can say and know. The
semantics of traversing a classification scheme (as we define it here) are very different
from traversing the semantics of a Linnaean taxonomy. The semantics of traversing a
classification scheme are "fuzzier". Ultimately, it is the semantics of traversing
classification scheme relationships that allows us to differentiate classification from
taxonomy in a formal sense.

Because we are native taxonomists in our psychology we understand what broad to


narrower and narrow to broader means. We understand the cognitive linearity of it all.
The allowance to do this is highly utilitarian for knowledge workers in seek mode
working with information interfaces that represent conceptual spaces. But this
relationship between "broad" and "narrow", and the utility to traverse conceptual space
from broad to narrower, is completely dependent on the subsumption relationship that
binds parent to child being of the one formal "kind of" type. If the subsumption
relationship is not "kind of", then what does it mean to navigate the hierarchical tree,
what are the semantics of this?

"Kind of" relationships are intuitively understood because we are cognitive beings. We
group every thing in the world into kinds of things: trees, buildings, animals, movies
etc. This means that any taxonomy, as a thing, is usually easily learnable/teachable with

  11
 

reference to those who need to know about them to build them and maintain them. This
is because we all have the innate capacity to do this cognitive operation, and we do it
both continuously and continually (as well as universally).

Whilst all user communities, from our point of view as information access designers, are
cultures (and so their point of view is "cultural"), and whilst the results of cognitive
grouping into "kinds" is always cultural, the fact that we do group things into kinds is not
cultural. 2

In its function as a cognitive tool, the set of taxonomy node relationships tells the user
community how things, and their kinds, are grouped. In its function as a cognitive tool it
also communicates the underlying principles of exactly what parameters relate and
differentiate concepts. In its function as a communication tool the feature of
relationships kinds and their grouping principle is explicit, and removes any variance
that there might be about where to find what. For it to function as a communication tool
with a high clarity, it must communicate clearly the semantics of the relationships. In its
function as a cultural translation, the feature of relationships is a mapping of the user
community's view of the domain or world of their special subject. The feature of node
relationships is aligned with special languages. Special languages communicate
relationships between concepts. The feature of node relationships is also aligned with a
terminology point of view. Many special subjects have rules (sometimes formal) or
heuristics that guide types of lexical generation to create terms so that the terms carry,
i.e. indicate, the subsumption relationship.

Understanding Taxonomies: The Functional Feature of Definition

A step that is often overlooked in taxonomy design is the step of defining the nodes.
Without definitions the taxonomy is incomplete and unusable (because, how can we
consistently, through either editorial workflow or a categorization application, associate
documents with nodes if there is no definition of the node?).

Pretty much every taxonomy we build includes substantial proportions of the special
language that is used to communicate discourse about the domain. All the users of a
special language know, and agree on, the definitions of the concepts. Of course,
definitions evolve as domains evolve, but this gives rise to a maintenance issue and not a
pure-play ontological issue. Because all users of a special language know their own
language definitions it is our task as designers to follow their lead (and needs).

Definition of a concept directly impacts precision/recall measures from the user's point
of view. Mismatches between the definition of a concept in the taxonomy and the
definition of the same concept in the subject recognition application or workflow layer
directly impact the user. Definitions must be explicit. If they are implicit, then who
knows what?

At a user experience level, if the definition of the concept is broader than the definition
used by the user community then, with a perfectly working categorization platform or
workflow, the user community will experience the document set attached to the
taxonomy node as exhibiting low precision. The categorization topic (or workflow) will
execute correctly and consistently return a set of documents that covers a broader range
than the user requires. The converse also leads to a predictable outcome. If the definition
of the concept is narrower than the definition used by the community of practice, the

  12
 

users will experience the document set attached to the taxonomy node as exhibiting low
recall.

Definitions also impact user "satisfaction". If the user says "I know what I mean, and I
name what I know." Then the user in question is not going to want labels, which should
be referencing the same definition as the user community, including or excluding other
aspects of the definition. Just as precision/recall is the basis for a formal, non-emotional,
metric so this dilution/distillation is the basis for an emotional metric.

With subject recognition applications that are lexically based (as opposed to statistically
based) the definition is, in a very real sense, distinct from the code. In a semantics
network model of subject recognition there will be sets of synonyms combined together
plus vectors of weighting. In a rules-based approach there will be sets of lexical elements,
again usually synonyms, plus the rules. But this code has to be shaped by the definition
of the node - it does not shape the definition. Additionally, it we strip out everything
from this code except the lexical elements themselves then we are not left with
grammatical, syntactically useful or indeed semantically meaningful expressions that can
pass for definitions. We need the definition so that we can generate sets of appropriate
lexical elements and configure the rules of the application to give us the correct
semantics of combining different sets of lexical elements.

In its function as a cognitive tool, the functional feature of definition tells precisely why
any thing is this particular kind of thing. In its function as a communication tool, the
feature of definition makes this set of defining rules explicit (so open to discussion and
modification). In its function as a cultural translation, the feature of definition is a
mapping of the user community's view of each of the things of its domain or world of
focus. In its function as a special language variant, the definition will have a
corresponding special language term - if the definition exists within the special subject.

Understanding Taxonomies: The Functional Feature of the Node


Label

"Exactly so," said Alice.

"Then you should say what you mean," the March Hare went on.

"I do," Alice hastily replied; "at least - at least I mean what I say - that's the same thing,
you know."

"Not the same thing a bit!" said the Hatter. "You might just as well say that 'I see what I
eat' is the same thing as 'I eat what I see'!"

"You might just as well say," added the March Hare, "that 'I like what I get' is the same
thing as 'I get what I like!" 3

Poor Alice! Temporarily disoriented (and wouldn't us terminographers and taxonomy


designers also be) after following the White Rabbit, she is muxing ip communication
intent, labels and definitions all in one mad brew. Not her fault. But, confusing facets is
much worse, from a usability point of view, than mixing metaphors. Luckily for her (and
us) the Mad Hatter (and surprisingly also the March Hare) appears to have attended at

  13
 

least one first class taxonomy workshop, and gives all evidence of having read Juan
Sager's classic on terminology. He is certainly putting her straight!

The node labels are the names for the concepts. We directly incorporate two aspects of
terminology work in assigning labels to taxonomy nodes.

Firstly, taxonomies share with special languages the fact that the names substitute for
the definitions. In special languages terms substitute for definitions through the property
of special reference. In information access design work taxonomy designers rarely or
never use the technical term (which we could borrow from "classical" terminology work)
"special reference". Perhaps we could. Taxonomy node labels do carry the property of
special reference. The design issue does get complicated when we are taking substantial
proportions of the lexicon of special languages and incorporating that which is not. In
this case, as the taxonomy becomes socialized and accepted we have actually created
special reference. In a very real sense our taxonomy becomes an expression of a special
language that we have created.

Secondly, special languages and terminology work pay a great deal of attention to
creating lexical derivations and compounding. There are often many "rules" or heuristics
within a special subject that as a set constrains a typology of forms of derivation and
compounding that the practitioners of that special subject can use. This is particularly so
in many of the sciences.

Leveraging the ability to create lexical element types is partly formal in its application
and partly based upon a set of heuristics. Lexical derivation, lexical compounding and
shared lexical elements actively carry information that conveys the cognitive notion of
relationship type. Here is an example of using terminology selection to drive level
identification. These kinds of rules are common in any well-formed taxonomy and
indicate (and should indicate) that we are in the genus-specific-varietal model of
Linnaean type taxonomy -

Marketing Channels
Advertising
Newspaper Advertising
Business Newspaper Advertising
General Newspaper Advertising

These two core communication properties of definition and shared special reference
contribute to the success (or otherwise) of the taxonomy as a communication tool. These
two properties are user-centric and user-critical. We can sum all this up by asserting that
the taxonomy has to speak the same (special) language to the user community as it
speaks to itself, and has to be designed to do this.

In its function as a cognitive tool, the functional feature of node label operates
semantically. In its function as a communication tool, the feature of node label carries
those semantics clearly and precisely. In its function as a cultural translation, we want to
be talking the same special language as our user community. Where we can, we can use
their rules of lexical derivation and compounding. Node labels align with the model of
special languages and terminology theory. Node labels either are terms, or become terms
because they have no choice.

  14
 

Notes and References

1. For example, a particular "treatment" of a topic is implicit in the world of documents.


The Universal Decimal Classification includes such document forms as bibliographies,
dictionaries, "general reference works" and serial publications. Whereas the concepts of
any domain are independent of the notion of being communicated in any documentary
form. For a brief introduction to the UDC see the summary classification at the UDC
Consortium website.

2. For a high-level exposition of the cultural dimensions of categorizing things and topics
see -

Women, Fire and Dangerous Things: what categories reveal about the mind / George
Lakoff. - University of Chicago Press, 1990.

and

The Geography of Thought: how Asians and Westerners think differently … and why /
Richard Nisbett. - Free Press, 2003. Richard Nisbett is the Theodore M. Newcomb
Distinguished University Professor at the University of Michigan. His personal website
gives more context to his work.

For an implicit alignment of anthropology with technically-based communities see -


English Special Languages: principles and practice in science and technology / Juan
Sager et al. Oscar Brandstetter Verlag, 1980.

3. Both the Alice books are available online through Project Gutenburg. This quote
comes from Chapter VII A Mad Tea Party of "Alice's Adventures in Wonderland"

Saying what is meant and the meaning of what is said are congruent in a special
languages environment. (Remember, terms are substitutes for definitions.) We never
want the presenting layers of knowledge organization to break this congruency.

  15
 

A Model of Communication

A Focus on Communication

The semanthink point of view is that the enterprise intranet is both a holistic
information access/delivery environment and a holistic communication tool. What
makes the intranet function well as a communication tool are the ways it represents and
organizes concepts and the ways it successfully recognizes what subjects documents
discuss.

Ultimately, information finding happens because of good communication.

Information environments, such as enterprise intranets, are supposed to be


communication environments. In fact, we would say that the environment is the
communication tool. Intranets are communication tools. Communication tools need
communication expertise to build them. As a communication tool, each intranet is built
up from a number of different functional elements, each a tool in its own right.

A taxonomy, for instance, is a communication tool. What does it communicate? It


communicates a representation of a domain. How does it communicate this? Any
taxonomy is a composed of four separate functional elements. These four functional
elements together communicate the representation. Integrating any user community's
special language (i.e. its terminology) into the taxonomies they need is a core
requirement of taxonomy design. You can read more in the semanthink Model of
Taxonomy essay in this set of essays.

Similarly, a content categorization application is a communication tool. Although it may


not seem like that to the knowledge worker end user. Partly this is because the end
results of its categorization functionality are usually delivered through intermediary
tools such as search engines or browsable classification schemes. What do categorization
applications communicate? They communicate the fact that a certain document
discusses particular subjects. How do they do this? They recognize meanings, not
words or strings. The core functional elements of lexically-based categorization
platforms are subject recognition objects consisting of words and phrases.

How Individuals in Business Communities Communicate

Every community of interest is technical; baseball fans, americana music lovers,


insurance salesforce personnel, research biologists, anthropologists, investment bankers
specializing in M&A advisory services are all technical communities and all technical
cultures. Every technical community speaks in, and uses, a "special language". Because
every business subject is "special". In fact, peer communities invent or generate their
own special language. This development is largely "organic", but scientific communities
can have formal rules and formal standardization procedures that constrain (in a
positive way) the development of particular special languages. Special languages are built
around sets of terminology. But they are "more" than the lexicon of the terminology.

  16
 

The Business Requirement: Why Do We Want a Model like This?

This model of the communication dynamic is abstracted from Juan Sager's classic of
terminology work 1. You can read some different aspects about it, and its wider context,
in the semanthink Review Essay on Sager's book.

The model sets up the transaction, of information, in a context of purposes, and


motivations, using terminology as the medium. These purposes and motivations revolve
around states of knowledge that can be changed in a number of different ways. As
information access designers our knowledge organization schemes need to reflect this
typology of knowledge state changes.

The model is "formal", but science as a method is "formal". And so is business, in the
sense that there are constraints imposed in pursuit of profitability. Formal is good!
Formality gives every parameter its place and weight for the particular task at hand.

As information access designers, we need a model that tells us useful information about
the "act" or event of communication. What kinds of information do we need to know
about the event? We are lacking a purpose for why people want to "know" or "find", and
why people want to communicate or tell. In terms of users "wanting to know", they
obviously have tasks to perform. But, there is a meta level above the level of particular
tasks. Similarly with the motivation to communicate.

Basically, what is the purpose of communication? We need to have a way to understand


that users have states of knowledge. These knowledge states vary over time, vary
according to information "taken in" and processed. If knowledge states vary between
peers, then they certainly vary between any individual user and the total information
access environment as one gestalt. If users are in states of knowing that they want to
change, one way or the other (but usually not to get more ignorant, but more informed!),
we need to know what parameters support that transformation and what inhibit or
destroy it. So, We could usefully use a total model of communication that explicitly -

makes a change of a personal knowledge state foreground and central


defines a common typology of the ways personal knowledge states are
changed
exposes the requirements that make any information transaction effective,
from the recipient's point of view
factors in the self-evident truth that assumptions and pre-suppositions about
information recipients exist, from the communicator, and that these should be
dealt with

The Model: Communicator, Recipient and Knowledge States

This is how Sager describes his model.

"In a model of specialist communication we assume the existence of at least two


specialists in the same discipline, who are jointly involved in a particular situation where
the sender (speaker or writer) is motivated to transmit a linguistic message which
concerns the topic of his choice and which he expects a recipient (reader or listener) to

  17
 

receive. We assume that the sender's motivation arises from a need or desire to affect in
some way the current state of the knowledge of the recipient." [page 99]

So, we have actors, a sender and a recipient. We have motivations, a sender's


motivation that is to inform, and a recipient's current non-optimal knowledge state
about an aspect of the domain. Sager calls the sender's motivation the "intention … to
transmit information which will have an effect on the current knowledge configuration of
the intended recipient". [page 99]

Now we have the notion of a "knowledge configuration" or state of knowledge. This gives
us three communication parameters so far: actors, intentions, knowledge states.

The knowledge state of the communication recipient can be changed in different ways.
Here are some types of knowledge configuration change (there are others, but these ones
serve well to give the flavor of it all). The sender's communication can -

augment the recipient's knowledge


confirm the recipient's knowledge
modify the recipient's knowledge

The Four Parameters of the Model

We now have four parameters in play: actors, intentions, knowledge states, and ways the
knowledge states can be changed. With the idealized parties in place, and the parameters
in play, the communication event then happens. The "parts" of the model are now made
to "work" by the participants.

"Basing himself on presuppositions about the recipient's knowledge and assumptions


about his expectations, the sender decides his own intentions, and then selects items
from his own store of knowledge, chooses a suitable language, encodes the items into a
text and transmits the entire message toward the intended recipient ... Perfect
communication can be said to have occurred when the recipient's state of knowledge
after the reception of the text corresponds exactly to the sender's intention in originating
the message." [page 100]

Enterprise intranets can be modeled as exactly conforming to all these elements of this
quotation. There is a difference between augmentation (learning "more"), confirmation
(knowing that what you know is what there is) and modification ("changing").
Sometimes these knowledge state changes can be related to specific document types -
new regulations supersede old, as an instance of necessarily required modification, or,
today's newsfeed will be different from yesterday's (hopefully), and so augmentation
begins again (as if it would ever stop!). Senders, and in an enterprise intranet
environment this includes both authors, obviously, and those who design the knowledge
representations and navigations, have intent. This second may not be as obvious as the
clear role we can ascribe to authors, but intention is built into representation and
navigation layers, whether we are aware of it or not.

We might not want to use the label "perfect communication" to describe what we would
consider to be optimal communication outcomes, because the word "perfect" carries
bothersome connotations, but we do aim for intranet users to get what they need for task

  18
 

resolution.

Intention, Knowledge Selection and Choice of Language

The communication event has happened. But our model is not complete to our
satisfaction. We still want to understand what makes the difference between "good"
communication and "bad" communication. Sager tells us that: "The achievement of
successful communication is fundamentally dependent on the three choices the sender
has to make in formulating his message". [page 102]

These choices are -

intention
selection of knowledge
choice of language

Intention

Intention can, at times, be clear from the type of document. A checklist is usually used to
confirm knowledge. Instructions are intended to augment. User manuals can augment,
confirm or modify.

More commonly, though, intention is expressed through terminology choices and


choices from general vocabulary. Here senders have to be careful to align their choice of
terms with what they intend the impact to be on the recipient's knowledge state. For
instance, an undefined or vague term will not augment a knowledge worker's knowledge
state. We can sum all this up in Sager's words.

"The sender must choose an intention that is commensurate with the recipient's
expectation. For the communication to succeed, the recipient must capture the sender's
intention either from the message or from the situation, and his interpretation of the
intention must be accurate." [page 102]

Knowledge States

There are a number of important parameters around the sender's selection of knowledge.
For instance, the sender must either have some kind of prior knowledge about a
recipient’s current state of knowledge or make correct presuppositions. If the intent is to
augment or modify the recipient's knowledge state, then, obviously, the sender needs to
know more about the topic than the recipient does. Otherwise, no transfer is going to
happen.

  19
 

Language Selection

The use of accepted standardized terms makes the communication economic, precise
and appropriate, or not. In the area of choice of language, the sender must choose an
appropriate special subject language, or general language.

Taking the Model Into Project Work

In technical communication there are many variations on the theme of modeling the
dynamics between those who want to send messages and the intended recipients, and
what kinds of outcomes, or end states, arise. We like the communication model used in
the terminology processing work of Sager. It is simple, formal, and speaks to what the
heart of technical communication is, namely the purpose being to change someone's
state of knowledge of defined concepts. Information access systems exist to rebalance
unbalanced user knowledge states. Knowledge workers operate in a constant flux of
knowledge states. This flux repeats itself continually - users seek, find, use. We explicitly
design these kinds of systems, at the semantic and ontological levels, to ensure this
rebalancing occurs effectively.

Terms and general vocabulary operate differently from each other in a model of
communication. As information access designers we want to use terminology when
possible, general vocabulary when we have to. Our focus is on designing around the
pitfalls that general vocabulary usage creates in enterprise intranet environments.

We make the model work for us at the user's granular level of making lexical choices at
each of the intermediate decision-making points that are faced. Not only do we know
that we use a tool called a special language (because our users do) but we now have
enough information to apply the tools of special languages and terminologies. If the
choices to be made are in the special language they are used to using, then choices will be
easier (and faster). Their special language will convey the augmentation, confirmation
and modification.

At the level of the link to the individual document we want as much as is possible that
our users are able to tell if any document is going to augment, confirm or modify their
current state of knowledge. This augmentation, confirmation or modification is not just
needed at the level of links to individual documents. When taxonomies, and more
commonly, classification schemes are browsable they guide users to sets of documents
discussing the same subject. In this way users choose where in the domain space they
want to augment, confirm or modify. Keeping up to date is just another, shorthand, way
of augmenting, confirming or modifying ones knowledge state.

The model lets us think about enterprise intranets, or any information access
environment, in a different way. Because the user seems to always be doing all the
"work", of searching, navigating, browsing and so on, we can forget that we need to think
about the environment doing much of the work. We can also reframe the environment as
being "active". It is a sender or communicator. To be sure, it's only a sender when the
user wants it to be (but maybe that's the best kind of sender!). When the environment is
a sender it had better follow all the good practices that all good communicators practice.
This is the "how" of communicating. These good practices include differentiating
between general language and special languages, understanding that terms are the

  20
 

objects and that a special language is the "wrapper", that intent counts, and that the
assumptions that the sender makes are in a mirrored relationship to the expectations
that recipients have and that both sets of assumptions and expectations have to be
worked with. And all this is lexical and ontological work.

We use this model in conjunction with the models of special languages and
concept/definition/term. Together they form a powerful triad. We use the OntoLexical
Layers Model for both strategic and project management purposes, but we really put this
triad to work at the project design and activity definitions levels. We use all of them to
keep us focused "on message" and to design projects that deliver.

Which is all good. But the real purpose of these models, as all models, is to make us think
before we practice.

Re-Purposing the Model: the User as Recipient

In Sager's model the communication is direct. In some ways his model is a sender-
centric model. It focuses on the intent, choices and actions of the sender. Working with
designing and re-designing intranets we derive much insight from pivoting the model to
be recipient-centric, because intranet users carry many qualities of Sager's recipients. We
can focus on the users as information seekers - active recipients rather than active
senders. But the model still helps point us towards what we should do as we work to
provide information access to documents for users. Intranet users require all that Sager's
recipients require for successful information transactions to take place.

The situational factor of the "inequality of knowledge" dynamic interests us. The sender
is expected to have a greater knowledge of the subject than the recipient. The document
collection holds the "answer", the user knows that the "answer" is out there to a greater
or lesser degree of precision. But the document set is not the sender. All the presentation
tools of navigation, coupled with the semantic and ontological back-end, together are the
sender. And the navigation the means to approach and retrieve it.

Re-Purposing the Model: The Intranet as Sender

Another aspect of the communication dynamic interests us. In Sager's model the
communication event is a one-to-one, direct, personal communication. But from the
point of view of the individual intranet user the intranet communication process can
appear as an instance of a many-to-one, indirect, impersonal process. And the "many"
carries multiple connotations: many documents, a variety of navigation, multiple ways to
search etc. The "lucky" recipient in Sager's model is given one communications
deliverable, one document (or speech). This kind of communication was always push,
until the internet/intranet era. Now users spend much time (and effort) in trying to
"pull" information from out of the complexity. Users actively want, and require, to be
communicated to. The information access system is the sender. It could usefully conform
to all the constraints that Sager's senders conform to.

It is this indirectness of the intranet environment, coupled with the scale of publishing,
which causes and compounds some of the problems of "hard-to-locate" content. There is

  21
 

certainly intention aplenty in an intranet environment. Authors, navigation designers,


taxonomy engineers are all intending that best-fit documents be communicated to users.
It just depends how explicit this intention is, and how crafted. We try to integrate all
these parameters of communication. We integrate by working with concept identification
and concept naming. We build this intermediate layer to speak the language of the user.
Regardless of the language of the author. And in doing so we work with the parameters
that Sager's sender has to work with -

intention (we know a great deal about our users and their requirements)
knowledge (concept identification and definition is our key deliverable)
language (terminologies and vocabularies, special languages and general
languages are our tools)

We should always remember that we are designing information access layers to impact
the user's current knowledge state that the user herself wants to change: to augment,
confirm or modify.

Terminology, and lexical normalization against terminology, clarify this whole process of
communication. The information transaction breaks when there is a mismatch in either
the sender's or recipient's lexical choices or knowledge. In an intranet, the mismatch is
between the lexicon of the user and the system. Not only is the overall model similar, in
ways that we can work with effectively in our information access design, it is also
identical with the outcomes that we would like to occur in intranet information
transactions.

What Degrades Effective Communication?

Last, but definitely not least, we are rarely allowed to forget that successful transmission
of the message can fail. The type of failure that concerns us as information access
designers is the instance of incompatibility between sender's and receiver's lexical and
knowledge structures. The chosen language must be appropriate. Lexical elements carry
ontological and semantic information. There are responsibilities on the sender, which
accrue to (or are inherited by) the holistic intranet in our use of the model.

"The sender must choose a language and sublanguage which he assumes the recipient to
have command of; the recipient must be able to recognize and understand the linguistic
forms chosen in order to analyze the message." [page 104]

Sager is clear how failure of communication can occur.

"The linguistic forms to be used in a message must therefore be planned at every level in
such a way that the greatest degree of clarity can be achieved by means of systematic and
transparent designations of concepts and by the avoidance of ambiguity at every level of
expression." [page 104]

And again.

"Accurate transmission can be impeded by incompatibility of the sender's or recipient's

  22
 

lexical and knowledge structure ... the analytical use of language, i.e. the application of
inference rules in the process of comprehension, works only if designations are regularly
patterned and if both interlocutors know the rules of designation. The linguistic forms to
be used in a message must therefore be planned at every level in such a way that the
greatest degree of clarity can be achieved by means of systematic and transparent
designations of concepts and by the avoidance of ambiguity at every level of expression."
[page 104]

Notes and References

1. A Practical Course in Terminology Processing by Juan C. Sager. 1990, John Benjamins


B.V.

At time of writing you can buy through the John Benjamin site, which normally has stock
to hand and is weeks quicker in fulfillment than Amazon.

  23
 

A Model of Special Languages – Thing, Definition, Term

The Business Requirement: Why Do We Want a Model like This?

Effective taxonomy use, in enterprise intranet environments, is one link in a chain of


communication. Effective taxonomy use is also part of a larger process, but the
metaphor of "chain" immediately gives us a context of a sequence of dependencies.
Obviously, what goes "before" is then critical. What goes "before" as we set in motion a
"Taxonomy First" approach to information access?

An understanding, acknowledgement, and re-engineering of the special languages that


your communities of interest use, to communicate peer-to-peer about their shared
special subjects, is one of these "before" parameters.

Why? It is the semanthink point of view that enterprise intranets are communication
tools. To understand communication in enterprise intranet environments holistically we
have to understand each of the parameters that together make up the process. Which, of
course, is the baseline purpose in devising and using models. One of the key elements of
communication is the medium of communication. Since we are focused on
designing information environments that deliver ontological and semantic information,
we need first to understand how communities convey, to their peers, these two kinds of
information. We need to understand this medium first before we can move on to
understanding two different perspectives. One of these perspectives is that, if we
understand how ontological and semantic information is mediated, then, we will begin to
know where to look for the lexical elements that we want to build into our information
access environments. The other perspective is that, if we understand how ontological and
semantic information is conveyed, we can build into our communication tools, such as
taxonomies, a re-engineered version of the special language in question, to convey the
very same ontology and semantics. In doing both of these tasks successfully we will
mediate between members of a community and the documents they need in the moment
for the task at hand, rather than disintermediate them.

The answer to the implied question above is that that members of a community that
share a practice of a special subject, communicate by utilizing a "special language".

The Environment that Mediates, or Disintermediates, Resources and


Users

Today's business documents are written for specialized communication. Every


community of practice is a technical environment from a lexical point of view;
investment banking, catering, human resources, logistics, project management are all
technical environments. Today's business documents use terminology to convey facts
and analyses to communities that practice the same special subject. This mean that
today's business documents are essentially written in numerous special languages.

It's important to remember that the concept of "special language" is a model. In fact, it is
a working model. It is a model to be used when doing so seems to work for the project at
hand. It's also important to remember that users in technical communities are highly
unlikely to be aware that they are using a special language, for two reasons. Obviously

  24
 

this is because this is only a model, and not necessarily "true". Additionally, being self-
reflective about what makes one good at doing is very different from being good at what
one does.

Special languages are really part of Juan Sager's larger model of communication 1. We've
extracted the concept from there and wrapped it in its own model. This makes the overall
set of models easier to both teach and apply (and think about). We discuss some aspects
of Sager's work in the Review Essay of his key book.

It's hard to overestimate the importance of the concept of a special language. How do
peers in a technical environment communicate? They communicate through the medium
of a special language. The special language of any community of interest is both the
lexical wrapper for the special concepts of that particular domain and the set of related
definitions of these concepts. This is non-dynamic, in the sense that the metamodel of a
special language - its concept of definition, term, and concept itself - remains the same. A
special language is a tool, not an event. In the act or event of communication there is a
dynamic interplay of actors and parameters. It is within this dynamic that special
languages play their role as a communication tool. We will come to this shortly.

Defining a Special Language

We would say that all business communities of interest are cultural, in the widest sense
of the word, and so technical. And we would constrain "cultural" to characterize the
combination of shared subject of interest, way of conceptualizing it and way of
communicating about it. Sager comes to the same conclusion, but says this in his own
way. In short, the language used in any business, scientific or technical community is a
special subject language. Sager uses the expression "special languages" to indicate that
communication occurs in a "technical" environment. This is how he defines a special
language -

"Special languages have been defined as semi-autonomous, complex, semiotic systems


based on and derived from general language; their effective use is restricted to people
who have received a special education and who use these languages for communication
with their professional peers and associates in the same or related fields of knowledge."
[page 105]

All of these parameters characterize corporate intranet environments. Communities that


work together receive a "special education", whether it is highly formal or so informal
that it passes unnoticed as a phenomenon. Special languages are semi-autonomous.
Large proportions of the lexical elements, the "rules" for compounding phrases, and the
meanings that lexical elements convey, are different from the general vocabulary we all
use at the periphery of our special subject and outside our special subject. Special
languages are complex to outsiders. Special languages are semiotic - the item of
terminology in any special language "substitutes" completely for the definition of the
concept, regardless of the complexity of the concept (to insiders or outsiders). And
"effective use" only comes from an alignment of knowing what you are talking about
together with knowing how to talk about it.

"The lexicon of a special subject language reflects the organizational characteristics of


the discipline by tending to provide as many lexical units as there are concepts

  25
 

conventionally established in the subspace and by restricting the reference of each such
lexical unit to a well-defined region. Besides containing a large number of items which
are endowed with the property of special reference the lexicon of a special language also
contains items of general reference which do not usually seem to be specific to any
discipline or disciplines and whose referential properties are uniformly vague or
generalized. The items which are characterized by special reference within a discipline
are the 'terms' of that discipline, and collectively they form its 'terminology'; those which
function in general reference over a variety of sublanguages are simply called 'words',
and their totality the 'vocabulary'." [page 19]

The lexicon carries an ontology, or a representation of the domain. Polysemy is highly


and explicitly constrained - one concept to one lexical unit is a highly efficient way of
communicating.

Special languages differ from general languages in that they use terms in addition to
words. Terms as linguistic expressions, and hence special languages, carry two core
properties of communication in general- the property of definition and the property of
shared special reference. These two communication properties, of course, carry over into
information access environments.

"In special communication terms are considered substitute labels for definitions because
only a full and precise definition is the proper linguistic representation of a concept."
[page 109]

"Only if both interlocutors in a speech act know the special reference of a term, and, by
implication, that they are using terms rather than words, can special communication
succeed." [page 105]

"Special reference" refers to the structure of concepts in any special subject, as opposed
to general knowledge. Special subjects have a need for delineating the relationships
between concepts more strictly, so with much less fuzziness or flexibility, than in general
knowledge. But this delineation is one of difference in degree only. This is how Sager
puts it.

"Within a subject field, some or all of the included dimensions may assume increased
importance, with a greater need for distinction between a larger number of concepts
along a given axis: at the same time, the necessity to avoid overlap between concepts, i.e.
intersecting regions, will tend to reduce the degree of flexibility admissible in the
delimitation of the bounds of any given concept. There is thus a difference of degree
between the intradisciplinary structure of concepts in the bounded subspace of a special
subject or discipline and the less well-defined, less "disciplined" structure of "general
knowledge". This does not mean that general knowledge cannot contain well-defined
facts; but only that disciplines have a greater need for more rigorous constraints on the
overall delineation of items of knowledge. The referential function at the extremes of this
distinction is classified as "special reference" and "general reference" respectively." [page
19]

How Special Languages Communicate

"In special communication terms and standardized terms make a critical contribution to

  26
 

achieving complete and effective communication. This they do by making the choice of
language, knowledge and intention more systematic and hence easier. In order to
establish criteria for evaluating the effectiveness of communication in special languages
we can postulate three objectives or properties:" [page 105] -

Economy
Precision
Appropriateness

From our point of view of project design these "three objectives or properties" are a way
for us to model the use of lexical choices, and to model how each type of choice
contributes to successful or failed communication.

The Lexical Expression of Economy

Economy is really of two dimensions -

conciseness
co-ordination of content and intent

In term formation there is economy in simply juxtaposing lexical elements to create


terms rather than using neologisms (which require traversing a learning curve).

The Lexical Expression of Precision

Precision in working with terminology (as opposed to its technical sense of a metric used
in information retrieval) is a measure of the accuracy with which knowledge and intent
are conveyed by a set of lexical elements. Precision works with two parameters: precision
of reference and precision of syntactic relationships between referents.

The Lexical Expression of Appropriateness

Appropriateness is a measure of the effectiveness of the intention as it is expressed and


understood in the message. Economy and precision are in a dynamic relationship.

The Functional Role of Terms in Communication

"Terms can, of course, only be used as such if the user already possesses the

  27
 

configuration of knowledge which determines the role of the term in a structured system.
The limiting case of this restriction is the requirement that a new term be learned
contemporaneously with new knowledge, e.g. through text books; a term acquired
without awareness of the conventional configuration of knowledge to which it relates is
communicatively useless." [page 20]

"Within any given language, a wide range of phonological, grammatical and lexical
variation is available, but, within the range of possible variations, the social norm
operates to determine criteria for the selection of codes, whose phonological,
grammatical and lexical properties may be functions of the situation in which
communication takes place. In general, diversification at the level of phonology and
grammar is most evident in regional and social variation and is therefore of marginal
interest for terminology. Variation on the lexical level is most characteristic of special
languages, the linguistic subsystem selected by an individual whose discourse is to be
centered on a particular subject field." [page 18]

Of course, we don't need enterprise knowledge workers to explicitly know (or even
suspect) that they are constantly using special languages when they communicate with
peers or customers. Or when they use an intranet. But those of us who design for
knowledge workers do need to know this. We also need to understand that when peers in
a community of interest communicate they "know" that they are using terms. Another
way to approach this point is to think of it as a process of default values. Technical peers
"default" to using particular lexical elements (terms) whenever they are in technical
communication. This default takes precedence over natural language usage. This default
is communication using special languages.

The Business Requirement: Why Do We Want a Model like This?

We needed to know what the actual tool is that peers in a community of interest use to
communicate. Now that we do know, we see that it is a language tool called a special
language. This tool of a special language combines two different types of lexical
elements, "terms" and "general vocabulary". The terms of the special language carry two
particular functional properties. Terms are substitutes, or a shorthand, for conceptual
definitions. Terms carry information about the conceptual or ontological structure of the
particular domain, including "special reference", which is information about the
precision of delimitation between "adjacent" concepts.

The Model of Special Languages tells us the communication tool that our users use. This
in turn tells us that the information access environment that we are designing should
also use special languages, equally effectively as our users do. This means that we will
use terms where terms are required and general vocabulary when it is called for. And
that we will know how to decide which situation is which. Every type of
presentation that the user interacts with will be an opportunity to use a special language,
or not - navigation schemes, browsable classifications and taxonomies, search results
rendering and so on.

Now that we know the core ways in which this tool functions, it is not a large leap of
intuition, or pragmatic project management sense, to see where we are headed, in
working with lexical information. This model is linked with our other models in a
synergistic way.

  28
 

The model of term/concept/definition tells us that unless we have these three inter-
related functional elements in perfect alignment for every concept that the user
community requires then we are going to build into the environment unwanted
outcomes. From an information retrieval point of view this will be issues with
precision and recall. From a usability point of view this will be all kinds of user
behaviors that usually indicate frustration, lostness and so on. From a business point of
view there are going to be efficiency consequences. These will, as always, be hard to
measure.

The Model of Communication tells us both about intent of communicators, and states of
knowledge that need rebalancing. We want to understand the total information access
environment as a communicator. Basically, "how" it communicates to the user. Once we
understand this we can begin to harvest the implications of how the information access
environment works. Once we can isolate these implications clearly, and since the
communication medium is words as representations of concepts, we can then explicitly
design our lexical layers (and the workflows that they depend upon).

These three models taken together give us a model that -

distinguishes terminology from general vocabulary


distinguishes a special language from a general language
defines how a word becomes a term
places the definition of concepts (named by terms) at the interface of the
communication transaction

When we add the fourth, the OntoLexical Layers Model, we can see where words,
terminology, and representations of domains function, occur and how they should
function. Taken together (and used together) we will have at our fingertips a map of four
models that explain to us the where, how and why of optimally using lexical elements in
an information access environment such as an enterprise intranet.

Bringing the terminology dimension to our ontological and knowledge engineering work
is one of the bases of our information access design. Particularly, we want to open up the
black box of the process of terminology and vocabulary choices at a granular level.
Deciding that a word is an item of terminology is an ultra-granular decision-making
process. And this decision-making process is reprised thousands of times in designing
for information access. So, we need a model to support the intent of our decision-making
processes, and, in truth, to keep us focused (and "on message") on the importance of
selecting the optimal lexical elements, every time, that are to be communication tools.
And lastly, if we have a model that makes sense, then, we can teach (or socialize) the skill
that the model describes.

Are there Exceptions?

Are there exceptions to special languages being used. The answer is "no". If you want to
use a different model, then you can model the total information access environment
without using the concept of special languages. But you may find that you will need to

  29
 

accommodate certain ideas that the concept of special languages gives you.

Notes and References

1. A Practical Course in Terminology Processing by Juan C. Sager. 1990, John Benjamins


B.V.

At time of writing you can buy through the John Benjamin site, which normally has stock
to hand and is weeks quicker in fulfillment than Amazon.

  30
 

The OntoLexical Layers Model

Semantics and Ontology: the Core Communication Problem

People, knowledge workers included, are native ontologists. We all classify our worlds in
highly skillful and nuanced ways. There are associated costs when this native ability is
translated into a human language. People, knowledge workers included, use the
semantically rich tool of human languages to communicate definitions, precision and
differentiation, and how concepts relate each to the other. We pay some costs for this
semantic richness.

Many of these costs are a result of polysemy. In the world of numbers "247" always
"means" 247 (by and large). In the world of words, lexical objects such as "promote",
"coupon" and "board" don't unambiguously declare, in their isolation, where we are and
what we are about. Fortunately (although it could not be otherwise) we rarely use words
in such strict isolation. There are always contexts that give us our ontological place and
our task at hand.

Meanings and conceptual relationships are what we work with in information access
environments. Returning to the everpresent background issue of "information overload"
as an angle that never goes away, we don't have "too much" semantic and ontological
richness. Actually, we have too little of that which produces semantic and ontological
richness. Too little of this, means too much of that - that overload stuff, that is. Too little
of effective knowledge organization and effective subject recognition.

How can we process and organize this ontological and semantic richness? We have two
types of tools. To work with ontological richness and complexity we use knowledge
organization tools. A taxonomy is an instance of a knowledge organization tool. To work
with semantic richness and complexity we use subject recognition tools. People are
excellent subject recognition tools. To work at scale we can use a categorization
application from one of numerous vendors.

Of course, there is much "more" to an optimal information access environment that just
knowledge organization and subject recognition. Choice of technology applications does
count. As does their conceptual/semantic implementation. Training, the transfer of
knowhow, is key. Workflow processes have to be defined and managed. Maintenance
never goes away. Because language lives, and so constantly develops, and because
subject domains change their scope and sets of relationships, maintenance is key also.
And, finally, project design makes for success, as opposed to some halfway house
between failure and the need to keep a low profile.

What Models will Help Us?

It's the semanthink thesis that, when working with document sets or corpora, at scale,
two of the pivotal functions to "get right" are knowledge organization and subject
recognition. In a massive document environment, failure to do this creates a non-
negotiable bottleneck. We can define "massive" as that amount of documents that will
always increase in number, day by day, while an inordinately large group of people are
processing them manually! Sounds like entropic nightmare with a sly attitude.

  31
 

Words are everywhere in intranet environments. Our general vocabulary, and a sub-class
of this, the set of technical terms used to communicate about any special subject, act as
communication media. We want to use these terminologies to connect users to
documents in information access environments. Since lexical elements are the
intermediary between users and documents, we need a set of models to help us
understand how words operate as carriers of both semantic and ontological information.

And then there are the actors - authors, knowledge workers, taxonomy designers, and so
on. Each of those has particular skills, needs and responsibilities.

The interaction of documents, words and people needs an overall model that gives us
each their "place" in the overall scheme of things. The OntoLexical Layers Model shows
us "where" words and terms "are", performing different communication functions within
different functional tools.

The OntoLexical Layers Model

Our task as information access designers is to build systems that deliver meaning to the
user. "Delivering meaning" means that the user finds what is required and expected, with
little or no "noise". It is the noise that would be irrelevant or unexpected.

What are the layers that lie "behind" the user interface layer? Let's borrow (or, re-use in
knowledge management terms) a story from the semanthink Review Essay on Sager's
book.

"Once (and still), upon a time, the user sits in front of the screen. The user knows the
task at hand. There is a chance that one of the links currently showing on the screen will
lead to a document that resolves this task. The user is focused on two or three links in
particular. Actually, the user is focused on the words contained in these links. This small
moment is a moment of decision; to click through, or not."

In this little story if the user could look behind the pixels of the screen, an
informationscape something like this outlined below would be seen. These layers each
play an individual function in connecting users to content. The layers are sometimes
process and sometimes application.

Author Layer

This is where information life begins.

The role of terminology in the Author Layer: Authors know what they are writing
about. They know the special languages of their audience(s).

Content Layer

The Content Layer is the universe of content, the repositories of the intranet, the

  32
 

newsfeeds and their processing etc. The intranet document set, or corpus, is a container
of (lexically described) concepts. Documents are wrappers for information on concepts.
Documents are also of different kinds that approach the discussion of subjects in
different ways. The Content Layer is also a lexicon of (as yet) undifferentiated and
processed vocabulary. This lexicon contains terminology (such as names of things),
potential terminology (with choices of variance of preferred term) and lexical elements
that are expressions of concepts that will require a term to label them. It is within the
content layer that what the user seeks, resides - it may be a number of documents, a
single document or a part of a document.

The role of terminology in the Content Layer: Terminology exists in this layer,
but it is un-differentiated from general vocabulary. We are at a pre-processing stage with
regards to terminology compilation.

Subject Recognition Layer

Subject recognition is the process of recognizing the lexical elements (words and
phrases) in texts that are used to represent (and discuss) any particular concept.

There are different ways to characterize the small number of different and competing
technologies used within categorization platforms to recognize the occurrence of
particular subjects in documents. For our purposes here we can simply divide them into
two families.

One set of technologies explicitly programs human semantics and syntax into the
categorization application. An example of this approach is semantic networks. In a
semantic network, words and phrases are grouped into sets of synonyms. Sets of
synonyms are associated together to provide the elements required to build (and so
match in the texts to be categorized) complex, compound concepts that require more
than one synonym set or semantic type (noun, verb etc). We can characterize these kinds
of approaches as knowledgebase-based approaches. The product model is to build a
knowledgebase of meanings and their words, and to program this knowledgebase into
the application.

When people "do" subject recognition against documents, they do it lexically only. It's
important to note that people are also subject recognition experts - ask any cataloguer!

The other set of subject recognition technologies, from our terminology-based


information access point of view, do not program human semantics and syntax into
applications. Instead they rely on building mathematical or statistical models that
characterize the content of documents. Examples of these technologies include Bayesian
probabilistic methods, Latent Semantic Indexing and other vector approaches. This
approach does not treat words as lexical elements but as strings of characters, with no
semantic meaning or syntactic purpose. Complex sets of relationships between character
strings are computed for documents, sets of documents and corpora. Models are derived
from the products of these computations that characterize the "conceptual signature" of
any document - what it is, or might be, about. These document classifications have to be
validated or accepted by content managers or knowledge workers to ascribe "meaning".
This set of approaches is terminology-free. Character strings are not accounted for
semantically or syntactically, and so cannot be terms. Concepts and their definitions

  33
 

exist only insofar as knowledge workers pragmatically accept or reject classifications


based on the underlying mathematical or statistical models that have been generated.

While it is true to say this classification process is terminology-free, there are important
requirements for terminology in the overall information access design that incorporates
statistical modeling methods. Sets of documents that are classified as discussing the
same concept require labels for their set. Sets of classified documents require to be
related to other sets of documents, in one meaningful way or another, to create a
taxonomy or classification scheme. These labels of the taxonomy or classification will
require to show or indicate the nature of these hierarchical relationships.

The Subject Recognition Layer is shown "deeper" in the model than the Metadata
Scheme Layer. While the metadata scheme contains the elements that contain the
instances of, for example, the labels of a domain taxonomy, the documents in the
Content Layer are not identified with their "metadata signature" until the subject
recognition processing step takes place. So in this sense the Subject Recognition Layer is
"deeper", in terms of process, than the Metadata Scheme Layer.

The role of terminology in the Subject Recognition Layer: If the underlying


technology of the subject recognition application is not lexically-based, but involves
building statistical or mathematical models, then terminology as a class of lexical
elements is not a parameter in this environment. If the underlying technology is
lexically-based, such as approaches founded on semantic networks or rules-based topics,
then both terminology and terminology variance can be re-engineered into the subject
recognition knowledgebase.

Subject recognition is a process. But when you have created subject recognition
executables, based on any categorization technology, and load them into a categorization
application, these executables are part of the platform layer. The deliverable of subject
recognition is sets of meta-tags associated with individual documents. The subject
recognition layer is required to recognize the concepts needed by the user communities
when they are discussed in particular documents. It is not required to recognize "all" the
concepts in the document set. Where do these concepts that require recognition come
from?

Metadata Scheme Layer

The metadata scheme contains all the kinds of descriptions to be applied to documents
to support both user requirements for information access and content management
requirements for maintenance and management reporting information. Taxonomies,
ontologies, topic maps and other Knowledge Organization schemes belong in the
metadata scheme, as elements of the scheme. Designing the metadata scheme is a
process. The metadata scheme itself is a platform layer.

The role of terminology in the Metadata Scheme Layer: Any controlled


vocabularies in the metadata scheme are terminology, and require design, management
and maintenance. Uncontrolled vocabularies, such as author names or search engine
input misspellings, still need to be managed by rules and require variance to be
normalized.

  34
 

Knowledge Organization Layer

The process of engineering a knowledge organization scheme includes how we identify


concepts required by the user communities. Note that we say "identify" here, whilst it is
the function of the subject recognition process to "recognize" the concept so identified.
The Knowledge Organization Layer is where the Subject Recognition Layer "obtains" the
concepts that need to be recognized. Within the knowledge organization scheme the
identified concepts are arranged in relational-hierarchical relationships to each other.
We arrange these identified concepts in knowledge organization schemes such as
ontologies and taxonomies. Ontologies, taxonomies and other knowledge organization
systems are integrated into the enterprise metadata schema and become part of it. There,
they are part of the platform layer.

The Knowledge Organization Layer represents the users' world. It does not, and should
not, represent the total abstract world of the collective of concepts lexically described in
the document set. This is why the Knowledge Organization Layer and the subject
recognition process have to be tightly coupled. Apart from identification (and so required
recognition) of concepts, it is important to be clear about the definitions of these
concepts. Definitional tensions in the Knowledge Organization Layer lead inevitably to
application tensions in the Subject Recognition Layer - if we don't define the scope,
inclusions and exclusions of a taxonomy node, how can we build a subject recognition
executable to correctly identify discussion of the concept in texts?

The three fundamental steps of terminology processing - concept identification, concept


definition and concept naming - are also three of the key stages in any development
methodology of a knowledge organization scheme.

The role of terminology in the Knowledge Organization Layer: The


Knowledge Organization layer requires terminology to name the concepts. Without a
name the concept cannot be communicated. Without a definition the concept does not
exist.

Terminology Layer

Terminology identification and standardization is a process. The process of terminology


identification and application allows concepts to be named. Terms are stored in the
metadata scheme. Taxonomy node names are usually terminology. Other metadata
element types, such as document object types and audience characterization types, are
terminology.

Navigation Presentation Layer

The presentation of concepts to users occurs in navigation schemes and search results
rendering. This presentation layer requires terminology, usually in the form of
terminologies that describe different kinds of concepts, to enable users to connect to
particular concepts discussed in particular documents. Concepts are then displayed to
user communities in the Navigation Presentation Layer, using terminology, where

  35
 

appropriate, and general vocabulary.

User

The user in a sense is a "container" of tasks, just as the content is a container of concepts
that are discussed. In the context of this model of layers the user performs tasks that
require the locating of information. So, information finding is a precursor task to the task
of using it. That means we must design information access solutions focused on user
seeking becoming user finding, as quickly and efficiently as possible. In the context of
Sager's model of communication, we pivot the metaphor of communicator and recipient
so that the information environment that we design is the "communicator" and the user
is the recipient of communication.

Why Do We Want a Model like This?

We want a model like the OntoLexical Layers Model to be able to take the total
information access environment apart, conceptually so we can analyze both the purposes
and the functions of each of the layers. We want to do this for two kinds of reasons - to
make our work more precise at the strategic level and at the project design and
management level.

We use the model in strategic planning to provide a placeholder for all the elements,
outside budget and politics, required to deliver information access strategies.
Strategically, we need to look at the requirements of all the information-using
communities on the one the hand. We also want to understand and audit what
applications, workflow, outputs and quality measures we have on the other hand. Putting
these two sets of facts together lets us plan and prioritize.

At the project design and project management level we want to be able to deliver
workflow processes and configured/customized content handling applications. When we
look at each of the layers individually we discover all the variable parameters that are
configurable for our information access design. Each of the layers carries its own set of
parameters we identify to -

ascertain status
define maintenance and management issues
identify tasks (that may later become projects)
specify required applications functionalities
specify required workflow processes

There are many other ways to model this environment. But any model is only ever
devised to serve the total process of creating deliverables that are effective, and this
model gives us all the parameter placeholders that we need at both an abstract level and
a granular level.

  36
 

Perspectives on Paradigms

Project Management as a Paradigm-free Environment

The emerging space of bringing organization to enterprise content management is


overflowing with paradigms in transition. There is a complexity of paradigms emerging
and fading, each impacting single elements of the overall problems/solutions space that
we all work with. This makes our tasks exciting and hopeful, but also ultra-complex. We
need to ground ourselves in understanding this inevitable flux of paradigms. Then, we
will be well placed to deliver solutions that are successful. There is a risk, using "risk" as
a technical project management term, in not including in our understanding of the
overall information access environment a perspective on paradigms.

Although semanthink is about delivering information access solutions that convey the
semantic and ontological information required by user communities, semanthink is
really (between you and me) about project design. If we get project design right, then
we own a deserved momentum. And project design incorporates either solutions to, or at
least awareness of, all the parameters we need to figure with and configure. These
include, amongst many, redefining skillsets, understanding the functional elements
of tools such as taxonomies, deciding what quality is, in relation to semantic and
ontological information and assessing it, and understanding above all that information
organization is communication. All of these are impacted by paradigm changes
currently in play.

semanthink is about designing information access projects that deliver success. Don't
think for a minute that project and program management is a paradigm-free
environment. It's not. Knowing which paradigms are operating in a project space is one
of the keys to designing successful projects. Somewhere just after strategic vision ends
and just before implementation begins is a cusp. Some times our awareness of operating
paradigms slips through this gap. It's the moment where we want strategy and
implementation to intersect, not diverge. These early moments are when we frame our
project design parameters, knowingly or not. The more explicit this framing is, then the
better our chance that our projects will kick off on a robust intellectual footing, and that
we can make the difference between success and struggling to make the success. This
series of perspectives is about bringing some operating information industry paradigms
from background to foreground.

There are two sets of content, of course. Content originated by the enterprise itself, and
content from publishers that serve the enterprise. Perspectives on Paradigms also aims
to discuss some of these paradigm dynamics from the point of view of publishers. After
all, they are part of the problem of information access. The more alignment there is
between publishers-for-the-enterprise and intranets that require to incorporate
publisher content the better. A problem shared, in this context so far, is not a problem
solved/halved. But some solutions that publishers could adopt, over time, will remove
some of the problem areas that intranet managers face. Publishing is one of the
industries that has a huge amount to offer (and a huge amount of gain to accrue) if they
were to align product development with solutions to the endemic enterprise problems of
information organization and categorization.

Taking the Notion – An Introduction to the Set of Perspectives on

  37
 

Paradigms

I've taken the notion of "paradigm" from Thomas Kuhn's 1 famous (paradigmatic even)
work on the history of science. While the practice of science and business are markedly
different in all kinds of ways, the concept of paradigm, and paradigm shifts or changes, is
now commonly applied by analysts, business thought leaders and CEOs in
understanding (and being best responder to) the dynamic of business shifts.

The Problem with Paradigms: Paradigms Influence Projects

For instance, you want to implement an enterprise document categorization solution


where the underlying subject recognition technology is lexically-based, as opposed to
statistically-based. One emerging paradigm here is that the future is semantic. One of the
differences, then, between success and still reaching for success will depend upon the
lexical skills available to your project team. Or, suppose you want to investigate the
feasibility of implementing an application to help you build your enterprise taxonomy, or
parts of it. The re-emerging paradigm here is that of the re-centralization of active
human intelligence at the heart of the decision-making process. But the type of human
intelligence that the project will require is a skill with building ontological
representations of areas of interest. Another useful skillset for the project team here
might well be a familiarity with terminology practices. The era of buying the software,
booting up the software cd and sitting back to leisurely browse your new enterprise
taxonomy is over. (In fact, that era never really arrived.)

What is a Paradigm?

What is a paradigm? It's a shared worldview of a way to understand a slice of the larger
world, and one that provides ways to model problems and solutions, at all levels.

Paradigms become Foreground and Background

Paradigms emerge and fade. To be replaced, of course, by others. Some paradigms are
not true, but are conceits, or marketing hype. For instance, software that can
automatically "understand" text was certainly marketed as a paradigm shift. But the
underlying paradigm just wasn't there. Desire does not a paradigm make.

The business model of content categorization companies has changed markedly since the
late 90s. And amongst all the growing pains that new and emerging business segments
face, and the ebb and flow of business advantage in the marketplace, there was one ...
quite definite ... paradigm change.

  38
 

Categorization Vendors

Then, vendors such as Verity and Convera sold powerful "functional shells", that could, if
programmed lexically, recognize in documents any subjects that you would wish to
define. By "functional shell" I mean that the application contained all the subject
recognition functionality ... except the lexical content, the missing and essential "magic
sauce". So, where was this lexical content, to build subject recognition rules and
semantic networks, going to come from? Or more importantly, from the point of view of
project and program managers, how was this mission-critical lexical content going to be
delivered, what project activities needed to be defined, what resources and timelines
allocated, and so on.

Now, Verity, Convera, and others including InXight, Entrieva and nStein bundle lexical
content with their subject recognition applications. How is this all going to play out in
the marketplace? Well, not all lexical content is created equal. That's a telling early point
we can make. There are vast gaps in domain coverage when we knit together all the
available good domain vocabularies that can be re-engineered into first-class
business-facing taxonomies and subject recognition objects. Perhaps the history of
technical dictionaries has something useful that we can intuit about the coming play of
forces. And all of this means? ... that there are a whole host of project design issues
emerging anew. Exactly.

The Publishing Industry

Similarly, publishers of must-have trade, professional and business news face interesting
options if (or when) paradigms emerge that suggest taxonomies and semantic networks,
or other sets of subject recognition objects, may be of huge value in cementing a different
kind of relationship between their content and their customers. (Cemented business
relationships are good, because cement is hard to break to let in competitor publishers.)
The days of business magazines just "telling it like it is" is a long-gone paradigmatic past,
of course. Content available in ontological organization that represents, and so fits with,
the ways customer communities of users think and work, and categorized precisely and
finely ... is that part of the new publishing model, or not? So, there is a whole host of
issues emerging around this dynamic of publishers being part of the problem or part of
the solution.

What Lies Between? ... Interoperability

Ontological interoperability is a paradigm waiting to happen. Collaboration requires


ontological and semantic interoperability. But how can the global enterprise achieve
ontological and semantic interoperability (internally even) if the origination of their
knowledge organization and subject recognition layers is fragmented? But, enough ...

Point of View

Perspectives on Paradigms aims to provide a take on some of the different vibrant

  39
 

paradigms that come into play in developing projects to deliver a total information
access environment. Particularly those that impact these two key dependencies for
creating an optimized enterprise information access environment, the design of the
knowledge organization and subject recognition layers.

References

1. The Structure of Scientific Revolutions 3d edition / Kuhn, Thomas S. University of


Chicago Press, 1996 -

An excellent outline of and study guide to "The Structure of Scientific Revolutions" is


available from Professor Frank Pajares of Emory University

It's only fair to point out that there is genuine controversy in history of science and
philosophy of science communities about Kuhn's concept of paradigm. Begin with a
Scientific American review of Steve Fuller's book on Thomas Kuhn - Thomas Kuhn: a
Philosophical History for Our Times / Steve Fuller. University of Chicago Press, 2000 -

And Steve Fuller's book is available from - University of Chicago Press

  40
 

Perspectives on Paradigms

February 2003

A Brief History of Dictionaries: Meanings and Their Words

We are, right now, inside the transition point between two paradigms that define the
knowledge economy in a truly major way. The transition impacts every company playing
in the global economy. One of these paradigms is fading from the foreground, and one is
emerging. As they do. The cycle of the one fading has been long. Centuries.

But first, let's visit the business problem.

The Current Business Problem

Recognizing the occurrence of concepts in texts automatically 1 is a key dependency in


creating an information access environment at scale. But, more than this, is it also a
problem that is a "bottleneck problem" - solve it, and we can move on; not solve it, and
we can't move on. What does the unsolved subject recognition issue prevent us moving
on to?

If we cannot recognize concepts in texts, we cannot automatically associate documents


with any domain taxonomy, even if it is the best taxonomy in the world, nor can we
group and relate them ontologically. If we cannot recognize concepts in texts, we cannot
automatically leverage the definitions of users' ways of thinking into any application's
way to retrieve and differentiate documents based upon the subjects that they discuss.

That's the business problem from the "outside". "Inside" the business problem, solving it
at the project level, we realize that there are implementation issues, hence project design
issues. These project design issues are all centered around working with words. Using
words to find words. Or, to say it another way, putting words into a categorization
application to find words, and so the concepts they signify, in documents. All this at a
scale that matches the requirements of the global knowledge economy.

At the project level we are going to need to be focused on skillsets (lexical skillsets) and
deliverables (lexical deliverables). Ways with words, and ways through words.

The Paradigm Transition Defined

I've often been in presentations where the presenter goes back to the Stone Age, or some
other ancient time, and stakes out the beginning of a paradigmatic vector. For instance,
cave paintings as the emergence of semiotics, or the ancient library at Alexandria as a
beginning benchmark for information overload etc. And I've smiled.

So ... my version goes something like this ...

There are three extreme paradigm shift milestones in human culture becoming

  41
 

dependant on text as an integral part of human culture itself. These are -

invention of the printing press


invention of the dictionary
invention of lexically-based subject recognition models such as semantic
networks

The defining tipping point of this transition is the move away from a focus on words and
their meanings towards a focus on meanings and their words.

The Invention of the Printing Press

The invention of the printing press, amongst other effects, drove fundamental changes in
literacy (and the concept of literacy), authoring, mass communication, and what subjects
were deemed candidates for writing about.

The Invention of the Dictionary

The invention of the dictionary 2 allowed absolutely essential normalization at the level
of the individual printed word. The functionality of the word as a conveyer of meaning,
and a communication device between writer and reader, was enhanced through this kind
of normalization. Spellings were normalized. Pronunciation was normalized, through the
adoption of a standard code that translated phonetic symbols coupled with
morphological units. Meanings, at the level of the word were normalized, and
polysemous senses pointed out. The invention of the dictionary allowed translation (i.e.
interoperability) between languages. All this created manifest cultural impacts, such as
providing a platform for education and science, which are not really relevant in an
information access design context.

But, all this was at the level of the word, each word in large isolation from each other.
How is the concept of the dictionary as a tool limited (and limiting) in an information
access design context? It is true texts are words. But what we really want to do with texts
is to recognize the concepts that they discuss regardless of the ways the same concept
may be worded.

As the scale of information publishing increased, new tools were needed. Because scale
created issues around finding. Dictionaries were not enough on several dimensions.
Dictionaries specify usage, how we can write. But they don't help in finding ways to word
concepts. Dictionaries help us to normalize what we communicate, "push" out. But they
don't overly help us with what we want, as knowledge workers, to "pull" in. Lexical
knowledgebases, such as semantic networks and other sets of subject recognition objects,
are designed around the notion of meanings and their words, and so help us find.

Where we are on the paradigm change curve today is dependent upon texts(s), but not
yet able to leverage their full content exhaustively and precisely.

  42
 

The Invention of Lexically-Based Subject Recognition

The invention of lexically-based categorization models is the last piece of the puzzle. It is
the emergent paradigm that allows finding in large document sets and corpora. Just as
dictionaries allowed normalization at the level of the word, so lexically-based subject
recognition knowledgebases offer, essentially, normalization at the level of the individual
meaning. They give us the ability to work with meanings and their words. We put sets of
words together and so build models that recognize subjects in texts.

There are a whole range of platforms that allow us to work with meanings and their
words. Verity delivers quality subject recognition results using a rules-based platform.
Convera, with its RetrievalWare 8.0 product, delivers quality subject recognition results
using a platform based upon a semantic networks model. Although the approach is
slightly different in each case, both allow the combination of sets of lexical elements that
together model the way to recognize the occurrence of a complex topic in a text. Both,
when implemented, are from the point of view of this paradigm a lexical knowledgebase
whose semantic/syntactic objects can be combined.

We can also look at this transition as being from origination-focused to reuse-focused.


The dictionary specifies usage parameters for "correctly" (culturally) applying words in
content and a lexically-based categorization application allows reuse of content by
enabling finding.

How Far Away is the Future-is-Semantic?

The emerging paradigm is that the Future-is-Semantic. It has been emerging for a while,
but paradigms do emerge over time.

Where are we really positioned time-wise, from a global economic point of view, in the
emerging Future-is-Semantic paradigm? Are we at the Dr Johnson 3 stage in
dictionaries, or better, or worse? Dr Samuel Johnson is often considered (including his
own self-promotion) to have written the "first" dictionary (of the English language). He
didn't. But it was the first widely adopted and widely modeled one.

From my perspective, I see the global economy as being pre- the Dr Johnson stage in
semantic knowledgebases. Not so pre-, but definitely pre-.

Language is a precise and utilitarian tool. We don't necessarily need to make it "better",
more precise and more utilitarian. Nor could we, since the imposition of rules and
regulations to govern language use is impossible at best. The communication function of
language is basically OK. So the "push" side of language works. Where we do need to
need to focus is in making finding, the "pull", better. We are going to have to become
better at using language to find language.

  43
 

How to Spend Project Dollars

It's not even that the global economy needs more precise and more utilitarian
applications for finding subjects discussed in texts. We have these (and they are getting
manifestly better all the time). What we have is fine. We just need to make them work to
their full potential.

So, at this stage in the emerging paradigm how can you wisely spend your dollars
allocated to taxonomy and subject recognition in your information access environment?

Really the question should be more along the lines of - what kinds of project activities do
I need to explicitly design and incorporate into the implementation? Suppose you have
chosen an application whose functionality fits in with what you want to deliver to users.
What now? Well, now it's all lexical, not IT. Taxonomy/categorization software is largely
purchased through IT departments and budgets, but the skillsets required to create
lexical deliverables are … lexical skillsets. Understanding the lexical nature of the task to
implement Verity, Convera and their peers is a mind-set issue. Once the application is
integrated into the overall enterprise platform then we are immediately in a language
paradigm transition, not a silicon wafer paradigm transition nor a software paradigm
transition.

If we want a mantra to guide the scope of these project activities how about "Meanings
and their Words"? This is precisely the mantra that indicates we are no longer in the
dictionary paradigm, because the dictionary paradigm mantra is "Words and their
Meanings". It is also the mantra that indicates we are no longer in the information
technology paradigm, because the IT paradigm is all about APIs and repositories and
network protocols - words are numbers, bytes and bytes.

Knowing that you are in a language paradigm allows you to focus on, and make explicit,
the lexical tasks that have to be carried out at the project level. Knowing that you are in a
language paradigm means that you will have to allocate dollars, by either internal or
external resource, to working with words, not just APIs. This means lexical expertise has
to be incorporated into the project, whether it is in-house or consultancy-based.

Postludes

Some of the emerging normalization at the level of the meaning is unique and some aims
to be standard. Intra-corporate is (largely) unique. For companies to interoperate within
a domain then we require standardized taxonomies and standardized subject
recognition.

Interestingly, we could assert that taxonomy is a tool that is necessarily required at this
third shift in the paradigm, where we have reached the need for lexically-based subject
recognition models. We need the tool of taxonomy to give us the ontological architecture
that underlies our classifications of documents. Taxonomies built as faceted taxonomies
are ideally suited to taxonomic indexing and allow for the post-coordination of complex
subjects. The functional taxonomy elements of scope and node definition, together with
the set of nodes and their relationships, gives the specification for the semantic
knowledgebase we need to build for our subject recognition application.

  44
 

Notes and References

1. The word "automatically" is often incredibly vague as to its actual meaning. The word
"automatically" is often used as a term - i.e. given the kind of constrained meaning that
occurs in peer groups working within a special subject area. But in fact, vendors and
customers often use "automatically" in quite different ways. And this, in its turn, is part
of the larger dynamic of the fudging of the line between genuine terminology and general
vocabulary (which always fails). All content categorization applications require the
expenditure of the resource of human time to configure them lexically and ontologically.
It just varies along the parameters of when in the workflow, from the customer's point of
view, and doing exactly what tasks or activities. We will go into "automatically", in the
depth that it deserves, elsewhere and at a different time.

2. For both a complete and "proper" history of the dictionary as a functional tool see -
The English Dictionary from Cawdrey to Johnson 1604-1755 / De Witt T. Starnes and
Gertrude E. Noyes. John Benjamins Publishing Company, 1991.

3. For more on Dr Johnson and his dictionary, and more on him as a literary and
lexicographical genius begin at the homepage of Jack Lynch, Assistant Professor at
Rutgers University.

  45
 

Perspectives on Paradigms

May 2003

A Brief History of Newspapers: All the News that's Fit to Find

The news industry 1, particularly that which sells business and industry news, entered a
distinct paradigm shift a while back. It's been sitting in this paradigm shift for some
time, and adoption of that which creates real and progressive information utility for
readers, and so value for originators, is patchy.

As always and usual, it's all about those two core layers of information access -
knowledge organization and subject recognition. And this particular paradigm shift that
publishers are sitting through revolves around what part they want to play in making
content easier to find. Knowledge organization and subject recognition are, or are going
to be, (depending on when you are looking from) two of the non-negotiable
functionalities of all the text information handling products that together provide the
platform for the knowledge economy.

The future is always a set of what ifs. But it's never too early to think about the
possibilities, which lead to probabilities, which lead to business models or business
decisions. What eventually happens in the news publishing space will impact every
enterprise that employs knowledge workers whose job it is to re-engineer the facts of
news content.

We can enter the paradigm by asking ourselves a simple question. Is a taxonomy of any
monetary value to a business or professional news publisher? Maybe. It depends how
you model the business opportunity. Monetizing it may not be direct.

The Current Business Problem

The business problem for knowledge workers (aka readers) always remains the same.
First they have to find. The business problem, from the point of view of news publishers,
is a bit more complex. Do news publishers want to make it (a bit of) their business to
make individual items of content easier to find, or do they want to give that business
away to KO/SR intermediaries?

What if …

What if value propositions change as paradigms shift?

If we strip out advertising, editorializing and opinion promotion, entertainment and


focus only on news collection and publishing and return to the pure roots of the original
communication tool, we return to news finding, selection and publishing.

Traditionally, advertising apart, all the money was made on selling the day's news once.
Although certain parameters of the news (its audience, which created its market and so

  46
 

circulation, for instance) drove the rate card for advertisers. Then came, in addition,
online databases, which were re-sellers and aggregators. And then a different generation
of aggregators. So, there are plenty of channels for any news originator to sell its content
multiple times.

But channel is only one part of the value proposition. What if other factors were going to
be important? What kinds of factors might these be? Knowledge workers spend a lot of
time seeking, and not necessarily finding (which is a lot worse than spending time and
finding what one needs, but still costly). Also, a lot of time reading that which they don't
need, or want, to read. This kind of reading is just another form of seeking, of sifting
through. So, factors around knowledge worker behavior and the unmeasurable cost of it
will be a factor. We all know that information overload is a complaint, but what if it
became a factor that drove content purchasing decisions?

What if … Taxonomies were Important?

Let's make up a scenario. Here are the actors 2 -

A publisher which specializes in one, or more, special subjects


Its set of corporate customers
The domain of the publisher's customers (hence the domain of the publisher)
A taxonomy of the domain

And here's what is happening in the business space. The publisher has competitors. The
publisher has multiple channels of delivery to the customers - print, web, multiple
flavors of aggregators. The customers are frustrated. But this frustration is generalized
against information hassle in general and is not especially targeted at our publisher.
There are a number of emergent taxonomy/categorization applications vendors. While
knowledge organization and subject recognition have been with us since the great library
at Alexandria, these vendors are literally whizzing through paradigm shift upon
paradigm shift 3 of their own. All in all, everybody's a bit unhappy, if they are aware of
their situation. And, then, there is the thing about interoperability. Interoperability is
still a Great Problem to come. The thing about interoperability hangs in the answer to
questions like this one: "Do we (whoever we are) want each and every domain to have
more than one taxonomy, or not?" This is the publishing problem, or opportunity.

What can history tell us?

A Brief History of Text Information Tools

Throughout the history of the printed word as a carrier of information we see the
emergence of new information-communicating tools as new information-finding needs
become foreground enough for some entrepreneur or visionary to step in. This is not a
new phenomenon. This has always been the case since the invention of the printing press
ushered in the era of text becoming intrinsic to human culture. In no order of any

  47
 

timeline some of these are newspapers, dictionaries, thesauri, encyclopaedias,


classification schemes. cataloguing schemes, catalogues, abstracts, indexes, citation
indexes and so on. These are all tools. Specifically they are all communication tools.
More are on the way today - markup languages, interchange formats, ontologies to
support the semantic web, and so on.

As a set, built up over time and each addressing specific as-then unmet requirements,
their functional elements are all to do with semantics, ontology and finding. Because the
user problems are all to do with finding semantics and ontology.

A Brief History of Newspapers: All the News that's Fit to Find

The worth of business news to users has long since moved on from the traditional value
proposition of finding stories, making a selection and working this into journalism.

News users requirements of news are different now, since the first corantos of the early
1600s. In fact, news users requirements are additional, not entirely new. It's often
forgotten that with the information explosion the weight of ways of working with
information has changed. A lot of information use by knowledge workers is collating and
analyzing material that is not today's news, but is historical. This way of knowledge
working makes knowledge organization and subject recognition foreground. Without
these two functions knowledge work is hard to do.

The question for business and professional news publishers is "what part of this business
do I want?"

Fit to Find

So, to find and remain focused on a more contemporary value proposition, we need a
new mantra. The business news model now is not just "all the news that's fit to print" but
now has to include "all the news that's fit to find". (The play on words is intentional.)
News originators know this. But it's all the parameters that make the dynamic complex
that interests those of us who look from the point of view of categorization and
taxonomy.

What if … a Taxonomy could have a Monetizable Value?

If you are a vertically-focused business news publisher what does a taxonomy buy you?
Well, if you give this technical community 4 a domain representation that works for
them, and those you consider to be your competitive peers don't, this is worth
something, possibly.

Let's return to our little scenario and see what might have happened since last we looked.

Well … the publisher made the business decision to contract an information access
design consultancy to build a domain taxonomy. It was a faceted taxonomy, to enable

  48
 

post-coordination of elementary concepts to form complex topics. They then tagged their
news (and several years of legacy news) against the taxonomy. They used a
categorization vendor for this. And built a workflow around their digital asset
management (DAM) system to categorize and tag every day's news. They then had the
ability to design new information content products, based on classifications, and
potentially personalized classifications, derived from the faceted taxonomy.

The line of reasoning behind the business decision went something like this.

Firstly, the publisher decided that one of the antidotes to information hassle, from their
customers' point of view, is the availability of precision and recall. This "availability"
doesn't just happen like business magic, of course. Particularly since precision and recall
lie in the eyes of customers. But if you give a set of customers, who all work in the same
general special subject area, their representation of their business world then you have
given them precision and recall. All the publisher had to do was insist that their
categorization vendor could recognize the subjects of the taxonomy, up to the level of
whatever the acceptability criteria were. Which opens up another interesting vista for
content originators to think about information overload; if you are not part of the
solution, then … are you part of the problem? In a more distant future than the next
couple of years, will the information vending model be to sell information with
knowledge organization, or information without knowledge organization?

Secondly, the publisher realized that monetizing a product enhancement can be more
than just charging for it, more than just being able to measure a new and differentiable
cash flow attributable precisely to the enhancement. If the new taxonomy-based
approach to product development took off, and they had every belief in this, then it
would impact their competitors, particularly the two that they collectively lost the most
sleep over. And so this competitive differentiation was how they would monetize the
benefits of the taxonomy. Their customers would value the classifications of the content
that they had to work with every day as knowledge workers. The customers could port
the taxonomy into their news portal. The taxonomy becomes a lock to lock in customers.

So far it looks like win/win/win, aka win3. Everybody could be happy. Publisher,
categorization vendor, knowledge worker customers. Everybody sticks to what they do
the best.

But there's more. More happiness all round in prospect. The happiness that
interoperability brings.

Community or Domain?

We correctly separate the community, which is a set of users, from the domain, which is
a "thing", the field that the users operate within. However, this separation should
disappear at the taxonomy layer. It is taxonomy that brings both together, so that all user
tasks that begin with ontological and semantic information are fulfilled.

Every technical community does have a shared view of its business world. Most likely,
this representation of the business world is not (yet) in the form of a taxonomy. There
are remarkably few "good" special subject taxonomies out there. Which fact is an
opportunity for whoever thinks they can take it successfully.

  49
 

The background to all of this is the notion that each technical community or (domain)
only really requires one "good" representation scheme to model the domain. One gives
interoperability, unambiguously. Two, or more, may, or may not, give true
interoperability.

A taxonomy, such as the one our publisher above built and integrated into product
development, tells us something about the business of the enterprises whose knowledge
workers use it. Maybe it would be licensable to categorization vendors that port
knowledge organization schemes formatted to their technology. In which business case it
would become even more ubiquitous.

Text is Different from Data?

Of course it is. It is the differences that create the semantic and ontological information
with many-to-many references. But it's instructive to look at what is happening in the
world of data. Data ontologies are all going to be standardized and interoperable 5. A lot
of business taxonomies are going to include many of these concepts that data elements
quantify, because documents are written about these concepts, not just data crunched.
So, if it's data our data structures are interoperable, but if it's text then our taxonomies
and subject recognition objects are not? That seems hard to argue for, from this point of
view.

Who Does What when Paradigms Change?

Paradigm changes are all about "what's going on?" They are also all about "who's going
to do what?"

Vendors

A number of categorization vendors are successfully working the publisher space. This is
an interesting model that is different from addressing vertical markets, i.e. domains.
Publishers can publish in any domain.

Convera has accrued a long list of publisher customers. Convera is actively marketing
"cartridges" to all its customers, each of which covers a different domain. A Convera
cartridge is a bundle of a taxonomy and a semantic network to recognize the concepts in
the taxonomy.

nStein is actively targeting the publisher space. In their nserver suite of products they
bundle the IPTC news taxonomy. A publisher can publish in any domain, of course, but
their business needs are all identical. To make their content objects more findable and
information products more usable and to take part in the revolution to do away with
information overload by in-building the potential for precision and recall into their
content objects. And to be able to develop new kinds of content products.

  50
 

Postlude

News publishers currently have an opportunity to print all the news that's fit to find. Do
they want to be in the taxonomy business, or not? Opportunities don't last forever, of
course. Content categorization applications can do the finding, if some other party does
the knowledge organization building. The enterprise will be able to look after itself, by
purchasing the appropriate categorization application and then using (internal or
bought-in) lexical and ontological skillsets to integrate publisher content with intranet
environments.

Notes and References

1. In the continuing spirit of these PoP histories being histories with an ultra-short
recollection span you can concurrently begin with a timeline of early milestones in the
UK from the British Library.

... and Mitchell Stephens, Professor of Journalism and Mass Communication at New
York University has posted a fine introductory encyclopaedia article.

2. This isn't based upon a strict use case reading of the scenario. Is the domain an actor?
More likely not. But the domain gives rise to the creation, and design, of the taxonomy,
and is part of the model of the dynamics.

3. What is a good collective noun for a set of paradigm shifts?

4. "Technical Community" is used in a technical sense. All communities of users that


share a common business task or role are technical. See the semanthink Review Essay of
Juan Sager's classic text.

5. As examples of what is happening in the world of data interoperability see some of the
following - XBRL - eXtensible Business Reporting Language FpML - Financial
products Markup Language MDDL - Market Data Definition Language

  51
 

Review Essays

Project Management and Information Over-and-Under-Load

One of the persistent tasks of semanthink is "information overload" and how to solve it.
Information overload is a symptom, and like many symptoms it doesn't show it's
underlying causes clearly.

In fact, the term "information overload" has probably run the course of its short but very
useful life. Information overload is not "just" about "too much". True, there is too much,
but what causes us and users problems with information overload is really a combination
of two powerful and different processes; entropic 1 information "disorganization" and
active information "misorganization".

We think that "information overload" for knowledge workers is really a symptom of the
absence or inadequacy of information access design. From this point of view knowledge
workers are at the literal mercy of either entropy or (non-optimized) design. Or both. In
corporate intranet environments we solve information overload by working with the
underlying structural causes. One of these structural causes is lack of effective
metadata design. How does this in turn cause business problems? It is within the
metadata element set that we want to place information about the concepts that
documents discuss. Without those concepts being defined, named, recognized and
associated with individual documents, through some application of subject recognition
or categorization upstream, it is hard, downstream, to enable information access, at the
level of the concept, to document users.

If information overload is bad for knowledge workers, it can be even worse for those
tasked with solving information access problems for groups of users in enterprise
intranets. There is too much in the world to read for information access project and
program managers, just us there is for all of us. "Too much to read" is a very different
problem from too much to sift through to find what you really need for the task at hand.
Additionally, and more problematic, project and program managers working with
taxonomy and content categorization projects suffer from information "underload" - too
little of what might be helpful to them in project design. Hence the semanthink site.

Project Design Based upon Models and Methodologies

What are the basic dynamics of the business problem of "information overload"? How
can we open this black box so we can find variable parameters that we can work with and
optimally configure? One of the persistent thematic solutions of semanthink is the
advocacy of models, methodologies and project design, all in synergistic
combination. It is this synergistic combination that is too little written about. Which is
only to be expected, because of our place in the larger business cycle regarding which
business management problems are adopted as problems to solve. And while it is only to
be expected, it is also time to remedy this state.

  52
 

Why a Review Essay?

The books reviewed here are largely "old", i.e. they are not new releases into the
information flux. For instance, "A Practical Course in Terminology Processing" 2 by Juan
C. Sager was published in 1990. The question "why these reviews, then?" obviously needs
answering. semanthink review essays aim to take some key but largely under-
appreciated, or put aside and never returned to, (because of overload) books that have
important things to say about models and methodologies that can be incorporated into
information access project design. By choice, these won't be the best sellers.

There are a number of goals for these review essays. These goals are interlinked. One set
of goal is to do with improving the overall economic contribution that taxonomy and
content categorization applications make to corporate economic performance; i.e. let's
implement the knowledge economy and create economic benefits. These goals are aimed
to appeal at a level of the importance of the richness of ideas.

More importantly, another set is concerned with improving the performance of our
information access deliverables; i.e. let's implement the knowledge economy and create
efficiency and effectiveness benefits. These goals are aimed to appeal at a level of
practical design and implementation.

Business Contexts and Project Design

Is a "review essay" different from a vanilla book review? It's more applicability-based. A
review essay allows the writer to move outside the boundaries of the concepts of the book
to apply what it has to teach to areas it was not originally conceived for, or that were not
possible at the time. The books reviewed here all share one aspect in common, that is
non-negotiable. They all contain at least one (or one part of a) model or methodology
that we can usefully re-engineer and use to enrich our design of taxonomy and content
categorization projects.

These review essays are written from the point of view of working within any enterprise
intranet or newsfeed environment, but much of the modeling is equally applicable to e-
commerce and e-learning environments, and information transaction environments in
general.

The Economic Justification of Information Access Design

This is what Juan C. Sager says about terminology improving business performance -

"The primary function [of terminology] is the collection of terminological information


which is undertaken in order to improve communications and its economic
justification lies in this objective".[page 208]

Improved communication and an economic payback is a compelling juxtaposition that is


worth working towards. Improved communication and an economic payback are what
actually implementing the knowledge economy is all about.

  53
 

The Current Emergent (and Paradigmatic) Phenomenon of Lists

Lists of things are everywhere. From a sales point of view, they can be a highly effective
way to cross-sell. All credit to Amazon!

Lists are classification in the raw. Anyone who doubts that classifying aspects of the
world is highly personal regarding what is utilitarian only has to either read "Women,
Fire and Dangerous Things" 3, or, visit Amazon. What makes a list? A theme. (What
makes a good list is a different question entirely.)

So, thinking a question along the lines of "what are the most important under-
appreciated books that can enrich our models of information access design?" we start to
hit against the parameters of what makes a good list. Size of the list is one parameter.
Some numbers are just too small. "The ten most important jazz albums of all time"?
Hmm. That would not work for a lot of people. Ten is too small a number and the span of
jazz in years and genres is too vast. Which might indicate that time span covered is an
important parameter.

And so it is that size and time span are two ways to turn a fine idea into a poor
deliverable. There is one final parametric trick or tweak. The deliverable may well be a
list of books, but the impetus is actually a list of ideas. semanthink review essays aim to
take ideas that are vital to understand, re-engineer and integrate into information
access design projects. Under-appreciating useful books is one thing. Under-appreciating
useful ideas is quite another.

Once there is more than one essay review then we will certainly have a list. The final size
of the list is not known. And as far as time span is concerned they will likely be "old", as
we have mentioned (or at least non-current). And as far as genre is concerned they most
likely will not be mainstream information architecture, usability etc. So, we can safely
say that semanthink review essays are not ("rarely", may be safer) going to cover recent
hit books. Current and mainstream presumably have a certain momentum of attention,
and they won't really need more of the same from here.

Books or ideas? Now there is an interesting dynamic. Books are only wrappers, after all.
Wrapping ideas in a book "review" is a way to review ideas, and bring certain ideas from
the background to the foreground. Foreground/background is a key notion in the gestalt
psychology understanding of perception and the need of the moment. From background
to foreground is one of the key dynamics that we use in our work, whether it is project
design, training or evangelizing. That which is important but overlooked or under-
appreciated always needs to be brought to the foreground for attention and decision.

And lastly, synthesis and re-use. Knowledge management is all about synthesis and re-
use. Under-used ideas that help us with our varied lexical and ontological work need to
be brought to the foreground.

Happy and engaging reading!

  54
 

Notes and References

1. There is little descriptive research on the natural entropy of non-organized


information asset sets. (But we do know what it looks like when we experience it!)
Research carried out by IBM, Alta Vista and Compaq, on the web as the subject (which is
a hyper-linked environment, but otherwise non-organized) show the web as "bow tie"
shaped and essentially partitioned. The study can be access at the IBM Almaden
Research Center.

It is our assumption at semanthink that this model will apply to most corporate intranets
before a project program to optimize information access, and that the partitions would
tend to be descriptively comparable.

(Incidentally, this type of study tends to disprove the "six degrees of separation"
metaphor.)

2. A Practical Course in Terminology Processing by Juan C. Sager. 1990, John Benjamins


B.V.

At time of writing you can buy through the John Benjamin site, which normally has stock
to hand and is weeks quicker in fulfillment than Amazon.

3. Women, Fire, and Dangerous Things: what categories reveal about the mind by
George Lakoff. 1987, University of Chicago Press.

  55
 

Review Essay May 2003

A Practical Course in Terminology Processing by Juan C. Sager

Part 1: Why Terminology?

Essay Theme

This review essay is written primarily for project and program managers responsible for
the scoping and design of taxonomy and content categorization projects for enterprise
intranets. These tasks of scoping and design require models of different aspects of the
communication that takes place in enterprise intranets. In this essay we extract three
core models from Juan Sager's classic of terminology work. 1 We re-engineer each of
these models and use them to understand a different aspect of the communication that
needs to happen, between user and information environment, to ultimately connect
users to the documents they really require.

In this essay we focus on terminology. Terms and words from general vocabulary occur
in different functional tools, in different communication roles, in different layers of the
total information access environment. The persistent question, from a project design
point of view, is "how do they get there?" For instance, how are taxonomy nodes and
concepts in controlled vocabularies named? Through what process, and using what
methodologies? We do it by incorporating the methods, models and theories of "classic"
terminology work into the information access design process. More precisely, we base
our methodologies and project activity definitions upon a set of models, including the
three highlighted here.

A Focus on Communication

One of the key functional communication elements in the information access


environment, that connects users to documents, is the choice of term for each and every
concept in the domain. What makes this key? Users use words to find words; they use
words in information access schemes (such as taxonomies and classification schemes) to
find words in documents that indicate or denote particular concepts. Or, looking the
other way, the total information access environment uses words (in information access
schemes) to communicate indications of concepts in documents to users. But, users are
more than "just" users - they are also members of a community of technical peers. These
communities create, and re-create, their own terminologies continually. Terms in the
information access environment, in a taxonomy or classification for instance, are
required to be their terms.

This is how it all began ...


Once (and still), upon a time, the user sits in front of the screen. The user knows the task
at hand. There is a chance that one of the links currently showing on the screen will lead
to a document that resolves this task. The user is focused on two or three links in
particular. Actually, the user is focused on the words contained in these links. This small
moment is a moment of decision; to click through, or not.

  56
 

The Business Problem

Whenever the user genuinely does not know what the document is "behind" a link, then
our user is effectively asking a question of the form: "What (kind of
document/information) will I find if I click on this link? Will it help me?" It's a question
that is asked hundreds of thousands of times each day in every corporate intranet in the
world.

We, however, being information access designers, are focusing on the words in the links
and asking different questions. "How did these words get there?" And, "are they working
well?" We ask these questions because much, but not all, of the user's cognitive decision-
making process is dependent upon the words in these links, and how they work.

There is a whole host of sub-questions to both those questions of ours. This first question
is to do with the process of identifying and selecting words for this important
communication role. That is, what methods (or heuristics), project design processes and
theories (or ideas) were used to design the workflow that delivered those words? The
second question is to do with the success (or otherwise) of these words as
communication tools for particular communities of users with particular tasks to resolve.
That is, are these words successfully mediating between user and topic, by both correctly
identifying the concept and labeling it in a way most useful to the user? In this little
scenario we will never know. But, this review essay is an introduction to how words that
carry key communication functionalities should get into any information access layer.
And also about how words selected as terminology can be made to "work".

"Words" and "Terms"

One of these sets is obviously a subset of the other, one way or the other. What is the
difference between "a word" and "a term"?

Every community of interest is technical; baseball fans, americana music lovers,


insurance salesforce personnel, research biologists, workplace anthropologists,
investment bankers specializing in M&A advisory services are all technical communities
and all technical cultures. Every technical community speaks in, and uses, a "special
language". 2 A special language is a set of terms, each of which carries a precise and
consensual definition, and which "substitutes" exactly for the concept in communication
between technical peers. Terms are one of the differentiators between special languages
and general language.

Terms differ from general vocabulary, even though both are "words". Terms are
"technical". They declare concepts in technical communities. And they declare those
concepts monosemously. In most technical communities terms carry only one meaning;
they are not polysemous in the way that much natural language is. The meaning of any
term has to be "defined" and the use of the term "standardized" if it is to be an item of
terminology. This standardization is not necessarily formal, although in some scientific
areas it is. It is defined and standardized within the technical community and culture
that uses it to declare a concept. This cultural consensual definition and standardization
of technical communication does not apply to general vocabulary.

Both words and terms are used in the layers of the total information environment that sit

  57
 

between users and documents. In some layers they are completely separate, or excluded,
from each other. But this is rare, and situational. More often, there is not a complete
separation between terms and general language. For instance, some thesauri are "just"
terminology, but even here they have to reference the non-preferred (non-
terminological) lexical elements. Usually, the functional element of the taxonomy that is
the node label is terminology, but not always. Document titles and descriptions mix and
match terminology and natural vocabulary. As do documents themselves.

And precisely here is the genesis of what makes "good" automatic subject recognition
over "poor" subject recognition. We need to know both the special language and general
language lexical variance to be able to combine them for accurate subject recognition.
Similarly, here is the genesis of "good" communication or "poor" communication. If we
mix and mismatch terms and general vocabulary in our presentations to users we can
create communication breakdowns at the semantic and ontological levels.

Why an Understanding of Terminology is Important in Project Design


and Development

Because of these characteristics of special languages, deciding that a lexical element is an 
item of terminology carries consequences in the way that it is managed, maintained and 
applied. An understanding of terminology enables us to effectively deploy terminology in 
applications. But, before that, before we can apply terminology, we have to collect it. 
Knowing what terminology does, or what it is "supposed" to do, clarifies what we need to do 
to collect it. We are going to get the understanding of what terminology is "supposed" to do 
by looking at three models that together model how terminology makes successful 
communication. These models are at the heart of Sager's book and will shape how we define 
our project activities. These are a ‐

model of the relationship between concept, definition and its term 

model of special languages


model of communication

To work at the project design level we do need to begin with models and methodologies.
Models are based on theories and methodologies are based on the inputs we have to
work with and the deliverables we want to produce. We derive our inputs from what our
models tell us should be happening.

Without these models (or any models) project design in our kind of work sometimes
carries a similarity to alchemy - in this case the alchemy of vocabulary into terminology -
rather than applied communication science. This alchemy is a black box we want to
open, so as to discover processes and hence methods. Building models is one way to open
any black box.

Where is Terminology?

Terminology is everywhere in the intranet environment. Content managers and content 

  58
 

publishers use terminology in many of the layers of the total information access 
environment. Here are some of the layers that connect users to documents (and documents 
to users) where we can apply terminology ‐

Classification Schemes
Document Titles and Descriptions
Navigation Systems
Subject Recognition Executables
Taxonomies
Thesauri and Controlled Vocabularies

Some of those deliverables that consume high-quality terminology are elements of the
overall metadata scheme, such as taxonomy or ontology. Other elements of the metadata
scheme that are controlled vocabularies will require terminology. For instance, if we
carry out an audience characterization study we will need terminology to name the
audience types. Or, if we want to type documents as kinds of wrappers of content (such
as FAQ, glossary entry, analyst report, forms, committee minutes etc) we will need
controlled terminology for this. Or, common workflow tasks that can be unambiguously
associated with particular documents or document types will also need terminology
design.

Thesauri and controlled vocabularies, used for search engine results optimization and
other processes, need to be based on terminology principles. Here the idea is to collect,
as exhaustively as is still cost effective, the lexical variance of the ways to denote the
concept alongside the "technical" term and normalize against the technical term.

Our work in terminology, in part, strongly influences our promotion of best practice
technical communication in the authoring of these two fundamental user-facing
information architecture elements, document descriptions and titles. Titles and
descriptions of documents that use appropriately chosen terminology items will
communicate a certain clarity or precision, in support of user click-on decisions,
compared to those that use general vocabulary alone.

Other uses of terminology occur in functional elements of the total information access
environment that are outside the metadata scheme. For example, terminology is also a
key aspect we take into account when naming navigation and classification labels so that
they conform to the consensus of the community of users. Contextual links can also
benefit from having terminology designed into them. Lexically-based and
knowledgebase-based approaches to subject recognition in documents, whether these
are rules-based or aim for a semantic network model, benefit from the assimilation of
terminology principles.

Outside information access design work terminologies support interoperability through


translation from one language to another. In the global economy customers of global
companies (and the companies themselves) are caught in a dynamic of
globalization/localization. Terminologies are used to bridge the kinds of gaps that occur
because of this dynamic.

What is Terminology?
Let's now begin at the beginning and understand what terminology is, from Sager
himself. There are a number of senses of the lexical element "terminology", and most of

  59
 

us will be familiar with this one -

"A terminology, i.e. a coherent, structured collection of terms, is a representation of an


equally coherent, but possibly differently structured system of concepts."[page 114]

We would all recognize this definition - the noun meaning a collection of words that
together form a particular way of describing a part of the world at large for a particular
community that works within that world. However, there are two other senses not so
commonly recognized or expected. This is how Sager defines these meanings.
"Terminology is the study of and the field of activity concerned with the collection,
description, processing and presentation of terms, i.e. lexical items belonging to
specialized areas of usage of one or more languages."[page 2]

When we separate out the three meanings of the word "terminology" this is what we 
get 3 ‐

It is an activity: the set of methods and methodologies used to collect, describe and 
present terms. The activity is what terminologists carry out. 
It is an activity: the set of methods and methodologies used to collect, describe
and present terms. The activity is what terminologists carry out.
It is a theory to explain the relationships between concepts and terms.
It is a noun meaning a vocabulary of a special subject area.

These three meanings are all used in project work to create information access
environments. As an activity we use it to help us set the tasks of project activity
definitions. As a theory we use it to robustly design our projects at the project level. As
the noun, it faces two ways. To project managers, a "terminology" is a deliverable. To a
particular user community it is their own communication tool - the set (or sets) of terms
that they use to name and relate its concepts.

Life Lived One Click at a Time

Links and their labels sit between users and documents. The user is in front and the
document is behind. In "front" of what, and "behind" what? Terminology and general
vocabulary. A cognitive decision by the user and a click are all it takes to move from the
link to the document. Because it's a cognitive decision, we have to ask ourselves "what
supports the user's decision to click through or not?"

Let's go back to the beginning ...

... The user sits in front of the screen. The user knows the task at hand. There is a chance
that one of the links currently showing on the screen will lead to a document that
resolves this task. The user is focused on two or three links in particular. Actually, the
user is focused on the words contained in these links ...

This is a small moment in time (but repeated vast numbers of times in the corporate
world each day). The decision of this moment is binary, no or yes. The user has
expectations of the link. "Will it … or won't it?" We see the task of all text labels as
making this user decision effortlessly binary. "Yes, it will!" Or, "No, it won't!"

  60
 

As project managers or designers, we can now add a set of additional questions,


requiring answers, to our earlier ones. "How do I define and collect terminology?" "What
instances of general vocabulary do I need to collect?" "And why?" "Where can I mix and
match terms and general vocabulary?" "And how will this impact the functionality of the
tool where I do this?" "Where shouldn't I mix and terms and general vocabulary?"

Life lived one click at a time is a highly existential, binary life fraught with stress because
of lack of information about information. Robust concept-focused information access
design schemes give decision support to the user at this very granular level. Every time.
Users do have expectations. Users do have information tasks to resolve. The quality of
terminology and vocabulary in the layer that presents information choices to users is a
key driver of the effectiveness, or not, of the total information access environment.

Notes and References

1. A Practical Course in Terminology Processing by Juan C. Sager. 1990, John Benjamins


B.V.

At time of writing you can buy through the John Benjamin site, which normally has stock
to hand and is weeks quicker in fulfillment than Amazon.

2. See also English Special Languages by Juan C. Sager, David Dungworth and Peter F
McDonald. Oscar Brandstetter Verlag.

3. When we talk about special languages and terminology we are ourselves using a
special language and a set of terms. There are some other distinctions that are worth
noting.

One particular area of confusion highlighted by the POINTER Project is that of the
differences between these terms - terminology and lexicology, and terminography and
lexicography.

"While lexicology is the study of words in general, terminology is the study of special-
language words or terms associated with particular areas of specialist knowledge.
Neither lexicology nor terminology is directly concerned with any application.
Lexicography, however, is the process of making dictionaries, most commonly of
general-language words, but occasionally of special-language words (i.e. terms). Most
general-purpose dictionaries also contain a number of specialist terms, often embedded
within entries together with general-language words. Terminography (or often
misleadingly "terminology"), on the other hand, is concerned exclusively with compiling
collections of the vocabulary of special languages. The outputs of this work may be
known by a number of different names - often used inconsistently - including
"terminology", "specialized vocabulary", "glossary", and so on."[section 1.2, page 31]

POINTER Final Report / POINTER (Proposals for an Operational Infrastructure for


Terminology in Europe)

It is also available online at Citeseer, in a variety of formats including pdf.

  61
 

Review Essay May 2003


A Practical Course in Terminology Processing by Juan C. Sager

Part 2: Models Let us Devise Methodologies

An Introduction to Models: Redefining the Business Problem

Models let us devise methodologies, methodologies enable us to properly scope project


activities, and through projects we implement solutions.

Models of processes are highly useful. They define what elements, actors and
parameters are essential to building the process, whatever the process is. They include
all of these, and show how they inter-operate. This inter-operation requires
communication. Communication requires a medium. The medium, in our case of
organizing enterprise intranet content, is lexical. Understanding what the medium
should contain lets us design activities to engineer precisely what needs to go exactly
where. We take these kinds of models, and their parameters, and build methodologies
and design project activities from them.

What Models will Help Us?

We are going to be looking at how to leverage "naturally occurring" terminologies to


connect users to documents in information environments, particularly enterprise
intranet environments. Since lexical elements are the intermediary between users and
documents, we need a set of models to help us understand how words and terms operate
as carriers of both semantic and ontological information.

We are going to walk through three models extracted from "A Practical Course in
Terminology Processing". We are going to re-engineer these to our purposes of project
design and project activity definition. These are a -

model of the relationship between concept, definition and its term


model of special languages
model of communication

Why do we want to build these particular kinds of models? Here are some of the kinds of
questions that need answers if we are to build effectively communicating information
environments. And the projects to deliver them. Users and the total information
environment interact. To what purposes? How? What do users "carry" within (a bit like
programming) that governs their use of semantics and their classifications of the objects
and events of their world? How do they use these in a responsive and interactive
information environment? What do users share in common with their peers? How is this
mediated? How does the information environment get its semantic and ontological
programming? How do we ensure that these are mapped to the users views? What
causes breakdowns in communication of precision and recall between users and the
environment? How does this happen? And so on.

These three models are only part of the overall set of models we use in designing

  62
 

information access systems. We never, or rarely, use these three models in isolation. For
instance, there is a semanthink model called the OntoLexical Layers Model. It models
the layers in the information access environment that carry conceptual information
though using lexical elements as signs. It shows us "where" words and terms "are",
performing different communication functions within different functional tools. We will
need to bear this model in mind as part of the bigger overall context. This is described in
at length in the Models part of the semanthink site. Other models that we habitually use
we will not even touch on here.

Model 1: A Model of Concepts, their Definitions and their Terms

The Business Requirement: Why Do We Want a Model like This?

This model is so simple that it is sometimes overlooked precisely because it is so simple.


But its purpose is crucial.

The label for any concept is always cultural, because any and every community of users is
a culture. This means that labels are both consensual and shared.

Any concept, its definition, and its possible labels must be de-coupled so that choice of
the term can be chosen to be most user-centric in any layer of the content access
platform. Without having access to this kind of model we might not realize the
importance of this de-coupling. How any concept is defined is different from what it is
labeled, and what delivers these are different project activities.

The Model

"The primary objects of terminology, the terms, are perceived as symbols which
represent concepts. Concepts must therefore be created and come to exist before terms
can be formed to represent them. In fact, the naming of a concept may be considered the
first step in the consolidation of a concept as a socially useful or usable entity."[page 22]

Sager is telling us that although we all know "terminology" to be about "words", in one
way or another, terminology is actually concerned with compiling and labeling
concepts. Concepts need names of course, if we are to communicate about them. But
the name of the concept is not the concept itself, it only represents the concept to the
user. Words are simply what we use to label concepts. This means that the terminology
work we do is concept-oriented work.

"Through the activity of definition we fix the precise reference of a term to a concept,
albeit by linguistic means only; at the same time it creates and thereby declares
relationships to other concepts inside a knowledge structure … We expand the
knowledge structure of a subject field by the addition of new concepts for which we have
to create linguistic forms before they can be used in special subject discourse."[page 21]
More precisely, terminology work is concerned with concepts, their names and their
definitions. Terms are precise references to defined concepts, i.e. they "substitute" for

  63
 

defined concepts. We isolate concepts, collect and compile them, define their boundaries
and represent them lexically, through the medium of terms. The term connects users to
concepts, whether they are communicating or seeking, pushing information or pulling
information.
Additionally, Sager points out to us that no concept exists in isolation from others. The
cognitive world is made up of domains - concepts in relationship sets. Concepts declare
relationships to other concepts.

"A theory of terminology is usually considered as having three basic tasks: it has to
account for sets of concepts as discrete entities of the knowledge structure; it has to
account for sets of interrelated linguistic entities which are somehow associated with
concepts grouped and structured according to cognitive principles; it has, lastly, to
establish a link between concepts and terms, which is traditionally done by
definitions."[page 21]

Taking the Model into Project Design

There are other ways to model this set of relationships. The semiotic triangle would
model this as word, concept and reference. But this simple theory of reference works for
us, especially in the highly lexical environment of an intranet.

Naming is one issue for information access projects. We need to know that much of what
users look for is information discussing concepts. These concepts can be named in two
ways. We can either use a technical term in a terminology, or we can use natural
language labels. What counts is the way that users are naming particular concepts. We
need to come to know this.

Defining is another key issue for both knowledge organization and subject recognition.
The importance of definitions is often overlooked. After the process of making the
definition explicit we have a deliverable, a definition. This deliverable is one of the four
functional elements of a taxonomy. These kinds of definitions are linguistic descriptions
of concepts. These kinds of definitions interface between users and content. Their
business purpose, within our applications, is to support users in making decisions about
which kinds or which items of content to retrieve, or where to go within a knowledge
scheme to find that particular concept-predicated content.

Apart from taxonomies, we will use definitions of concepts in subject recognition


applications. For instance, if our categorization platform is based upon a semantic
networks model, these definitions will specify the synonym sets we will intersect for the
subject recognition to be accurate. Without definitions we cannot build accurate subject
recognition models that categorize against individual taxonomy nodes, for without
definitions what are we trying to recognize? Our goal in content categorization is to make
sure that the concepts we categorize documents against are defined the same way that
users are expecting taxonomy nodes to be defined.

For these two reasons of knowledge organization and subject recognition we have to
make explicit these definitions for the application and the environment. It is also
important to remember that these definitions belong to the users within a particular
technical community, even if they have only been made explicit by our project work.

  64
 

Model 2: A Model of Special Languages

Special languages are really part of Sager's larger model of communication. We've
extracted the concept from there and wrapped it in its own model. This makes the overall
set of models easier to both teach and apply (and think about).

It's hard to overestimate the importance of the concept of a special language. How do
peers in any technical environment communicate? They communicate through the
medium of a special language. The special language of any community of interest is both
the lexical wrapper for the special concepts of that particular domain and the set of
related definitions of these concepts along with their terms. Terms and their definitions
are interchangeable in a special language. In the act or event of communication there is a
dynamic interplay of actors and parameters. It is within this dynamic that special
languages play their role. A special language is a tool, not an event.

The Model

We would say that all business communities of interest are cultural, in the widest sense
of the word, and so technical. And we would constrain "cultural" to characterize the
combination of shared subject of interest, way of conceptualizing it and way of
communicating about it. Sager comes to the same conclusion, but says this in his own
way. In short, the language used in any business, scientific or technical community is a
special subject language.

This is how Sager defines a special language -

"Special languages have been defined as semi-autonomous, complex, semiotic systems


based on and derived from general language; their effective use is restricted to people
who have received a special education and who use these languages for communication
with their professional peers and associates in the same or related fields of
knowledge."[page 105]

All of these parameters characterize corporate intranet environments.

"The lexicon of a special subject language reflects the organizational characteristics of


the discipline by tending to provide as many lexical units as there are concepts
conventionally established in the subspace and by restricting the reference of each such
lexical unit to a well-defined region. Besides containing a large number of items which
are endowed with the property of special reference the lexicon of a special language also
contains items of general reference which do not usually seem to be specific to any
discipline or disciplines and whose referential properties are uniformly vague or
generalized. The items which are characterized by special reference within a discipline
are the 'terms' of that discipline, and collectively they form its 'terminology'; those which
function in general reference over a variety of sublanguages are simply called 'words',
and their totality the 'vocabulary'."[page 19]

  65
 

Special languages differ from general languages in that they use terms in addition to
words. Terms as linguistic expressions, and hence special languages, carry two core
properties of communication in general- the property of definition and the property of
shared special reference. These two communication properties, of course, carry over into
information access environments.

"In special communication terms are considered substitute labels for definitions because
only a full and precise definition is the proper linguistic representation of a
concept."[page 109]

This property of "special reference" effectively means that terms are substitute
definitions. Terms are obviously more efficient, in terms of lexical overhead, than the full
definition required to capture any particular concept.

"Only if both interlocutors in a speech act know the special reference of a term, and, by
implication, that they are using terms rather than words, can special communication
succeed."[page 105]

"Special reference" also refers to the structure of concepts in any special subject, as
opposed to general knowledge. Special subjects have a need for delineating the
relationships between concepts more strictly, so with much less fuzziness or flexibility,
than in general knowledge. But this delineation is one of difference in degree only. This
is how Sager puts it.

"Within a subject field, some or all of the included dimensions may assume increased
importance, with a greater need for distinction between a larger number of concepts
along a given axis: at the same time, the necessity to avoid overlap between concepts, i.e.
intersecting regions, will tend to reduce the degree of flexibility admissible in the
delimitation of the bounds of any given concept. There is thus a difference of degree
between the intradisciplinary structure of concepts in the bounded subspace of a special
subject or discipline and the less well-defined, less "disciplined" structure of "general
knowledge". This does not mean that general knowledge cannot contain well-defined
facts; but only that disciplines have a greater need for more rigorous constraints on the
overall delineation of items of knowledge. The referential function at the extremes of this
distinction is classified as "special reference" and "general reference" respectively."[page
19]

Special reference ensures complete communication.

"In special communication terms and standardized terms make a critical contribution to
achieving complete and effective communication."[page 105]

  66
 

Taking the Model into Project Design

We need to understand that users in technical communities know two things about their
special language. Firstly, they "know" when they are using their special language. This
knowledge is intuitive, since the concept of "special language" is a concept belonging to
terminologists and communication experts - i.e. users almost certainly don't know that
others have labeled what they are doing "using a special language". (The term "special
language" is part of our special language as information access designers.) Nor do we
need enterprise knowledge workers to explicitly know (or even suspect) that they are
constantly using special languages when they communicate with peers or customers. Or
when they use an intranet. But those of us who design for knowledge workers do need to
know this.

Secondly, users in technical communities also know how much ambiguity or polysemy
exists in their special language (very little). But those of us who design for knowledge
workers need to know this also. Why should knowledge workers create little ambiguity
and little lack of precision when they communicate directly, but experience (much) more
of both when there is an intermediate system?

Another way to approach this point is to think of it as a process of default values.


Technical peers "default" to using particular lexical elements (terms) whenever they are
in technical communication. This default takes precedence over natural language usage.
This default is communication using special languages.

The Business Requirement: Why Do We Want a Model like This?

Models have to be synergistic with their peer models.

We needed to know what the actual tool is that peers in a community of interest use to
communicate. Without knowing what the tool was, our project design options were both
limited and unfocused. Now that we do know, we see that it is a language tool called a
special language. This tool of a special language combines two different types of lexical
elements, "terms" and "general vocabulary". The terms of the special language carry two
particular functional properties. Terms are substitutes, or a shorthand, for conceptual
definitions. Terms carry information about the conceptual or ontological structure of the
particular domain, including "special reference", which is information about the
precision of delimitation between "adjacent" concepts.

Now that we know the core ways in which this tool functions, it is not a large leap of
intuition, or pragmatic project management sense, to see where we have come from and
where we are headed, in working with lexical information.

The OntoLexical Layers model tells us where instances of any special language lexical
elements can, or should, exist.

The model of term/concept/definition tells us that unless we have these three inter-
related functional elements in perfect alignment for every concept that the user
community requires then we are going to build into the environment unwanted
outcomes. From an information retrieval point of view this will be issues with precision
and recall. From a usability point of view these will be all kinds of user behaviors that

  67
 

usually indicate frustration, lostness and so on. From a business point of view there are
going to be efficiency consequences. These will, as always, be hard to measure.

The model of special languages tells us the communication tool that our users use. This
in turn tells us that the information access environment that we are designing should
also use our users' special languages, equally effectively as they do. This means that we
will use terms where terms are required and general vocabulary when it is called for. And
that we will know how to decide which situation is which. Every type of presentation that
the user interacts with will be an opportunity to use a special language, or not -
navigation schemes, browsable classifications and taxonomies, search results rendering
and so on.

Model 3: A Model of Communication

The Business Requirement: Why Do We Want a Model like This?

Let's now look at Sager's model of the communication dynamic. It is idealistic, or


"perfect", but that does not necessarily count against this model, or models in general. Its
idealistic point of view is communication between scientific/technical peers. But that is
fine, because every intranet environment uses its own enterprise-specific terminology.
From our point of view, the intranets of insurance companies and investment banks are
"as technical" as a scientific/technical environment. The model sets up the transaction,
of information, in a context of purposes, and motivations, using terminology as the
medium. It is "formal", but science as a method is "formal". But so is business, in the
sense that there are constraints imposed in pursuit of profitability.

We need a model that tells us useful information about the "act" or event of
communication. What kinds of information do we need to know about the event? We are
lacking a purpose for why people want to "know" or "find", and why people want to
communicate or tell. Basically, what is the purpose of communication? We need to have
a way to understand that users have states of knowledge. These knowledge states vary
over time, vary according to information "taken in" and processed. If knowledge states
vary between peers, then they certainly vary between any individual user and the total
information access environment as one gestalt. If users are in states of knowing that they
want to change, one way or the other (but usually not to get more ignorant, but more
informed), we need to know what parameters support that transformation and what
inhibit or destroy it. So, We could usefully use a total model of communication that -

explicitly puts a change of a personal knowledge state at the center


defines common different ways that personal knowledge states can be impacted
exposes the requirements of the information recipient (or seeker) that must be
met for any information transaction to be effective
explicitly points out that assumptions and pre-suppositions about information
recipients exist, and so these should be dealt with

All of these elements are included in Sager's model of communication.

  68
 

The Model: Communicator, Recipient and Knowledge States

This is how Sager describes his model.

"In a model of specialist communication we assume the existence of at least two


specialists in the same discipline, who are jointly involved in a particular situation where
the sender (speaker or writer) is motivated to transmit a linguistic message which
concerns the topic of his choice and which he expects a recipient (reader or listener) to
receive. We assume that the sender's motivation arises from a need or desire to affect in
some way the current state of the knowledge of the recipient."[page 99]

So, we have actors, a sender and a recipient. We have motivations, a sender's motivation
that is to inform, and a recipient's current non-optimal knowledge state about an aspect
of the domain. Sager calls the sender's motivation the "intention … to transmit
information which will have an effect on the current knowledge configuration of the
intended recipient".[page 99]

Now we have the notion of a "knowledge configuration" or state of knowledge. This gives
us three communication parameters so far: actors, intentions, states.

These knowledge states of the person who is the recipient can be changed in different
ways. Here are some of the "effects" on the recipient's current knowledge configuration
(there are others, but these ones serve well to give the flavor of it all). The sender's
communication can -

augment the recipient's knowledge


confirm the recipient's knowledge
modify the recipient's knowledge

We now have four parameters in play: actors, intentions, states, and ways the states can
be changed. With the idealized parties in place, and the parameters in play, the
communication event then happens. The "parts" of the model are now made to "work" by
the participants.

"Basing himself on presuppositions about the recipient's knowledge and assumptions


about his expectations, the sender decides his own intentions, and then selects items
from his own store of knowledge, chooses a suitable language, encodes the items into a
text and transmits the entire message toward the intended recipient … Perfect
communication can be said to have occurred when the recipient's state of knowledge
after the reception of the text corresponds exactly to the sender's intention in originating
the message."[page 100]

The communication event has happened. But our model is not complete to our
satisfaction. We still want to understand what makes the difference between "good"
communication and "bad" communication. Sager tells us that: "The achievement of
successful communication is fundamentally dependent on the three choices the sender

  69
 

has to make in formulating his message". [page 102]

These choices are -

intention
selection of knowledge
choice of language

Intention can, at times, be clear from the type of document. Sager gives examples of
some document types. A check-list is usually used to confirm knowledge. Instructions
are intended to augment. User manuals can augment, confirm or modify.

More commonly, though, intention is expressed through the use of terms. Here senders
have to be careful to align their choice of terms with what they intend the impact to be on
the recipient's knowledge state. For instance, an undefined term will not easily serve to
augment someone's knowledge state. We can sum all this up in Sager's words.

"The sender must choose an intention that is commensurate with the recipient's
expectation. For the communication to succeed, the recipient must capture the sender's
intention either from the message or from the situation, and his interpretation of the
intention must be accurate."[page 102]

There are a number of important parameters around the sender's selection of knowledge.
For instance, the sender must either have some kind of prior knowledge about a
recipient’s current state of knowledge or make correct presuppositions. If the intent is to
augment or modify the recipient's knowledge state, then, obviously, the sender needs to
know more about the topic than the recipient does. Otherwise, no transfer is going to
happen.

The use of accepted standardized terms makes the communication economic, precise
and appropriate, or not. In the area of choice of language, the sender must choose an
appropriate special subject language, or general language.

The Business Requirement: Why Do We Want a Model like This?

In technical communication there are many variations on the theme of modeling the
dynamics between those who want to send messages and the intended recipients, and
what kinds of outcomes, or end states, arise. We like the communication model used in
the terminology processing work of Sager. It is simple, formal, and speaks to what the
heart of technical communication is, namely the purpose being to change someone's
state of knowledge of defined concepts. Information access systems exist to rebalance
unbalanced user knowledge states. Knowledge workers operate in a constant flux of
knowledge states. This flux repeats itself continually - users seek, find, use. We explicitly
design these kinds of systems, at the semantic and ontological levels, to ensure this
rebalancing occurs effectively. We use lexical elements as the intermediary.

Sager observes how terms, distinct from general vocabulary, operate in a model of

  70
 

communication. We, of course, don't want to forbid general vocabulary. We want to use
terminology when possible, general vocabulary when we have to. Our focus is on
designing around the pitfalls that general vocabulary usage creates in an environment
such as the enterprise intranet, which from our point of view of conceptual
communication is quintessentially a technical environment.

We make the model work for us at the user's granular level of making lexical choices at
each of the intermediate decision-making points that are faced. Not only do we know
that we use a tool called a special language (because our users do) but we now have
enough information to apply the tools of special languages and terminologies. As much
as possible we want to support our users achieve the change of knowledge state that they
require.

At the level of the link to the individual document we want as much as is possible that
our users be able to tell if any document is going to augment, confirm or modify their
current state of knowledge. Do you remember the little story of our user at the beginning,
scanning links to documents? This augmentation, confirmation or modification is not
just needed at the level of links to individual documents. When taxonomies, and more
commonly, classification schemes are browsable they guide users to sets of documents
discussing the same subject. In this way users choose where in the domain space they
want to augment, confirm or modify. Keeping up to date is just another, shorthand, way
of augmenting, confirming or modifying ones knowledge state.

The model lets us think about enterprise intranets, or any information access
environment, in a different way. Because the user seems to always be doing all the
"work", of searching, navigating, browsing and so on, we can forget that we need to think
about the environment doing much of the work. We can also reframe the environment as
being "active". It is a sender or communicator. To be sure, it's only a sender when the
user wants it to be (but maybe that's the best kind of sender!). When the environment is
a sender it had better follow all the good practices that all communicators practice. This
is the "how" of communicating. These good practices include differentiating between
general language and special languages, understanding that terms are the objects and
that a special language is the "wrapper", that intent counts, and that the assumptions
that the sender makes are in a mirrored relationship to the expectations that recipients
have and that both sets of assumptions and expectations have to be worked with. And all
this is lexical and ontological work.

Lexical and ontological work is granular work. Graphic designers work with one pixel to
left or right, or one point or Em here or there. We work with a word or the turn of a
phrase. Or, how a concept relates to another in an ontological space; it is parent or child
or sibling, and are we sure the relationship type is "kind of" and that its label is reified
correctly. And then, can we collect the lexical variance to build a subject recognition
object that recognizes with high precision and high recall? This granular work is reprised
thousands of times in designing for information access.

While we use the OntoLexical Layers model for both strategic and project management
purposes, we only really want the other three model together to work for us at the project
design and activity definitions levels. We use them to keep us focused "on message" and
to design projects that deliver.

Which is all good. But the real purpose of these models, as all models, is to make us think
before we practice.

  71
 

Re-Purposing the Model: the User as Recipient

In Sager's model the communication is direct. In some ways his model is a sender-
centric model. It focuses on the intent, choices and actions of the sender. Working with
designing and re-designing intranets we derive much insight from pivoting the model to
be recipient-centric, because intranet users carry many qualities of Sager's recipients. We
can focus on the users as information seekers - active recipients rather than active
senders. But the model still helps point us towards what we should do as we work to
provide information access to documents for users. Intranet users require all that Sager's
recipients require for successful information transactions to take place.

The situational factor of the "inequality of knowledge" dynamic interests us. The sender
is expected to have a greater knowledge of the subject than the recipient. The document
collection holds the "answer", the user knows that the "answer" is out there to a greater
or lesser degree of precision. But the document set is not the sender. All the presentation
tools of navigation, coupled with the semantic and ontological back-end, together are the
sender. And the navigation the means to approach and retrieve it. The user wants
augmentation, confirmation, modification etc.

Re-Purposing the Model: The Intranet as Sender

Another aspect of the communication dynamic interests us. In Sager's model the
communication event is a one-to-one, direct, personal communication. But from the
point of view of the individual intranet user the intranet communication process can
appear as an instance of a many-to-one, indirect, impersonal process. And the "many"
carries multiple connotations: many documents, a variety of navigation, multiple ways to
search etc. The "lucky" recipient in Sager's model is given one communications
deliverable, one document (or speech). This kind of communication was always push,
until the internet/intranet era. Now users spend much time (and effort) in trying to
"pull" information from out of the complexity. Users actively want, and require, to be
communicated to. The information access system is the sender. It could usefully conform
to all the constraints that Sager's senders conform to.

It is this indirectness of the intranet environment, coupled with the scale of publishing,
which causes and compounds some of the problems of "hard-to-locate" content. There is
certainly intention aplenty in an intranet environment. Authors, navigation designers,
taxonomy engineers are all intending that best-fit documents be communicated to users.
We try to integrate all these parameters of communication. We integrate by working with
concept identification and concept naming. We build this intermediate layer to speak the
language of the user. Regardless of the language of the author. And in doing so we work
with the parameters that Sager's sender has to work with -

intention (we know a great deal about our users and their requirements)
knowledge (concept identification and definition is our key deliverable)
language (terminologies and vocabularies, special languages and general
languages are our tools)

We should always remember that we are designing information access layers to impact
the user's current knowledge state that the user herself wants to change; to augment,

  72
 

confirm or modify.

Terminology, and lexical normalization against terminology, clarify this whole process of
communication. The information transaction breaks when there is a mismatch in either
the sender's or recipient's lexical choices or knowledge. In an intranet, the mismatch is
between the lexicon of the user and the system. Not only is the overall model similar, in
ways that we can work with effectively in our information access design, it is also
identical with the outcomes that we would like to occur in intranet information
transactions.

What Degrades Effective Communication?

Last, but definitely not least, we are rarely allowed to forget that successful transmission
of the message can fail. The type of failure that concerns us as information access
designers is the instance of incompatibility between sender's and receiver's lexical and
knowledge structures. First, the chosen language must be appropriate.

"The sender must choose a language and sublanguage which he assumes the recipient to
have command of; the recipient must be able to recognize and understand the linguistic
forms chosen in order to analyze the message." [page 104]

Sager is clear how failure of communication can occur.

"The linguistic forms to be used in a message must therefore be planned at every level in
such a way that the greatest degree of clarity can be achieved by means of systematic and
transparent designations of concepts and by the avoidance of ambiguity at every level of
expression." [page 104]

And again.

"Accurate transmission can be impeded by incompatibility of the sender's or recipient's


lexical and knowledge structure … the analytical use of language, i.e. the application of
inference rules in the process of comprehension, works only if designations are regularly
patterned and if both interlocutors know the rules of designation. The linguistic forms to
be used in a message must therefore be planned at every level in such a way that the
greatest degree of clarity can be achieved by means of systematic and transparent
designations of concepts and by the avoidance of ambiguity at every level of expression."
[page 104]

Conclusion

Terminology is a "big" subject. And an exhaustive discussion of how to apply terminology

  73
 

knowhow to enterprise intranets is too big a subject for one essay. Juan Sager's book is
one of the fundamental texts of "classic" terminology work. Because of this, assimilating
the ideas it contains, and re-purposing them, is a solid foundation for beginning any
project to build the deep layers of information access design.

Lots of book reviews conclude with a recommendation that the book is question is
"essential reading" for such and such an audience. This review concludes differently. "A
Practical Course in Terminology Processing" really is essential reading for project and
program managers who are going to have to implement lexical projects in information
access environments. More than just essential reading, there are models here that can be
re-engineered and applied to lexical projects. Their application will make the difference
between success and still (after all those months of) not reaching that success. For those
who need to come to terms with terms, this is the one.

It is also more than just a book to read. It is a resource that you can refer to again and
again. Who knows how many expertly designed taxonomy and categorization projects
are currently successfully underway? But this book is how to begin, how to create one of
those, and how to stay successful to the end.

  74
 

About semanthink.com

About Me

[This last section is the content item about my approach which was in the About
directory of the site – it’s old, 2002, but accurately reflects how I described myself back
then. It is included for completeness’ sake only. I describe myself a bit differently today
on renniewalker.com ]

I'm Rennie Walker. I'm a consultant with both KAPS Group and KCurve. I specialize in
information access design within two particular layers of the total knowledge
management problems/solutions environment. I prefer to call these two issues
"knowledge organization" and "subject recognition", and I use these expressions
constantly. Knowledge organization is the design of conceptual maps, or representations,
of domains. These knowledge organization schemes are usually formalized as
taxonomies or ontologies. Subject recognition is the ability of applications (or people) to
recognize the concepts that documents discuss. These concepts are specified by, and
organized in, the taxonomy.

KAPS Group is a knowledge architecture consulting company. KCurve is a consulting


group that designs information access solutions for internet and corporate intranet
content. I work with the clients of these two companies to design and implement
information access solutions. I also work with the taxonomy/categorization and search
technology partners of KAPS Group and KCurve to implement, lexically and
ontologically, their taxonomy, classification and categorization applications.

I am a Convera Certified Taxonomy Developer. I have taken the InXight Technical


Partner SmartDiscovery program. I am both a psychologist and information scientist. I
have a Masters in psychology from Edinburgh University, in Scotland, and following that
I completed a post-graduate Diploma in Librarianship, from what is now a unit of the
University of Wales.

About semanthink.com

The aim of semanthink.com is that is that it will contain all that I know and use in my
client work, about

what taxonomies are, what they actually "do", and how they do this
the project design process for building taxonomies
using a taxonomy as a specification for creating subject recognition executables
(with a variety of categorization applications) so that we can associate documents
with taxonomy nodes
writing and modeling sets of lexically-based subject recognition executables for
content categorization applications that can use them
teaching all of the above

semanthink.com is my personal professional site. Here I take the space to write in depth
about subjects that can be very "small". I personally see a requirement for this focus on

  75
 

the "small" some times with the clients I work with. Though it is my personal
professional site, I am hugely indebted to the colleagues that I discuss ideas and client
solutions with nearly every day. And obviously, to our clients as well, who talk us through
their information access design issues and who give us professional inspiration.

The semanthink.com Audience

Semanthink.com is written for those tasked with making strategic decisions about
enterprise content organization and categorization, and those tasked with implementing
such solutions. The intent is that CIOs, content managers, IT managers, project and
program managers, and technical groups involved in enterprise knowledge organization
and subject recognition, will find something useful at some time in the semanthink.com
site. semanthink.com is not meant to be hip, or new, or catalytic. It is designed as a
straightforward resource for these people. "These people", you, may have a requirement
to find the kinds of questions you need to ask of yourself and your applications and
professional services vendors, so you can implement the knowledge economy. Or, if you
already have the questions you may be looking for some answers.

Consolidation and Memory in an Information-Rich World

One of the issues that interests me as a psychologist is how we deal with (and can
possibly deal with) this information-rich world that we are constantly adding to and
building on. The psychological tasks that press to the foreground for me in my own
everyday work are consolidation of personal knowledge and remembering
what we do actually know. How can we continually consolidate what we know so
that we become steadily "more" expert on our chosen domain? How can we remember
the breadth and depth of what we "know" about our chosen domain?

With this in mind, the semanthink.com site is quite consciously designed around the
theme of consolidation that the individual needs. In a very real sense, the
semanthink.com site is part of my professional memory. And, because we live in a wired
world, semanthink.com becomes a memory object that you can plug into, if you wish.
Part of remembering and knowing what we know is organization of what we want to
know. Which applies to every knowledge worker in the world. So, semanthink.com also
focuses on consolidation of and around some of the key ideas critical to connecting
users to the documents (and only the documents) that they really need in their moment
of need. So many of us across the knowledge economy are working together on an
antidote to information disorder and information misorder. This is intended to be part of
the antidote. So the focus will remain always on the same small number of themes.

Themes

A number of themes inform the way that I work.

From psychology comes a focus on cognition. People communicate, categorize, create


ontological structures and semantics, and build languages. These are all native human

  76
 

skills. Hence my focus on taxonomies as communication tools. And on subject


recognition giving knowledge workers in special subject areas the semantics (and
accuracy) that they themselves use everyday in peer communications.

From information science comes a focus on what information is, and how to organize
its origination and description to align it with human cognitive structures - and so create
a process. In fact, I see information science as a discipline in service to human cognition.

From my time as an information management professional in the strategy consulting


and investment banking communities comes a focus on the business problems of
finding information. These years gave me an acute familiarity with the ease with which
facts become lost in information disorder and information misorder. These
years also gave me an acute sense of the role that lexical variance plays, particularly in
the English language which is the one I am most proficient at, in communicating or
trying to find information.

My years at Sageware Inc., a Mountain View, CA, -based early thought leader in the
content categorization space, reprised a great deal of what had gone on before. At
Sageware we worked to meld a lexical approach to subject recognition, with a logical
functional "shell" of our application, together with a professional services practice that
gathered knowledge organization requirements from our customers. My work since then,
and today, is putting aspects of this three-part solution model together to create
knowledge organization and subject recognition tools that work for special subject
communities.

Project management design became a focus out of necessity, after a while. There's an old
story. I heard it as an Irish joke, but here's a (slightly) Scottish version. I've also come
across variants in collections of Sufi and Zen stories, which is probably where its true
origination lies.

"A youngster is lost in the countryside, and is wandering hither and thither, trying to get
to the destination agreed yesterday with friends. The youngster scrambles through yet
one more hedgerow and sees an oldster sitting on a stile. 'Oh, I hope you can help me',
says the youngster. 'I'm lost and want to get to Such-and-Such a place'. The oldster sits
chewing and musing for a bit. Then turns to the youngster. 'Aye, well if Ah were you,
laddie/lassie (you choose the interactive bit here) Ah wuldnie start frae here.'"

The semanthink.com Purpose

It's always good to know where you're going. It's equally and exactly important to know
where you begin from. I see project design, when the project aims to deliver knowledge
organization or subject recognition deliverables, as beginning with a precise and total
perspective. This precise and total perspective includes, particularly, models and
methodologies, and knowledge we can take to design working models and
methodologies. Otherwise we start from the wrong place. All the semanthink.com
content aims to make some applicable contribution to this precise and total perspective
that informs the design of these kinds of projects. Some of this content will appear richer
than others in this usefulness, but what is required always depends on the need and the
moment.

  77
 

The semanthink.com Rationale

My focus on information access design is with two particular layers of the total
information access problems/solutions environment -

The Knowledge Organization Layer


The Subject Recognition Layer

I see the issues that arise from the lack of these two functioning layers as creating an
information access bottleneck. There is only so far we can go in creating an information
access environment without addressing and solving the issues created by the absence of
these two layers. Until we do, we hit a glass ceiling, to do with precision and recall, and
the absolute extremes of these where finding particular information is actually
systemically impossible.

The functional elements of the knowledge organization layer are all metadata.
Taxonomies, faceted taxonomies, ontologies and other knowledge organization
metaphors are all ways to represent a domain to users who work within that domain.
Other knowledge organization tools include vocabularies, terminologies and
classification schemes. The individual functional components of each these tools must
meet the total set of user requirements. The set of user requirements can be both wide
and complex, but will normally include such parameters as domain scope, formal
conceptual relationships in the representation model and the use of terminology in
labeling etc.

The functional elements of the subject recognition layer are essentially sets of lexical
elements (often in combination) that categorization platforms, of one kind or another,
use to identify the subjects that any document discusses. Through subject recognition we
can associate documents with taxonomy nodes and with classification schemes.

semanthink.com tries quite consciously to take, and to give, a different perspective on


taxonomies (and knowledge organization schemes in general). Similarly with optimizing
subject recognition platforms. We also work hard to make any paradigms that seem to be
at play in early stages of project design explicit (so that we can discuss them).

Why Focus on Project Design?

We absolutely require robust project design. We require information access project


designs that tightly couple a highly analytical methodology with, particularly, project
activity definitions and project risk mitigation. For instance, if we don't have an analysis
of how taxonomies "work", then how can we adequately (or at all) define project
activities to re-engineer data to make them work? If we don't have a roadmap for what
we are doing, then risk lies around every corner (or in every email and meeting). Even in
"simple" taxonomy development projects risk arises from different types of variables.
Conceptual variance, arising from words (like "taxonomy") that carry different meanings

  78
 

for different participants, leads to risk of confusion. The "Black Box Effect", of not
breaking problems down into the optimal elements to work with, leads to the risk of
being unable to define exactly what to do (and to do it).

Implementing the Knowledge Economy: the Business Thesis

As we resolve the business problems associated with implementing integrated knowledge


organization and subject recognition solutions, then we will be in a position to effectively
implement the Knowledge Economy. It's my deep perception that enterprise content that
is not optimally organized and categorized, coupled with the scale of content creation
and content purchasing within the enterprise, causes systemic business problems around
the effective use and, equally paramount, re-use of unstructured information resources.

Perspective: The Metaphor of the Last Mile

Information access design, as a core business function, has its own "Last Mile" issue.
Within telecoms/media the Last Mile verbal shorthand allowed us, at the time, to discuss
and "issue-ize" the problems of getting data from the efficient backbone pipes into
customers homes - this distance being the Last Mile.

Within information access design the issue is analogous. We call our equivalent of pipes,
"shells". So, search/indexing engines, categorization applications and portal and
content/document management applications are all shells in that they provide the
enterprise-wide infrastructure platform for many different kinds of information
transactions, information workflow processes and information presentation outputs and
metaphors, all based upon a robust features and functions set. But, they are purchased
"empty". Empty of the content to immediately, off-the-shelf, organize and recognize the
topics and concepts in your content that are required by your user communities. The
information access Last Mile is robust knowledge organization and subject recognition
functionality. And these predicated upon considered methodologies and project designs.

Perspective: The Motivation of the Last Mile

It is only when we can associate, with a high degree of precision, documents to concepts,
or concepts to documents, that we then have the ability to connect information users to
documents. This is our business motivation.

These tasks of knowledge organization and subject recognition, that a knowledgeable


person can personally execute easily and precisely on small numbers of documents, is a
major implementation effort when a platform of applications is required to execute the
same kinds of knowledge organization and subject recognition tasks at enterprise scale.
And "automatically".

The complication in all of this, of course, from the point of view of knowledge
organization, is knowing what subjects, how to relate them and what to label them. And
the corollary complication in all of this, from the point of view of subject recognition, is

  79
 

the syntactic and semantic nature of the languages in which we write our textual
communications to each other. This is no small matter when we want to implement
robust, precise and effective subject recognition capabilities.

semanthink.com, most categorically, takes a The-Future-is-Semantic kind of perspective


on the future of what will lead to enterprise knowledge management success. Which is
why I focus on that which absolutely needs to be incorporated into our working lives (as
information access designers) in the way of models of communication, cognitive
categorization, terminology and special languages. We need to either program our
semantics into our solutions, or implement solutions that implicitly leverage our
semantics (accurately and usefully).

Competitive Advantage

Myself and colleagues work with companies to creatively and powerfully solve business
problems. We consider all of what we do to be in the service of competitive necessity,
competitive differentiation or competitive advantage for those that we work for.

We take the view that in the knowledge economy managing companies for ongoing
success will mean that management in general and management of knowledge will align
more closely than they do at the present, and at increasingly senior levels within the
enterprise. We are aware of the distinction between vision and evangelism, but like the
thesis stated this simply.

We view ourselves as business problem solving experts. We just happen to specialize in


knowledge organization and subject recognition. One of my favorite business mantras
currently is Knowledge Organization is Power.

Easy to Publish, Hard to Find

In the Knowledge Economy's current state, it is both simple and easy to publish/send
any document. But it is not nearly as simple, often, given the current status of
information and its management in most enterprises, to find and retrieve any particular
item of content, or any particular related set of content items.

Knowledge workers require and want to find documents for what the document contains
- facts, ideas, theories, opinions, analyses, predictions, forecasts and so on. Documents
are containers of these. Documents discuss concepts. And the code that these concepts
are written in is text. So, knowledge workers who seek discussion of concepts need a
means to retrieve the document, or set of documents, that discuss whatever the subject
of their current interest and requirement is.

There is a manifest decoupling in the corporate knowledge economy between the two
information tasks of publishing and finding. This decoupled supply and demand cycle is
where the high-level systemic breakdown in the knowledge economy takes place. That
publishing is easy, is good. That finding is hard, is bad.

  80
 

Affirmation

This Breath

In this Moment

I dedicate to the Benefit of All Creation.

This Moment

In this day

I dedicate to the Benefit of All Creation.

This Day

In this Life

I dedicate to the Benefit of All Creation.

This Life

Of this Thread of Eternal Creation I Am

I dedicate to the Benefit of All Creation.

Acknowledging that Gratitude is my Sustenance,

Acknowledging that Giving is my Receiving,

Acknowledging that “my” Stillness

Is “my” Expansion,

Acknowledging that my Heart is the One Heart,

And that my “mind” is of the One Mind,

I dedicate “myself” to All Creation.

Om Shanti Shanti Shanti Om Om Shanti

  81

You might also like