Professional Documents
Culture Documents
Representations of
Emoji
IUC 36
Presenter:
Dr. Vinton G. Cerf
Vice President and Dr. Cerf will discuss the problem of curating digital
Chief Internet content on the order of centuries. Unicode has a role to
Evangelist, Google play although there are very complex issues relating to
format and structure of digital objects, interpretation of
content, intellectual property management, perhaps
even patents and other legal framework questions. The
problems are both technical and legal.
Outline
● A brief history of emoji
● Encoding: Shift JIS and Unicode
● Mapping and unification
● Emoji in Unicode 6
● Problems:
○ variation selectors
○ regional indicators
○ counting
● Best practices
Emoji down the ages
What if you were tasked with preserving the following texts
to be passed down for posterity?
Emoji down the ages
What if you were tasked with preserving the following texts
to be passed down for posterity?
awesome! :-)
Emoji down the ages
What if you were tasked with preserving the following texts
to be passed down for posterity?
awesome! :-)
yay! ☺
Emoji down the ages
What if you were tasked with preserving the following texts
to be passed down for posterity?
awesome! :-)
yay! ☺
History:
● popularised on Japanese mobile devices
● extension of Japanese character sets
● carrier-specific standards
"Early" history in Japan
Three major cell phone operators supported emoji:
● NTT DoCoMo
● au/EZweb by KDDI
● SoftBank
Problems:
● each operator had its own set of emoji
● they were encoded differently
● no interoperability between them
Examples of emoji
Supplementary PUA-A:
0xF0000 - 0xFFFFF
Supplementary PUA-B:
0x100000 - 0x10FFFD
Encoding is carrier-specific
Each carrier used different values to encode emoji. For
example...
NTT DoCoMo:
● Shift JIS: 0xF89F - 0xF9FC
● Unicode: 0xE63E - 0xE757 (BMP PUA)
● JIS points for e-mail
Sent:
Displayed:
Outline
● A brief history of emoji
● Encoding: Shift JIS and Unicode
● Mapping and unification
● Emoji in Unicode 6
● Problems:
○ variation selectors
○ regional indicators
○ counting
● Best practices
Carrier-to-carrier mapping
SoftBank Disney au by KDDI DoCoMo
Source: SoftBank
Emoji support spreads...
Emoji began to be supported in web mail and other
devices:
● Yahoo! Japan Web Mail (2006)
● Gmail (2008)
● iPhone 2.2 (2008)
● Android apps (2009)
Google emoji
Provides a unified representation of the three emoji sets:
● union of all the emoji characters
● cross-mapping
○ combine same character
○ a few dozen: existing Unicode
● about 700 new characters
KDDI
○ using PUA
○ outside BMP (U+FExxx)
SoftBank
Idea:
● support legacy systems by
DoCoMo
converting between other
encodings and Unicode
Google PUA mapping table
Converting at boundaries
Gmail KDDI
(Google PUA)
DoCoMo
SoftBank
Convert to/from Unicode
Emoji in Gmail
Uses mapping table to convert
between PUA and carrier encoding.
Authors:
● Markus Scherer, Mark Davis, Kat Momoi, Darick Tong
(Google)
● Yasuo Kida, Peter Edberg (Apple)
The Proposal
Restrictions:
● Source separation rule (strict rule)
● Reuse existing Unicode symbols
● Separate generic symbols
● Abstract characters (no specific colours or animation)
● Unify semantically identical symbols, but:
disunify visually similar but semantically different
symbols
● Unify Unicode with least-marked most-common symbol
Source: Unicode Technical Committee Subcommittee on Encoding of Symbols
Proposal accepted
In 2010, the new emoji were accepted into Unicode 6.
Germany/Ireland counter-proposal:
● encode 256 characters for ISO 3166 country codes
Compromise:
● encode twenty-six "regional indicator symbols" (A-Z)
● spell out the two-letter country codes
Possible ambiguity
... ...
or ... ...?
Thank you!
Q&A