Professional Documents
Culture Documents
l
Challenges
h ll
and
d
What Computational Intelligence
Techniques
h
May Offer
ff
Ah-Hwee Tan
(http://www.ntu.edu.sg/home/asahtan)
School of Computer Engineering
Nanyang Technological University
Big Data Analytics Symposium
London, UK
13 September 2013
Outline
Big Data Analytics
Computational Intelligence Techniques
Web Data Analytics
Sources of Big
g Data
Traditionally, mostly produced in scientific fields such as
astronomy meteorology,
astronomy,
meteorology genomics physics
physics, biology
biology, and
environmental research.
With rapid
p development
p
of IT technology
gy and the
consequent decrease of cost on collecting and storing
data, big data has been generated from almost every
industry and sector as well as governmental department
department,
including retail, finance, banking, security, audit, electric
power, healthcare.
Recently, big data over the Web (big Web data for short),
which includes all the context data, such as, user
generated contents,
contents browser/search log data
data, deep web
data, etc.
Value
Metric
1000
kB
kilobyte
10002
MB
megabyte
10003
GB
gigabyte
10004
TB
terabyte
10005
PB
petabyte
10006
EB
exabyte
10007
ZB
zettabyte
10008
YB
yottabyte
Velocity
Time sensitive data, data that
grow exponentially
g
p
y or even in
rates that overwhelm the wellknown Moore's Law
Value
Metric
1000
kB
kilobyte
10002
MB
megabyte
10003
GB
gigabyte
i b t
10004
TB
terabyte
10005
PB
petabyte
10006
EB
exabyte
10007
ZB
zettabyte
10008
YB
yottabyte
Variety
V i t
From structured data into semi-structured and
completely unstructured data of different types
types, such as
text, image, audio, video, click streams, log files,
Veracity
V
i
Are the results meaningful for the given problem
space?
Volatility
How long do you need to look/store this data?
Computational Intelligence
Computational Intelligence
Flagship Events of
Computational Intelligence
World Congress on Computational Intelligence
(Australia 2012, Beijing 2014)
y p
on Computational
p
Intelligence
g
IEEE Symposium
(Singapore 2013, Florida, USA 2014)
IEEE Symposium on Computational Intelligence
in Big Data (IEEE CIBD'2014)
Self-Organizing
S
lf O
i i N
Neurall
Networks for
P
Personalized
li d W
Web
b Intelligence
I t lli
Organize
(clustering/categorizing)
Putting things in perspectives
Track
Constant monitoring
Approaches to
Organizing/Analyzing
Clustering
Cl stering
Organizing information into groups based on
similarity functions and thresholds
e.g. BullsEye, NorthernLight, Vivisimo
Categorization
g
Organizing information into a predefined set of
classes
e.g. Yahoo!, Autonomy Knowledge Server
Which is better?
Clustering
g
Pros
Unsupervised/self-organizing, require no training
or predefinition of classes
Able to identify new themes
Cons
Users have no control
Ever changing cluster structure
Difficult to navigate and track
Categorization
g
Pros
Good control on classes
Every info assigned to one or more classes
of interests
Cons
R
Require
i llearning
i ((supervised)
i d) and/or
d/
definition of classification rules/knowledge
Every info has to be assigned to one or
more classes
Good control but lack flexibility to handle
new information
User-configurable Clustering
(Tan & Pan,
Pan PAKDD 2002)
Information organization
organi ation and content
management
Online incremental clustering + user
userdefined structure (preferences)
Reduces to a clustering system if no user
indication given
Allows personalization in a direct
direct,
intuitive, and interactive manner
Control + flexibility
F1
F1
a
Information Vector
Preference Vector
FOCI Architecture
Intranet/
Internet
Users
CI Portfolio
Domain-Specific
Knowledge
Content
Management
Content
Publishing
g
Content
Analysis
Visu
ualization Front End
d
Content
Gathering
Personalization Functions
Marking/labeling (selected) clusters
Personal interpretation
Inserting Clusters
Indicate preference on groupings
Merging clusters
Indicate preferences on similarities
Splitting clusters
Indicate preferences on differences
...
Information Clustering
g
A portfolio created
byy a meta-search of
4 search engines
with a query on
Text Mining
A Personalized Portfolio
after <=19 p
personalization operations
p
(mainly labeling and creating clusters)
Organizing
g
g New Information
Without the
Personalized
Portfolio
Based on
Personalized
Portfolio
Summary
y
A fusion neural network algorithm, called fusion ART, has
been
proposed
for
integrating
clustering
and
categorization
Has been applied to competitive
competiti e intelligence on the web.
eb
Comparing with
advantages in
existing
works,
fusion
ART
has
Lei Meng,
g Ah-Hwee Tan and Dong
g Xu
IEEE Transactions on Knowledge and Data Engineering, 2013
33
Introduction
The ppopularity
p
y of social websites leads to greatly
g
y
increase of web multimedia documents
Massive number Billions of images and articles online
Diversity Diverse content and booming emerging topics
Multi-modal descriptors images, text, category, tags,
Keywords
comments
Category
Birds
Images
from
Wild, bird, beach,
Surrounding
tree, vacation,
text
animal, mar, sunny,
playa, nayarit,
arena,ave, water,
vacaciones,
i
hollyday,
pelicano.
34
Introduction
Clustering of web multimedia data is challenging
Scalability
S
l bili to big
bi data
d
Difficulty in integrating multi-modal feature data
Ambiguity in deciding the number of categories
Rich but noisy meta-information semantic gap of images, noisy
tags
Bi d
Birds
Wild, bird,
beach, tree,
vacation,
animal mar,
animal,
mar
sunny, playa,
nayarit, arena,
ave, water,
vacaciones,
hollyday,
pelicano.
B h
Beach
Ocean, blue,
sea, summer,
vacation, sun,
man, beach,
b h
water, yellow,
fun, sand,
pplay,
y funny,
y
adult, humor,
lifestyle,
sunny, resort. 35
Problem Statement
We define the theme discovery of web multimedia data
as a heterogeneous
h
d
data
co-clustering
l
i problem,
bl
which
hi h
identifies the semantic categories of data patterns
through the fusion and recognition of multiple types of
features.
Multiple
Apple
Apple
Descriptions
Category
Fruits
Products
Movies
Tag
User
Description
Surrounding
text
36
Proposed
p
Approach
pp
A self-organizing neural network approach to Heterogeneous
Data Co-clustering
Based on Fusion Adaptive Resonance Theory (Fusion ART)
Fuse arbitrary number of feature modalities
Adaptively tune the weights for different feature modalities
Two different learning function for primary data, such as
images and articles, and meta-information to handle short
and nois
noisy text
te t
Incremental fast learning
Do
D not need
d to give
i the
h number
b off clusters
l
37
Experiments
NUS-WIDE data set
36784 images of 18 categories
Visual features: Grid color moment, Edge direction histogram, and
wavelet texture
Textual
T t l features
f t
off surrounding
di text:
t t 1142 words
d (7 words
d per image
i
on
average)
20 Newsgroups
g p data set
12826 text documents of 10 categories
Textual features of document content: over 60k words (800 words per
document on average)
Textual features of category: 3 labels per document on average
38
GHF-ART with the adaptively tuned weight values _SA achieves the best
performance in 5 classes and the overall performance, and achieves close
performance with the best results obtained by fixed weight values
39
Textual features of surrounding text are assigned higher weights than visual
features
Thee value
v ue of
o _S
SA sstabilizes
b es in [0.7,
[ .7, 0.8]
. ] with
w thee increase
c e se of
o patterns
p e s
Big fluctuation may be resulted by the generation of new clusters
40
41
GHF-ART and Fusion ART incur very small increase of time cost
For 23284 images, GHF-ART complete the clustering process in 10 seconds
42
Experiments
p
on 20 Newsgroups
g p Data Set
Clustering performance comparison using document content
andd category
t
information
i f
ti
Summary
y
A Heterogeneous data co-clustering algorithm, called GHFART is proposed to discover the themes of web multimedia data
ART,
via their rich but heterogeneous descriptors.
Comparing with existing works,
works GHF-ART
GHF ART has advantages in
Strong noise immunity A learning function of meta-information is
proposed to handle noise
Adaptive
Ad ti channel
h
l weighting
i hti A well-defined
ll d fi d weighting
i h i algorithm
l i h is
i
proposed to identify the important feature modalities for a better fusion of
multi-modal features for overall similarity measure;
Low
L
ti
time
complexity
l it GHF-ART
GHF ART performs
f
real-time
l ti
search
h and
d match
t h
of patterns resulting in a linear time complexity for big data;
Incremental clustering manner GHF-ART may adapt to dynamic
web
b multimedia
lti di data
d t sett by
b incrementally
i
t ll clustering
l t i new patterns
tt
b d
based
on the learnt cluster structure without referring to the old data.
44
Aging in Place:
Opportunities and Challenges
Ah-Hwee Tan
((http://www.ntu.edu.sg/home/asahtan)
p
g
)
School of Computer Engineering
Nanyang Technological University
Aging
g g in Place
the ability to live in one's own home and community
safely, independently, and comfortably, regardless of
age, income, or ability level - Center for Disease
Control,, Dec 2011
46
Motivation
Global aging population creates silver challenges
Most adults would prefer to age in place
78 percent of adults between the ages of 50 and 64
report that they would prefer to stay in their current
residence as they age
Unobtrusive sensing device detects: the elder keeps walking around at an irregular
pace.
Social signal processing indicates: the elder has been silent for an unusually long
time.
Cognitive
Analysis
result
lt
Your
mother may
be feeling
anxious
now
now
I need to
call my
y
mother
now
Silver Challenges
g
49
Vision
To enable
T
bl elderly
ld l to
t maintain
i t i an active,
ti
h lth and
healthy
d
engaging life style in their own homes supported by
an age-friendly
g
y intelligent
g
environment, pprovidingg allround comprehensive tender care
Round-the-clock day-to-day health and wellness
monitoring
i i
Cognitive Support and recommendation to products
and services
Companionship and emotional support
Support for maintaining/stimulating social
interaction
50
and
- Proactive,
P
i naturall iinteraction
i
51
Approach
pp
and Methodology
gy
To support
pp active livingg off elderlies
through an intelligent multi-agent environment
with ubiquitous access, natural interface, and allrounded
d d comprehensive
h i care
Key Technologies
A Multi-Agent Collaborative
Care Environment
Isabel
(Personal Nurse)
Small talk
Recommendations
for healthcare
products and services
Alf d
Alfred
(The Butler)
Small talk
User modeling
Social and travel
advisory
Frank
(Robot Dog)
Activity sensing
Pattern modeling
53
Why
y Multi-Agent?
g
Unobtrusive sensing and monitoring agents
of different characteristics and capabilities
Ubi
Ubiquitous
i
access to information
i f
i and
d
services agents in different platforms and
locations
Threes a p
party
y more opportunities
pp
for
cognitive stimulation and social interaction
54
Cognitive
C
ii S
Support information
i f
i and
d
recommendation on (healthcare) products, services,
skills
k
and
nd activities
ct v t
55
56
Adaptive
p
User Modellingg
Identity and profile
Interests and preferences
Behaviour model: Time,
Ti
space,
p
activity
ti it
Knowledge and skills
Social
S i l network:
k Family
l and
d friends
f
d
Meth0ds for Model Building
Explicit: User specification
Implicit: User actions, choices, conversation
57
Cognitive Support:
Product/Service Recommendation
Domain knowledge:
Healthcare, Travel, Cooking
Delivery modes:
- Question & Answer
- Proactive
P
i recommendation
d i
- Conversation
Personal
P
l Touch:
T h
Personalized, Context sensitive, small talks
58
Challenges in
y
Bigg Livingg Analytics
Volume huge amount of data through bio
sensing, motion sensors, wearable/mobile sensors
for health monitoring and activity tracking
59
Thank you!
JOINT UBC-NTU RESEARCH CENTRE