You are on page 1of 16

Affective Storytelling

Automatic Measurement of Story Effectiveness from


Emotional Responses Collected over the Internet
Daniel McDuff
PhD Proposal in Media Arts & Sciences
Affective Computing Group, MIT Media Lab
djmcduff@media.mit.edu

June 6, 2012

Executive Summary
Emotion is key to the effectiveness of narratives and storytelling whether it be in influencing
memory, likability or persuasion. Stories, even if fictional, have the ability to induce a genuine
emotional response. However, the understanding of the role of emotions in storytelling and ad-
vertising effectiveness has been limited due to the difficulty in measuring emotions in real-life
contexts. Video advertising is a ubiquitous form of a short story usually 30-60 seconds designed
to influence, persuade and engage, in which media with emotional content is frequently used and
this will be one of the focuses of this thesis. The lack of understanding of the effects of emotion in
advertising results in large amounts of wasted time, money and other resources.
Facial expressions, head gestures, heart rate, respiration rate and heart rate variability can in-
form us about the emotional valence, arousal and attention of a person. In this thesis I propose to
demonstrate how automatically detected naturalistic and spontaneous facial responses and physio-
logical responses can be used to predict the effectiveness of stories.
I propose a framework for automatically measuring facial and physiological responses in addi-
tion to self-report and behavioral measures to content (e.g. video advertisements) over the Internet
in order to understand the role of emotions in story effectiveness. Specifically, I will present anal-
ysis of the first large scale data of facial, physiological, behavioral and self-report responses to
video content collected “in-the-wild” using the cloud. I will develop models for evaluating the ef-
fectiveness of stories (e.g. likability, persuasion and memory) based on the automatically extracted
features. This work will be evaluated on the success in predicting measures of story effectiveness
that are useful in creation of content whether that be in copy-testing or content development.

i
Affective Storytelling
Automatic Measurement of Story Effectiveness from
Emotional Responses Collected over the Internet
Daniel McDuff
PhD Proposal in Media Arts & Sciences
Affective Computing Group, MIT Media Lab

Thesis Committee

Rosalind Picard
Professor of Media Arts and Sciences, MIT Media Lab
Thesis Supervisor

Jeffrey Cohn
Professor of Psychology
University of Pittsburgh

Ashish Kapoor
Senior Research Scientist
Microsoft Research, Redmond

Thales Teixeira
Assistant Professor of Business Administration
Harvard Business School

ii
Abstract
Emotion is key to the effectiveness of narratives and storytelling whether it be in influ-
encing memory, likability or persuasion. Stories, even if fictional, have the ability to induce a
genuine emotional response. However, the understanding of the role of emotions in storytelling
and advertising effectiveness has been limited due to the difficulty in measuring emotions in
real-life contexts. Video advertising is a ubiquitous form of a short story usually 30-60 sec-
onds designed to influence, persuade and engage, in which media with emotional content is
frequently used and this will be one of the focuses of this thesis.
Facial expressions, head gestures, heart rate, respiration rate and heart rate variability can
inform us about emotional valence and arousal and attention. In this thesis I propose to demon-
strate how automatically detected naturalistic and spontaneous facial responses and physiolog-
ical responses can be used to predict the effectiveness of stories. The results will be used to
inform the creation and evaluation of new content.
I propose a framework for automatically measuring facial and physiological responses in
addition to self-report and behavioral measures to content (e.g. video advertisements) over the
Internet in order to understand the role of emotions in story effectiveness. Specifically, I will
present analysis of the first large scale data of facial, physiological, behavioral and self-report
responses to video content collected “in-the-wild” using the cloud. I will develop models for
evaluating the effectiveness of stories (e.g. likability, persuasion and memory) based on the
automatically extracted features.

1 Introduction
There remains truth in Ray and Batra’s [28] statement: “an inadequate understanding of the role
of affect in advertising has probably been the cause of more wasted advertising money than any
other single reason.” This statement applies beyond advertising to many other forms of media and
is due in part to the lack of understanding about how to measure emotion. This thesis proposal
deals with evaluating the effectiveness of emotional content in storytelling and advertising beyond
the laboratory environment using remotely measured facial and physiological responses. I will an-
alyze challenging ecologically valid data collected over the Internet in the same contexts in which
the media would normally be consumed and build a framework and set of models for automatic
evaluation of effectiveness based on affective responses.
The face is one of the richest sources of communicating affective and cognitive informa-
tion [11]. In addition, physiological reactions, such as changes in heart rate and other vital signs,
are partially controlled by the autonomic nervous system and as such are manifestations of emo-
tional processes [36]. Recent work has demonstrated that both facial behavior and physiological
information can be measured directly from videos of the human face and as such emotion valence
and arousal can be measured remotely.
Previous work has shown that many people are willing to engage and share visual images from
their webcam over the Internet and these images and videos can be used for training automatic
algorithms for learning [32, 34, 22]. Moreover, webcams are now ubiquitous and have become a
standard component on many media devices, laptops and tablets. In 2010, the number of camera
phones in use totaled 1.8 billion, which accounted for a third of all mobile phones1 . In addition,
1
http://www.economist.com/node/15865270

1
about half of the videos shared on Facebook every day are personal videos recorded from a desktop
or phone camera2 .
Traditionally consumer testing of video advertising, whether by self-report, facial response or
physiology, has been conducted in laboratory settings. Lab-based studies, while controlled, are
subject to bias from the presence of an experimenter and other factors (e.g. comfort with the con-
text) unrelated to advertising interest that may impact the participants emotional experience [35].
Conducting experiments outside a lab-based context can help avoid such problems.
Self-report is the current standard measure of affect, where people are typically interviewed,
asked to rate their feeling on a Likert scale or turn a dial to quantify their state (affect dial ap-
proaches). While convenient and inexpensive, self-report is problematic because it is also subject
to biasing from the context, increased cognitive load and other factors of little relevance to the
stimulus being tested [30]. Self-report has a number of drawbacks including the difficulty for peo-
ple to access information about their emotional experiences and their willingness to report feelings
even if they didn’t have them [8]. For many the act of introspection is challenging to perform
in conjunction with another task and may in itself alter that state [21]. Although affect dial ap-
proaches provide a higher resolution report of a subject’s response compared to a post-hoc survey,
subjects are often required to view the stimuli twice in order to help the participant introspect on
their emotional state.
Unlike self-report, facial expressions and physiological responses are implicit, non-intrusive
and do not interrupt a person’s experience. In addition, as with affect dial ratings, facial and
physiological responses allow for continuous and dynamic representation of how affect changes
over time. This represents a much richer data than can be obtained via a post-hoc survey. A
small number of marketing studies consider the measurement of emotions via physiological [6],
facial [18] or brain responses [3]. However, these are invariably performed in laboratory settings
and are restricted to a limited demographic.
Advertising and online media is global: movie trailers, advertisements and other content can
now be viewed the world over via the Internet and not just on selected television networks. It is
important that marketers understand the nuances in responses across a diverse demographic and a
broad set of geographic locations. For instance, advertising that works in certain cultural contexts
may not be effective in others. A majority of the studies of emotion in advertising have only
considered a homogeneous subject pool, such as university undergraduates or a group from one
location. There is evidence to suggest that emotions can be universally expressed on the face [10]
and our framework allows for the evaluation of advertising effectiveness across a large and diverse
demographic much more efficiently than is possible via lab-based experiments.
The aim of the proposed research is to utilize a framework for measuring facial, physiological,
self-report and behavioral responses to commercials over the Internet in order to understand the
role of emotions in advertising effectiveness (e.g. likability, persuasion and sales) and to design
an automated system for predicting success based on these signals. This incorporates first-in-the-
world studies of measurement of these parameters via the cloud and allows the robust exploration
of phenomena across a diverse demographic and a broad set of geographic locations.
2
http://gigaom.com/video/facebook-40-of-videos-are-webcam-uploads/

2
2 Contributions
The main contributions of this thesis are described below:

1. To use a custom cloud based framework for collecting a large corpus of response videos
to online media content (advertisements, movie trailers, etc.) with ground truth success
(sharing, likability, persuasion and sales). To collect data from a diverse population to a
broad range of content.

2. To automatically analyze facial responses, gestures and physiological reactions using com-
puter vision algorithms.

3. To design, train and evaluate, a set of models for predicting key measures of story/advertisement
effectiveness based on facial responses, gestures and physiological features automatically
extracted from the videos.

4. To propose generalizable emotional profiles that describe an effective story/advertisement in


order to practically inform the development of new content.

5. To implement a system (demo) that incorporates the findings into a fully automated classi-
fication of a response to a story/advertisement. The predicted label will be the effect of the
story in changing likability/persuasion.

3 Background and Related Work


3.1 Storytelling, Marketing and Emotion
Emotion is key to the effectiveness of narratives and storytelling [15]. Stories, even if fictional,
have the ability to induce a genuine emotional response [14]. However, there are nuances in the
emotional response to narrative representations compared to everyday social dialogue [25] and
therefore context specific models need to be designed.
Marketing, and more specifically advertising, makes much use of narratives and stories. The
role of emotion in marketing and advertising has been considered extensively since early work by
Zajonc [37] that argued that emotions function independently of cognition and can indeed over-
ride it. It is widely held that emotions play a significant part in the decision-making process of
purchasing and advertising is often seen as an effective source of enhancement of these emotional
associations [24]. In advertising the states of amusement, surprise and confusion are of particular
interest and measurement of valence and arousal should be useful in distinguishing between these
states.
In a study of TV commercials, Hazlett and Hazlett [18] found that facial responses, mea-
sured using facial electromyography (EMG), were a stronger discriminator between commercials
and was more strongly related to recall than self-report information. Lang [20] found that pha-
sic changes in heart rate could act as an indication of attention and tonic changes could act as an
indication of arousal. The combination of physiology and facial responses is likely to improve
recognition of emotions further still.

3
Sales is arguably the key measure of success of advertising and predicting behavioral measures
of success from responses will be our main focus. However, the success of an advertisement varies
from person to person and sales figures at this level are often not available, therefore I will also
consider other measures of success, in particular liking, memory (recall and recognition) and per-
suasion. “Ad liking” was found to be the best predictor of sales success in the Advertising Research
Foundation Copy validation Research Project [17]. Biel [5] and Gordon [13] state that likability
is the best predictor of sales effectiveness. Explicit memory of advertising (recall and recognition)
is one of the most frequently used metrics for measuring advertising success. Independent studies
have demonstrated the sales validity of recall [17, 24]. Indeed, recall was found to be the second
best predictor of advertising effectiveness (after ad liking) as measured by increased sales in the
Advertising Research Foundation Copy validation Research Project [17].
Behavioral methods such as ad zapping or banner click through rates are frequently used meth-
ods of measuring success. Teixeira et al. [33] show that inducing affect is important in engaging
viewers in online video adverts and in reducing the frequency of “zapping” (skipping the adver-
tisement). They demonstrated that joy was one of the states that stimulated viewer retention in the
commercial. With our web based framework I can test behavioral measures (such as sharing or
click through) outside the laboratory in natural consumption contexts.

3.2 Facial Actions, Physiology, and Emotions


Charles Darwin was one of the first to demonstrate universality in facial expressions in his book,
“The Expression of the Emotions in Man and Animals” [9]. Since then a number of other studies
have demonstrated that facial actions communicate underlying emotional information and that
some of these expressions are consistent across cultures [10].
There are two main approaches for coding of facial displays, “sign judgment” and “message
judgment.” “Sign judgment” involves the labeling of facial muscle movements or actions, such as
those defined in the FACS [12] taxonomy, “message judgments” are labels of human perceptual
judgment of the underlying state. In this proposal I focus on “sign judgments”, specific action unit
intensities, as they are objective and not open to contextual variation.
The Facial Action Coding System (FACS) [12] is the most comprehensive labeling system.
FACS 2002 defines 27 action units (AU) - 9 upper face and 18 lower face, 14 head positions
and movements, 9 eye positions and movements and 28 other descriptors, behaviors and visibility
codes [7]. The action units can be further defined using five intensity ratings from A (minimum)
to E (maximum). More than 7000 AU combinations have been observed [29].
Physiological changes, such as heart rate (HR), respiration rate (RR) and heart rate variability
(HRV), are partially controlled by the autonomic nervous system, these are important in describing
emotional responses in the real world [16]. Physiological changes can contain information about
both the emotional arousal and valence of a person.
By measuring facial responses, gestures, HR, RR and HRV we are able to capture elements
of both the valence and arousal dimensions of emotion. In addition, we can capture levels of
viewer attention. These three dimensions are likely to be important in predicting effectiveness
from responses.

4
3.3 Remote Measurement of Facial Actions and Physiology
The first example of automated facial expression recognition was presented by Suwo et al. [31].
Over the past 20 years there have been significant advances in the state of the art in action unit
recognition [38]. Our preliminary work has shown that certain actions, such as smiles, can be
accurately detected in low resolution, unconstrained videos collected via the Internet [23].
We have shown that heart rate (HR), respiration rate (RR) and heart rate variability (HRV) can
be measured remotely using camera based technology [26, 27]. This method has been validated
on webcam videos with a resolution of 640x480 pixels and a frame rate of 15 fps (correlation
with contact sensor measurements for HR: r=1.00; for RR: r=0.94; for HRV HF and LF: 0.94;
all correlations p<0.001). Video of this quality should be obtainable over the Internet using our
framework.

3.4 Machine Learning for Affective Computing


The interpretation of facial and physiological responses is a challenging pattern recognition prob-
lem. The data are ecologically valid but noisy and require state of the art techniques in order to
achieve strong performance predicting measures of likability, persuasion or sales. The aim is to
take advantage of the huge quantities of data (1,000’s of video responses) that can be collected
using our web based framework to design models that generalize across a range of content, gender,
age and cultural demographics and a broad set of locations. In hierarchical Bayesian models prior
information can be used in a tiered approach to make context specific predictions. I plan to imple-
ment state-of-the-art models, the first examples to be trained on ecologically valid data collected
via the Internet.
Increasingly, the importance of considering temporal information and dynamics of facial ex-
pressions has been highlighted. Dynamics can be important in distinguishing between the under-
lying meaning behind an expression [2, 19]. I will implement a method that considers temporal
responses to commercials taking advantage of the rich moment-to-moment data that can collected
using automated facial and physiological analysis. Hidden Markov Models and Conditional Ran-
dom Fields have been shown to be effective at modeling affective information. With multimodal
information the coupling of multiple models may improve the predictions. Hierarchical Bayesian
models have been used to model the interplay of emotions and attention on behavior in adver-
tising [33]. These techniques provide the ability to describe the data temporally and in terms of
multiple modalities.

4 Proposed Research
4.1 Aim
I propose to analyze story effectiveness based emotional responses of viewers using facial and
physiological responses measuring over the Internet. The technology allows for the remote mea-
surement of affect via a webcam and I will design a custom framework and set of models for
automatic evaluation of advertising effectiveness based on this research. The dependent measures
will be based on established metrics for story and advertising success, including: sales, persuasion,

5
sharing and likability. Achieving this aim will involve the identification of generalizable facial ac-
tion and physiological features and models that are adaptable to contexts. This work is the first
large scale study to consider physiological and facial responses measured “in-the-wild” via the
cloud to understand the impact of emotional content in storytelling and advertising and how to use
it to maximum effect. Figure 4 shows a summarization of the framework proposed which is based
on Barrett et al.’s dual-process model of emotion [4]. The valence, arousal and attention of the
user may be represented by latent variables within the models that are trained and not predicted
explicitly.

4.2 Methodology
I will use a web based framework for collecting responses over the Internet. The first iteration
of this framework was presented in [22] and is shown in Figure 1. This framework allows the
efficient collection of thousands of naturalistic and spontaneous responses to online videos. Fig-
ure 2(a) shows example frames from data collected via this framework. Recruitment of participants
has initially been performed by creating a social interface that allows people to share a graph of
their automatically analyzed smile response with others but recruitment can also be performed via
Mechanical Turk, or another crowd marketplace, with financial incentives. The latter will be used
for more in depth studies in which voluntary participation is difficult to obtain.
The facial response videos, an example of which is shown in Figure 2(b), will be analyzed
using automated facial action unit detection algorithms developed by Affectiva or MIT. As an
example, Affectiva’s AU12 algorithm is based on Local Binary Pattern (LBP) features with the
resulting features being classified using decision tree classifiers. This outputs a frame-by-frame
measurement of smile probability. An example of the smile probability output is also shown in
Figure 2(b). Although the algorithms will be trained with binary examples (e.g. AU12 vs. non-
AU12) the probability outputs tend to be positively correlated with the intensity of the action,
as shown in Figure 2(b). However, we must acknowledge that this interpretation not always be
accurate. Classifiers for AU1+2 (Frontalis/eyebrow raise), AU4 (Corrugator/brow furrow) and
AU12 (Zygomatic Major/smile) will be used in addition to any others that are available by the
time that analysis is performed. AU1+2, AU4 and AU12 should capture the main components of
surprise, confusion and amusement responses. Head turning, tilting and general motion will be
calculated through the use of a head pose detector and facial feature tracker. The intention is to
capture information about the attention of the viewers.
Heart rate, respiration rate and heart rate variability features are calculated using a non-contact
method described in [26, 27]. Figure 3 shows graphically how our algorithm can be used to extract
the blood volume pulse (BVP) and subsequently HR, RR and HRV information from the RGB
channels in a video containing a face. Specifically, the facial region within each video frame is
segmented automatically and a spacial average of the RGB color values calculated for the region
of interest (ROI). For a given time window (typically 20-30s) the raw RGB signals are normalized
and detrended. A blind source separation technique (Independent Component Analysis (ICA)) is
then used to calculate a set of source signals. The source signal with the strongest BVP signal is
filtered and used to calculate the HR, RR and HRV. This method has been validated against contact
sensors and proven to be accurate.
There will be limitations involved in collecting data over the Internet, the uncontrolled nature
of this research presents several challenges. Firstly, “clean” data is not always available, motion

6
SERVER
4. Video of webcam 5. Video processed to
footage stored calculate facial and
physiological response

CLIENT
1. 2. 3. 6.

HOMEPAGE/ CONSENT MEDIA SELF-REPORT


INTRODUCTION

Participant visits site and is Participant asked if they will allow Flash capture of webcam footage. User can answer self-report questions
introduced to the study. access to their webcam stream. Frames sent to server.
Media clip played simultaneously.
7. Behavioral measures - sharing/
click through - recorded

Figure 1: Overview of what the user experience and web-based framework that is used to crowd-
source the facial videos. The video from the webcam is streamed in real-time to a server where
automated facial expression analysis is performed. All the video processing can be performed on
the server side.

and context of the users will vary considerably and result in greater noise within our measurements
than if the data were collected in a laboratory. In addition, the video recordings are likely to have
a lower frame rate and resolution compared to those that could be collected in a laboratory. In
which case some more subtle and faster micro-expressions may be missed and the physiological
measurements will be noisier. Secondly, detailed and reliable profiles of the participants may be
difficult to ensure in all cases. In order to address these weaknesses we will compare the results
obtained against those from analyses of datasets collected within controlled laboratory settings.
The computer vision methods for extracting facial and physiological response features will be
validated in controlled studies with ground truth measures and against videos of differing qualities
in order to ensure reliability on data collected over the Internet. Specifically, I intend to recruit a
number of subjects (10-20) and record video that matches those collected over the Internet with
ground truth measures of physiology. The accuracy of the system can be characterized under these
conditions. The AU detection algorithms will be tested against hand labeled examples of frames
collected over the Internet as shown in [23].
By performing analysis online we can collect data from large populations with considerable
representation from diverse subgroups (gender/age/cultural background). We will recruit 150 par-
ticipants for the second study proposed below and a similar number for the subsequent studies. In
these cases recruitment will be possible through existing market research participant pools. How-
ever, recruitment can also occur through a variety of other mechanisms (such as voluntary means
and paid crowd marketplaces) and by using self-report measures of age, gender and cultural back-
ground.
The extracted features will be collected alongside self-report responses, as these are the current
standard, and behavioral metrics. In order to minimize effects due to primacy and recency the
order in which advertisements are presented will be randomized. I plan to collaborate with MIT

7
(a)

Ad

Response

(b)
1

0.8
Smile Probability

0.6
Smile
Track 0.4

0.2

0
0 5 10 15 20 25 30
Time (s)

Figure 2: a) Example frames of data collected using a web-based framework similar to that
described in Figure 1. b) A series of frames from one particular video, showing an AU12
(smile/amusement) response. The smile track demonstrates how greater smile intensity is posi-
tively correlated with the probability output from the classifier.

(a) Automated face tracking (b) Channel separation (c) Raw traces (d) Signal components (e) Analysis of BVP
Red Channel

Red Signal
Separated Source 1

tn
t2
t1
Green Channel
Heart rate

Green Signal Separated Source 2

Signal Respiration rate


Separation

tn tn
t2
t2 t1
t1 Blue Channel
Heart rate variability
Blue Signal
HF/LF
Separated Source 3

tn
t2
t1

Figure 3: Graphical illustration of our algorithm for extracting heart rate, respiration rate and heart
rate variability from video images of a human face as described in [27].

8
Stimuli Emotion Measured Response Effect

Physiology
HR, RR, HRV
Valence

Likeability
Memory
Story/Narrative Arousal Facial Behavior Persuasion
Purchase
Sharing
Attention
Head Gestures

Controlled Processing

Figure 4: Schematic of the proposed research model. Inspired by Barrett et al.’s dual-process view
of emotion [4]. The measured responses will capture information about the valence, arousal and
attention of the viewer and will be used to predict the effects of the story/narrative.

Media Lab member companies in order to obtain sales data related to the advertisements.

4.3 Studies
I propose to carry out a series of studies in this research. A preliminary study has already been
performed and was the first-in-the-world attempt to collect facial responses to videos on a large
scale over the Internet. This involved testing three commercials which appeared during the 2011
Super Bowl. The website was live for over a year and can be found at [1]. Visitors to the website
were asked to opt-in to watch short videos and have their facial expressions recorded and analyzed.
Immediately following each video, visitors completed a short self-report questionnaire. The videos
from the webcam were streamed in real-time at 15 frames a second at a resolution of 320x240 to
a server where automated facial expression analysis is performed. Approximately 7,000 videos
were collected in this study. This data will be used to build models for predicting advertising liking
purely from automatically measured behavior. In addition, I will investigate whether advertising
liking can be predicted effectively from only a subset of the response (e.g. the first 25% or 50%).
The second study will extend the framework and methodology used in the first study to a much
greater number of commercials and I will extend the self-report questioning to cover more in-depth
questions. Specifically, I will be collecting and analyzing data for 150 viewers and 16 commercials
(with each viewer watching a subset of the commercials). Video recordings of the participant’s
responses to the content will be collected and analyzed as described in the Methodology section.
Self-report measures of persuasion, likability and familiarity will be recorded (post viewing Likert
scale reports). Pre- and post-launch sales data for the products will be available. The videos
collected in this study will be of a similar quality as above (resolution: 320x240, frame rate: 15
fps). This dataset will allow me to extend the modeling carried out in the preliminary study to
build and evaluate models for predicting likability, persuasion and sales.

9
The third study I propose will be collecting and analyzing data for a set of advertisement con-
cepts around different product ranges. This will involve approximately 100 viewers watching mul-
tiple (2 or 3) advertisement concepts. Self-report measures of persuasion, likability and familiarity
will be recorded. This study will compare similar but different advertising concepts for the same
product. I will investigate the ability for measured emotional responses to distinguish between the
efficacy of subtly different concepts for the same product.
The structure of the latter two studies will allow for richer data to be collected and a more
controlled experimental design whilst still allowing us to collect naturalistic and spontaneous data
“in-the-wild”. I will investigate the role of facial behavior and head gestures, HR, RR and HRV in
predicting the variables of persuasion, likability and sales. The dimensions of valence, arousal and
attention will be modeled as latent variables within the model.
As described above I will be carrying out small-scale lab based studies to evaluate the accuracy
of the physiological measurement under a greater range of conditions. This will involve a smaller
number of participants (10-20) viewing content on a computer or laptop whilst a video is recorded
of their face. The method will be evaluated by its correlation with, and accuracy when compared
to, measurements from contact sensors. Data for 16 participants has been collected already, if
necessary further data collection can be performed. For these experiments recruitment can be from
the local community.

4.4 Plan for Completion of the Research


Table 1 shows my tentative plan for completion of the research described in this proposal.

Timeline Work Progress


January-March 2011 Analysis of Data from preliminary study completed
April-June 2012 Design of studies ongoing
September-November 2012 Implementation of studies planned
November-March 2013 Analysis of data collected planned
March 2013 First thesis outline planned
April-June 2013 Complete analysis of study data planned
July 2013 Second thesis outline planned
August-December 2013 Thesis writing planned
January-February 2014 Thesis defense planned

Table 1: Plan for completion of my doctoral thesis research.

4.5 Human Subjects Approval


The protocol for all studies will be approved by the Massachusetts Institute of Technology Com-
mittee On Use of Humans as Experimental Subjects (COUHES).

10
4.6 Collaborations
I will be collaborating with Thales Teixeira at Harvard Business School on the modeling of effec-
tiveness based on emotional responses. I will be working at Affectiva for one semester in order to
complete parts of the data collection described. I will be building on the data collection framework
and using the facial action unit detection algorithms.

5 Biography
Daniel McDuff is a PhD candidate in the Affective Comput-
ing group at the MIT Media Lab. McDuff received his bach-
elor degree, with first-class honors, and master degree in engi-
neering from Cambridge University. Prior to joining the Media
Lab, he worked for the Defense Science and Technology Labo-
ratory (DSTL) in the UK. He is interested in using computer vi-
sion and machine learning to enable the automated recognition
of affect, particularly in the domain of storytelling and advertis-
ing.

Email: djmcduff@mit.edu
Web: media.mit.edu/∼djmcduff

References
[1] Web address of data collection site: http://www.forbes.com/2011/02/28/detect-smile-webcam-affectiva-mit-
media-lab.html.
[2] Z. Ambadar, J.F. Cohn, and L.I. Reed. All smiles are not created equal: Morphology and timing of smiles
perceived as amused, polite, and embarrassed/nervous. Journal of nonverbal behavior, 33(1):17–34, 2009.
[3] T. Ambler, A. Ioannides, and S. Rose. Brands on the brain: Neuro-images of advertising. Business Strategy
Review, 11(3):17–30, 2000.
[4] L.F. Barrett, K.N. Ochsner, and J.J. Gross. On the automaticity of emotion. Social psychology and the uncon-
scious: The automaticity of higher mental processes, pages 173–217, 2007.
[5] A.L. Biel. Love the ad. buy the product? Admap, September, 1990.
[6] P.D. Bolls, A. Lang, and R.F. Potter. The effects of message valence and listener arousal on attention, memory,
and facial muscular responses to radio advertisements. Communication Research, 28(5):627–651, 2001.
[7] J.F. Cohn, Z. Ambadar, and P. Ekman. Observer-based measurement of facial expression with the Facial Action
Coding System. Oxford: NY, 2005.
[8] R.R. Cornelius. The science of emotion: Research and tradition in the psychology of emotions. Prentice-Hall,
Inc, 1996.
[9] C. Darwin, P. Ekman, and P. Prodger. The expression of the emotions in man and animals. Oxford University
Press, USA, 2002.
[10] P. Ekman. Facial expression and emotion. American Psychologist, 48(4):384, 1993.
[11] P. Ekman, W.V. Freisen, and S. Ancoli. Facial signs of emotional experience. Journal of Personality and Social
Psychology, 39(6):1125, 1980.

11
[12] P. Ekman and W.V. Friesen. Facial action coding system. 1977.
[13] W. Gordon. What do consumers do emotionally with advertising? Journal of Advertising research, 46(1), 2006.
[14] M.C. Green. Transportation into narrative worlds: The role of prior knowledge and perceived realism. Discourse
Processes, 38(2):247–266, 2004.
[15] M.C. Green, J.J. Strange, and T.C. Brock. Narrative impact: Social and cognitive foundations. Lawrence
Erlbaum, 2002.
[16] H. Gunes, M. Piccardi, and M. Pantic. From the lab to the real world: Affect recognition using multiple cues and
modalities. Affective computing: focus on emotion expression, synthesis, and recognition, pages 185–218, 2008.
[17] R.I. Haley. The arf copy research validity project: Final report. In Transcript Proceedings of the Seventh Annual
ARF Copy Research Workshop, 1990.
[18] R.L. Hazlett and S.Y. Hazlett. Emotional response to television commercials: Facial emg vs. self-report. Journal
of Advertising Research, 39:7–24, 1999.
[19] M. E. Hoque and R.W. Picard. Acted vs. natural frustration and delight: many people smile in natural frustration.
In Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on.
IEEE, 2011.
[20] A. Lang. Involuntary attention and physiological arousal evoked by structural features and emotional content in
tv commercials. Communication Research, 17(3):275–299, 1990.
[21] M.D. Lieberman, N.I. Eisenberger, M.J. Crockett, S.M. Tom, J.H. Pfeifer, and B.M. Way. Putting feelings into
words. Psychological Science, 18(5):421, 2007.
[22] D. McDuff, R. El Kaliouby, and R. Picard. Crowdsourced data collection of facial responses. In Proceedings of
the 13th international conference on Multimodal Interaction. ACM, 2011.
[23] D. J. McDuff, R. E. Kaliouby, and R. W. Picard. Crowdsourcing Facial Responses to Online Videos. IEEE
Transactions on Affective Computing, 2012.
[24] A. Mehta and S.C. Purvis. Reconsidering recall and emotion in advertising. Journal of Advertising Research,
46(1):49, 2006.
[25] B. Parkinson and A.S.R. Manstead. Making sense of emotion in stories and social life. Cognition & Emotion,
7(3-4):295–323, 1993.
[26] M.Z. Poh, D.J. McDuff, and R.W. Picard. Non-contact, automated cardiac pulse measurements using video
imaging and blind source separation. Optics Express, 18(10):10762–10774, 2010.
[27] M.Z. Poh, D.J. McDuff, and R.W. Picard. Advancements in noncontact, multiparameter physiological measure-
ments using a webcam. Biomedical Engineering, IEEE Transactions on, 58(1):7–11, 2011.
[28] M.L. Ray and R. Batra. Emotion and persuasion in advertising: What we do and don’t know about affect.
Graduate School of Business, Stanford University, 1982.
[29] K.R. Scherer and P. Ekman. Methodological issues in studying nonverbal behavior. Handbook of methods in
nonverbal behavior research, pages 1–44, 1982.
[30] N. Schwarz and F. Strack. Reports of subjective well-being: Judgmental processes and their methodological
implications. Well-being: The foundations of hedonic psychology, pages 61–84, 1999.
[31] M. Suwa, N. Sugie, and K. Fujimora. A preliminary note on pattern recognition of human emotional expression.
In International Joint Conference on Pattern Recognition, pages 408–410, 1978.
[32] G.W. Taylor, I. Spiro, C. Bregler, and R. Fergus. Learning Invariance through Imitation. In Proceedings of IEEE
Conference on Computer Vision and Pattern Recognition, 2011.
[33] T. Teixeira, M. Wedel, and R. Pieters. Emotion-induced engagement in internet video ads. Journal of Marketing
Research, (ja):1–51, 2010.

12
[34] J. Whitehill, G. Littlewort, I. Fasel, M. Bartlett, and J. Movellan. Toward practical smile detection. Pattern
Analysis and Machine Intelligence, IEEE Transactions on, 31(11):2106–2111, 2009.
[35] F.H. Wilhelm and P. Grossman. Emotions beyond the laboratory: Theoretical fundaments, study design, and
analytic strategies for advanced ambulatory assessment. Biological Psychology, 84(3):552–569, 2010.
[36] P. Winkielman, G.G. Berntson, and J.T. Cacioppo. The psychophysiological perspective on the social mind.
Blackwell handbook of social psychology: Intraindividual processes, pages 89–108, 2001.
[37] R.B. Zajonc. Feeling and thinking: Preferences need no inferences. American psychologist, 35(2):151, 1980.
[38] Z. Zeng, M. Pantic, G.I. Roisman, and T.S. Huang. A survey of affect recognition methods: Audio, visual, and
spontaneous expressions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(1):39–58, 2009.

13
Committee Biographies
Jeffrey Cohn
Professor of Psychology
University of Pittsburg

Jeffrey Cohn is Professor of Psychology at the University of Pittsburgh and Adjunct Faculty
at the Robotics Institute at Carnegie Mellon University. He has led interdisciplinary and inter-
institutional efforts to develop advanced methods of automatic analysis of facial expression and
prosody; and applied those tools to research in human emotion, social development, non-verbal
communication, psychopathology, and biomedicine. He co-chaired the 2008 IEEE International
Conference on Automatic Face and Gesture Recognition (FG2008) and the 2009 International
Conference on Affective Computing and Intelligent Interaction (ACII2009). He has co-edited
two recent special issues of the Journal of Image and Vision Computing. His research has been
supported by grants from the National Institutes of Health, National Science Foundation, Autism
Foundation, Office of Naval Research, Defense Advanced Research Projects Agency, and the Tech-
nical Support Working Group.

Ashish Kapoor
Senior Research Scientist
Microsoft Research, Redmond

Ashish Kapoor is a researcher with the Adaptive Systems and Interaction Group at Microsoft
Research, Redmond. He is focusing on Machine Learning and Computer Vision with applications
in User Modelling, Affective Computing and Computer-Human interaction scenarios. Ashish did
a PhD at the MIT Media Lab and his Doctoral thesis looked at building Discriminative Models
for Pattern Recognition with incomplete information (semi-supervised learning, imputation, noisy
data etc.). Most of the earlier work focused on building new machine learning models for affect
recognition. A significant part of that work involved automatic analysis of non-verbal behavior
and physiological responses.

Thales Teixeira
Assistant Professor of Business Administration
Harvard Business School

Thales Teixeira is Assistant Professor in the Marketing Department of the Harvard Business School.
His research focuses on the economics of attention. He explores the rules of (implicit) transaction
of attention in a marketplace in which consumer attention is a scarce resource, arguably even
scarcer than money or time. His work has also appeared in Marketing Science. He received his
PhD in Business from University of Michigan and holds a Master of Arts in Statistics (University
of Sao Paulo, Brazil) and a Bachelor of Arts in Administration (University of Sao Paulo, Brazil).
Before entering academia, he consulted for companies such as Microsoft and Hewlett-Packard. At
Harvard, he teaches an MBA course in Marketing.

14

You might also like