You are on page 1of 6

2018 Proceedings of PICMET '18: Technology Management for Interconnected World

Impact of External Stimuli on Social Media


Engagement: A SME Perspective
Olumide Adebayo1, Elif Kongar2, Nasir Jamil Sheikh1
1
Department of Technology Management, School of Engineering, University of Bridgeport, Bridgeport, CT USA
2
Departments of Mechanical Engineering and Technology Management, School of Engineering, University of Bridgeport,
Bridgeport, CT USA

Abstract—This paper aims to define and analyze the factors weather attributes alone or in combination with others have
contributing to the frequency and rate of social media platform direct influence on the level of engagement. While there are a
utilization. The focus of the research is on social media big data few other leading social media platforms in the US, the
and weather as an external force. Juxtaposing weather related research was limited to Twitter and to data collected over a
data, Twitter metadata and various data mining algorithms are one-month period. The highlights of this research will help an
used to determine the relationships between an external condition organization that runs a social media platform with project
and the level of user engagement on social media. The research resource allocation based on various forecasts of external
findings indicate that weather attributes alone and in
factors that affect social media engagement.
combination with others have direct influence on the level of user
engagement. While there are several leading social media A. Big Data
platforms in the USA, the research utilized Twitter considering
Big data is not a simple discrete technology, but rather a
data over a one-month period. The highlights of this research
will help organizations that run social media platforms estimate
phenomenon resulting from the vast amount of raw information
resource utilization based on provided weather forecasts. The generated across society, and collected by commercial and
paper illustrates the drivers of social media engagement while government organizations [8]. It is a phenomenon defined in
providing data sourcing and social media platform organizations terms of 3V’s. The first V, volume, refers to the magnitude of
with directions for resource allocation and provisioning. data. In big data, data size is in the order of petabytes. This
number however, is relative and what is considered as big data
I. INTRODUCTION today may not meet the threshold in the future [9]. Variety is
another dimension in big data which refers to the varying
Today, one is not considered part of cyberspace unless s/he structure of the data set. Traditional data sets are usually
is participating in Social Media (SM) [1]. The Internet has structured and contain plain text while big data sets are likely
become a staple of our daily living [2] while the engagement in to contain a combination of text, images, audio, video and
social media has greatly increased across all age groups [3] several other unstructured types of data. The third V is velocity
over the past few years. Contrary to the body of knowledge on representing the rate or speed at which data is created and
cause and effect, the common belief is that people simply use ingested. The proliferation of digital devices such as
social media because it is the current buzz. For an organization smartphones and sensors has led to an unprecedented rate of
providing social media platform services, dealing with data creation and has become a major force driving the need
uncertainties with regards to new technology is a major for real-time analytics and evidence-based planning [9]. The
concern [4]. A major part of this uncertainty can be attributed literature offers a few other V’s to describe big data even
to the level of engagement. Almost all small-to-midsize though Volume, Velocity and Variety are the three generally
enterprises (SMEs) have an account with either Facebook or agreed upon dimensions by which big data is defined.
Twitter or both but the most important thing when it comes to
marketing is the firm’s performance [5]. These category of B. Social Media
businesses are the major driving force for any thriving
economy [6]. The vast majority of them fail to realize the Social Media “have become essential communication tools
magnitude of creativivity and innovativion that they can derive for millions of people − in short, they have become
from SM either because they are ill-equipped or simply lack ubiquitous.“ [10] Social Media is defined as “a group of
the informative knowledge necessary. Internet-based applications that build on the ideological and
technological foundations of Web 2.0, and that allow the
Many external forces influence us individuals on a daily creation and exchange of User Generated Content [1].” It is a
basis. Some of these external forces include, but not limited to,
place for people to share ideas, provide feedback, state
weather, trending news [7], economy, and socio-political
factors. In this study, the focus is on the weather as an external opinions and interests [11, 12]. Currently, top social media
force and on Twitter, a popular social media platform. Both sites include Facebook, YouTube, Twitter and LinkedIn [13].
weather-related data and the data obtained from twitter are fed
into data mining algorithms to investigate the relationship C. Engagement
between weather conditions and the level of social media The concept of engagement is still subject to interpretation
engagement. The research findings indicate that various and over the years researchers held varying opinions about the

978-1-890843-37-3 ©2018 PICMET


its definition [14]. This is also true in social media related concept of direct remote causes was consistent with relativity.
research. On some social media platforms engagement is This alludes to the fact that some event might be a direct cause
measured in terms of views indicating how many people or of an effect but the ability to have the desired impact is relative
how many times users read a post or watched a video. Other to all other factors lining up as appropriate. This is similar to
platforms disregard the number of views and measure the the conclusion reached in [28] while looking into the effects of
engagement in terms of clicks, i.e., retweet, like, share, and various external variables on mobile service adoption.
save. There are others that measure the engagement in terms of
contributions, i.e., posts messages, blogs and tweets. III. METHODOLOGY
This research defines engagement as a combination of the There are hundreds of social media sites, platforms or
last two approaches. This is an important distinction since networks as of today, each with its own unique focus and
views do not always reflect active human actions and tend to attributes. This research focuses on the Twitter engagement for
introduce more noise than meaningful data. a number of reasons:
• Ease of use: The twitter search Application Program
D. Association Rules Interfaces (APIs) are relatively easier to use.
Association rule mining aims to detect all the rules included • Availability of pre-built tools: Twitter embodies several
in the database using predetermined minimum support and built-in tools for easier data retrieval.
confidence constraints [15]. This method has been extensively • Flexibility in structure: Twitter data structure provides
studied in the literature [16-18], and as a result, several more flexibility for data manipulation.
association rule algorithms such as Apriori [15], Close [19],
Eclat, Clique [16] and CBA [20] have been developed over the Similarly, despite the fact that there are several sources of
years. This research utilizes the Apriori algorithm due to its weather related data, this research utilizes forecast.io, a global
speed and simplicity. The Apriori algorithm employs the weather service, for its a) reputation as the source of reliable
minimum support (minsup) concept [15] and hence provides weather data served on the IOS platform, b) ease of use, and c)
relatively faster solutions. A mimimum support is the number clean and simple interface.
of data cases a rule must cover to be acceptable within a data
set. This value is user-defined and, while picked at random, A. Data Collection
relies on years of experience and requires familiarity with the This paper utilizes Spring XD as a container for extracting
data set. tweets from Twitter. Spring XD is a Java container that
II. RELATED WORK supports data streaming for big data purposes. The twitter
stream was setup to search tweets that contains the five (5)
The importance of social media or social networking big vowels –a, e, i, o, u. This allowed for continuous and
data in business intelligence and decision-making cannot be uninterrupted extraction of data over a four (4) week period
over emphasized. Social media big data utilization has proven
using an Amazon EC2 free-tier server. A total of seven
itself to be crucial for any business that wishes to survive in
hundred thousand (700,000) tweets were extracted as a result.
today’s economy. The utilization involves capturing, analyzing
and utilizing collected big data in a timely manner [11]. A custom java application was used to extract weather data
Today’s businesses need to not only value customers but also for each of the fifty-two states covering the same period.
continuously monitor the impact of social media on their Weather data included minimum and maximum temperatures
customer base [21]. for the day along with the humidity index. The link between
geo-tagged tweets and weather data is the state-based
The literature offers a large number of studies involving geographical coordinates. This data is publicly available on
social media engagement [10, 22-25]. Majority of these focus the Internet. Figure 1 presents the heat map of tweets by US
primarily on measuring, capturing, transforming and States out of the 33,000 cleaned tweets. The red indicates
monetizing user engagement and are driven by specific where fewer than a 1,000 tweets were registered and the
communication programs such as media relation and crisis
intensity of green represent tweets of 1000 to 5000+.
management [23]. Therefore, previous research tends to
investigate how people are engaging rather than why they are
engaging. In one of the rare studies focusing on the latter,
Khan [25] explored the reasons of social media engagement.
Similarly, Kietzmann et al. [12] proposed a framework that
defined social media by using functional social media building
blocks. These studies however, do not aim at addressing the
motivation or the rationale behind the social media Figure 1 Tweet Heat Map
engagement.
Looking at the cause and effect in social media
engagement, Mackie highlighted a group of causes that “are B. Data Cleaning
insufficient but necessary part of a condition which is itself One of the major big data related issues this research dealt
unnecessary but sufficient for the result [26].” Of their own, with was geo-tagging. Geo-tagging is the process of
these conditions will likely not to lead to the result but when identifying geographical information from the data and
combined with other factors, events, situations or assigning spatial coordinates, viz., longitude and latitude, to
circumstances they snowball into the result under investigation. each data point [29]. Even with the recent heightened interest
Patrick Huppes [27] took it further and emphasized that the
in Location-Based Social Network (LBSN), this is still a great The temperature range categorization above is based on
challenge [30]. With twitter, not all tweets that are geo- [32] which surveyed what different respondents felt were
enabled carry geographical information. There is a need to temperature ranges. The category transformation for
actually parse each tweet to extract the geographical humidity is as shown in Table III below.
information. The fact that tweets are returned as JavaScript
Object Notation (JSON) data objects was useful during this TABLE 3 AVERAGE HUMIDITY TRANSFORMATION
steps. There were three (3) possible attributes that could Humidity Index Category
Less than 0.3 (<0.3) Low
contain geographical information, viz., geo, place and Between 0.3 and 0.5 (0.3 – 0.5) Medium
coordinates [31]. Greater than 0.5 (> 0.5 ) High
Tweets without values in any of these attributes were
discarded as unusable. Furthermore, only the tweets with Figure 2 provides a visual for the raw user social media
identifiable United States geography were included in the engagement based on various daily weather conditions. The
dataset. Following these steps, the number of tweets are chart is stacked and each cluster is associated with varying
reduced down to thirty three thousand (33,000) units. humidity levels and corresponding average daily
temperatures. Each color band represents the actual count of
C. Data Integration and Transformation tweets after data cleaning. Blue band represents the count of
After identifying tweets originating from various locations tweets from regions with low humidity or temperature,
in the United States via the geo-tags in the tweets, there was a whereas green represents high and red represents medium
need to match the identified locations to actual states within the bands, respectively.
US. This was done via iterative programming and by building a
corpus of location names used by tweeters. As shown in Table IV. ANALYSIS AND RESULTS
I, the process involved more than a simple find and replace.
TABLE 1 SAMPLE STATE TRANSFORMATION After applying the Apriori algorithm for association rules
Original Transformed [33], it is observed that both high temperature and high
Not Self-centered, Texas TX humidity days resulted in high tweet counts. This result does
Lilburn, ga GA not guarantee that high temperature and high humidity days
Between some thighs, La LA will provide similar results even though there were no
New Orleans , La! LA
instances where high temperature or high humidity days were
associated with low tweet counts. The presence of such an
Apriori association rule is designed for categorical data association would have thrown out any theory of cause and
prohibiting its use in nominal datasets [15]. In order to effect that this study was pursuing.
ensure the applicability of the rule, a number of
transformations are conducted in the database. For instance, There were no associations between low humidity and high
in the original dataset, the daily temperature for a given tweet count or low temperature and high tweet count within the
coordinate was expressed as a range with its lowest and support level (0.2) that were experimented with. The absence
of these nullifying associations support the hypothesis that
highest values. The daily temperature is then converted to
there are external factors affecting the level of engagement in
its categorical equivalent using the corresponding average
social media. A 0.2 minimum support level means that the
temperature as shown in Table II. association rules must be true for at least 20 percent of the data
set. 0.2 minsup has been historically used when dealing with
TABLE 2 AVERAGE TEMPERATURE TRANSFORMATION
Average Daily Temperature Category
Social Media data set [34].
Less than 65 (<65) Low
Between 65 and 80 (65-80) Medium
Greater than 80 (>80) High

Figure 2 Stacked Clustered Plot of Tweet Count By Weather Condition


TABLE 4 ASSOCIATION RULE RESULT levels of humidity tweets have large numbers of small tweet
Association Support counts most likely due to higher number of observations.
Average temperature => high tweet count 0.2
High Humidity => high tweet count 0.4 In other to further investigate these results 1-way ANOVA
High Temperature => high tweet count 0.2 test (single factor) is conducted at the α = 0.05 significance
level using the hypotheses:
This study determined that there was most likely a positive H0: μ1= μ2=μ3, all the means are the same
correlation between the prevailing weather conditions and the H1: two or more means are different from the others.
level of engagement in social media. While it cannot be
conclusively stated that high tweet count is the result of high
ANOVA test results are provided in Table 6.
humidity or high average daily temperature, it can be claimed
that the days with high average temperature or the days with TABLE 6 SINGLE FACTOR ANOVA RESULTS
high humidity index result in high tweet counts. It should also
be noted that the analysis does not indicate any association
between low humidity or low average temperature days and
low tweet count.
Since there is no association between low humidity and
high tweet count or low temperature and high tweet count, the
notion of a correlation cannot be rejected. A potential possible
reason for this is that on such days, people tend to stay away
from outdoor activities in favor of indoor activities. This in
turn results in high utilization of social media applications.
Figure 3 presents humidity at varying levels, viz., low,
medium and high, with the corresponding numbers of tweets The p-value (0.000107) is below the significance level of
for each category. Despite minor overlaps it can be observed 0.05 indicating real difference among the means. Therefore the
from the figure that the varying levels of humidity correspond test concludes that the mean tweet counts of all the humidity
to different levels of tweet counts. levels is not the same. Further analysis can be conducted with a
larger data set with equal observations for each level to
investigate the correlation between different external
Humidity vs. Number of Tweets
conditions and the tweet counts.
1
Humidity

0.5 V. DISCUSSIONS AND FUTURE RESEARCH


0 With the ability to digitally collect, store and analyze large
0 500 1000 1500 amounts of data, today’s industry can be best defined with the
Number of Tweets term Industry 4.0 [35] emphasizing the importance of
integrating digital technology to operation and business
Medium Humidity High Humidity Low Humidity models. Recent technological advances coupled with
digitalization of operations had significant impact on pretty
Figure 3 Scatter Plot of Tweet Counts By Varying Humidity Levels
much in every aspect of industrial procedures. These changes
also markedly affected market dynamics and competition over
In order to understand the relationship between tweet
the past few years. Enterprises are now expected to
counts and levels of humidity, average and standard deviation
successfully utilize digital technology to add value to their
of each data set is calculated (Table 5).
operations to remain competitive in today’s complex markets.
TABLE 5 AVERAGE AND STANDARD DEVIATION OF TWEET COUNTS FOR With this motivation, this study demonstrated the utilization of
VARYING LEVELS OF HUMIDITY crowd sourced data to better understand and predict customer
Low Medium High behavior. Analyzed and implemented appropriately, the user
Mean (μ) 13.41 73.03 36.76 involvement in social media might be a powerful factor for
establishing better online practices for any industry, small or
Std. dev. 17.71 122.78 73.27
large that utilize online products and services.
Observations 49 62 739
In terms of the future work, there are a number of
limitations to this study. The first limitation involves the data
As it can be observed from the table, medium levels of size and accuracy. The research utilized data gathered over a
humidity have the highest average, whereas the lower levels of four-week period during significant events occurring in the
humidity has the lowest average number of tweets. This can be United States, e.g. publicized shootings and two political
attributed to the significantly large number of observations in conventions. It would be the focus of future research to
high humidity levels (739) as opposed to low humidity’s 49 observe the changes in the outcome when the data collection is
and medium humidity’s 62. Furthermore, the standard expanded to a longer period where no extraordinary
deviation of medium humidity level is significantly high circumstances are present.
(122.78) pointing to a more dispersed data set whereas high
Despite the fact that Facebook is the #1 social media site [4] T. A. Tran and T. Daim, "A taxonomic review of methods and tools
[13] this research opted to use Twitter, the #3 social media site, applied in technology assessment," Technological Forecasting and
Social Change, vol. 75, pp. 1396-1405, 2008.
for reasons stated earlier. Utilizing data from the top five (5) [5] D. Öztamur and İ. S. Karakadılar, "Exploring the Role of Social Media
social media sites would validate or refute the findings. While for SMEs: As a New Marketing Strategy Tool for the Firm Performance
this remains a target, the vast majority of social media Perspective," Procedia - Social and Behavioral Sciences, vol. 150, pp.
platforms do not lend themselves easily to third party data 511-520, 2014/09/15/ 2014.
aggregation and analysis. This is mostly likely due to the fact [6] G. A. N. Vásquez and E. M. Escamilla, "Best Practice in the Use of
Social Networks Marketing Strategy as in SMEs," Procedia - Social and
that most providers build out with speed to market in mind Behavioral Sciences, vol. 148, pp. 533-542, 2014/08/25/ 2014.
before considering alternative utilization of data. Research will [7] G. De Francisci Morales, A. Gionis, and C. Lucchese, "From chatter to
be greatly enhanced and benefited if technologist, in the design headlines: harnessing the real-time web for personalized news
of social media platforms, can design for research in addition to recommendation," in Proceedings of the fifth ACM international
usability. Designing for research in this case would require conference on Web search and data mining, 2012, pp. 153-162.
platforms to be built with a robust third party data accessibility [8] D.-H. Shin, "Demystifying big data: Anatomy of big data developmental
process," Telecommunications Policy.
feature even with limited access if needed due confidentiality [9] A. Gandomi and M. Haider, "Beyond the hype: Big data concepts,
and other restrictions imposed on collected data. A viable methods, and analytics," International Journal of Information
method would be providing social media data to multiple users Management, vol. 35, pp. 137-144, 4// 2015.
via a masked data lake, where sensitive, private, personal [10] C. Oh, Y. Roumani, J. K. Nwankpa, and H.-F. Hu, "Beyond likes and
and/or legal data points are masked out. tweets: Consumer engagement behavior and movie box office in social
media," Information & Management, vol. 54, pp. 25-37, 2017/01/01/
Social media data set evaluated for this research included 2017.
on the data points that were identifiable as originating from one [11] M. Mayeh, R. Scheepers, and M. Valos, "Understanding the role of
of the fifty (50) contiguous states of the United States and social media monitoring in generating external intelligence," in ACIS
2012: Location, location, location: Proceedings of the 23rd
Hawaii. These are limitations that were mindfully placed on Australasian Conference on Information Systems 2012, 2012, pp. 1-10.
this research for the sake of expediency. The research can be [12] J. H. Kietzmann, K. Hermkens, I. P. McCarthy, and B. S. Silvestre,
expanded to investigate global social media utilization. "Social media? Get serious! Understanding the functional building
Including additional external factors, other than the weather, blocks of social media," Business horizons, vol. 54, pp. 241-251, 2011.
would increase the depth of this research. [13] (2016). Top 15 Most Popular Social Networking Sites. Available:
http://www.ebizmba.com/articles/social-networking-websites
[14] M. Boekaerts, "Engagement as an inherent aspect of the learning
VI. CONCLUSIONS process," Learning and Instruction, vol. 43, pp. 76-83, 6// 2016.
[15] R. Agrawal and R. Srikant, "Fast algorithms for mining association
Social Media networks and marketing grow by leaps and rules," in Proc. 20th int. conf. very large data bases, VLDB, 1994, pp.
bounds daily and provide a deep wealth of innovative ideas that 487-499.
SMEs can tap into for competitive advantage. The challenges [16] M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, "New Algorithms
of SMEs are well documented in the literature. It is also well for Fast Discovery of Association Rules," in KDD, 1997, pp. 283-286.
[17] S. Singh, R. Garg, and P. Mishra, "Review of apriori based algorithms
known that SMEs are risk aversive in nature even in instances on mapreduce framework," arXiv preprint arXiv:1702.06284, 2017.
where the organizations have adequate resources to improve [18] G. Li, Y. Hu, H. Chen, H. Li, M. Hu, Y. Guo, et al., "Data partitioning
business operations. Dahnil et al. [36] identify ‘users’ as one of and association mining for identifying VRF energy consumption
the risk factors for SMEs when it comes to embracing social patterns under various part loads and refrigerant charge conditions,"
Applied Energy, vol. 185, pp. 846-861, 2017.
media marketing. While a thorough analysis of “social media [19] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, "Efficient mining of
user” is not the purpose of this study, this work aimed at association rules using closed itemset lattices," Information systems, vol.
investigating some of the likely external factors influencing a 24, pp. 25-46, 1999.
user in social media networks. [20] B. L. W. H. Y. Ma, "Integrating classification and association rule
mining," in Proceedings of the fourth international conference on
Acknowledging that social media users are not simply in a knowledge discovery and data mining, 1998.
vacuum but are influenced by external factors which can be [21] A. J. Kim and E. Ko, "Do social media marketing activities enhance
customer equity? An empirical study of luxury fashion brand," Journal
monitored independently can add significant values to SMEs
of Business Research, vol. 65, pp. 1480-1486, 2012.
that are contemplating initiating a social media platform. [22] S.-U. Yang and M. Kang, "Measuring blog engagement: Testing a four-
Furthermore, identification of these external factors would dimensional scale," Public Relations Review, vol. 35, pp. 323-324, 9//
provide valuable information to SMEs that are reluctant in 2009.
[23] H. Jiang, Y. Luo, and O. Kulemeka, "Social media engagement as an
adopting a social media marketing strategy for growth.
evaluation barometer: Insights from communication executives," Public
Relations Review, vol. 42, pp. 679-691, 11// 2016.
REFERENCES [24] H.-J. Paek, T. Hove, Y. Jung, and R. T. Cole, "Engagement across three
social media platforms: An exploratory study of a cause-related PR
[1] A. M. Kaplan and M. Haenlein, "Users of the world, unite! The campaign," Public Relations Review, vol. 39, pp. 526-533, 12// 2013.
challenges and opportunities of Social Media," Business horizons, vol. [25] M. L. Khan, "Social media engagement: What motivates user
53, pp. 59-68, 2010. participation and consumption on YouTube?," Computers in Human
[2] C. K. Lee, Y. C. Lee, W. L. Wu, and Y. C. Lin, "Blogger effect: User Behavior, vol. 66, pp. 236-247, 1// 2017.
behavior during the theme park selection process," in Proceedings of [26] J. L. Mackie, "Causes and Conditions," American Philosophical
PICMET '14 Conference: Portland International Center for Quarterly, vol. 2, pp. 245-264, 1965.
Management of Engineering and Technology; Infrastructure and Service [27] P. HUPPES, "A probabilistic theory of causality," 1970.
Integration, 2014, pp. 3179-3183. [28] E. Polat, N. Basoglu, and T. Daim, "Effects of Adaptivity and Other
[3] K. Freberg, "Intention to comply with crisis messages communicated via External Variables on Mobile Service Adoption," in System Sciences,
social media," Public Relations Review, vol. 38, pp. 416-421, 2012. 2009. HICSS '09. 42nd Hawaii International Conference on, 2009, pp.
1-10.
[29] R. C. Pasley, P. D. Clough, and M. Sanderson, "Geo-tagging for
imprecise regions of different sizes," presented at the Proceedings of the
4th ACM workshop on Geographical information retrieval, Lisbon, [32] (2016). New Survey Reveals What Temperature Is Too Hot To Enjoy in
Portugal, 2007. the Lower 48 States. Available: https://weather.com/news/news/how-
[30] M. b. Khalifa, R. P. Díaz Redondo, A. F. Vilas, and S. S. Rodríguez, hot-is-too-hot-survey
"Identifying urban crowds using geo-located Social media data: a [33] C. Borgelt and R. Kruse, "Induction of association rules: Apriori
Twitter experiment in New York City," Journal of Intelligent implementation," in Compstat, 2002, pp. 395-400.
Information Systems, vol. 48, pp. 287-308, April 01 2017. [34] V. Bhatnagar, Data Mining in Dynamic Social Networks and Fuzzy
[31] M. Dredze, M. J. Paul, S. Bergsma, and H. Tran, "Carmen: A twitter Systems: IGI Global, 2013.
geolocation system with applications to public health," in AAAI [35] C. Baur and D. Wee, "Manufacturing’s next act," McKinsey Quarterly,
workshop on expanding the boundaries of health informatics using AI Jun, 2015.
(HIAI), 2013, pp. 20-24. [36] M. I. Dahnil, K. M. Marzuki, J. Langgat, and N. F. Fabeil, "Factors
influencing SMEs adoption of social media marketing," Procedia-Social
and behavioral sciences, vol. 148, pp. 119-126, 2014.

You might also like