Professional Documents
Culture Documents
Abstract-Since 2000, the Internet has become a primary modeling and online recommendation to provide proper rec
advertising and marketing channel for businesses. With the recent ommendation for users. [3]employed Bayesian Networks and
advance of mobile computing and wireless networking, mobile an expert sets up the structure of Bayesian Networks and the
advertising is now becoming popular because mobile devices parameters are obtained from the training dataset. According to
provide an effective advertising platform. People today believe
[4], the rapid growth of email subscribers, web sites and news
that big data analytics provides new opportunities and needs
groups requires effective filtering mechanisms, it proposed
for advertisers. However, most existing advertisement solutions
primarily use Behavior Targeting (BT) technology to provide
a new filtering mechanism that combines the advantages of
static services, which cannot satisfy the real-time, fast big data content-based and collaborative filtering approaches. [5] use
processing requirements. The purpose of this project is to develop time and space constraints to deduce user preferences and
a new big data analytics service in advertising and marketing perform complete location recommendation. [6] provided a
based on emergent big data technologies, data mining algorithms, general introduction about big data, big data analytics and
and machine learning solutions. The primary objective of this popular big data analytics platforms in the marketing. [7] gave
project is to provide real-time and static on demand services for a real case study that Hadoop was not good at real time big
advertisers and publishers to decide when, what, where, who, and data process and provided a customized in-memory data engine
how to place advertisements. In addition, this project requires
to handle low-latency processing.
solutions to analyze the collected big advertising data, discover
customers behavior patterns, and establish an innovative model Although there is essential information in big data coming
for advertising recommendation and trend prediction. The system from online and mobile applications, it is still not easy for
will be developed based on advanced machine learning and data
advertisers to find potential rules and opportunities hidden in
mining algorithms, NoSQL database technologies, and visualiza
the high volume of data.
tion techniques. This service will allow advertisers and publishers
to reduce their costs while improving their effectiveness. The goal of our work is to help and recommend to ad
Keywords-advertising analytics; social network; big data vertisers when, what, where, and how to place advertisements
based on the online advertising data to achieve the maximum
marketing values by utilizing big data analytics technologies.
I. INTRODU CTION
This paper analyzes big data in the area of advertising
Business intelligence and analytics and the related field and marketing and to make intelligent marketing decisions.
of big data analytics have become increasingly important in Big data in the project are obtained from Twitter, Yelp,
both the academic and the business conununities over the Facebook, and other social network websites and applications.
past decade. Meanwhile, big data brings opportunities and Among the large amount of data, an algorithm focusing on
provides a solid foundation for advanced analysis and business online advertising strategies for commercial products will be
intelligence recommendation. generated to match the market need.
Many companies have started to use big data technology for The primary technologies of big data, such as Mongo DB,
targeted advertising and marketing. Ad-placement companies Zarkov MapReduce framework, Mahout machine learning of
are building out their own systems to analyze consumer data clustering and classification, in memory database, artificial
for better ad targeting, while other cloud-based startups, such intelligence, and data mining, will be applied into the system
as Bloom Reach, are trying to take the guesswork out of to perform big data analytics. Because the approaches of
figuring out what content consumers really want to see on in-memory database technology are becoming increasingly
commercial web sites with pure SaaS offerings. In academic feasible, it potentially provides real-time analyses for big data
research fields, [1] built a personalized recommender system on marketing and advertising. The system will perform big data
for a user at a particular location based upon user's preference analytics by using machine learning technologies to enable
learned from location history and employed collaborative fil companies to move away from intuitive advertising to data
tering model and used the data of both user's location and the based decision-making. As a result, machine learning and ar
reviews of local people. [2] pointed that sparsely of the user tificial intelligence are playing key roles to help create potential
item matrix is a big problem encountered by the collaborative values by providing enterprises with intelligent analysis of
filtering recommender systems and thus, combined offline advertising and marketing strategies, and to capture structured
There are a few of the many techniques used in big • Analyze product/brand feedback and evaluate adver
data analytics including association rule learning, data mining, tising performance: It will enable advertisers to query
cluster analysis, crowdsourcing, Machine learning, text analyt existing product and brand feedback from social me
ics, classification, data fusion, network analysis, optimization, dia, and evaluate advertising performance after an
predictive modeling, regression, special analysis, time series advertisement is placed.
analysis, and others. So, which ones are used depends on the
type of data being analyzed, the technology available to you, There are two main user roles in our system: administrator
and the research questions you are trying to solve. and advertiser. System administrator can perform the following
operations.
Nowadays, online advertising market grows fast, unlike
traditional advertising system, it focus on publishing the certain • View and approve the new users registrations
advertisement to certain group of people, besides, those online
advertisements can be published on several places which • View edit the users in the system
means multiple screens, for example, PC, mobile phone, tablet • Control data sources of the system
and Internet TV.
[8] provided a specific fast data processing framework Advertiser is the system main user. The following is the
called MapUpdate. Similar to the Map-Reduce framework, the primary responsibilities.
MapUpdate has two major components: Map and Update. [9]
proposed a novel approach to provide suggestions by analyzing • Input product information based on system require
session and concept based query. In order to support efficient ment
pattern mining, they developed a new cluster queries mining al • Select the data sources for advertisement evaluation
gorithm with a new data structure concept sequence suffix tree.
[10] provided a big data visualization and analytics platform • Select the strategy for the product advertisement
called Stat! to support exploratory query so that it can empower
users an interactive tool for fast development and feedback. Users can get the recommendation results once they finish set
[11] introduced Big Data ecosystem, especially data pipeline ting up their products information, the results will be visualized
architecture, at LinkedIn. It used the Kafka to persist data into by radar chart and several locations will be shown the map to
the offline system, then constructed and executed the data. help user to make their decisions. After they applying systems
Finally, the analyzed data were sent back to the online system. recommendation, they also can review their products feedback
[12] provided a new temporal algorithm to solve the limitations changes in the AD Feedback component; several charts will
for BT-style applications that deal with temporal data. Those be used to illustrate the differences between applying the
limitations include not easy to transform query into temporal recommendation strategies before and after. Meanwhile, we
one and hard to analyze data in real time mode. In order to also allow users to configure the feedbacks data resources in
improve temporal query against the traditional BT application, the data configuration component. Several data sources are
a new architecture called Timer is proposed. [13] proposed provided, like Facebook, yelp, and twitter. In the future, more
an emerging practice of Magnetic, Agile, and Deep (MAD) cooperating social media companies will be added. Another
data analysis approach as a radical departure from traditional important component is AD Management, user can do the
Enterprise Data Warehouses and Business Intelligence. The view, edit, and delete operation to each their inputted product.
in aggregation and map-reduce support, which make its data
processing capability stronger.
3) Ads Analytics and Recommendation Service and Al
gorithms: Combining with the in-memory database Redis,
various classifying algorithms will be used to mine big data
BI. Data _.... Anlllytics S)'ll.... and generate reasonable results. The natural language process
(NLP) and sentiment analysis technologies will also be used
to process the data from social media such as Twitter.
Meanwhile, in this project, both classification and clus
tering algorithms will be used. The key to achieve excellent
effectiveness of machine learning is to use multiple algorithms
together and train the model by using large amount of input
big data.
In the system, this probabilistic framework for texts needs
to be investigated as well as the na?ve Bayesian equations of
their classification in order that equations and models suit the
Fig. 1. System Architecture business needs.
There are two primary categories for recommendation. One
B. System Architecture is content-based recommendation. Another is collaborative
filtering reconunendation.
The following is the proposed technical solution architec
ture. In this solution, there are five primary components: user In this advertising recommending system, the statlstlc
interface, data collection, database, map-reduce solution, and based collaborative filtering recommendation approach will
machine learning/data mining algorithms. be used. By using the statistic information in one area, an
advertisement will be recommended to one area or many
1) Data retriving: There are two major big data sources. advertisements in one area will be ranked for reconunendation.
One is from social media, such as Twitter, Yelp and Facebook.
Another is from tracking data of existing advertising publishers Weka is an open source machine learning toolkits de
such as Yahoo and Putara. The data will be collected through veloped at the University of Waikato, New Zealand. In this
published web services APls from social medial and existing project, the Weka machine learning toolkits will be used for
publishers. For data collection service, Yelp, Twitter and all chosen machine learning algorithms: Naive Bayesian, SVM,
Facebook data fetch API are available to implement for the and decision tree.
system to collect data. By registering with developer service
account, data from Yelp and Twitter can be obtained. Follow IV. SYSTEM DESIGN AND IMPLEMENTATION
the best practice, our data collection APls are designed to A. Client Design and Implement
be Restful web services based and they are explored to third
parties components. Yelp academic dataset is also obtained There are two roles in this system, one is advertiser and
from Yelp in order to collect big size data used in the next another is administrator.
step processing in our system. The academic data is coming Administrator role is set for system owner to manage the
from state of Arizona with users reviews, ranking, and business system, its major job is for adding, editing, and delete users.
objects data. I can approve or deny the advertiser registration request. In
2) Data storage and Map-Reduce Computing: MongoDB the next version, more features will be added, currently, it
is chosen as the persistence database after comparing them just can only show the information of advertiser, we link
with various other database options. MongoDB is used for the advertisers advertisements or products information, which
historical persistence data processing: It will be used for will easily for administrator to manage all the advertisers and
advanced analysis based on the persistent historical data. The products profile. We link the advertisers advertisements or
typical applications can be long-term advertising and mar products information, which will easily for administrator to
keting tread recommendation, report, performance analysis. manage all the advertisers and products profile.
MongoDB is an open-source NoSQL document database with Advertiser role is set for the system regular use. When
the strengths at high performance unstructured data analysis user logged in as an advertiser first time, system will provide
and high scalabilities.Combining with the in-memory database user a wizard to help him finish the first time product set up.
Redis, various classifying algorithms will be used to mine As it shown in the diagram, there are four major components
big data and generate reasonable results. The natural language for advertiser, which are AD planning, data configuration, AD
process (NLP) and sentiment analysis technologies will also management, and AD feedback.
be used to process the data from social media such as Twit
ter.MongoDB is an open-source NoSQL document database
B. Middle-Tier Design and Implement
with the strengths at high performance unstructured data
analysis and high scalabilities. MongoDB has strong eshading 1) System Call Design: Sequence diagrams for Big Data
capability to enable a horizontal partition in a database. As a Analytics Service System for Advertising and Marketing are
result, it supports load balance and fault tolerance. It has built- introduced as figture 3.
• Now, only left negation words, value words, and
emoticons. Emoticons are treated a value words, be
cause they are included as part of the value words
list with weights. The classifier goes through the list
of words summing the weight of each value word in
the statement. If it comes across a negation word, it
changes the sign of weight of the next value word.
• After iterating through all the words in the statement,
the classifier will generate a value of the total weight
of the statement. If the value is greater than zero,
the statement is classified as a positive statement. If
the value is less than zero, the statement is classified
as negative; Otherwise, the statement is neutral if the
value is zero. The challenges for this approach is to
Fig. 2. Advertiser Navigation
find appropriate word sets for stop words, negation
words and sentiment value words. In addition, these
word sets are domain sensitive. For example, word
forward will be a quite valuable one in Advertisement
industry, but not in other industry like Chemicals.
After investigation and testing, the following key word set are
chosen based on their generalization and good test result.
.,,""""
n
L IPi - dil
DIFF(P,L) =
i=_l
_ __ _ (4)
n
[7] G. Mishne, J. Dalton, Z. Li, A. Sharma, and J. Lin, "Fast data in the era
of big data: Twitter's real-time related query suggestion architecture,"
in Proceedings of the 2013 international conference on Management of
data. ACM, 2013, pp. 1147-1158.
[8] w. Lam, L. Liu, S. Prasad, A. Rajaraman, Z. Vacheri, and A. Doan,
"Muppet: Mapreduce-style processing of fast data," Proceedings of the
VLDB Endowment, vol. 5, no. 12, pp. 1814-1825, 2012.
Fig. 6. Advertisement Effectiveness Analysis [9] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li, "Context
aware query suggestion by mining click-through and session data," in
Proceedings of the 14th ACM SIGKDD international conference on
Knowledge discovery and data mining. ACM, 2008, pp. 875-883.
V. CONCLUSION
[10] M. Barnett, B. Chandramouli, R. DeLine, S. Drucker, D. Fisher,
This paper presented a novel approach to location based J. Goldstein, P. Morrison, and 1. Platt, "Stat!: an interactive analytics
ad recommendation system using the current state of art environment for big data," in Proceedings of the 2013 international
conference on Management of data. ACM, 2013, pp. 1013-1016.
technologies. The project provides a decision based approach
to handle various use cases associated with pushing relevant [11] R. Sumbaly, J. Kreps, and S. Shah, "The big data ecosystem at linkedin,"
ads towards the end-users. The objective of the project is to in Proceedings of the 2013 international conference on Management of
data. ACM, 2013, pp. 1125-1134.
undergo the whole process of complete testing and bench
marking which would enable us to put forward a scalable [12] B. Chandramouli, J. Goldstein, and S. Duan, "Temporal analytics on
big data for web advertising," in Data Engineering (ICDE), 2012 IEEE
big data ad processing platform in the current market. The
28th International Coriference on. IEEE, 2012, pp. 90-101.
project also provides a pilot data analytics approach for the
merchants to view their end-users. Once the experimentation [13] J. Cohen, B. Dolan, M. Dunlap, J. M. Hellerstein, and C. Welton, "Mad
skills: new analysis practices for big data," Proceedings of the VLDB
of this approach is beta-tested based on the reviews of the
Endowment, vol. 2, no. 2, pp. 1481-1492, 2009.
merchants, we would improve the data analytics component
[14] A. Bindra, S. Pokuri, K. Uppala, and A. Teredesai, "Distributed big
of our system.
advertiser data mining," in Data Mining Workshops (ICDMW), 2012
The scope of the current project is very large. A lot of IEEE 12th International Coriference on. IEEE, 2012, pp. 914-914.
features can be added to the system and made more feature [15] A. R. Location-based advertising and marketing report. bi-lba2-
rich and more scalable to deal with real-time data. Currently ps.pdf. [Online]. Available: hup://www.berginsight.comlReportPDFI
our system handles only offline modeling and training but as ProductSheetl
the system scales we intend to provide online modeling and [16] O. M. Four charts that illustrate the
training of datasets to provide an enriched user experience. transformation of personal computing. [Online].
Available: http://www.technologyreview.com/view/508651/
four-charts-that-illustrate-the-transformation-of-personaI-computingl
REFERENCES
[17] c. H. How your audience uses mobile now. [Online]. Available:
[1] J. Bao, Y. Zheng, and M. F. Mokbel, "Location-based and preference http://heidicohen.coml67-mobile-facts-from-2013-research-chartsl
aware recommendation using sparse geo-social networking data," in
[18] J. M. T. Process real-time big data with twitter storm.
Proceedings of the 20th International Coriference on Advances in
[Online]. Available: http://www.ibm.com/developerworksllibrary/
Geographic Information Systems. ACM, 2012, pp. 199-208.
os-twitterstormlindex.html ?cmp=dw&cpb=dwope&ct=dwnew&cr=
[2] H. Yin, Y. Sun, B. Cui, Z. Hu, and L. Chen, "Lcars: A location dwnen&ccy=zz&csr=110912
content-aware recommender system," in Proceedings of the 19th ACM
SIGKDD international conference on Knowledge discovery and data [19] B. Chandramouli et al. Temporal analytics on big data for web
mining. ACM, 2013, pp. 221-229. advertising. [Online]. Available: http://research.microsoft.comlapps/
pubsl?id=150002
[3] M.-H. Park, J.-H. Hong, and S.-B. Cho, "Location-based recommenda
tion system using bayesian users preference model in mobile devices," [20] G. A. Spark, storm and real time analytics. [Online]. Available: http:
in Ubiquitous Intelligence and Computing. Springer, 2007, pp. 1130- 1Iwww.infoq.comlnews/201410 lISpark-Storm-Real-Time-Analytics
1139.
[21] J. M. T. Spark, an alternative for fast data analytics. [Online].
[4] M. Claypool, A. Gokhale, T. Miranda, P. Mumikov, D. Netes, and
Available: http://www.ibm. comldeveloperworks/library/os-sparkl
M. Sartin, "Combining content-based and collaborative filters in an
online newspaper," in Proceedings of ACM SIGIR workshop on rec [22] S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, "Supervised machine
ommender systems, vol. 60. Citeseer, 1999. learning: A review of classification techniques," 2007.