Professional Documents
Culture Documents
00 800 6 7 8 9 10 11
(*) The information given isfree, asare most calls
(though some operators, phone boxes orhotels may charge you).
ISBN 978-92-79-66962-0
doi:10.2767/797031
The European Network ofPublic Employment Services was created following aDecision ofthe European Parliament and Council inJune 2014
(DECISION No573/2014/EU). Its objective isto reinforce PES capacity, effectiveness and efficiency. This activity has been developed within
the work programme ofthe European PES Network. For further information: http://ec.europa.eu/social/PESNetwork.
This activity has received financial support from the European Union Programme for Employment and Social Innovation "EaSI" (2014-2020).
For further information please consult: http://ec.europa.eu/social/easi
LEGAL NOTICE
This document has been prepared for the European Commission however itreflects the views only ofthe authors, and the Commission cannot
beheld responsible for any use which may bemade ofthe information contained therein.
PRACTITIONERS TOOLKIT
MARCH 2017
Contents
CHAPTER 1. INTRODUCTION 6
1.1 What isthis toolkit about? 6
1.2 Who this toolkit isfor 8
1.2.1 PES with little orno experience working with data 8
1.2.2 PES with some orintermediate levels of experience 8
1.2.3 PES with advanced levels ofexperience 8
1.2.4 Tactical oroperational managers 8
1.3 Strategic managers 9
1.4 Scope ofthis toolkit 9
1.5 Reading guide 9
APPENDICES 45
Appendix 1 | Safe Harbor De-identification types 45
6
Introduction
more data, but advancements incomputing power
aswell asthe tools and algorithms toanalyse data
allow organisations touse data inentirely new ways.
This also applies toPublic Employment Services
(PES). Afew PES are exploring the use ofBig Data
toimprove efficiency and effectiveness ofprocesses,
improve customer satisfaction and/or innovate
inorder totransform how the PES functions. This
toolkit could help them onthis journey. Most PES,
however, are atthe start ofthis journey and this
toolkit can also help PES who want tostart using
their data inbetter and smarter ways.
44x
Growth in data
production
26.000.000.000
2009-2020**** Number of connected devices
2010 2030
on the Internet of Things (IoT)
in 2020*****
The amount of data
produced doubles
every year***
* http://www.csc.com/insights/flxwd/78931-big_data_universe_beginning_to_explode
** http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/
big-data-the-next-frontier-for-innovation
*** https://www.technologyreview.com/business-report/big-data-gets-personal/
**** http://www.gartner.com/newsroom/id/2684616
***** http://www.emc.com/about/news/press/2011/20110628-01.htm
PRACTITIONERS TOOLKIT
With the expected future growth ofdata production, This toolkit aims tohelp organisations, specifically
several challenges arise: PES, toget started with finding answers tothese
How tounlock the potential ofdata wealready questions.
have toimprove efficiency, effectiveness,
customer satisfaction orany other goal? Even though the topic ofthis toolkit isdata, the true
How toset-up the infrastructure inthe organi- goal isto help PES transform their existing data into
sation now sothat wegain experience with data Information, Knowledge, and subsequently Wisdom.
analytics before wedrown inasea ofdata? Ineach stage oftransformation ofdata, the data
How tointegrate data-analytics into the DNA isenriched, asthe schema below illustrates.
ofthe organisation sothat organisational
decision making improves and organisational
agility increases?
Data None This isplainly data, the 63 [just the plain number]
(Raw data) numbers how you would
extract those from any
system.
Information Adding Meaning When transforming data into 63% oflower educated
information, weadd basic clients are unhappy with
meaning tothe information. PES service levels
Very often this happens
byadding units, variables
and definitions.
Knowledge Adding context When adding context, weare 63% oflower educated clients
able tomake sense ofthe are unhappy with PES service
data, itstarts telling astory. levels, compared to48%
ofhigher educated clients.
Wisdom Adding application Turning knowledge into 63% oflower educated clients
action isthe last step of are unhappy with PES service
the process. Bycombining levels, compared to48%
data points, you can create ofhigher educated clients.
actionable results. This correlates with their
evaluation oflanguage
difficulty onthe PES website.
Changing language level
could solve this problem
8
The toolkit isbased onthe thematic review workshop 1.2.1 PES with little orno experience
onmodernising PES through data and ITsystems1. working with data
The workshop revealed aneed tounderstand the
topic ofdata inmore detail and explore how PES This toolkit can serve asastarting point for those
can benefit from advancements inthis field. Core PES who want toget started with data. Itgives
concept inthis toolkit isthe concept ofsmart data. anoverview ofrelevant actions and provides practical
Byusing the concept smart data, wewant toprevent tips onhow toget started and where.
bias towards data being anormative goal initself.
Smart data isseen asthe sum of: PES with little orno experience are advised tostart
(Big) Data [the data itself] reading at: Chapter 2 Getting started with data
Utility [the potential utility derived from
the data] 1.2.2 PES with some orintermediate levels
Semantics [the semantic understanding of experience
ofthe data]
Data Quality [the quality ofthe data collected] For PES with some experience with data, especially
Security [the ways data are managed securely] the sections onadvanced analytics and the various
Data Protection [how privacy and confidenti- examples provided throughout this toolkit may beof
ality are guarded] interest. Furthermore, the sections onreporting and
presenting data may provide new insights. Lastly,
These different topics are woven throughout the the parts ondata security and protection could serve
body ofthis toolkit. Similarly, tostay focused, asagood refresh onmatters related tosecurity.
westreamline this toolkit along the lines ofthe PDCA
(Plan, Do, Check, Act) Cycle. The first content chapter PES with some experience, are advised tostart read-
[2] focuses onhow toget started and create aplan. ing at: Chapter 3 Organising data
The following chapters [3-5] focus onthe actual
doing. While wefocus throughout the toolkit and 1.2.3 PES with advanced levels ofexperience
the proper checks and balances, chapter 6, isspecifi-
cally devoted tothe topic ofchecking and evaluating. PES that have much experience working with data
While this isapractitioners toolkit, most content may still benefit from this toolkit. Onthe one hand,
inthis toolkit this actionable, but once again, specific the toolkit may provide some new insights, especially
action points after the analytical process are dis- regarding more novel developments inthe advanced
cussed inthe final chapter. analytics section. Furthermore, the toolkit can serve
asan introduction for new employees who join data
1.2 Who this toolkit isfor teams. Lastly, even advanced PES may find interest-
ing examples from other PES throughout this toolkit.
The primary audience for this toolkit consists ofPES
who have little tono experience working with (big) The most relevant section for PES with advanced
data. Asapractitioners toolkit, managers ontactical levels ofexperience will besection 4.4.
and operational levels may benefit the most from
the content inthis toolkit. For example managers 1.2.4 Tactical oroperational managers
who have been tasked with analytics ordata science.
However, there will beuses for other audiences Tactical oroperational managers tasked with setting
aswell. Below welay out how the different audiences updata functions and/or capabilities within PES may
could benefit from this toolkit. benefit the most this toolkit. The entire toolkit should
berelevant for this audience. The toolkit will help
you get familiar with much ofthe jargon used inthe
world ofdata analytics and should give you enough
guidance toget started. While this toolkit ismeant
toprovide ageneric introduction and practical tips
and tricks, most sections will provide links toother
resources that could help you further.
This toolkit isabout the use ofdata that PES collect 2. Organising data. Cleaning and describing data.
intheir systems orthrough other methods. Data cur- Ensuring the quality ofdata, integration ofdata
rently stored inproduction systems ordata ware- sources. How tosecurity data and protect privacy
houses are examples ofthis type ofdata. Wealso and confidentiality.
include data that are available inaPES that donot
stem from primary processes, such asresearch data 3. The actual analysis ofdata. Different types ofdata
(e.g. data-sets from survey research), data shared analysis including statistical methods, data mining
from other organisations (e.g. educational data) and advanced topics such asartificial intelligence
oreven reports, books, etc. Insum; wefocus onall and machine learning.
data the PES already has and less soon data the PES
would need tocollect toachieve certain goals. Con- 4. Presenting and reporting ofdata. What are novel
ducting research (e.g. how toconduct surveys, inter- ways topresent results and what are considera-
views, focus-groups, etc.) isnot part ofthis toolkit. tions when reporting outcomes? Also discussion
ofopen data.
In terms ofanalytics; the focus ison the more novel
types ofanalytics and specifically those more com- 5. Evaluation, continuation and implementation. How
monly associated with big data. Where appropriate, togo from small pilot orexperiment toabroader
wewill discuss more traditional types ofanalytics implementation? What are the key technical and
(such asmore common statistical methods), and organisational considerations?
tools (such asExcel, SPSS, SAS, etc.), but given the
abundance of(online) resources, wewill link tothose
resources instead ofproviding more comprehensive
information inthis toolkit.
10
Getting started
meeting those goals?
Does the organisation simply want tolearn
about working with data sothat experience
11
process. Inthis case, the PES could formulate the 2.1.2 Inductive approaches
expectation ortheory that the new matching system
isworking improperly. This theory could betranslated The set-up ofinductive approaches isdifferent. There
into aseries ofconcrete expectations orhypotheses isno clear understanding ofaproblem and the focus
that could betested using data (e.g. The matching ison:
algorithms donot include certain important variables
orthe weight ofcertain variables incalculating 1. Learning from the data and/or finding ways to
matches istoo high/low). Ifsuch hypotheses exist, generate value from this data.
adata team could start working ontesting these
hypothesis and work closely with the business pro- 2. Learning about analytics and gaining knowledge
cess owners inthe PES tocollect and analyse data. about the process and (smart) data applications.
Concept: Data analytics2 Only after certain patterns have been discovered
The whole process offormulating data goals, inthe data after analysing high numbers ofobserva-
collecting, organising, analysing and presenting tions, certain hypothesis and eventually theories
(big) data. could beformulated regarding the discovered pat-
terns (which insubsequent rounds ofdeductive
analysis could befurther tested).
2 Some equate data analytics todata analysis, In such asituation, the analytics function would take
wedecide todifferentiate for the sake ofclarity
(and inline with many publications dealing with big more the shape ofalaboratory orexperimental unit.
data). Wesee analysis asthe process ofanalysing This means itoperates more onastand-alone basis,
data, whereas wesee analytics asamuch broader has more freedom toexperiment and isless tied
term involving the setting ofgoals, collection,
organisation, analysis and presentation ofdata.
12
Deductive vsInductive
DEDUCTIVE INDUCTIVE
Primary focus Solving business problems Learning and innovation
Management focus Integration with the business, bridging parts Shielding the data team from the business,
of the organisation. Introducing the team to make sure they can focus ontheir work.
to the organisation (so they understand Making sure data isaccessible and the team
the processes). works with the data.
Position inthe organisation Close to/part ofbusiness processes Independent/removed from the business
processes
Team values Validity, robustness, value driven Creativity, making mistakes, trial and error
Team composition Focused, mostly ondata engineers Broad, including social scientists & people
and scientists with creative profiles
tospecific business processes. The following table CDO (Chief Data Officer) [or equivalent]
compares differences between the different approaches. In practice, only large (data) mature organisa-
tion will have aleading data position. The CDO
PES wanting tostart with analytics are typically isresponsible for governance and utilisation
better ofstarting with inductive approaches. Ittypi- ofdata across the entire organisation. This
cally allows for smaller scale experiments that allow means that deCDO oversees all data initiatives
both the data team, and the PES, togrow accus- and coordinates all analytics activities within
tomed toworking with data and slowly turn into the organisation.
adata driven organisation. Creating asmall team CIO (Chief Innovation Officer) [or equivalent]
that operates relatively independently from the The first interpretation ofCIO isthat of
organisation inorder toprove value inthe long term Innovation officer. This role isconcerned with
isagood starting point. Once the team has experi- innovation and change management within
ence and shown value, the team could bebrought the organisation. Ifthe data team ispositioned
into the organisation more and start shifting tomore under the Innovation Officer, the focus ofthe
deductive approaches. data team will most likely beon more inductive
approaches, trying tocreate innovative data-
2.2 Creating your data team driven solutions.
CIO (Chief Information Officer) [or equivalent]
Crucial tothe success ofworking with data, isthe The second interpretation isthat ofInformation
composition ofthe data team. Relevant questions officer. This role isreserved for the highest
inthis respect are; what isthe approach weare taking ranking officer who isresponsible for informa-
(inductive, deductive, oracombination)? How many tion technology and computer systems inside
resources can wemake available? What isthe time the organisation. Ifpositioned under aCIO,
pressure todeliver results? Inthis section wediscuss the data team will probably befocused more
team leadership (and variations therein) and several onsupporting the technology role inthe
tiers ofpotential positions within the team. organisation and hence have amore deductive
orientation.
2.2.1 Leadership CTO (Chief Technology Officer) [or equivalent]
The CTO isin charge of(the broader) technology
In smaller scale settings, the team will besmaller used bythe organisation, but could also focus
and team leadership will generally have aless senior oncore technologies iftechnology isimportant
position. Inthis case asenior data-scientist orman- inclient facing processes (with the increasing
ager ofdata analytics would beafitting role tolead levels ofautomation used inPES, that seems
the team. Asfor the position inthe organisation, toapply here). Ifpositioned under aCTO, the
arole under the following functional leadership role data team will probably focus ondeductive,
ispossible: client oriented and technology related issues.
PRACTITIONERS TOOLKIT
13
Director ofR&D [or equivalent] sure the system works smoothly and performs
The last role isthat ofthe director ofR&D. well. Very often they have abackground
While there issome overlap with the Chief insoftware engineering.
Innovation Officers role, this CIO istypically Data analysts
occupied with more short term change manage- Data Analysts are the professionals intheir
ment and the implementation ofinnovations. organization who query and process data.
The director ofR&D istypically charges with Furthermore, they typically create data reports
longer term research and development. and summaries, visualizations. This role ismore
Ifpositioned here, the data team will bemore closely associated with Data Mining and
experimental and focused onthe development Statistical Analysis.
oflonger term innovations. Data scientists
The data scientist isthe most important role
inthe context ofthis toolkit. Key role ofthe
Strategic insight data scientist isto generate valuable and
The purpose ofthis overview ofroles isnot actionable insights from the data and help solve
toprescribe where the data team should problems inthe organization using data. Data
bepositioned. Rather, itis meant toraise the scientists apply (mostly) advanced analytics,
awareness that the position inthe organisation such asmachine learning, todata.
will impact the expectations one should have
ofthe team and what the focus ofthe team Secondary roles:
will have. This could impact hiring ofteam Social scientists
members aswell. Team members with asocial science back-
ground (e.g. Sociology, psychology, communica-
tion, marketing, etc.) can perform two important
2.2.2 Data team members roles onthe team. The first isto help create
theories and hypotheses can beanswered/
Depending onthe (desired) size, workload and focus tested using deductive approaches. The second
ofthe team, different types ofteam members are isto help make sense ofthe outcome ofanaly-
needed tostart asuccessful data analytics practice. sis when more inductive approaches are taken.
Wedivide these types inthree tiers: Inthis way, social scientist hold akey role
Key Roles intranslating organisational goals and/or
These are the must have members asyou start problems into the actual data work and
building your data team. subsequently translate the outcome ofthe
Secondary roles data work into implications and actions for
These are the good tohave roles and will the organisation.
become more relevant asthe team grows Software engineers
insize. Even though there isawealth oftools available,
Tertiary roles toorganise, analyse and present data, very
These are the nice tohave roles that add value often anorganisation will discover that the
tothe team, but are less critical than the others. tools donot fit their needs entirely. Software
They will most likely become relevant once the engineers built custom service solutions (in
team reaches high levels ofmaturity and has conjunction with the rest ofthe team) that
alarge size. help maximize results across the organization.
Difference with data engineers isthat the
We can define three key roles inworking with the software engineer inthis context typically works
actual data: onmore front end (or customer) facing solu-
Data engineers tions (for example dashboards ormobile
The data engineer typically sets upand works applications toaccess data and outcomes).
with the data infrastructure. That means that
they set updatabases and work onExtraction, Some tertiary roles (that wewont discuss into detail):
Transformation and Loading (ETL, see below) Graphical and/or interface designers
tasks, they support data analysts and data To help design useful and usable applications
scientists intheir roles, and lastly they make and visualisations.
14
Presentation
Server
Open data
Database
Web
Server
PRACTITIONERS TOOLKIT
15
Getting started
Working with data from the warehouse directly Presentation and visualisation servers
could pose arisk regarding the integrity ofthis These servers typically run applications
archive. The second isthat the warehouse may toshow results (such asdashboards) and/or
contain sensitive data orPII [see below] that the data catalogue.
should not beused for analytics purposes.
Creating adedicated analytics database
The third and last scenario isthat of(a) Concept: PII
dedicated database(s) solely for the purpose PII stands for Personal Identifiable Information.
ofanalytics. This isinformation that helps identify individual
people. Classic examples ofPII are names,
In this case, relevant data are being pulled from addresses and unique person identifiers such
the warehouse (and other sources) and after being associal security orcitizen numbers. One
sanitised loaded into ananalytics database. This problem with PII and analytics isthat, asmore
database would then bethe primary source for the and more different types ofdata are being
data team towork from. This database could be combined, itbecomes increasingly possible
supported byseveral types ofservers toaid in the to indirectly pinpoint individuals. Itis impor-
analytics: tant, especially asdata are being opened (see
Analytics Servers section 5.4) toensure that individuals cannot
Servers whose sole purpose itis torun beidentified.
(computational) analytics. These servers
typically have lots ofprocessing power torun
heavy and complicated analytics.
Distributed Computing Servers 2.4 Creating adata-catalogue
When datasets become too large, itmay no
longer befeasible torun onthem onasingle In order for any organisation tostart answering ques-
server. Inthis situation itis common toset tion, itis important toknow what information the
upadistributed computing environment. What organisation already has. This prevents the organisa-
this entails is(conceptually) straightforward: tion from collecting the same information again and
adataset isbeing broken down insmaller ityields anoverview ofdata stored within the organi-
pieces, send todifferent servers (computers), sation that can beused for other purposes.
analysed onthese servers, and results ofthe
analysis are bundled back into one single A data catalogue isan instrument that provides
outcome. Many commercial distributed anoverview of(all) data present inthe organisation,
environments are available (such asAmazon it(semantically) explains the data and variables,
Web Services (AWS), Google Compute Engine, and adds meta-data tothese data sets. This means
and many others), but depending onthe that the catalogue contains descriptions ofthe vari-
sensitivity ofthe data, aPES could consider ables and nature ofthe data.
setting upadistributed environment.
16
Smart data = (Big) Data + Utility While the primary function ofadata catalogue
+ Semantics + Data Quality + Security isto document the data stored inthe organisations
+ Data Protection databases, more types ofdata could beincluded
Semantics refers tothe meaning ofdata. Know- inadata catalogue, such as:
ing exactly what data you have and what the Research data (such assurvey data sets).
data can tell you about reality isan important Externally available (analytics) data (such
aspect ofsmart data. Semantics here does not aswhat iscollected through Google analytics
just refer todescriptions ofdata and variables, orother trackers).
but also totheir meaning inreal life. For exam- Other relevant data (such associal media use,
ple, does ameasure orproxy ofcustomer relevant (technical) documents.
satisfaction really signify satisfaction ofclients
inreal life? While astatic data-catalogue ispossible (literally
acatalogue describing data), most organisations
choose tocreate adatabase based data-catalogue
A data catalogue can becompared toan index with asearchable front-end (often using web tech-
inalibrary. Alibrary isacollection ofbooks, written nologies). While technically anopen data catalogue,
bydifferent people ondifferent subjects atdifferent the catalogue athttp://catalog.data.gov/dataset
points intime ofdifferent lengths for different provides agood anexample ofwhat atypically
audiences. Anindex inthe library contains the data catalogue looks like.
overview ofexactly what you can find inthe library.
Furthermore, the index allows you tofind the 2.5 Costs and budgeting
resources you need.
We finish this chapter byfocusing onthe costs
The process ofcreating adata catalogue ispretty associated with starting ananalytics practice.
straightforward. Members ofthe data team will Budgeting for analytics activities proves tobe
have todocument all available data inthe organisa- achallenging task. Astudy byGartner4 found that
tion and use meta-data todescribe this data. For more than half ofall analytics projects failed
example, they have to: because they were not completed within budget
Create anoverview ofall data sources, oron schedule. Key reason for this isthat itis very
databases and datasets available and the difficult toestimate the costs ofearly analytics
nature ofthese data sets (e.g. what kind projects, for example because:
ofdatabases dowe have?). Early on, itis often unclear which information
Document and describe all variables within the organisation has and how useful this
these datasets. information is.
Document the number ofrecords and changes Organising and cleaning data take upconsider-
inthese records (e.g. How often are databases able amounts oftime (and resources) and
updated). organisations tend tounderestimate the amount
oftime ittakes toget all data organised.
4 http://www.gartner.com/newsroom/id/2637615
PRACTITIONERS TOOLKIT
17
Especially with inductive projects, the expected example conflict with the needs ofthe organi-
outcomes are difficult toforesee, this creates sation when certain requirements cannot
many uncertainties regarding timeline and costs. bemet [for example when acertain tool does
not work onthe platform provided]). The
Nevertheless, itis possible tocreate abudget and second isthat certain legal requirements could
allocate costs for execution ofanalytics. The cost prohibit the organisation from storing data
ofany analytics endeavour typically breaks down outside ofthe organisation, the government and/
inthe following: or the country. Itis wise toconsider the specific
The set-up cost for the relevant infrastructure legal requirements (see also section 3.5) before
(databases, analytics servers, data extraction, making this decision.
transformation & loading, other hardware and Internal data team orexternal service provider?
software). The good news regarding these Organisations that want toengage inanalytics
costs isthat, while still significant, the cost activities face animportant (cost related)
ofacquiring, storing and managing data keeps decision when itcomes tothe personnel aspect;
ongoing down. Especially using scalable hire adata-team oroutsource the personnel
solutions (see 2.3) itis possible toset up aspect. Several (consulting) providers provide
arelatively affordable infrastructure. analytics services that could take care ofthe
The recurring/ongoing costs for the infrastruc- personnel requirement (and often the infrastruc-
ture (power, licensing fees, maintenance, etc.). ture aswell). Furthermore, the organisation could
Total sum ofthese costs depends onmany consider using freelance personnel orvirtual (job)
factors such asthe use offreely available marketplaces, but given the sensitivity ofthe
software, orcommercial software, the size data often involved these options are often not
ofthe server park, etc. realistic. The benefits ofusing anexternal
Personnel costs. This entails the costs ofthe provider are that a) itcould becost effective
data-team such ashiring and salary costs. ifanalytics are only needed onaproject basis
While analytics isdriven bytechnology, human (and not continuously), b) you dont have toworry
labour still makes upalarge portion ofthe about training orcapacity. Downsides are that
total cost ofany analytics initiatives (in most a) the organisation builds little experience and
cases the bulk). Reason for this isthat organ- knowledge about analytics internally, b) costs
izing, sanitizing, cleaning and subsequently can be(much) higher inthe long run. Soif the
analysing and interpreting ofdata isavery organisation plans toseriously build analytics
labour intensive process (also see chapter 3). capabilities, creating the capacity internally
The more ambitious the project, the higher seems the right course ofaction.
the labour costs will be. Hybrid ornot?
Lastly, the organisation could choose amix
However, the degree towhich all costs occur depend ofdoing part ofthe infrastructure internally
onanumber offactors, most notably the following (such asdata storage) and part externally
choices will impact the cost: (such asrunning certain analysis onan external
In-house oroutsourcing? server provided byathird party) and hire part
The list ofcosts above assumes the organisa- ofthe data team and use anexternal provider
tion will want toown their own infrastructure for specific expertise areas. Inpractice, most
and have all personnel onstaff. There are, advanced organisations use some kind ofhybrid
however anumber ofalternative scenarios: version. Reasons for this are that a) analytics
Use ofexternal orcloud infrastructure? needs can become very specific (or advanced)
Many organisations host their analytics servers and using and external provider ismore
elsewhere, often inthe cloud. Benefits ofusing cost-effective than building the capability
external providers for storage and/or analytics internally for just one specific type ofanalysis
solutions isthat a) you only pay for the b) depending onthe nature and scope ofthe
capacity needed, b) itis easy toscale the analytics activities, the organisation may only
capacity needed upor down and c) you dont need acertain amount ofcapacity torun the
have toworry (as much) about administration, normal business but during peak times orfor
maintenance, etc. However, there are anumber specific activities, extra capacity could be
ofdrawbacks, the first isalower level ofcon- needed inwhich case partnering with athird
trol over the infrastructure (which could for party isreasonable.
18
Furthermore, the following are relevant considera- advanced, sodoes the complexity ofthe work.
tions when itcomes tocosts: Especially when amultitude ofdata sources are
As mentioned above, many projects overrun being used and models become more compli-
their budgets. This means that there isaten- cated, costs can rise atahigher than linear rate.
dency tounderestimate costs. For this reasons The best way tokeep costs manageable isto
itis advisable toeither lower project require- start small. Smaller datasets can beanalysed
ments ornot beoverly optimistic when starting onrelatively cheap hardware and smaller teams
with analytics. are needed for smaller projects. Inthat sense, if
Costs ofanalytics donot scale linearly. Not only budgets are non-existent, orsmall, itis advisable
isthere typically alarge start-up cost involved, tostart asmaller analytics practice and grow this
but asanalytics activities become more over time asthe team proves its value.
PRACTICAL EXAMPLE
In May 2016, the Executive Office ofthe United States President released areport onthe opportunities ofBig Data. The report contains
acase study onthe potential ofBig Data for employment. Askey problem inemployment related todata, the report recognises that
traditional hiring practices may unnecessarily filter out applicants whose skills match the job opening. Tosolve this problem, big data
isseen asan opportunity: Big data can beused touncover orpossibly reduce employment discrimination. For example, big data
analytics can beused:
To prevent affinity bias orlike mebias inthe hiring process (for example where hiring managers tend toselect candidates like them
orwhom they like).
To find potential job-candidates who otherwise might have been overlooked based onthe more traditional educational orworkplace-
experience related job-requirements. For example, bylooking atthe skills and knowledge areas that have made other employees
successful, amatching system could use pattern matching torecognize the characteristics that made current employees successful
and thus need tobe looked for infuture employees.
Large data analytics systems could help prevent biases often seen intraditional hiring practices that could lead todiscrimination.
Analgorithm could bedesigned tonot look atfactors like age, gender, race orany factor whereas itis much more difficult toblock
such (implicit) factors asahuman.
Beyond supporting orrecommending matching/hiring decisions, advanced algorithms create the possibility ofsolving long-term
employment challenges related todiscrimination, such asthe wage gap oroccupational segregation, for example bygoing beyond
formal job qualifications, but finding the person for the job based oncultural orother factors.
Using data-analytics new kinds ofcandidate scores ormatching scores can becreated byusing diverse and new sources
ofinformation onjob candidates. The report mentions how one employment research firm found that distance employees commute
towork tobe one ofthe strongest predictors ofhow long customer service employees will stay with their jobs. Such variables
and data could beused toimprove matching algorithms for specific (customer service) job vacancies.
Finally, machine learning based algorithms could help decide what kinds ofemployees are likely tobe successful byreviewing
the past performance ofexisting employees ofcertain companies orjob seekers who worked for certain firms orby analysing
the preferences ofhiring managers asshown bytheir past decisions. This could also apply tosuch things asemployee turnover
and the likelihood that certain people will retain jobs incertain industries.
PRACTITIONERS TOOLKIT
19
Organising data
sanitised and how existing data from within and
outside ofthe organisation can beextracted, trans-
formed and loaded into adatabase that can beused
for analytics.
Getting started
The following might beuseful toget started onorganising and cleaning data:
4. The data need tobe transformed. Even though Smart data = (Big) Data + Utility
transformation ispart ofthe ETL process (see + Semantics + Data Quality + Security
section 2.3). Additional transformation can be + Data Protection
needed while preparing for data analytics. For Data Quality refers tovarious aspects ofthe data
example, decimal places could have tobe fixed itself. These are a) completeness (do Ihave
or floating numbers could have tobe converted enough data about everybody tomake claims),
to integers. Another example isto transform b) cleanness (are the data well cleaned, sanitised
unstructured data into structured data. and maintained), c) have high levels ofvalidity
and d) impact (have they been analysed insuch
away that they retain their validity and create
Concept: Structured vs. Unstructured data relevant meaning for the organisation).
Structured data isdata with ahigh level
oforganisation (and formatting). For example,
atable with records, variables and labelled data 3.2 Describing data & data
(such asatable with jobseekers demographic characteristics
information) isstructured information. Inits most
simple form, unstructured data isdata lacking Descriptive analytics help usunderstand the data.
such organisation. Adatabase with PDFs orpho- When describing data, welook athow the data
tocopies ofjobseekers resumes isan example isdistributed (e.g. normal, power-law, linear), what
ofunstructured data. the key characteristics ofthe data are (e.g. the mode,
median and mean) and wecheck for such things
asthe outliers inthe data. Descriptives are important
5. Missing data might have tobe imputed. Imputation because they:
isthe process ofsubstituting missing data with Help usunderstand the nature ofthe data
calculated values. Indatabases where data (e.g. what isthe nature ofthe variables).
ismissing, scientists may choose torun algorithms Help usdraw initial conclusions about the data
toimpute these missing data points. One way (e.g. Based onthe distribution ofdata wecan
todo this isby looking atpatterns inthe data. For observe that certain jobseekers have certain
example, ifpeople with similar characteristics characteristics).
consistently have the same value regarding acer- Help usprepare for further analysis (e.g. by
tain variable, the likelihood increases that the removing outliers that distort the data).
missing values are similar.
Furthermore, descriptive statistics are another way
It isextremely important that data are cleaned and tocheck the quality ofthe data and the organisation
sanitised properly. For this reason, PES starting to ofthe data. For example, itcould show inconsisten-
work with data could create peer review processes cies inthe data collected and areas where data have
to make sure data are being reviewed after being been improperly transformed.
organised byother team members.
PRACTITIONERS TOOLKIT
21
Convincing different stakeholders ofopening This applies even more strongly todata integration
uptheir data silos. across organisations. Ifthe success ofthe data team
Getting cooperation from ITdepartments depends ondata from other organisations, itwill
toactually get the data. need astronger position inthe organisation, prefer-
Coordinate privacy and security risks with ably with support from the highest levels ofleader-
relevant stakeholders. ship inthe organisation. Logical data sharing partners
Working with relevant stakeholders tounder- for PES include:
stand the nature ofthe data (i.e. Add tothe
data-catalogue). Other governments:
Making agreements around updates ofthe data Ministries ofeducation (or similar)
orSLAs (i.e. how often and towhat standards For example regarding data about the future
data are being shared). workforce, which could behelpful inpredictive
models for future matching applications
orunemployment forecasting.
Concept: Service Level Agreement (SLA) Tax agencies (or similar)
An agreement specifying the quality ofservices Regarding financial data about job seekers
delivered from one (part ofan) organisation (e.g. For benefit fraud detection).
toanother. For example about the uptime Social security institutions (or similar)
ofservers orthe refresh rate ofdata. Regarding social security orbenefit information.
This isespecially relevant when there isalegal
obligation for data collection and/or sharing.
The stronger the mandate ofthe leader ofthe data Statistics bureaus
team and the higher the position inthe organisation, For various types ofinformation such aspopu-
the more (formal) organisational power the team lation mobility (which could beused tofine-tune
has ingetting the data itneeds. This isan important job recommendations) orhousehold develop-
consideration when starting with data analytics. ments (which could impact the labour force).
Amore experimental team focused oninductive Regional orlocal governments
approaches, may have more difficulty inintegrating For data regarding specific local orregional
data sources from across the organisation versus circumstances (e.g. Local employment
ateam with aposition more closely tied tobusiness initiatives).
processes.
Businesses:
For example regarding job developments
(are business going toadd orremove positions?),
their future needs.
PRACTICAL EXAMPLE
Other organisations:
Such asfoundations working inthe labour
market (for example, overviews ofactivities
The X-Road isEstonias infrastructure that connects databases could help ininterpreting specific labour
from amultitude ofgovernmental agencies. Itis best described market fluctuations).
asadistributed service bus which allows databases tointeract, making
integrated e-services possible. This, however, also creates opportunities 3.5 Security and Data Protection
for data integration that can beused for data analytics.
Currently, 219 databases are connected toX-Road and these result Security and Data Protection are two other key ele-
inover 1700 services being offered. Byintegrating data sources, ments from the Smart Data equation. Protecting (user)
citizens only have tosupply many pieces ofinformation once and data and having good security should beamong the
italready allows fraud prevention through analytics. highest priorities ofboth the data team, aswell asthe
Please, find further information inthe following link. leadership ofthe team and the parts ofthe organisa-
tion involved inthat data used bythe team. Several
types ofsecurity are important:
PRACTITIONERS TOOLKIT
23
Virtual access security For purposes ofthis toolkit, webreak down data
Once being close toamachine (or accessing protection intwo topics; privacy and confidentiality.
itremotely), how easy isit togain access? Are (safe) While privacy applies tothe person that needs tobe
passwords inplace? Isdata encrypted? The following protected, confidentiality applies tothe persons data.
can help inimproving virtual access security: When data can beused toidentify aperson, privacy
Develop guidelines for the encryption ofdata issues may arise. When data about aperson can
(especially onthose devices with easier physical beused maliciously (for example byjudging aperson
access) using data that was supposed tobe collected anony-
Develop policies for the use offirewalls and mously), confidentiality issues can arise.
ant-virus software
Have strict protocols regarding passwords, Guarding the privacy ofindividuals and the confi-
password sharing and password changes dentiality oftheir data isimportant toensure no
Limit the use ofAPIs (and other ways toaccess harm isdone toany individual ororganisation. Main
data) tonon-sensitive data and/or open data. consideration regarding privacy consists ofthe
applicable laws and regulations. Onan EUlevel, the
following are important:
Smart data = (Big) Data + Utility Directive 95/46/EC | Protection ofpersonal data.
+ Semantics + Data Quality + Security Directive 95/46/EC sets uparegulatory frame-
+ Data Protection work which seeks tostrike abalance between
Security refers tothe ways the data are being ahigh level ofprotection for the privacy
securely stored and managed. This applies not ofindividuals and the free movement ofper-
only tothe physical security (who can access sonal data within the European Union (EU).
servers and related systems?) but also virtual Todo so, the Directive sets strict limits onthe
security (who has access todata secured collection and use ofpersonal data and
onsystems?). Security isimportant tomake demands that each Member State set upan
sure systems are being hacked and/or data independent national body responsible for the
does not leak oris being stolen. supervision ofany activity linked tothe process-
ing ofpersonal data [quoted from URL below]
URL: http://eur-lex.europa.eu/legal-content/EN/
In addition, the following can help with security TXT/?uri=URISERV%3Al14012
ingeneral: Regulation (EU) 2016/679 | General Data
Makes sure software isalways upto date Protection Regulation
Having aregular security meeting todiscuss This Regulation isset toreplace directive 95/46/EC.
and refresh members ofthe data-teams Itwas adopted on27 April 2016 and enters into
memories onsecurity related matters application on25 May 2018.
Make security training astandard part of(new) More info: http://ec.europa.eu/justice/data-
data-team members, sothat aculture ofsecu- protection/index_en.htm
rity awareness isinstilled from the start.
24
Besides applicable European regulations, every Assess whether the information used complies
single member states will have their own applicable with all (privacy-related) legal and regulatory
laws and regulations. These should beconsulted requirements.
before starting analytics projects. When data across Make aninventory ofpotential risks ofworking
multiple countries are being collected, laws from with PII.
multiple countries may apply. Special care should Assess processes for handling information
also begiven tothe (cloud) storage ofdata incoun- toreduce ormitigate potential privacy risks.
tries other than the home country and/or outside Investigate the consent methods (see below)
ofthe EU. used toask individuals permission for the use
oftheir data.
Next toabiding bythe law, the following good prac- Record the outcomes ofthe assessment and
tices can help toestablish good data practices and make them available.
ensure privacy protection: Implement solutions for any risks orproblems
Implement solid de-identification protocols discovered.
and capabilities.
This consists ofthe removal (or replacement) Implement privacy bydesign principles (also
ofPII (see section 2.3), aswell ashaving checks required according toarticle 23 ofthe new
and balances inplace that ensure that noPII EU General Data Protection Regulation).
enters the process atany time and/or individu- This means that any project, process orservice
als can beidentified using analytics (e.g. needs tobe designed from the start toadhere
Bycombining amultitude ofvariables, itcould tothe strictest possible privacy considerations.
bepossible tonarrow data sets down toindi- For example, when creating ananalytics
viduals). The U.S. Department ofHealth and database, PII should never beincluded insuch
Human Services (HHS) describes two (commonly adatabase inthe first place. This prevents the
accepted) methods tode-identify information5: data team towork with PII inthe first place.
Expert Determination method:
In this scenario qualified experts apply statisti- To safeguard confidentiality, two actions are important:
cal orscientific principles torender information Use consent procedures when collecting
tobe not individually identifiable. information. For example ask people for consent
Safe Harbor method: touse their data when they register asunem-
The removal of18 types ofdata related ployed, orwhen they fill out surveys.
toindividuals from the dataset completely Actively inform individuals ofthe purposes
(see Appendix 1) for anoverview. for which their data isbeing used (very often
this happens inconjunction with the consent
Always use privacy impact assessments (PIA). procedure).
PIAs are tools used toidentify and mitigate
privacy risks. Section 3, article 33 ofthe new
EUGeneral Data Protection Regulation [Data
protection impact assessment and prior
authorisation] already stipulates controllers and
processors tocarry out adata protection impact
assessment prior torisky processing operations.
However, agood practice could beto assess the
privacy impact for any project related toindi-
vidual people and/or cases. Atthe very least,
such aPIA should:
5 See http://www.hhs.gov/hipaa/for-professionals/
privacy/special-topics/de-identification/index.html
PRACTITIONERS TOOLKIT
25
Analysing data
(such asstatistical methods and data mining), but
also discuss novel and innovative types ofanalytics
such asmachine learning and artificial intelligence.
The focus ofthese innovative types isnot necessarily
onhow PES are using these types, but onpotential
use cases for the future.
4.1 Overview
Before westart our overview oftechniques towork
with data, wefirst give anoverview ofsome common
analytical approaches and their differences/overlap6.
Asthe graph makes clear, there isan abundance
ofmethods, tools and approaches available totrans-
form data into value. The specific use case ofeach
approach depends onthe goal (see Chapter 2) the
PES wants toachieve.
Data
Mining
Statistics
Collection, organization,
Quantitative analysis, interpretation,
and presentation
Subset of machine Deep of data
learning that attempts Learning
to model high level Qualitative
abstractions in data Quantitative data
Machine collection and analysis
Learning
Constructing Qualitative data
algorithms that collection and analysis
learn from and Artificial
make predictions Intelligence Creation of intelligence
based on data exhibited by machines
27
The following table gives more detail ofeach approach and potential use cases for PES.
Eye tracking
Where respondents use adevise that tracks
their eye movements. This helps understand
how respondents navigate products, which parts 8 One could argue that most approaches are
draw attention, etc. This could beused indevel- quantitative and therefore fall in this bucket.
However, we use quantitative statistics in a social
opment stages ofnew (online) tools tohelp science context, i.e. Referring to the analysis is
understand how people navigate pages. quantitative data collected through social science
research methods, such as surveys.
28
PRACTICAL EXAMPLE
Plus-minus methods
Where respondents are asked tomark part
of aproduct (e.g. anonline tool, orphysical
The USState ofNew Mexico Department ofWorkforce Solutions (DWS) product) they like ordislike. Subsequently
noticed that many benefits applications made mistakes (purposefully respondents are asked toelaborate ontheir
ornot) while applying for (unemployment) benefits, resulting inimproper choices. Often these are used totest brochures
benefits payments. and other physical products, but could beused
inonline, product environments aswell.
DWS partnered with Deloitte toconduct atwo stage project. The first was
touse quantitative statistical analysis tomodel suspect behaviours. The
next step was togently nudge individuals into more desirable behaviours. Like quantitative methods, qualitative research
The key tothis was todesign and test communications and notifications remains valuable inthe age ofbig and/or smart
for claimants atthree moments: 1) during the vetting process for eligibility, data. The key function ofqualitative methods isto
2) when individuals report work and earnings, and 3) while determining help make sense ofthe world and/or get adeeper
anaction plan toseek new employment. understanding ofphenomenon that simply cannot
By field testing different types ofcommunications, DWS was able toanalyse begenerated through other means ofanalysis.
the best working solution and subsequently implement this. DWS was Ininnovation settings, qualitative methods are most
able tosubstantially influence claimants behaviour. The State successfully commonly used inconjunction with other methods,
increased accurate reporting while reducing improper payments. throughout the innovation process.
(see https://www2.deloitte.com/us/en/pages/deloitte-analytics/articles/
business-analytics-case-studies.html) Getting started
29
4.3 Data mining & KDD Data mining refers the application ofspecific algo-
rithms inorder toextract patterns from data. Data
Data Mining and Knowledge Discovery inDatabases mining was invented when datasets became too
(KDD) are closely related (yet different) types ofana- large tobe analysed byhumans soresearchers
lytics. Their relationship can beseen infigure 7. invented ways tocondense large datasets and
Wecould argue that KDD isaway ofturning informa- extract useful types ofinformation. Therefore the
tion (see figure 7) gathered through data mining key difference (in this context) between the statistical
into valuable knowledge. methods discussed inthe previous section isthat
data mining isaimed atthe automation ofthe
Figure 7: KDD process analysis and presentation ofresults.
Two common applications ofdata mining are: Nowadays, data mining isoften used inconjunction
with more advanced types ofanalytics (as wewill
Automated prediction oftrends and behaviours. discuss below). For example, certain probabilities
For example, based onprevious purchases, ofoccurrences discovered using data mining can
marketers can estimate the likelihood ofcus- beused asinputs for machine learning models.
tomers buying other products orwhen they Likewise, machine learning could beused toimproved
are likely tobuy the same product again. algorithms used for data mining.
Within PES, this could for example beused to: KDD inmany ways isafollow-up step todata mining.
Estimate the likelihood that (certain) job seekers Itis important tomention here asit illustrates two
find (certain) jobs inacertain period oftime. points:
Estimate the probability ofbenefit fraud While data mined can have value initself, the
occurring among certain groups ofpeople with true value lies inthe interpretation ofdata and
benefits. its transformation into knowledge.
Unemployment forecasting based onhistorical It requires extra effort toturn data into interpre-
data. tative knowledge (and actionable wisdom.
Automated discovery ofpreviously unknown To move from data mining toKDD, PES can dotwo
patterns. things:
By combing different variables and many types Enrich the data mined (e.g. bycombining
of data, data mining can beused todiscover variables) sothat more value iscreated
patterns in data that were previously unknown. orpatterns become more obvious. For example,
For example, this type ofdata mining isused bycombining unemployment trends with labour
inmarketing todiscover ifcustomers who market seasonality, itbecomes possible
purchase certain products are also buying tocorrect for seasonal variations inunemploy-
other products (for Example, Amazon uses this ment and assess the true trend (if any).
togive recommendations tocustomers: people Use experts tointerpret results. Making sense
who bought this product, also bought that ofresults can beextremely difficult without the
product). proper subject matter expertise and knowledge
ofthe context inwhich the analysis takes place.
Within PES, this could for example beused to: Itis for these reasons that having social
If job seekers with certain similar aspects on scientists (and other experts) onthe data team
their resumes are more likely tofind jobs quicker. can enhance its value multifold.
If certain combinations ofjob seeker character-
istics would also make them agood fit for Getting started
vacancies not directly fitting with their past
experiences. The following methods, tools and/or applications can
beused for data mining and/or KDD:
31
4.4 Advanced Analytics all the potential moves ofagame ofGo requires
tremendous processing power and the ability
4.4.1 Artificial Intelligence ofAlphaGo tocrunch the numbers isatestament
tothe progress made incomputational power. The
Artificial intelligence (AI) isused tocreate smarter second isthat AlphaGo beat ahuman against the
technologies that can make decisions orsupport deci- expectations of(many) AIexperts who did not expect
sion making. The main goal ofAI isto create technolo- the application tobe sophisticated enough interms
gies that are sosmart that they can think and act like ofreasoning and decision making. Ifyou want tosee
humans. The Turing test isthe benchmark for AIafter more current examples ofhow AIis developing,
which itcan beconsidered human smart. https://aiexperiments.withgoogle.com/ isagood
website toget inspiration.
Artificial Intelligence isabroad concept that encom-
passes machine learning, deep learning and inter- Figure 8: Components ofArtificial Intelligence
sects with other types ofanalytics, such asdata
mining and statistics. Inthis toolkit werestrict
ourselves tothe intelligent applications ofAI where (Big) Data
the application exhibits certain levels ofsmartness
based onlearning and creativity.
In the table above, wedescribe several existing With the novelty toAI and the lack ofexperiences
applications orAI and how PES could develop similar ofPES (or other governments) that could bereadily
applications ofAI. implemented within PES, itseems advisable tostart
small. Smaller scale experiments with AIallow PES
toexplore the possibilities, reduce the amount
ofdata needed and the complexity ofthe algorithms.
The following tools could beof help.
33
4.4.2 Machine Learning and includes systems that can make decisions,
combine elements, reason and thus show behaviour
Machine learning isused tocreate better functioning comparable tohuman thinking. Herein also lies akey
algorithms and models bylearning from ongoing difference between machine learning and data-
analysis. Machine learning isasubset ofartificial mining/KDD. Inmachine learning there isaclear
intelligence and there isdisagreement about the emphasis inlearning from the data and the applied
exact difference between the two concepts. Wesee analysis for future iterations.
the difference asmachine learning being mostly
used toanalyse large volumes ofdata, discover Several types ofmachine learning exist; inthe table
patterns inthese data and subsequently learn from below welist anumber ofcommon types ofmachine
the data. Artificial intelligence goes one step further learning:
Machine learning isbeing widely applied incom- In the following table welist some more applications
mercial settings and healthcare. Virus scanners on from machine learning from other domains and how
computers are based onmachine learning and soare these could beapplied within PES.
smart thermostats that learn about your heating
preferences and adjust heating cycles accordingly.
9 See https://www.skype.com/en/features/
skype-translator/
34
Within PES, machine learning has not been widely Several ofthe tools already mentioned can beused
applied. One notable exception isthe application for machine learning (such asR& Python). The table
oflearning algorithms atthe Flemish PES (VDAB) below lists more relevant tools.
(see below).
PRACTICAL EXAMPLE
is asubset ofartificial intelligence. Inour view10,
the key difference between machine learning and
deep learning isthat deep learning focuses heavily
The Flemish PES (VDAB) isworking toimprove job matching byusing onunstructured and abstract data aswell asthe
Big Data torecommend vacancies tojobseekers when they access combination ofmany layers ofdata. Machine learn-
the VDABs vacancy system. In2016 aRecommender system was ing tends tofocus onstructured data and discovering
developed based onatwofold objective: finding out which users are patterns indata that are well organised.
interested inwhich vacancies (by looking atwhat they click on, what
they read, open and look at); and predicting ajobseekers interest
Deep learning isfor example used toautomatically
inother vacancies (by looking atwhat similar users have looked
at, analysing behaviours). VDAB seeks tomake both accurate and organise and tag photos. Companies like Google and
extended recommendations tojobseekers through this system, looking Facebook can recognise people and locations inpho-
toopen the pool ofjobs that ajobseeker could find interesting based tos and can use this information totag and catego-
onarange ofpreferences that are expressed inthe vacancy. rise the photos.
The Recommender system iscurrently being tested onaset ofsub-
Thinking along these lines, apotential application
users. This system has been developed byVDABs Innovation Lab.
More information can befound about the Lab onaspecific fiche for deep learning inPES isto have algorithms learn
accessible inthe PES Practices website. from job seekers CVs orresumes and discover useful
information. For example, formatting styles, fonts
used, colours and pictures could tell ussomething
about the job seeker that could beuseful when
recommending jobs orto help them optimise their
resumes. Similarly, recordings (with consent and for
4.4.3 Deep Learning
Deep learning isused toexplore data that ishighly 10 Once again, many different interpretations exist,
unstructured and abstracted and tries tocreate so the reader may have come across different
abstractions from this data. Deep learning isasub- definitions. We have tried to create an easy to
set of machine learning (which, asexplained above) understand common definition.
PRACTITIONERS TOOLKIT
35
research purposes) could beused toanalyse tone Within PES, this could beused to:
ofvoice and emotions and this could beused toper- Better understand customer service communi-
sonalise service delivery processes. cation (and for example create content that
better aligns with clients language)
Getting started Interpret jobseekers resumes, and better
match the languages used byjobseekers
As with machine learning, many ofthe tools already and employers
mentioned can beused for deep learning (such Create better classification schemes (e.g. ESCO)
asR& Python). The table above lists more relevant bymapping jargon and technical terms
tools and/or specific packages. tohuman language.
In the previous section wehave described what cur- This isaspecial class ofmachine/deep learning focused
rently (in our view) the most important and promising onunderstanding orlearning from (digital) images.
types ofanalytics are for PES. However, many sub- Asmentioned above, the underlying deep learning
types, combinations and derivations ofthese main algorithms are used totag orcategorize content).
classes exist. Inthis section webriefly mention
several ofthese types ofanalytics. For each type, Within PES, this could beused to:
wedefine and explain the concept, describe how Analyse profile pictures used for resumes and
itrelates toother types ofanalytics, and outline how make recommendations for job seekers
itmay beof value for PES. pictures tobetter match certain jobs.
Based onmodels, trying toextrapolate from previous Closely related tonatural language processing,
data points tofuture data points. Many recommender speech recognition isused tounderstand spoken
systems are based onpredictive analytics and soare language. Combined with natural language process-
well known examples such asweather forecasts. ing and other types ofmachine learning and AI, this
could beused tocreated social robots and/or chat
Within PES, this could, for example beused in: bots. Currently, speech recognition isused totran-
Unemployment orlabour market forecasting scribe spoken communication. This helps create
Job seeker profiling applications (e.g. Byesti- archives ofcommunication and allows organisations
mating job seekers developments and training tounderstand content (such asquestions customers
needs) have and their accompanying emotions).
Matching applications (e.g. Bypredicting the
ease with which vacancies can befilled). Within PES, this has the following potential applications:
Understanding tone ofvoice and emotions
2. Natural language processing (NLP) incustomer service interactions tobetter
understand perceived problems and obstacles
NLP refers toabroad class ofmethods tointerpret Allow communication tobe continued and
normal people ornatural language (and translate stored onother channels
that inother types oflanguage). This isused Better understand word choices and use these
byspeech recognition and translation software. toupdate web and written content.
36
Presenting
ten) reports (given the abundance ofresources
onreporting). Wedo focus on(interactive) visualisa-
tions and dashboards aswell assharing data with
37
Be more appealing and actionable elements could beused toshow the extent
One ofthe bigger problems with traditional towhich job seekers match tocertain jobs.
reports istheir linear nature and the often dry In situations when summaries ofinformation
nature ofwriting that does not compel readers are presented (for example inevaluations
toread the entire report, let alone follow upon ofpilots). Condensed versions ofinformation
its recommendations. This isaproblem that (like summaries) reduce the complexities
many interactive tools try tosolve byoffering ofinformation and make iteasier touse
personalised insights and offer more dynamic visualisations).
routes toexplore outcomes. When room for contextual information isavail-
able (e.g. indashboards orinteractive web
5.2 (Interactive) Visualisations pages) more complex information can be
transmitted using visualisations, provided
We start this overview with ashort discussion ofthe the contextual information allows tointerpret
role ofvisualisations. Visualisations are apowerful information correctly.
way toshow findings and interesting patterns indata. When experts (e.g. from the data team) are
Key benefit ofvisualisations isthat they allow to: available tohelp interpret information, more
More easily show relationships and develop- complex visualisations can beused and the
ments (over time) than using text. experts can help resolve any ambiguities.
Provide abetter way ofconveying and remem-
bering information than using text for many Getting started
people (depending ontheir learning styles).
There are many ways tocreate following methods,
However, visualisations also have drawbacks, most tools and/or applications can beused tocalculate
notably: descriptive analytics:
Only visual representations may lead tofalse Most spreadsheet software (e.g. Microsoft Excel,
conclusions, for example when agraph suggests LibreOffice Calc) include basic graph functional-
arelationship between variables while inreal ity aswell asrudimentary capabilities for
life there isno significant relationship. interactivity.
Complex graphs can beoverwhelming and Most statistical software (e.g. SPSS, SAS) have
distract from the message that the sender rudimentary visualisation capabilities that allow
wants toconvey. Especially the volume aspect for basic manipulations.
ofbig data can cause problems when trying Many packages orprogramming languages
tovisualise too many data points atonce. can beused for data visualisations. Some of
Very often itstill requires skills and (domain) the most common ones are the jQuery, Chart.js
expertise tointerpret results. When just visuali- orlibraries for JavaScript. For some (interactive)
sations are presented itcan bedifficult tojudge examples see http://bl.ocks.org/mbostock.
the value ofthe results presented. Many online services are built ontop ofthese
While great for presenting results and data packages (and other code) and allow tocreate
visualisations are not the best vehicle toraise visualizations online without the need for coding
concerns, points for discussion and describe (e.g. Datawrapper.de).
context. Certain dedicated software tools exist for data
visualisations. Tableau (.com) isawell-known
In general, when information isvery complex and example, but others exist, such asSisense
much contextual information isneeded simple visu- (.com). Very often these tools blur the line
alisations asastand-alone way toreport information between simple visualisation tools and (online)
are not the best way toconvey amessage. For these dashboards that can beused toanalyse and
reasons, the following seem good use cases touse manipulate data.
(interactive) visualisations:
In production environments when the information
issimple enough tobe understood without too
much contextual information. For example, visual
38
PRACTICAL EXAMPLE
WollyBI (wollybi.com) isaspin-off from the Department ofStatistics and Quantitative Methods (CRISP), University ofMilan-Bicocca.
Together with regional public employment service, CRISP created the WollyBI platform asameans tovisually explore the labour
market invarious regions inItaly based onvacancies, location and skills. Itcurrently allows to inauser friendly manner visualise
various analyses ofover 1.5 million job vacancies.
5.3 Interactive Tools & Dashboards Interactive tools and dashboard aim toresolve many
ofthe issues existing with stand-alone visualisations.
In this section wediscuss (novel) ways topresent For example:
data using interactive (web) tools and online dash- Dashboards typically allow for more sophisti-
boards. The benefit ofthese tools isthat they allow cated types ofdata manipulations such
the user tointeract with the data and thus allow the assorting, searching and filtering.
user toa) personalise the data tofit his/her needs Dashboards can include additional information
(e.g. Through searching, sorting, and filtering), b) explaining data points that help interpret
explore patterns more easily while going from one information.
section toanother (related) section and c) get help Dashboards can include contextual information
and contextual information more easily (e.g. Through that help understand the setting inwhich the
embedded help functions). data was collected and analysed. This can
increase understanding ofthe data.
We define aDashboard inthis context asan interac- Dashboards can include recommendations or
tive tool, most often based onweb technologies, conclusions that build upon the data presented.
working directly ontop ofdata sources that allow Dashboards allow tolink between different
tomanipulate and visualise information and provide sections and thus allow for more dynamic
additional textual and contextual information. routing ofinformation.
PRACTITIONERS TOOLKIT
39
Dashboards can link toinformation outside risk exists oftrying toadd everything toadash-
ofthe dashboard, allowing tolink toother board leading tosomething called feature creep,
relevant subjects. namely adding every single possible data point
Dashboards allow tointegrate communication and type ofinformation. This could severely
tools (e.g. Chat applications) orlink tocommuni- distract from the goal which should beachieved.
cation tools, sothat the user ofthe dashboard Maintaining access rights requires resources
can contact the data team orother support and solid policies for this have tobe created.
staff toaid with the use ofthe dashboard and/ When dashboards are open (i.e. without user
or interpretation ofinformation. authorisation oraccessible without credentials),
extra care needs tobe taken toprevent misin-
In many cases dashboards are cloud based or terpretation and abuse ofdata.
through other means accessible onthe internet Even with the possibilities toexplain and add
or intranet. This allows toshare information easily context, interpreting data remains complicated
and allow flexible access policies. Furthermore, and even with the best dashboard, experts may
because dashboards typically work ontop ofanalyt- still beneeded. The risk exists that dashboards
ics data sources they allow for easy discrete orcon- are used toreplace instead ofcomplement
tinuous data updates. experts.
41
Evaluation &
ing both the outcomes and the process are often
neglected parts ofany analytical process. However,
they are extremely important injudging the quality
6.1 Evaluation
Note: this section does not focus onevaluation
asaresearch method. Itfocuses onthe evaluation
ofany analytics process.
EVALUATION FOCUS
Process oriented Outcome oriented
EVALUATION FOCUS Continuous Process flow evaluations System outcome evaluations
Ad-hoc Evaluations Assessments
43
For example, one could compare the quality of for the organisation. Itwill likely impact how (parts)
human generated vacancy matches with those ofthe organisation are working, could impact service
of anew analytics platform after acertain amount delivery for jobseekers and/or employers and could
of time and certain number ofmatches made. The consume valuable resources inthe organisation.
comparison ofthe two ways ofmatching could tell Besides these organisational aspects, there are
the organisation which one performs better (and several technical considerations when scaling upini-
should therefore beimplemented). tiatives. Here are some examples ofboth:
For any data analytics project that starts small, at Organisational considerations:
some point intime the organisation has todecide Very often anexpansion orscale-up ofaproject
whether this project (whether itis aresearch activity, orimplementation ofaproduct requires training
pilot orexperiment) will move from being asmall scale and support for new staff members. This
pilot tobeing rolled out inthe entire organisation. requires resources and time that may not
bedirectly available and need tobe planned.
This decision regarding continuation orfull scale-up When the purpose ofadata-team issolely
orimplementation inthe organisation isimportant tofocus onexperimenting and innovation,
44
PRACTICAL EXAMPLE
Perhaps most importantly, and often over-
looked, isdealing with resistance inthe organi-
The Finish PES developed anew statistical profiling tool that was sation, aswell ascreating adata-driven or
implemented inthe organisation in2007. The profiling tool was part innovation oriented culture inthe organisation.
ofan integrated ITsystem that calculated arisk estimate for the Without the support from the employees inthe
jobseeker atregistration using administrative data. The new model organisation, any innovation isdoomed tofail.
was found tobe 90 per cent effective atestimating the likelihood Proper communication plans and cultural
ofajobseeker being unemployed for over 12 months. However, case initiatives are essential inthe scale-up orimple-
workers did not think the tool was useful and did not trust the results mentation ofany new tool. Current attitudes
from the tool. Asaresult the tool was withdrawn from the production and cultures should betaken into account when
environment (see Kurekov, 2014). making implementation decisions.
45
Appendices
Appendix 1 | Safe Harbor
De-identification types
This list gives anoverview ofthe HHS types ofPII
that need tobe removed from data sets following
the Safe Harbor method. For the full text, consult
http://www.hhs.gov/hipaa/for-professionals/privacy/
special-topics/de-identification/index.html
a) Names
d) Telephone numbers
f) Fax numbers
h) Email addresses
p) Account numbers
r) Certificate/license numbers
HOW TOOBTAIN EUPUBLICATIONS
Free publications:
one copy:
via EUBookshop (http://bookshop.europa.eu)
more than one copy orposters/maps:
from the European Unions representations (http://ec.europa.eu/represent_en.htm);
from the delegations innon-EU countries (http://eeas.europa.eu/delegations/index_en.htm);
by contacting the Europe Direct service (http://europa.eu/europedirect/index_en.htm) or
calling 00 800 6 7 8 9 10 11 (freephone number from anywhere inthe EU) (*).
(*) The information given isfree, asare most calls (though some operators, phone boxes orhotels may charge you).
Priced publications:
via EUBookshop (http://bookshop.europa.eu).
KE-02-17-300-EN-N