You are on page 1of 8

Xåƴåų°čĜĹčƉųƋĜĀÎĜ°ĬƉFĹƋåĬĬĜčåĹÎå

ˆŅƉ)Ĺ°ÅĬåƉƉzųŅěÎƋĜƴåƉƒåÎƚųĜƋƼƉzŅŸƋƚųå
Introduction
œĜƋĘƋŅÚ±ƼűŸĵåÚĜ±ÆƚDŽDŽØ±ųƋĜĀÏĜ±ĬĜĹƋåĬĬĜčåĹÏåŠeFšĵ±Ƽ±ŞŞå±ųƋŅ
Æå ± ŸĜĬƴåų ÆƚĬĬåƋ ÏƼÆåų ŸåÏƚųĜƋƼţ  åųƋ±ĜĹĬƼØ ĵ±ĹƼ ŸåÏƚųĜƋƼ
±ŞŞĬĜϱƋĜŅĹŸ ʱƴå ƚųčåĹƋ ųåŧƚĜųåĵåĹƋŸ ƋʱƋ ϱĹ Æå ±ÚÚų域åÚ
ƵĜƋĘ ŸƚÏĘ ƋåÏĘĹŅĬŅčƼ× ŅŞåų±ƋĜŅĹŸØ ĵ±ĬƵ±ųå ±Ĺ±ĬƼŸĜŸØ ĜĹÏĜÚåĹƋ
ųåŸŞŅĹŸåØƴƚĬĹåų±ÆĜĬĜƋƼĵ±Ĺ±čåĵåĹƋرĹÚÏŅĵŞĬĜ±ĹÏåƋŅűĵå±
üåƵţX±ųčå±ĵŅƚĹƋŸŅüÚ±Ƌ±±ƴ±ĜĬ±ÆĬåŸƚŞŞĬåĵåĹƋƋĘĜŸ×ųåÏŅųÚŸ
ŅüĹåƋƵŅųĩƋų±þÏØƚŸ±čåĬŅčŸ±ĹÚåƻƋåųűĬĜĹÚĜϱƋŅųŸŠÚ±ųĩƵåÆØ
ŅŞåĹ ŸŅƚųÏåØ %ceØ åƋÏţš ±ųå ÏŅĹŸƋ±ĹƋĬƼ čųŅƵĜĹč ±ĹÚ ƵĜĬĬ čųŅƵ
åƴåĹü±ŸƋåų±ŸĹåƵŸƼŸƋåĵŸÏŅĵåŅĹĬĜĹåţ

eűĬƼDŽĜĹč Ĭ±ųčå ŸåƋŸ Ņü Ú±Ƌ± ƚŸĜĹč ƋåÏĘĹĜŧƚåŸ eF ϱĹ Ĭå±Ú ƋŅ
ĜĵŞųŅƴåÚ ĜĹŸĜčĘƋ ±ĹÚ ŞŅƋåĹƋĜ±ĬĬƼ ÆåƋƋåų ÚåÏĜŸĜŅĹŸ ô ĵ±ĩĜĹč
åĹƋåųŞųĜŸåŸ ĵŅųå ŸåÏƚųåţ ƚƋ ƋĘå ĀåĬÚ ĜŸ ŸƋĜĬĬ ƼŅƚĹčØ ±ĹÚ ƋĘåųå
±ųåĵ±ĹƼÏʱĬĬåĹčåŸƋʱƋĵƚŸƋÆåÏŅĹŸĜÚåųåÚ±ĬŅĹčƋĘåƵ±Ƽţkü
ŸŞåÏĜĀÏÏŅĹÏåųűųåÏĬå±ĹĜĹč±ĹÚŞųåŞ±ų±ƋĜŅĹØϱƚŸ±ĬĜƋƼØĵŅÚåĬ
Ƌų±ĹŸŞ±ųåĹÏƼ±ĹÚÚåÏĜŸĜŅĹŸƚŞŞŅųƋţ

ƵƵƵţÏƼųƐÏŅĹţ±Ĝ
Addressing the big four
Cleaning and preparation
A survey of data scientists conducted by researchers at Sandia National
Laboratories found that data collection and cleaning/preparation were the most
timeconsuming tasks for a big data analyst. Each of these tasks consumes 20
percent of the time in a typical project (median values). Individually, either could
consume up to 70 percent of time on some projects. The study results also
highlighted that most practitioners viewed these two steps — cleaning and
preparation —as the most critical in the process.

Tasks related to data cleaning and preparation can be highly dependent on the type
of data as well. For instance, consider a database of individual threat actors that
records a variety of attributes. Normalizing their information (i.e. whether the threat
±ÏƋŅųĜŸĜÚåĹƋĜĀåÚÆƼ±I±ÆÆåųĜÚØŸÏųååĹűĵåØÆĜƋÏŅĜĹƵ±ĬĬåƋØåƋÏţšϱĹ屟ĜĬƼÆåÏŅĵå
a tricky and time-consuming task. But its necessary for proper intelligence analysis
as identities must be understood in a consistent manner.

Figure 1. Social network analysis enabled by the


CYR3CON API allows for understanding threat actor
relationships even based on data from disparate sources.
The CYR3CON API takes care of many common cleaning
tasks so developers can focus on writing software as
opposed to organizing and preparing data.

www.cyr3con.ai
Causality
Many pundits extol the beĹåĀƋŸŅüeF±ĹÚĵ±ÏĘĜĹåĬå±ĹųĜĹčØĬå±ÚĜĹčƋŅƋĘåÆåĬĜåüƋʱƋ
ƋĘå ĵŅųå ĜĹüŅųĵ±ƋĜŅĹ ƼŅƚ ʱƴåØ ƋĘå ĘĜčĘåųěŧƚ±ĬĜƋƼ ųåŸƚĬƋŸ ƼŅƚ ƵĜĬĬ ŅÆƋ±ĜĹ üųŅĵ ±
čĜƴåĹ ±ĬčŅųĜƋĘĵţ ‰ĘĜŸ ĜŸ ĹŅƋ ĹåÏ域±ųĜĬƼ Ƌųƚåţ  FĹ ü±ÏƋØ Ĭ±ųčå Ú±Ƌ±ŸåƋŸ ĜĹÏų屟å ƋĘå
ĬĜĩåĬĜĘŅŅÚ Ņü ÏʱĹÏå ÏŅųųåĬ±ƋĜŅĹŸţ 8Ņų ĜĹŸƋ±ĹÏåØ ŮBĜĬ±ųĜŅƚŸ čų±ŞĘŸ ŞųŅƴå ƋʱƋ
ÏŅųųåĬ±ƋĜŅĹ ĜŸĹűƋ ϱƚŸ±ƋĜŅĹØŰ ±Ĺ ±ĵƚŸĜĹč ŞĜåÏå ŞƚÆĬĜŸĘåÚ ÆƼ ĵåÚĜ± čĜ±ĹƋ 8±ŸƋ
ŅĵޱĹƼØ ŸĘŅƵŸ ƋʱƋ ƋĘåųå åƻĜŸƋŸ ± ÏŅųųåĬ±ƋĜŅĹ ÆåƋƵååĹ ƋĘå ĹƚĵÆåų Ņü ĀĬĵŸ
üå±ƋƚųĜĹč±ÏƋŅųcĜÏŅĬ±Ÿ±čå±ĹÚƋĘåĹƚĵÆåųŅüÚųŅƵĹĜĹčÚå±ƋĘŸĜĹŸƵĜĵĵĜĹčŞŅŅĬŸ
ôƋƵŅŅÆƴĜŅƚŸĬƼƚĹųåĬ±ƋåÚƋĘĜĹčŸţ

±ƚŸ±ĬĜƋƼ ϱĹ Æå ±ÚÚų域åÚ ŅĹ ĵ±ĹƼ ĬåƴåĬŸţ  ‰Ęå ĀųŸƋ ÏĘåÏĩ ĜŸ 屟Ƽ× ĜŸ ƋĘå Ú±Ƌ±
ŸƚŞŞŅųƋĜĹčƋĘåeF±ĬčŅųĜƋĘĵ±ÏƋƚ±ĬĬƼĜĹÚĜϱƋĜƴåŅüƵʱƋƼŅƚ±ųåčŅĜĹčƋŅŞųåÚĜÏƋţFü
ƋĘå ĵåƋĘŅÚ ϱĹĹŅƋ ޱŸŸ ƋĘĜŸ ŸĜĵŞĬå Ƌ±ŸĩØ ƋĘåĹ ƋĘå åĹƋĜųå ±ŞŞųŅ±ÏĘ ŸĘŅƚĬÚ Æå
ųåÏŅĹŸĜÚåųåÚţ

Figure 2. CYR3CON Priority – used by enterprises to identify which vulnerabilities are most likely to be
åDŽŤĮŇЃåÚũƐƐ¥ƗkcƐ{ŹĞŇŹĞƒDžƐĞžƐķŇŹåƐåýåσĞƽåƐƒĚ±ĻƐŇƒĚåŹƐ±ŤŤŹŇ±ÏĚåžƐÆåϱƣžåƐЃƐĮåƽåŹ±ďåžƐÚ±ƒ±ƐƒĚ±ƒƐĞžƐ
predictive in nature – that obtained from the discussions of malicious hackers.

ƵƵƵţÏƼųƐÏŅĹţ±Ĝ
Transparency
Many machine learning approaches, when applied to data, produce a prediction or
attempt to label, classify or organize data in a certain way. These algorithms always
provide results that are correct with a certain probability. So, a machine learning
algorithm providing a prediction with an associated probability of 0.7 is only likely,
not guaranteed. On balance, many machine learning algorithms simply provide a
mathematical model, and an old adage goes “all models are wrong; some are
useful.” One thing that machine learning approaches generally do not provide, for
instance, is an explanation of how they arrived at a particular conclusion. This is
where secondary analysis may be required for further understanding – something
that is often ignored in practice. Even still, it is possible to garner better
understanding by examining the attributes of the data. For example, one can
examine which attribute of the data are more diagnostic, understand the cases
where machine learning algorithms fail, or even utilize a symbolic machine learning
approach – which will often tell the analyst how it arrived at a given conclusion.

Figure 3. CYR3CON DarkMention maked predictions in a transparent


manner.

www.cyr3con.ai
Decision support
The current state of the art for machine learning can provide interesting insights into
the data. These results are generally descriptive or predictive, however; they do not
give insight into a course of action that one may take based on the data. The good
ĹåƵŸ ĜŸ ƋʱƋ ƋĘå ųåÏåĹƋ ƋųåĹÚŸ ĜĹ ±ųƋĜĀÏĜ±Ĭ ĜĹƋåĬĬĜčåĹÏå åƻŞĬŅųåŸ ƋĘĜŸ ŞųŅÆĬåĵ Ņü
decision making – including the case of doing so under adversarial conditions. Even
if such techniques are not available for a given domain, it is important for data
analysts to ask the question 'Who cares?’ with regard to a particular result. It goes
beyond what the data says, to address the question of what courses of action does
it imply we take?

Figure 4. CYR3CON Priority is designed for decision support by helping vulnerability management
teams understand what vulnerabilities are most likely to be used in attacks.

www.cyr3con.ai
Closing thoughts
We need to be mindful, that AI and machine
learning, while highly useful, do not
represent a silver bullet for every problem
in cybersecurity. Even the best algorithms
will not provide good results if the analyst
does not understand how to properly use,
employ, or understand the results. In the
end, solid analysis depends on the human,
not solely the machine.

www.cyr3con.ai
(833) 229-0110 | sales@cyr3con.ai

www.cyr3con.ai

You might also like