You are on page 1of 88

Neo4j

in Depth

Max De Marzi

About Me
Max De Marzi - Neo4j Field Engineer

My Blog: http://maxdemarzi.com
Find me on Twitter: @maxdemarzi
Email me: maxdemarzi@gmail.com
GitHub: http://github.com/maxdemarzi

TLDR:

Property Graph Data Model

What you already know

The Problem

all JOINs are executed every time you query


(traverse) the relationship

executing a JOIN means to search for a key in


another table

with Indices executing a JOIN means to lookup a key


B-Tree Index: O(log(n))
more entries => more lookups => slower JOINs

143

Max
143

326

326

725
143

72

143

981

Da

N
QL
NoS

Con

ow

981

People

Big

ech
ta T

Attend

Da
iot
r
a
Ch

O
ta I

Conferences

143

Max
143

326

326
725

143

72

143

981

Big

Da

ech
ta T

N
QL
NoS

Con

ow

981

Da
riot
Cha

O
ta I

A Property Graph
Nodes
er
emb

m
uid: MDM
name: Max

member

uid: BDTC
where: Burlinggame

uid: NSN
where: San Francisco

mem

ber

Relationships

uid: CDIO
where: Philadelphia

Neo4j Secret Sauce

Pointers instead of look-ups


Fixed sized records for fast access
Do all your Joining on creation
Spin spin spin through this data
structure

Relational Databases Cant Handle Relationships Well


Cannot model or store data and relationships
without complexity
Performance degrades with number & levels of
relationships, and database size
Query complexity grows with need for JOINs
Adding new types of data and relationships
requires schema redesign, increasing time to
market
making traditional databases inappropriate when
relationships are valuable in real-time

Slow development
Poor performance
Low scalability
Hard to maintain

NoSQL Databases Dont Handle Relationships


No data structures to model or store
relationships
No query constructs to support
relationships
Relating data requires JOIN logic in the
application
No ACID support for transactions
making NoSQL databases inappropriate when
relationships are valuable in real-time

Real-Time Query Performance

Response Time

Performance must hold steady with scale


Relational and
Other NoSQL
Databases

Neo4j is
1000x faster
Reduces minutes
to milliseconds
Neo4j

0 to 2 hops
0 to 3 degrees
Thousands of connections

Tens to hundreds of hops


Thousands of degrees
Billions of connections

Connectedness and Size of Data Set

Re-Imagine Your Data as a Graph


Neo4j is an enterprise-grade graph
database that enables you to:
Model and store your data as a
graph
Query relationships with ease
and in real-time
Seamlessly evolve applications
to support new requirements by
adding new kinds of data and
relationships

Agile development
High performance
Vertical and horizontal scale
Seamless evolution

Neo4j Overview
Company
Neo Technology, Creator of Neo4j
80 employees with HQ in Silicon
Valley, London, Munich, Paris and
Malm
$45M in funding from Fidelity,
Sunstone, Conor, Creandum,
Dawn Capital

Product
Neo4j - Worlds leading graph
database
1M+ downloads, adding 50k+
per month
150+ enterprise subscription
customers including over
50 of the Global 2000

Neo4j: The Graph Database Leader


First
Invented native
property graph DB
graph
in 24/7
model production

Contributed
first graph
DB to open
source

2000 2003 2007 2009


Commercial
Leadership

Funding

Introduced Cypher Published


a declarative query OReilly book
language for
on Graph
property graphs
Databases

2011

2013

Extended
graph data
model to
labeled
property graph

2014

Technical
Leadership

2015

First Global 2000 GraphConnect, 150+ customers


first conference
Customer
50K+ monthly
for graph DBs
downloads

$2.5M Seed $11M Series A


Round from from Fidelity,
Sunstone
Sunstone
and Conor
and Conor

$11M Series B
from Fidelity,
Sunstone
and Conor

500+ graph
DB events
worldwide

$20M Series C
led by
Creandum, with
Dawn and
existing investors

Neo4j Leads the Graph Database Revolution


Graph analysis is possibly the single most effective competitive
differentiator for organizations pursuing data-driven operations
and decisions after the design of data capture.
Forrester estimates that over 25% of enterprises will be using
graph databases by 2017
Neo4j is the current market leader in graph databases.
1. IT Market Clock for Database Management Systems, 2014
2. TechRadar: Enterprise DBMS, Q1 2014
3.Graph Databases and Their Potential to Transform How We Capture Interdependencies (Enterprise Management Associates)

Two-Minute Video Demo


Building a Recommendation Engine in 2 Minutes with Neo4j
Developer Experience: Neo4j UI with Cypher Query Language

https://www.youtube.com/watch?v=qbZ_Q-YnHYo

Neo4j Key Product Features


Native Graph Storage
Ensures data consistency and
performance

The Graph Query Language: Cypher


Requires 10x to 100x less code than SQL

Native Graph Processing


Millions of hops per second, in real time

Scalability and High Availability


Vertical and horizontal scaling optimized
for graphs

Whiteboard Friendly Data Modeling


Model data as it naturally occurs

Built-in ETL
Seamless import from other databases

High Data Integrity


Fully ACID transactions

Integration
Drivers and APIs for popular languages
MATCH
(A)

Property Graph Model Components

Relationships
Relate nodes by type and direction
Can have properties

name: Dan
born: May 29, 1970
twitter: @dan

name: Ann
born: Dec 5, 1975

LOVES
LOVES

PERSON

DR

PERSON

LIVES WITH
IV

OW
NS

Nodes
The objects in the graph
Can have properties
Can be labeled

ES

since:
Jan 10, 2011

CAR

brand: Volvo
model: V70

Triple Store/RDF Model


Resource Description Framework
Subject, Predicate, Object
Standard Data Model
Names for subjects, predicates,
objects must be URIs
Names must be Global
No properties on the Relationships
Like 3rd Normal Form for Relational
Databases (but really more like 5/6th)

Property Graph Data Model (Movies)

RDF Data Model (Movies)

Property Graph Vs Triple Store


Property Graph is a more generic case of the Triple Store
Lack of properties on relationships for Triple Stores reduce ( or
complicate) their expressive power

Query Languages
Graph Databases:
Cypher - declarative, pattern
matching, easy to understand
Gremlin - imperative, step
driven, math inspired
Native APIs (Java, REST)

Triple Stores:
SPARQL (standard)
PROLOG (or prolog-like
languages)

General Use Cases


Graph Databases:
Triple Stores:
Local Queries (anchor on a
Global Queries (find pattern in
node or set of nodes then
large volume of information)
traverse)
Browsing Content
Realtime (<20ms) requirements
Inference Discovery
Complex, deep traversals
Flexible graph models

How do you model Flight Data?

How do you model Flight Data?

How do you model Flight Data?

How do you model Flight Data?

How do you model Flight Data?

How do you model Flight Data?

How do you model Flight Data?

How do you model Comic Books?

How do you model a world where anything can happen?

Graph Databases allow Model Flexibility

Watch the presentation at:


https://vimeo.com/79399404

Java CORE API


Direct access to Nodes and
Relationships

Java Core API


Step by Step from GraphDatabaseService
Start a transaction (reads and writes)
findNode(Label, Property, Value)
findNodes(Label, Property, Value)
findNodes(Label)
getNodeById(Long)
getRelationships(Direction, Type)
getProperty(Property, (optional) Default Value)

Example (get the friends of a user)

Traversal API

Describe Traversals

Traversal API

Start with the Simple Defaults (order, relationships,


depth, uniqueness, etc)
Custom Expanders
Where should I go next
Custom Evaluators
Ive gone there should I accept this path?

Traversal API Example

Cypher Query Language

ASCII Art Pattern Matching

Cypher: Powerful and Expressive Query Language

LOVES

Dan

Ann

Node

Node

MATCH (:Person { name:Dan} ) -[:LOVES]-> (:Person { name:Ann} )


Label

Property

Label

Property

Express Complex Queries Easily with Cypher


SQL Query

Cypher Query
MATCH (boss)-[:MANAGES*0..3]->(sub),
(sub)-[:MANAGES*1..3]->(report)
WHERE boss.name = John Doe
RETURN sub.name AS Subordinate,
count(report) AS Total

Find all direct reports and


how many people they manage,
up to 3 levels down

Hello World Recommendation

Hello World Recommendation

Movie Data Model

Cypher Query: Movie Recommendation


What are the Top 25 Movies
that I haven't seen
with the same genres as Toy Story
given high ratings
by people who liked Toy Story
MATCH (watched:Movie {title:"Toy Story}) <-[r1:RATED]- () -[r2:RATED]-> (unseen:Movie)
WHERE r1.rating > 7 AND r2.rating > 7
AND watched.genres = unseen.genres
AND NOT( (:Person {username:maxdemarzi"}) -[:RATED|WATCHED]-> (unseen) )
RETURN unseen.title, COUNT(*)
ORDER BY COUNT(*) DESC
LIMIT 25

Movie Data Model

Cypher Query: k-NN Recommendation


What are the Top 25 Movies
that Zoltan Varju has not seen
using the average rating
by my top 3 neighbors
MATCH (m:Movie) <-[r:RATED]- (b:Person) -[s:SIMILARITY]- (p:Person {name:'Zoltan Varju'})
WHERE NOT( (p) -[:RATED|WATCHED]-> (m) )
WITH m, s.similarity AS similarity, r.rating AS rating
ORDER BY m.name, similarity DESC
WITH m.name AS movie, COLLECT(rating)[0..3] AS ratings
WITH movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS recommendation
ORDER BY recommendation DESC
RETURN movie, recommendation
LIMIT 25

Neo4j Interface

Server, Service, Library

High Speed Fraud - 1000 R/S

http://maxdemarzi.com/2014/02/12/online-payment-risk-management-with-neo4j/

High Speed Fraud - 8000 R/S

http://maxdemarzi.com/2014/02/27/neo4j-at-ludicrous-speed/

High Speed Fraud - 28000 R/S

http://maxdemarzi.com/2014/03/10/its-over-9000-neo4j-on-websockets/

Neo4j

Additional Features

Neo4j Clustering
Architecture Optimized for Speed & Availability at Scale
Clustering Features:
Master-slave replication with
master re-election and failover
Each instance has its own local cache
Horizontal scaling & disaster recovery

Performance Benefits:
No network hops within queries
Real-time operations with fast and
consistent response times
Cache sharding spreads cache across
cluster for very large graphs

Load Balancer

Neo4j

Neo4j

Neo4j

57

Getting Data into Neo4j


Cypher-Based LOAD CSV Capability
Transactional (ACID) writes
Initial and incremental loads of up to
10 million nodes and relationships
Command-Line Bulk Loader neo4j-import
For initial database population
For loads with 10B+ records
Up to 1M records per second

4.58 million things

and their relationships


Loads in 100 seconds!

Neo4j Fits into Your Enterprise Environment


Application
Databases

End User

Bulk Analytic
Infrastructure

Graph Compute Engine


Hadoop EDW

Data Mining
and Aggregation

ETL

Data
Scientist

Neo4j

Neo4j

Neo4j

Graph Database Cluster

Ad Hoc
Analysis

Data Storage and


Business Rules Execution

ETL

Value from Relationships Common Use Cases


Internal Applications

Customer-Facing Applications

Master Data Management

Real-time Recommendations
Graph-based Search
Identity and
Access Management

Network and IT
Operations
Fraud Detection

Open Corporates
Uses Neo4j

Open Corporates

Open Corporates
Uses Neo4j

https://skillsmatter.com/skillscasts/4097-case-study-how-opencorporates-uses-neo4j-to-provide-insight

Open Source Examples

http://maxdemarzi.com/2012/10/18/matchesare-the-new-hotness/

What are the Top 10 Jobs for me


that are in the same location Im in
for which I have the necessary qualifications

Partial Subgraph Search

Recommend Love
Find your soulmate in the graph
Are they energetic?
Do they like dogs?
Have a good sense of humor?
Neat and tidy, but not crazy about it?
What are the Top 10 Potential Mates for me
that are in the same location
are sexually compatible
have traits I want
want traits I have

Love Recommendation

Two Party Partial Subgraph Search

http://maxdemarzi.com/2013/04/19/match-making-with-neo4j/

Real-Time Recommendations with Neo4j


Social
Recommendations

Products
and Services

Content

Routing

Walmart BUSINESS CASE


Needed online customer recommendations to
keep pace with competition
Data connections provided predictive context, but
were not in a usable format
Solution had to serve many millions of customers
and products while maintaining superior
scalability and performance

Worlds largest company


by revenue
Worlds largest retailer and
private employer
SF-based global
e-commerce division
manages several websites
Found in 1969
Bentonville, Arkansas

Walmart SOLUTION
Brings customers, preferences, purchases,
products and locations into a graph model
Uses connections to make product
recommendations
Solution deployed across WalMart
divisions and websites

Global Courier BUSINESS CASE


Needed to replace aging B2B and B2C parcel routing
system whose requirements include:
24x7 availability
Peak loads of 5M parcels per day, 3K per second
Support for complex and diverse software stack
Predictable performance with linear scalability
Daily changes to logistics networks
Route from any point to any point
Single point of truth for entire network

Worlds largest courier


480,000 employees
55 billion in revenue
Needed new
B2C and B2B parcel routing
system for its logistics
practice
Legacy system neither
supported the full network
nor the shift to online
demands

Global Courier SOLUTION


Neo4j provides the ideal domain fit since
a logistics network is a graph
High availability and performance via Neo4j
clustering
Greatly simplified Cypher queries for routing
versus relational SQL queries
Flexible data model that reflects the real
logistics world far better than relational
Easy-to-grasp whiteboard-friendly model

eBay BUSINESS CASE


Needed an offering to compete with
Amazon Prime
Enable customer-selected delivery inside
90 minutes
Calculate best route option in real-time
Scale to enable a variety of services
Offer more predictable delivery times

C2C and B2C


retail network
Full e-commerce
functionality for individuals
and businesses
Integrated with logistics
vendors for product
deliveries

eBay Now SOLUTION


Acquired UK-based Shutl. a leader
in same-day delivery
Used Neo4j to create eBay Now
1000 times faster than the prior
MySQL-based solution
Faster time-to-market
Improved code quality with
10 to 100 times less query code

Classmates BUSINESS CASE


Develop new social networking capabilities to
monetize yearbook-related offerings
Show all the people I know in a yearbook
Show yearbooks my friends appear in most often
Show sections of a yearbook that my friends
appear most in
Show me other schools my friends attended

Online yearbook
connecting friends from
school, work and military
in US and Canada
Founded as
Memory Lane in Seattle

Classmates SOLUTION
Neo4j provides a robust and scalable graph
database solution
3-instance cluster with cache sharding and
disaster-recovery
18ms response time for top 4 queries
100M nodes and 600M relationships in
initial graphincluding people, images,
schools, yearbooks and pages
Projected to grow to 1B nodes and 6B
relationships

National Geographic BUSINESS CASE


Improve poor performance of PostgreSQL app
Increase user engagement by linking to 100+ years
of multimedia content
Improve targeting by understand subscribers
interests better
Recommend content and services to users based
on their interests

Non-profit scientific and


educational institution
founded in 1888
Covers geography,
archaeology, natural science,
environment and historical
conservation
Journals, online media,
radio, TV, documentaries,
live events and consumer
content and goods

National Geographic SOLUTION


Enabled complex real-time analytics across
eight million users and a century of content
Delivered robust performance by eliminating
triple-nested SQL joins
Cross-refers users among content, live events,
travel, goods and causes
Neo4j solution much less cumbersome
and easier to maintain than previous
SQL system

Curaspan BUSINESS CASE


Improve poor performance of Oracle solution
Support more complexity including granular,
role-based access control

Satisfy complex Graph Search queries by discharge


nurses and intake coordinators
Find a skilled nursing facility within n miles of a
given location, belonging to health care group
XYZ, offering speech therapy and cardiac care,
and optionally Italian language services

Leader in patient
management for discharges
and referrals
Manages patient referrals
4600+ health care facilities
Connects providers, payers
via web-based patient
management platform
Founded in 1999 in
Newton, Massachusetts

Curaspan SOLUTION
Met fast, real-time performance demands
Supported queries span multiple hierarchies
including provider and employee-permissions
graphs
Improved data model to handle adding more
dimensions to the data such as insurance
networks, service areas and care organizations
Greatly simplified queries, simplifying
multi-page SQL statements into one
Neo4j function

FiftyThree BUSINESS CASE


Add social capabilities to digital-paper app
Support social collaboration across millions of
users in new Mix app
Enable seamless interaction between social
and content-asset networks
Ensure new apps are robust, scalable and fast

Maker of Paper,
one of the top apps
in Apples App Store, with
millions of users
Based in New York City

FiftyThree SOLUTION
Neo4j data model ideal for social network, content

management and access control


Users create, publish and share designs simply
Easy to develop and evolve Neo4j-based app
Integrates well with FiftyThree EC2 architecture

See the Neo4j solution in action


Betting the Company (Literally) on a Graph Database
http://aseemk.com/talks/neo4j-lessons-learned#/

App Store Editors Choice


2012 iPad App of Year
Apple Best Apps of 2014

Users Love Neo4j

jQuery Inventor

Heroku Founder

THANK YOU

You might also like