Neo4j is a high performance graph database. It is ACID-compliant transactional database with native graph storage and processing.

Following figure shows the basic concepts in Neo4j:

The Labeled Property Graph Model

It contains Nodes, Relationships, Properties and Labels:

  • Nodes are the main elements and connected via relationships
  • Nodes can have one or more properties (i.e., key/value pairs) and labels
  • Relationships are directed and have one or more properties
  • Properties are named values where the name(key) is a string
  • Labels are used to group nodes into sets(groups)

Neo4j allows users to use a SQL-like language: Cypher to manipulate the stored graphs.

The Graph Query Language: Cypher

Cypher is a graph query language that allows for expressive and efficient querying of graph data. Like SQL, Cypher is a declarative query language that allows users to state what actions they want to be performed (such as match, insert update or delete) upon their graph data without requiring them to describe exactly how to do it.

Most of the keywords of Cypher are inspired by SQL (e.g. WHERE and ORDER BY). The pattern matching borrows from SQARQL.

Cypher example

  1. Create a person node in database:

    CREATE (ee:Person {name: "Emil", from: "Sweden"})
    
    • CREATE clause to create data
    • () parenthesis to indicate a node
    • ee:Person a variable ‘ee’ and label ‘Person’ for the new node
    • {} brackets to add properties to the node
  2. Find the node representing Emil:

    MATCH (ee:Person) WHERE ee.name = "Emil" RETURN ee;
    
    • MATCH clause to specify a pattern of nodes and relationships
    • (ee:Person) a single node pattern with label ‘Person’ which will assign matches to the variable ‘ee’
    • WHERE clause to constrain the results
    • ee.name = "Emil" compares name property to the value “Emil”
    • RETURN clause used to request particular results
  3. Find Emil’s friends:

    MATCH (ee:Person)-[:KNOWS]-(friends)
    WHERE ee.name = "Emil" RETURN ee, friends
    
    • -[:KNOWS] matches “KNOWS” relationships (in eigher direction)
    • (friends) will be bound to Emil’s friends
  4. Johan is learning to surf. He wants to find a friend’s friend who already does:

    MATCH (js:Person)-[:KNOWS]-()-[:KNOWS]-(surfer)
    WHERE js.name="Johan" AND surfer.hobby = "surfing"
    RETURN DISTINCT surfer
    
    • () empty parenthesis to ignore these nodes
    • DISTINCT because more than one path will match the pattern

Graph algorithms with Neo4j

Neo4j has an efficient graph algorithm library. This library warps graph algorithms as Cypher procedures. It currently contains implementations of the following algorithms:

  • Centralities
    • Page Rank (algo.pageRank)
    • Betweenness Centrality (algo.betweenness)
    • Closeness Centrality (algo.closeness)
  • Community Detection
    • Louvain (algo.louvain)
    • Label Propagation (algo.labelPropagation)
    • Weakly Connected Components (algo.unionFind)
    • Strongly Connected Components (algo.scc)
    • Triangle Count/ Clustering Coefficient (algo.triangleCount)
  • Path Finding
    • Minimum Weight Spanning Tree (algo.mst)
    • All Pairs and Single Source Shortest Path (algo.shortestPath and algo.allShortestPaths)

The neo4j graph algorithm library is implemented in Java by using the neo4j Java APIs.

Example: Single Shortest path

  • Find the shortest path of any relationships from Bacon to Meg Ryan

    MATCH p = shortestPath(
    (bacon:Person {name:"Kevin Bacon"})-[*]-(meg:Person {name:"Meg Ryan"})
    ) RETURN p
    

    results: shortest path from *Bacon* to *Meg Ryan*

Example: Betweenness Centrality

  • Compute the betweenness Centrality for all nodes with User label on MANAGE relationship

    CALL algo.betweenness.stream('User','MANAGE',{direction:'out'})
    YIELD nodeId, centrality
    RETURN nodeId,centrality order by centrality desc limit 20;
    

    results:

    ╒════════╤════════════╕
    │"nodeId"│"centrality"│
    ╞════════╪════════════╡
    │1217    │4           │
    ├────────┼────────────┤
    │1219    │2           │
    ├────────┼────────────┤
    │1218    │0           │
    ├────────┼────────────┤
    │1220    │0           │
    ├────────┼────────────┤
    │1221    │0           │
    ├────────┼────────────┤
    │1222    │0           │
    └────────┴────────────┘
    

RDBMS vs Neo4j

Feature RDBMS Neo4j
Data Storage fixed, pre-defined tables with rows and columns with connected data often disjointed between tables Graph storage structure with index-free adjacency results in faster transactions and processing for data relationships
Query Performance Data processing performance suffers with the number and depth of JOINs Graph processing ensures zero latency and real-time performance, regardless of the number or depth of relationships
Query Language SQL Cypher
Processing at Scale Scales out through replication and scale up architecture is possible but costly. Complex data relationships are not harvested at scale Graph model inherently scales for pattern-based queries. Scale out architecture maintains data integrity via replication.

Competitors

OrientDB

OrientDB is an Open Source Multi-Model NoSQL DBMS with the support of Native Graphs, Documents Full-Text. Reactivity, Geo-Spatial and Object Oriented concepts.

OrientDB vs Neo4j

  • Performance

    According to an independent benchmark by Tokyo Institute of Technology and IBM Research, OrientDB is 10x faster than Neo4j on Graph operations among all the workloads.

  • Query Language

    Neo4j uses Cypher while OrientDB using the SQL with some extensions to manipulate trees and graphs.

    one example: Retrieve the actor’s name and their movies from the actor vertex:

    • OrientDB SQL SELECT
    SELECT name, out('ACTS').title FROM Person WHERE name='Robin'
    
    • Neo4j Cypher
    MATCH (act:Person {name: 'Robin'})-[:ACTS_IN]->(movie) RETURN act.name, movie.title
    
  • Graph Algorithms

    OrientDB has some builtin SQL functions, for exmaple, shortestPath(), dijkstra() and astar(). Neo4j has a separated algorithm library which supports more algorithms.

    one example in OrientDB: find the shortest path between vertices #8:32 and #8:10 only crossing outgoing edges:

    SELECT shortestPath(#8:32, #8:10, 'OUT')
    

ArangoDB

ArangoDB is a native multi-model database with flexible data models for documents, graphs and key-values writing in C++.

It uses ArangoDB Query Language(AQL) as its query language. The AQL is similar to the SQL in its purpose.

Some examples:

  • INSERT a single row/document

    • SQL
    INSERT INTO users (name, gender) 
    VALUES ("John Doe", "m");
    
    • AQL
    INSERT { name: "John Doe", gender: "m" } 
    INTO users
    
  • UPDATE a single row / document

    • SQL
    UPDATE users 
    SET name = "John Smith"
    WHERE id = 1;
    
    
    • AQL
    UPDATE { _key: "1" }
    WITH { name: "John Smith" }
    IN users
    

Neo4j vs OrientDB vs ArangoDB

Feature Neo4j OrientDB ArangoDB
Data Model graph document,objects,graph,key-value,geo-spatial document,graph,key-value
Data format JSON table JSON
Data Storage Neo4j graph storage PLocal/Memory MMFiles/RocksDB
Transaction Model ACID ACID ACID
Query Language Cypher SQL with extension AQL
Graph computation Support builtin/separated library builtin builtin
Graph Algorithms rich rare rare

Reference