I have needed to store a lot of data in my time and I’ve used a lot of the big contenders: PostgreSQL, MySQL, SQLite, Redis, and MongoDB. While I’ve built up extensive experience with these tools, I wouldn’t say that any of them have ever made the task fun. I fell in love with Ruby because it was fun and because it let me do more powerful things by not getting in my way. While I didn’t realize it, the usual suspects of data persistence were getting in my way. But I’ve found a new love: let me tell you about Neo4j.
What is Neo4j?
Neo4j is a graph database! That means that it is optimized for managing and querying connections (relationships) between entities (nodes) as opposed to something like a relational database which uses tables.
Why is this great? Imagine a world with no foreign keys. Each entity in your database can have many relationships referring directly to other entities. If you want to explore the relationships there are no table or index scans, just a few connections to follow. This matches up well with the typical object model. It is more powerful, though, because Neo4j, while providing a lot of the database functionality that we expect, gives us tools to query for complex patterns in our data.
Introducing ActiveNode
To connect to Neo4j we’ll be using the neo4j
gem. You can find instructions for connecting to Neo4j in your Rails application in the gem’s documentation. Also the app with the code shown below is available as a running Rails app in this GitHub repository (use the sitepoint
Git branch). When you’ve got your database up and running use the rake load_sample_data
command to populate your database.
Here is a basic example of an Asset
model from an asset management Rails app:
app/models/asset.rb
class Asset
include Neo4j::ActiveNode
property :title
has_many :out, :categories, type: :HAS_CATEGORY
end
Let’s break this down:
- The
neo4j
gem gives us theNeo4j::ActiveNode
module, which weinclude
to make a model. - The class name
Asset
means that this model will be responsible for all nodes in Neo4j labeledAsset
(labels play a similar role to table names except that a node can have many labels). - We have a
title
property to describe the individual nodes - We have an outgoing
has_many
association forcategories
. This association helps us findCategory
objects by followingHAS_CATEGORY
relationships in the database.
With this model we can perform a basic query to find an asset and get it’s categories:
2.2.0 :001 > asset = Asset.first
=> #<Asset uuid: "0098d2b7-a577-407a-a9f2-7ec4153cfa60", title: "ICC World Cup 2015 ">
2.2.0 :002 > asset.categories.to_a
=> [#<Category uuid: "91cd5369-605c-4aff-aad1-b51d8aa9b5f3", name: "Classification">]
Anybody familiar with ActiveRecord
or Mongoid
will have seen this hundreds of times. To get a bit more interesting, let’s define a Category
model:
class Category
include Neo4j::ActiveNode
property :name
has_many :in, :assets, origin: :categories
end
Here our association has an origin
option to reference the categories
association on the Asset
model. We could instead specify type: :HAS_CATEGORY
again if we wanted to.
Creating Recommendations
What if we wanted to get all assets that share a category with our asset?
2.2.0 :003 > asset.categories.assets.to_a
=> [#<Asset uuid: "d2ef17b5-4dbf-4a99-b814-dee2e96d4a09", title: "WineGraph">, ...]
So what just happened? ActiveNode generated a query to the database which specified a path from our asset to all other assets which share a category. The database then returned just those assets to us. Here’s the query that it used:
MATCH
asset436, asset436-[rel1:`HAS_CATEGORY`]->(node3:`Category`),
node3<-[rel2:`HAS_CATEGORY`]-(result_assets:`Asset`)
WHERE (ID(asset436) = {ID_asset436})
RETURN result_assets
Parameters: {ID_asset436: 436}
This is a query language called Cypher, which is Neo4j’s equivalent to SQL. Note particularly the ASCII art style of parentheses surrounding node definitions and arrows representing relationships. This Cypher query is a bit more verbose because ActiveNode generated it algorithmically. If a human were to write the query it would look something like:
MATCH source_asset-[:HAS_CATEGORY]->(:Category)<-[:HAS_CATEGORY]-(result_assets:Asset)
WHERE ID(source_asset) = {source_asset_id}
RETURN result_assets
Parameters: {source_asset_id: 436}
I find Cypher easier and more powerful than SQL, but we won’t worry too much about Cypher in this article. If you want to learn more later you can find great tutorials and a thorough refcard.
As you can see, we can use Neo4j to span across our entities. Big deal! We can also do this in SQL with a couple of JOINS
. While Cypher seems cool, we’re not breaking any major ground yet. What if we wanted to use this query to make some asset recommendations based on shared categories? We’ll want to sort the assets to rank those with the most categories in common. Let’s create a method on our model:
class Asset
...
Recommendation = Struct.new(:asset, :categories, :score)
def asset_recommendations_by_category(common_links_required = 3)
categories(:c)
.assets(:asset)
.order('count(c) DESC')
.pluck('asset, collect(c), count(c)').reject do |_, _, count|
count < common_links_required
end.map do |other_asset, categories, count|
Recommendation.new(other_asset, categories, count)
end
end
end
There are a few interesting things to note here:
- We are defining variables as part of our chain to use later (
c
andasset
). - We are using the Cypher
collect
function to give us a result column containing an array of the shared categories (see the table below). Also note that we are getting full objects, not just columns/properties:
asset | collect(c) | count(c) |
---|---|---|
#<Asset> | [#<Category>] | 1 |
#<Asset> | [#<Category>, #<Category>, …] | 4 |
#<Asset> | [#<Category>, #<Category>] | 2 |
… | … | … |
Did you notice that there is not a GROUP BY
clause? Neo4j is smart enough to realize that collect
and count
are aggregation functions and it groups by the non-aggregation columns in our result (in this case that’s just the asset
variable).
Take that SQL!
As a last step we can make recommendations on more than just categories in common. Image that we have the following sub-graph in Neo4j:
In addition to shared categories, let’s account for how many creators and viewers assets have in common:
class Asset
...
Recommendation = Struct.new(:asset, :score)
def secret_sauce_recommendations
query_as(:source)
.match('source-[:HAS_CATEGORY]->(category:Category)<-[:HAS_CATEGORY]-(asset:Asset)').break
.optional_match('source<-[:CREATED]-(creator:User)-[:CREATED]->asset').break
.optional_match('source<-[:VIEWED]-(viewer:User)-[:VIEWED]->asset')
.limit(5)
.order('score DESC')
.pluck(
:asset,
'(count(category) * 2) + (count(creator) * 4) + (count(viewer) * 0.1) AS score').map do |other_asset, score|
Recommendation.new(other_asset, score)
end
end
end
Here we delve deeper and start forming our own query. The structure is the same but, rather than finding just one path between two assets via a shared category, we also specify two more optional paths. We could make all three paths optional, but then Neo4j would need to compare our asset with every other asset in the database. By using a match
rather than an optional_match
for our path through Category
nodes we require that there be at least one shared category. This vastly limits our search space.
In the diagram there is one shared category, zero shared creators, and two shared viewers. This means that the score between “Ruby” and “Ruby on Rails” would be:
(1 * 2) + (0 * 4) + (2 * 0.1) = 2.2
Also note that we’re doing a calculation (and sorting) on a count
aggregation of these three paths. That’s so cool to me that it makes me tingle a little to think about it…
Easy Authorization
Let’s tackle another common problem. Suppose your CEO comes by your desk and says “We’ve built a great app, but customers want to be able to control who can see their stuff. Could you build in some privacy controls?” It seems simple enough. Let’s just throw on a flag to allow for private assets:
class Asset
...
property :public, default: true
def self.visible_to(user)
query_as(:asset)
.match_nodes(user: user)
.where("asset.public OR asset<-[:CREATED]-user")
.pluck(:asset)
end
end
With this you can display all of the assets which a user can see either because the asset is public or because the viewer owns it. No problem, but again not a big deal. In another database you could just do a query on two columns/properties. Let’s get a bit crazier!
The Product Manager comes to you and says “Hey, thanks for that, but now people want to be able to give other users direct access to their private stuff”. No problem! You can build a UI to let users add and remove VIEWABLE_BY
relationships for their assets and then query them like so:
class Asset
...
def self.visible_to(user)
query_as(:asset)
.match_nodes(user: user)
.where("asset.public OR asset<-[:CREATED]-user OR asset-[:VIEWABLE_BY]->user")
.pluck(:asset)
end
end
That would have been a join table otherwise. Here you just throw in another path by which users can have access to an asset. You take a moment to appreciate Neo4j’s schemaless nature.
Satisfied with your days’ work you lean back in your chair and sip your afternoon coffee. Of course, that’s when the Social Media Customer Care Representative drops by to say “Users love the new feature, but they want to be able to create groups and assign access to groups. Can you do that? Oh, also, could you allow for an arbitrary hierarchy of groups?” You stare deeply into their eyes for a few minutes before responding: “Sure!”. Since this is starting to get complicated, let’s look at an example:
If both of the assets are private your code so far gives Matz and tenderlove access to Ruby and DHH access to the Ruby on Rails. To add group support you start by following directly assigned groups:
class Asset
...
def self.visible_to(user)
query_as(:asset)
.match_nodes(user: user)
.where("asset.public OR asset<-[:CREATED]-user OR asset-[:VIEWABLE_BY]->user OR asset-[:VIEWABLE_BY]->(:Group)<-[:BELONGS_TO]-user")
.pluck('DISTINCT asset')
end
end
That was pretty easy, since you just needed to add another path. It’s two hops, sure, but that’s old hat for us by now. Tenderlove and Yehuda will be able to see the “Ruby on Rails” asset because they are members of the “Railsists” group. Also note: now that some users have multiple paths to an asset (like Matz to Ruby via the Rubyists group and via the CREATED
relationship) you need to return DISTINCT asset
.
Specifying an arbitrary path through a hierarchy of groups takes you a bit more time, though. You look through the Neo4j documentation until you find something called “variable relationships” and give it a shot:
class Asset
...
def self.visible_to(user)
query_as(:asset)
.match_nodes(user: user)
.where("asset.public OR asset<-[:CREATED]-user OR asset-[:VIEWABLE_BY]->user OR asset-[:VIEWABLE_BY]->(:Group)<-[:HAS_SUBGROUP*0..5]-(:Group)<-[:BELONGS_TO]-user")
.pluck('DISTINCT asset')
end
end
Here you’ve done it! This query will find assets accessible to a group and traverse any set of zero to five HAS_SUBGROUP
relationships, finally ending on a check to see if the user is in the last group. You’re the hero of the story and your company showers you with bonuses for getting the job done so quickly!
Conclusion
There are many awesome things that you can do with Neo4j (including using it’s amazing web interface to explore your data with Cypher) which I’m not able to cover. Not only is it a great way to store your data in an easy and intuitive way, it provides a lot of benefits for efficient querying of highly connected data (and believe me your data is highly connected, even if you don’t realize it). I encourage you to check out Neo4j and give it a try for your next project!
Frequently Asked Questions about Using Neo4j in Your Next Ruby App
What are the benefits of using Neo4j with Ruby?
Neo4j is a graph database that provides a flexible and efficient way to store, process, and query data. When used with Ruby, a dynamic, open-source programming language, it allows developers to create powerful, data-driven applications. The combination of Ruby’s simplicity and Neo4j’s powerful graph processing capabilities makes it an excellent choice for developing complex applications that require efficient data handling and manipulation.
How do I get started with Neo4j in Ruby?
To get started with Neo4j in Ruby, you first need to install the ‘neo4j’ gem. This can be done by adding gem 'neo4j'
to your Gemfile and running bundle install
. Once the gem is installed, you can establish a connection to your Neo4j database using the Neo4j::Session.open
method.
How do I perform CRUD operations in Neo4j using Ruby?
CRUD operations in Neo4j using Ruby can be performed using the ActiveGraph gem. This gem provides a set of methods that allow you to create, read, update, and delete nodes and relationships in your Neo4j database. For example, to create a new node, you can use the create
method, like so: Person.create(name: 'John Doe')
.
How can I use Cypher queries in Ruby?
Cypher is Neo4j’s query language, and it can be used in Ruby through the query
method provided by the ‘neo4j’ gem. This method allows you to write and execute Cypher queries directly from your Ruby code. For example, to find all people named ‘John Doe’, you could use the following code: Neo4j::Session.query("MATCH (p:Person {name: 'John Doe'}) RETURN p")
.
What are the best practices for using Neo4j with Ruby?
When using Neo4j with Ruby, it’s important to follow best practices to ensure your application is efficient and maintainable. These include using indexes to speed up queries, keeping your Cypher queries as simple as possible, and using the ActiveGraph gem to handle CRUD operations.
How can I handle relationships in Neo4j using Ruby?
Relationships in Neo4j can be handled in Ruby using the ActiveGraph gem. This gem provides methods for creating, querying, and manipulating relationships between nodes. For example, to create a relationship between two nodes, you can use the relate_to
method, like so: john_doe.relate_to(jane_doe, 'KNOWS')
.
How can I optimize my Neo4j queries in Ruby?
Optimizing your Neo4j queries in Ruby can be done by using indexes, keeping your queries as simple as possible, and avoiding expensive operations like full graph scans. Additionally, you can use the EXPLAIN
and PROFILE
keywords in your Cypher queries to understand how they are being executed and identify potential performance issues.
How can I handle errors in Neo4j using Ruby?
Errors in Neo4j can be handled in Ruby using standard error handling techniques. The ‘neo4j’ gem provides a set of custom error classes that you can use to catch and handle Neo4j-specific errors. For example, you can use a begin-rescue
block to catch and handle a Neo4j::ActiveNode::Labels::RecordNotFound
error.
Can I use Neo4j with Ruby on Rails?
Yes, you can use Neo4j with Ruby on Rails. The ‘neo4j’ gem provides full support for Rails, including integration with ActiveRecord and ActiveSupport. This allows you to use Neo4j as your database in a Rails application, and take advantage of Rails’ powerful ORM capabilities.
How can I secure my Neo4j database when using it with Ruby?
Securing your Neo4j database when using it with Ruby can be done by following standard security practices. These include using strong, unique passwords for your database, enabling encryption for data in transit and at rest, and keeping your Neo4j and Ruby software up to date. Additionally, the ‘neo4j’ gem provides support for using SSL to secure your database connections.
Brian Underwood is a developer advocate for Neo4j and one of the maintainers of the neo4j.rb project. He is currently traveling the world with his wife and three year old son. You can find him as cheerfulstoic on GitHub, Twitter, Google+, or his website.