Data Science:-Getting started with Neo4j and Gephi Tool
This blog is about how to use two different tools like Neo4j and Gephi for data visualization in graphical form.
What is Neo4j ?
Neo4j delivers the lightning-fast read and write performance you need, while still protecting your data integrity. It is the only enterprise-strength graph database that combines native graph storage, scalable architecture optimized for speed, and ACID compliance to ensure the predictability of relationship-based queries.
Let Start the demo,
I am using the neo4j browser from its official website. You can check here. Also, you can download it from that website.
For demo purposes, I have use Movies Datasets which are by default available in the neo4j tool. After selecting the dataset I have performed various queries.
- Show movies that are released after the year 2006.
Query:
MATCH (m:Movie) where m.released > 2006 RETURN m
2. Query movies released after 2002 and limit the movie count up to 3 only.
Query:-
MATCH (m:Movie) where m.released > 2002 RETURN m limit 3
3. The below query returns the name of the person, director, and movie name that are released after the year 2007 up to a limit of 5.
Query:-
MATCH (p:Person)-[d:DIRECTED]-(m:Movie) where m.released > 2007 RETURN p,d,m limit 5
4. If we want to know the list of the persons that are available in the database we can use the following which queries the list of people but limits the output up to 10 people only.
Query:-
MATCH (p:Person) RETURN p limit 10
5. If one wants to search whether a movie with a particular name is present or not the following query.
Query:-
MATCH (m:Movie {title: 'A Few Good Men'}) RETURN m
What is the Gephi tool?
Gephi is a tool for data analysts and scientists keen to explore and understand graphs. … The goal is to help data analysts to make hypotheses, intuitively discover patterns, isolate structure singularities or faults during data sourcing.
You can download gephi tool form here.
In this demo, I have chosen simple lesmiserables.gml dataset and performed some basic gephi operations on it. So let’s get started.
- First open the gephi tool and click on the new project. After that choose File->Open and load the dataset of your choice as shown below. And Load the dataset.
In the image you can see there are no Issues and what are nodes and edges will available in that it was displayed.
2. Below is how all the nodes and edges are displayed l after the load of the dataset.
3. Then After clicking on Layout and choose ForceAtlas and click on the run button.
4. Next we can differentiate the nodes based on a various ranking like their In-Degree, Out-Degree, or Degree and show them in different colors. For this in the left pane on the top side choose Nodes->Ranking there choose the ranking like in the below image Degree is chosen.
5. For data table click on windows->data table.
6. Next we generate a Degree Distribution graph for Degree, In-Degree, and Out-Degree and also get the Average Degree value for all the nodes. for that click on the right pane choose the Statistics tab, and there run Average Degree in the Network Overview section.
7. Average degree report you see when you click on the run button in the above image.