On top of new features in the latest release of the graph database, Neo4j is being tuned for IBM hardware while Neo's query language Cypher gets its own open-source project.
By eliminating Java virtual machine-based object cache, release 2.3 is designed to offer higher concurrent performance at scale. There is also better Cypher performance through improvements to the query planner, according to Neo.
"Neo4j 2.3 marks the culmination of close to two years of engineering effort to move the entire database cache into a different layer, out of the JVM and off heap from the JVM and onto a low-level in-memory page cache that we designed especially for handling graph workloads," Neo Technology products VP Philip Rathle said.
"Java is extremely good and efficient at many things. What it's not so great at is if inside your Java virtual machine you're attempting to store the database cache. Then what happens with very high workloads is that you end up churning data in the cache. We've had customers with gigabytes of data per second coming into and out of the JVM."
The resulting garbage-collection process causes delays and can involve users with complex Java tuning.
"You really want your database just to perform and just to be robust and reliable. So by moving the cache off the JVM heap, we're seeing inside our own internal tests as well as some customers that have been part of the beta program much better scalability and just general behaviour as the datasets get very big," Rathle said.
The previous 2.2 release offered improvements in concurrent write throughput, with 10 times more being pushed data in.
"This does a similar thing for reads. We've seen in our testing with realistic customer workloads increases of up to seven times on larger machines with multi-threaded workloads," he said.
The open-source Neo4j graph database is used by businesses such as eBay, Wal-Mart and UBS. Graph databases use nodes and the connections between them to describe networks and contexts.
Last year, Forrester Research predicted that just over a quarter of enterprises will be using such databases by 2017 to support next-generation business applications that need connected datasets.
In Neo4j 2.3, the query planner, which uses the pattern described in the Cypher query to calculate the fastest and cheapest way of obtaining the information required, has been improved.
"Even for a simple query there may be literally be hundreds of millions of ways the database could go about getting the data. The best path could differ quite a lot, based on the data and how it's connected and how much of one type of thing you have versus another," Rathle said.
"The query planner that we released earlier this year was our first foray into cost-based planning, which takes account of both the shape of the query and the shape and quantity of data. We've made it a lot better in 2.3. It supports a number of new algorithms for really some important queries, including some that are really common in recommendations."
Other improvements in release 2.3 are string-enhanced graph searches, and database-enforced schema to ensure specified properties always exist for given nodes and relationships. There is also a fully-supported Neo4j data integration library for the Spring Framework.
On the operations side, Neo4j brings official support of Docker containers and for PowerShell, along with a Mac installer and launcher.
Under the partnership with IBM, the high-end Power8 system with large in-memory capacity will be offered with the Neo4j database, particularly aimed fraud detection, large-scale recommendations and Internet-of-Things applications.
"Our engineering teams are working with IBM's hardware engineering teams to optimise both Neo4j and the Power8 platforms to be able to process extremely large graphs much faster and at much higher rates than anything that's been possible. We expect it to push the state of the art by probably 10 times," Rathle said.
The goal of the openCypher open-source project will be the creation of a language specification, a reference implementation, a technology compatibility kit, and reference documentation.
Initial supporters of the project include Oracle, Apache Spark firm Databricks, Tableau, GraphAware, GrapheneDB, Graph Story, GraphGrid, Information Analysis Incorporated, Linkurious, Structr and Tom Sawyer Software.
"The query language is something that we've spent years refining - it's actually our third attempt at a query language. It's really created a strong base of adoption, where most people learning graph databases today learn using the Cypher query language," he said.
No hay comentarios:
Publicar un comentario
Te agradezco tus comentarios. Te esperamos de vuelta.