Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes #1364

Merged
merged 1 commit into from
May 13, 2022
Merged

fixes #1364

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions docs-2.0/1.introduction/0-0-graph.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

People from tech giants (such as Amazon and Facebook) to small research teams are devoting significant resources to exploring the potential of graph databases to solve data relationships problems. What exactly is a graph database? What can it do? Where does it fit in the database landscape? To answer these questions, we first need to understand graphs.

Graphs are one of the main areas of research in computer science. Graphs can efficiently solve many of the problems that exist today. This topic will start with graphs to explain the advantages of graph databases and their great potential in modern application development, and then describe the differences between distributed graph databases and several other types of databases.
Graphs are one of the main areas of research in computer science. Graphs can efficiently solve many of the problems that exist today. This topic will start with graphs and explain the advantages of graph databases and their great potential in modern application development, and then describe the differences between distributed graph databases and several other types of databases.

## What are graphs?

Expand Down Expand Up @@ -30,7 +30,7 @@ Simply put, graph theory is the study of graphs. Graph theory began in the early

[^171]: Souce of the picture: https://medium.freecodecamp.org/i-dont-understand-graph-theory-1c96572a1401.

To solve this problem, the great mathematician Euler by abstracting the four regions of the city into points and the seven bridges connecting the city into edges connecting the points, proved that the problem was unsolvable. The simplified abstract diagram is as follows [^063].
To solve this problem, the great mathematician Euler proved that the problem was unsolvable by abstracting the four regions of the city into points and the seven bridges connecting the city into edges connecting the points. The simplified abstract diagram is as follows [^063].

![image](https://user-images.githubusercontent.com/42762957/91538126-e578b900-e949-11ea-980c-5704254e8063.png)

Expand All @@ -48,15 +48,13 @@ From a mathematical point of view, graph theory studies the relationships betwee

In real life, there are many examples of property graphs.

For example, Qichacha or BOSS Zhipin use graphs to model business equity relationships. A vertex is usually a natural person or a business, and the edge is the equity relationship between a person and a business. The properties on vertices can be the name, age, ID number, etc. of the natural person. The properties on edges can be the investment amount, investment time, position such as director and supervisor.

![image](https://docs-cdn.nebula-graph.com.cn/books/images/enterprise-relations.png)
For example, Qichacha or BOSS Zhipin use graphs to model business equity relationships. A vertex usually represents a natural person or a business, and the edge represents the equity relationship between a person and a business. The properties on vertices can be the name, age, ID number, etc. of the natural person. The properties on edges can be the investment amount, investment time, position such as director and supervisor.

A vertex can be a listed company and an edge can be a correlation between listed companies. The vertex property can be a stock code, abbreviation, market capitalization, sector, etc. The edge property can be the time-series correlation coefficient of the stock price [^T01].

[^T01]: https://nebula-graph.com.cn/posts/stock-interrelation-analysis-jgrapht-nebula-graph/

The graph relationship can also be similar to the character relationship in a TV series like Game of Thrones [^s-01]. Vertices are the characters. Edges are the interactions between the characters. Vertex properties are the character's names, ages, camps, etc., and edge properties are the number of interactions between two characters.
The graph relationship can also be similar to the character relationship in a TV series like Game of Thrones [^s-01]. Vertices stand for the characters. Edges represent the interactions between the characters. Vertex properties are the character's names, ages, camps, etc., and edge properties are the number of interactions between two characters.

![image](https://docs-cdn.nebula-graph.com.cn/books/images/game-of-thrones-01.png)

Expand Down
5 changes: 3 additions & 2 deletions docs-2.0/1.introduction/0-1-graph-database.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,11 @@ Until recently, graph databases and related graph technologies were ranked in th

[^Gartner2]: https://www.gartner.com/smarterwithgartner/gartner-top-10-data-and-analytics-trends-for-2021/

It can be noted that Gartner's predictions match the DB-Engines ranking well. There is usually a period of rapid bubble development, then a plateau period, followed by a new bubble period due to the emergence of new technologies, and then a plateau period. And so on in a spiral.
It can be noted that Gartner's predictions match the DB-Engines ranking well. There is usually a period of rapid bubble development, then a plateau period, followed by a new bubble period due to the emergence of new technologies, and then a plateau period again.

### Market size of graph databases

According to statistics and forecasts from Verifiedmarketresearc[^ver], fnfresearch[^fnf], MarketsandMarkets[^mam], and Gartner[^gar], the global graph database market size to grow from about USD 0.8 billion in 2019 to USD 3-4 billion by 2026, at a Compound Annual Growth Rate (CAGR) of about 25%, which corresponds to about 5%-10% market share of the global database market.
According to statistics and forecasts from Verifiedmarketresearc[^ver], fnfresearch[^fnf], MarketsandMarkets[^mam], and Gartner[^gar], the global graph database market size is about to grow from about USD 0.8 billion in 2019 to USD 3-4 billion by 2026, at a Compound Annual Growth Rate (CAGR) of about 25%, which corresponds to about 5%-10% market share of the global database market.

![Image](https://www.verifiedmarketresearch.com/wp-content/uploads/2020/10/Graph-Database-Market-Size.jpg)

Expand Down Expand Up @@ -104,6 +104,7 @@ Although this network model greatly improved productivity, its performance has b
In the first public release of Neo4j ( Neo4j 1.4,2011), the data model was consisted of vertices and typed edges. Vertices and edges have properties. The early versions of Neo4j did not have indexes. Applications had to construct their search structure from the root vertex. Because this was very unwieldy for the applications, Neo4j 2.0 (2013.12) introduced a new concept label on vertices. Based on labels, Neo4j can index some predefined vertex properties.

"Vertex", "Relationship", "Property", "Relationships can only have one label.", "Vertices can have zero or multiple labels.". All these concepts form the data model definitions for Neo4j property graphs. With the later addition of indexing, Cypher became the main way of interacting with Neo4j. This is because the application developer only needs to focus on the data itself, not on the search structure that the developer built himself as mentioned above.

#### The creation of Gremlin

Gremlin is a graph query language based on Apache TinkerPop, which is close in style to a sequence of function (procedure) calls. Initially, Neo4j was queried through the Java API. applications could embed the query engine as a library into the application and then use the API to query the graph.
Expand Down
32 changes: 16 additions & 16 deletions docs-2.0/1.introduction/0-2.relates.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Technically speaking, as a semi-structured unit of information, a document in a

#### Graph Store

The last class of NoSQL databases is graph databases. Nebula Graph, is also a graph database. Although graph databases are also NoSQL databases, graph databases are fundamentally different from the above-mentioned NoSQL databases. Graph databases store data in the form of points, edges, and attributes. Its advantages include high flexibility, support for complex graph algorithms, and can be used to build complex relational graphs. We will discuss graph databases in detail in the subsequent topics. But in this topic, you just need to know that a graph database is a NoSQL type of database. Common graph databases include Nebula Graph, Neo4j, OrientDB, etc.
The last class of NoSQL databases is graph databases. Nebula Graph, is also a graph database. Although graph databases are also NoSQL databases, graph databases are fundamentally different from the above-mentioned NoSQL databases. Graph databases store data in the form of vertices, edges, and properties. Its advantages include high flexibility, support for complex graph algorithms, and can be used to build complex relational graphs. We will discuss graph databases in detail in the subsequent topics. But in this topic, you just need to know that a graph database is a NoSQL type of database. Common graph databases include Nebula Graph, Neo4j, OrientDB, etc.

## Graph-related technologies

Expand All @@ -67,12 +67,12 @@ Take a look at a panoramic view of graph technology in 2020 [^lan].

There are many technologies that are related to graphs, which can be broadly classified into these categories:

- Infrastructure: including graph databases, graph computing (processing) engines, graph deep learning, cloud services, etc.
- Infrastructure: Graph databases, graph computing (processing) engines, graph deep learning, cloud services, etc.

- Applications: including visualization, knowledge graph, anti-fraud, cyber security, social network, etc.
- Applications: Visualization, knowledge graph, anti-fraud, cyber security, social network, etc.


- Development tools: including graph query languages, modeling tools, development frameworks, and libraries.
- Development tools: Graph query languages, modeling tools, development frameworks, and libraries.

- E-books [^info] and conferences, etc.

Expand Down Expand Up @@ -110,9 +110,9 @@ A graph system usually includes a complex data pipeline [^biggraph]. From the da

Graph databases and graph processing systems have different origins and specialties (and weaknesses).

- (Online) The graph database is designed for persistent storage management of graphs and efficient subgraph operations. Hard disks and network) are the target operating devices, physical/logical data mapping, data integrity, and (fault) consistency are the main goals. Each request typically involves only a small part of the full graph and can usually be done on a single server. Request latency is usually in milliseconds or seconds, and request concurrency is typically in the thousands or hundreds of thousands. The early Neo4j was one of the origins of the graph database space.
- (Online) The graph database is designed for persistent storage management of graphs and efficient subgraph operations. Hard disks and network are the target operating devices, physical/logical data mapping, data integrity, and (fault) consistency are the main goals. Each request typically involves only a small part of the full graph and can usually be done on a single server. Request latency is usually in milliseconds or seconds, and request concurrency is typically in the thousands or hundreds of thousands. The early Neo4j was one of the origins of the graph database space.

- (Offline) The graph processing system is for high-volume, parallel, iterative, processing, and analysis of the full graph. Memory and network are the target operating devices. Each request involves all graph vertices and requires all servers to be involved in its completion. The latency of a single request is in the range of minutes to hours (days). The request concurrency is in single digits. Google's Pregel [^Pregel] represents the typical origin of graph processing systems. Its point-centric programming abstraction and BSP's operational model constitute a programming paradigm that is a more graph-friendly API abstraction than the previous Hadoop Map-Reduce.
- (Offline) The graph processing system is for high-volume, concurrency, iteration, processing, and analysis of the full graph. Memory and network are the target operating devices. Each request involves all graph vertices and requires all servers to be involved in its completion. The latency of a single request is in the range of minutes to hours (days). The request concurrency is in single digits. Google's Pregel [^Pregel] represents the typical origin of graph processing systems. Its point-centric programming abstraction and BSP's operational model constitute a programming paradigm that is a more graph-friendly API abstraction than the previous Hadoop Map-Reduce.

[^Pregel]: G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the International Conference on Management of data (SIGMOD), pages 135–146, New York, NY, USA, 2010. ACM

Expand All @@ -127,7 +127,7 @@ For large-scale graph data, it is difficult to store it in the memory of a singl

As the volume of data increases, for example, 100 billion data already exceeds the capacity of all commercially available servers on the market.

There is another option is to shard data and place each shard on a different server to increase reliability and performance. For NoSQL systems, such as key-value or document systems, the sharding method is intuitive and natural. Each record and data unit can usually be placed on a different server based on the key or docID.
Another option is to shard data and place each shard on a different server to increase reliability and performance. For NoSQL systems, such as key-value or document systems, the sharding method is intuitive and natural. Each record and data unit can usually be placed on a different server based on the key or docID.

However, the sharding of data structures like graphs is usually less intuitive, because usually, graphs are "fully connected" and each vertex can be connected to any other vertex in usually 6 hops.

Expand All @@ -139,15 +139,15 @@ When distributing the entire graph data across multiple servers, the cross-serve

[^gpml]: https://livebook.manning.com/book/graph-powered-machine-learning/welcome/v-8/

Usually, graphs have a clear power-law distribution. A small number of vertices have much denser neighboring edges than the average vertices. While processing these vertices can usually be within the same server, reducing cross-network access, also means that these servers will be far more stressed than the average.
Usually, graphs have a clear power-law distribution. A small number of vertices have much denser neighboring edges than the average vertices. Though processing these vertices can usually be within the same server which reduces cross-network access, load will be far more heavier than the average.

![](https://docs-cdn.nebula-graph.com.cn/books/images/Power_Law_Distribution.png)

![](https://docs-cdn.nebula-graph.com.cn/books/images/singleserver.png)

The common graph sharding methods are as follows:

- Biased application-level sharding: The application layer senses and controls which shard each vertex and edge should locate on, which can generally be judged based on the type of points and edges. A set of vertices of the same type is placed on one sharding and another set of vertices of the same type is placed on another sharding. Of course, for high reliability, the sharding itself can also be made multiple copies. When used by the application, the desired vertices and edges are fetched from each shard, and then on the off-application side (or some proxy server-side), the fetched data is assembled into the final result. This is typically represented by the Neo4j 4. x Fabric.
- Application-level sharding: The application layer senses and controls which shard each vertex and edge should locate on based on the type of vertices and edges. A set of vertices of the same type is placed on one sharding and another set of vertices of the same type is placed on another sharding. Of course, for high reliability, the sharding itself can also be made multiple replicas. When used by the application, the desired vertices and edges are fetched from each shard, and then on the off-application side (or some proxy server-side), the fetched data is assembled into the final result. This is typically represented by the Neo4j 4. x Fabric.

![](https://docs-cdn.nebula-graph.com.cn/books/images/neo4j4x.png)

Expand All @@ -169,10 +169,10 @@ In the literature [^Ubiquity], a thorough investigation of graphs and challenges

[^Ubiquity]: https://arxiv.org/abs/1709.03188

- Scalability: Loading and upgrading big graphs, performing graph computation and graph traversal, use of triggers and supernodes.
- Visualization: Customizable layouts, rendering and display big images, and display dynamic and updated display.
- Query language and programming API: Language expressiveness, standards compatibility, compatibility with existing systems, design of subqueries, and associative queries across multiple graphs.
- Faster graph algorithms.
- Scalability: Loading and upgrading big graphs, performing graph computation and graph traversal, use of triggers and supernodes
- Visualization: Customizable layouts, rendering and display big images, and display dynamic and updated display
- Query language and programming API: Language expressiveness, standards compatibility, compatibility with existing systems, design of subqueries, and associative queries across multiple graphs
- Faster graph algorithms
- Easy to use (configuration and usage)
- Performance metrics and testing
- General graph technology software (e.g., to handle offline, online, streaming computations.)
Expand All @@ -183,7 +183,7 @@ In the literature [^Ubiquity], a thorough investigation of graphs and challenges

There is a common misconception about graph databases that any data access involving graph structure needs to be stored in a graph database.

When the amount of data is not large, single machine memory is enough to store the data. You can use some single machine open-source tools to store tens of millions of vertices and edges.
When the amount of data is not large, single machine memory is enough to store the data. You can use some single-machine open-source tools to store tens of millions of vertices and edges.

- JGraphT[^JGraphT]: A well-known open-source Java graph theory library, which implements a considerable number of efficient graph algorithms.

Expand Down Expand Up @@ -225,7 +225,7 @@ An SNB dataset simulates the relationship between people and posts of a social n

The standard data size ranges from 0.1 GB (scale factor 0.1) to 1000 GB (sf 1000). Larger data sets of 10 TB and 100 TB can also be generated. The number of vertices and edges is as shown below.

![](https://docs-cdn.nebula-graph.com.cn/books/images/ldbcsf.png)
![data_size](https://docs-cdn.nebula-graph.com.cn/books/images/ldbcsf.png)

## Trends

Expand All @@ -236,7 +236,7 @@ The standard data size ranges from 0.1 GB (scale factor 0.1) to 1000 GB (sf 1000
### The trends in cloud computing place higher demands on scalability.

According to Gartner's projections, cloud services have been growing at a rapid rate and penetration [^cl]. A large number of commercial software is gradually moving from a completely local and private model 10 years ago to a cloud services-based business model.
One of the major advantages of cloud services is that they offer near-infinite scalability. It requires that various cloud infrastructure-based software must have a better ability to scale up and down quickly and elastically.
One of the major advantages of cloud services is that they offer near-infinite scalability. It requires that various cloud infrastructure-based software must have a better ability to scale quickly and elastically.

![](https://docs-cdn.nebula-graph.com.cn/books/images/cloudtrends.png)

Expand Down