How to optimize query performance for a large number of edges #4786

quanhengzhuang · 2022-10-26T07:34:20Z

General Question

One of our business scenarios:
A's following B is also following C, and we need to find out B

A has 10 to 1000 followings
C has 10 to 10000000 followers

Using FIND ALL PATH ... to query is very slowly, takes few seconds, is there a faster way?

The text was updated successfully, but these errors were encountered:

wey-gu · 2022-10-26T07:36:12Z

cc @forest-yuxl @MuYiYong @critical27

bazingame · 2022-10-26T09:03:55Z

Below are the details:

Nebula Version: v3.1.0

Deployment:

three servers and each server deployed with one nebula-metad, one nebula-graphd, and one nebula-storaged.

Machine Info:

CPU: 72 Core Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Memory:   192 G DDR4
SSD: NVME 16 T

Space Statistics:

Partition Number:240 Replica Factor : 3

vertices: 800 million
- member Tag : 250 million
edges: 9.5 billion
- follow: 2.6 billion

show hosts
+----------------+------+-----------+----------+--------------+---------------------------+----------------------------+---------+
| Host           | Port | HTTP port | Status   | Leader count | Leader distribution       | Partition distribution     | Version |
+----------------+------+-----------+----------+--------------+---------------------------+----------------------------+---------+
| "10.0.0.1"  | 9779 | 19669     | "ONLINE" | 80           | "base_space:80" | "base_space:240" | "3.1.0" |
| "10.0.0.2" | 9779 | 19669     | "ONLINE" | 80           | "base_space:80" | "base_space:240" | "3.1.0" |
| "10.0.0.3" | 9779 | 19669     | "ONLINE" | 80           | "base_space:80" | "base_space:240" | "3.1.0" |
+----------------+------+-----------+----------+--------------+---------------------------+----------------------------+---------+

---------+------------+------------+
| Type    | Name       | Count      |
+---------+------------+------------+
| "Tag"   | "content"  | 532806319  |
| "Tag"   | "member"   | 261499703  |
| "Edge"  | "follow"   | 2611243656 |
| "Edge"  | "upvote"   | 6837411544 |
| "Space" | "vertices" | 794306022  |
| "Space" | "edges"    | 9448655200 |
+---------+------------+------------+

nGQL and profile result

Case detail: the user m_1 has followed 277 users, and m_2 has about 1 million followers .

MATCH:

Firstly we tried MATCH statement which execution time is nearly 17 seconds.

MATCH (m)-[e:follow]->(n:member) WHERE id(m)=="m_1" MATCH (n)-[f:follow]->(l) WHERE id(l)=="m_2" RETURN id(n);

Explain result :

Profile result:

In this case, we don't need any properties, so we tried GO and FIND PATH statements:

GO:

GO FROM "m_1" OVER follow YIELD dst(edge) AS member_id INTERSECT GO FROM "m_2" OVER follow REVERSELY YIELD src(edge) AS member_id

Explain result :

Profile result:

FIND PATH

FIND ALL PATH FROM "m_1" TO "m_2" OVER follow,follow UPTO 2 STEPS YIELD path AS p | YIELD nodes($-.p) AS nodes | YIELD $-.nodes AS nodes, size($-.nodes) AS len | YIELD id($-.nodes[1]) as id WHERE $-.len == 3

Explain result:

Profile result:

The GO statement spends 3 seconds and the FIND PATH statement spends 6 seconds.
All of the above methods we tried cant qualify our requirements.

After reading the docs about Processing super vertices, we have tried some solutions.

Compact: but it seems to don't have any improvement.
Truncation: can't meet our scenarios in which we want all of the data.

And solutions at the application end are also not suitable as we can't do any one of the following:

Delete multiple edges and merge them into one: there is only one follow type edge between two members.
Split an edge into multiple edges of different types: only follow type we need.
Split vertices

bazingame · 2022-11-02T06:23:02Z

cc @forest-yuxl @MuYiYong @critical27

@forest-yuxl @MuYiYong @critical27 can anyone help us with this problem?

wey-gu mentioned this issue Oct 29, 2022

Weekly Report 2022-10-28 vesoft-inc/nebula-community#140

Closed

shanlai added the non-issue label Nov 11, 2022

Sophie-Xie added type/question Type: question about the product and removed non-issue labels Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to optimize query performance for a large number of edges #4786

How to optimize query performance for a large number of edges #4786

quanhengzhuang commented Oct 26, 2022 •

edited

Loading

wey-gu commented Oct 26, 2022

bazingame commented Oct 26, 2022 •

edited

Loading

bazingame commented Nov 2, 2022

How to optimize query performance for a large number of edges #4786

How to optimize query performance for a large number of edges #4786

Comments

quanhengzhuang commented Oct 26, 2022 • edited Loading

wey-gu commented Oct 26, 2022

bazingame commented Oct 26, 2022 • edited Loading

Nebula Version: v3.1.0

Deployment:

Machine Info:

Space Statistics:

nGQL and profile result

MATCH:

GO:

FIND PATH

bazingame commented Nov 2, 2022

quanhengzhuang commented Oct 26, 2022 •

edited

Loading

bazingame commented Oct 26, 2022 •

edited

Loading