Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to optimize query performance for a large number of edges #4786

Open
quanhengzhuang opened this issue Oct 26, 2022 · 3 comments
Open

How to optimize query performance for a large number of edges #4786

quanhengzhuang opened this issue Oct 26, 2022 · 3 comments
Labels
type/question Type: question about the product

Comments

@quanhengzhuang
Copy link

quanhengzhuang commented Oct 26, 2022

General Question

One of our business scenarios:
A's following B is also following C, and we need to find out B

  • A has 10 to 1000 followings
  • C has 10 to 10000000 followers

Using FIND ALL PATH ... to query is very slowly, takes few seconds, is there a faster way?

@wey-gu
Copy link
Contributor

wey-gu commented Oct 26, 2022

@bazingame
Copy link

bazingame commented Oct 26, 2022

Below are the details:

Nebula Version: v3.1.0

Deployment:

three servers and each server deployed with one nebula-metad, one nebula-graphd, and one nebula-storaged.

Machine Info:

CPU: 72 Core Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Memory:   192 G DDR4
SSD: NVME 16 T

Space Statistics:

Partition Number:240 Replica Factor : 3

  • vertices: 800 million
    • member Tag : 250 million
  • edges: 9.5 billion
    • follow: 2.6 billion
show hosts
+----------------+------+-----------+----------+--------------+---------------------------+----------------------------+---------+
| Host           | Port | HTTP port | Status   | Leader count | Leader distribution       | Partition distribution     | Version |
+----------------+------+-----------+----------+--------------+---------------------------+----------------------------+---------+
| "10.0.0.1"  | 9779 | 19669     | "ONLINE" | 80           | "base_space:80" | "base_space:240" | "3.1.0" |
| "10.0.0.2" | 9779 | 19669     | "ONLINE" | 80           | "base_space:80" | "base_space:240" | "3.1.0" |
| "10.0.0.3" | 9779 | 19669     | "ONLINE" | 80           | "base_space:80" | "base_space:240" | "3.1.0" |
+----------------+------+-----------+----------+--------------+---------------------------+----------------------------+---------+

---------+------------+------------+
| Type    | Name       | Count      |
+---------+------------+------------+
| "Tag"   | "content"  | 532806319  |
| "Tag"   | "member"   | 261499703  |
| "Edge"  | "follow"   | 2611243656 |
| "Edge"  | "upvote"   | 6837411544 |
| "Space" | "vertices" | 794306022  |
| "Space" | "edges"    | 9448655200 |
+---------+------------+------------+

nGQL and profile result

Case detail: the user m_1 has followed 277 users, and m_2 has about 1 million followers .

MATCH:

Firstly we tried MATCH statement which execution time is nearly 17 seconds.

MATCH (m)-[e:follow]->(n:member) WHERE id(m)=="m_1" MATCH (n)-[f:follow]->(l) WHERE id(l)=="m_2" RETURN id(n);

Explain result :

match_explain

Profile result:

match_profile

In this case, we don't need any properties, so we tried GO and FIND PATH statements:

GO:

GO FROM "m_1" OVER follow YIELD dst(edge) AS member_id INTERSECT GO FROM "m_2" OVER follow REVERSELY YIELD src(edge) AS member_id

Explain result :

go_explain

Profile result:

go_profile

FIND PATH

FIND ALL PATH FROM "m_1" TO "m_2" OVER follow,follow UPTO 2 STEPS YIELD path AS p | YIELD nodes($-.p) AS nodes | YIELD $-.nodes AS nodes, size($-.nodes) AS len | YIELD id($-.nodes[1]) as id WHERE $-.len == 3 

Explain result:

find_all_path_explain

Profile result:

find_all_path_profile

The GO statement spends 3 seconds and the FIND PATH statement spends 6 seconds.
All of the above methods we tried cant qualify our requirements.

After reading the docs about Processing super vertices, we have tried some solutions.

  • Compact: but it seems to don't have any improvement.
  • Truncation: can't meet our scenarios in which we want all of the data.

And solutions at the application end are also not suitable as we can't do any one of the following:

  • Delete multiple edges and merge them into one: there is only one follow type edge between two members.
  • Split an edge into multiple edges of different types: only follow type we need.
  • Split vertices

@bazingame
Copy link

cc @forest-yuxl @MuYiYong @critical27

@forest-yuxl @MuYiYong @critical27 can anyone help us with this problem?

@Sophie-Xie Sophie-Xie added type/question Type: question about the product and removed non-issue labels Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question Type: question about the product
Projects
None yet
Development

No branches or pull requests

5 participants