Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed for big table slower than kettle #617

Open
wonb168 opened this issue Mar 23, 2022 · 4 comments
Open

speed for big table slower than kettle #617

wonb168 opened this issue Mar 23, 2022 · 4 comments
Labels
Can't Reproduce Needs more information to reproduce or locate the problem

Comments

@wonb168
Copy link

wonb168 commented Mar 23, 2022

speed for exporting big table to csv,for example 1,0000,0000 rows+,
slower than kettle,
how to raise speed in petl, thank you ?

@juarezr
Copy link
Member

juarezr commented Mar 23, 2022

Hi, @wonb168,

Did you have any further details?

  • What are the SGBD, Driver?
  • Any transformations?
  • Do you have a reproducible test case?

@juarezr juarezr added the Can't Reproduce Needs more information to reproduce or locate the problem label Mar 23, 2022
@wonb168
Copy link
Author

wonb168 commented Mar 24, 2022

0.17 billion rows table, from sqlserver to csv, (need csv to batch load to gpdb)
use kettle: 37min
use petl: 67min
how to raise speed in petl? thank you

@wonb168
Copy link
Author

wonb168 commented Mar 28, 2022

run on my notebook, win10 os, 16M memory. only 8000row/s
table = petl.fromdb(conn, 'SELECT * FROM ReplenishLZ.dbo.ps_inv_materialsize')

177833017 rows

table.progress(1000000).tocsv('ps_inv_materialsize.csv')

1000000 rows in 120.57s (8294 row/s); batch in 120.57s (8294 row/s)
2000000 rows in 249.68s (8010 row/s); batch in 129.11s (7745 row/s)
3000000 rows in 368.70s (8136 row/s); batch in 119.02s (8401 row/s)
4000000 rows in 492.33s (8124 row/s); batch in 123.63s (8088 row/s)
5000000 rows in 620.53s (8057 row/s); batch in 128.19s (7800 row/s)
6000000 rows in 741.91s (8087 row/s); batch in 121.38s (8238 row/s)
7000000 rows in 857.63s (8161 row/s); batch in 115.72s (8641 row/s)
8000000 rows in 983.10s (8137 row/s); batch in 125.46s (7970 row/s)

@wonb168
Copy link
Author

wonb168 commented Apr 25, 2022

my 700,0000 rows table , use petl to csv need 40 miniutes.
and use connectorx, parrelled by 10, only 15miniutes,
can patch connectorx to petl to speed in next version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Can't Reproduce Needs more information to reproduce or locate the problem
Projects
None yet
Development

No branches or pull requests

2 participants