-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HammerDB 4.3 support and general improvements #5
Conversation
Other files should also have formatting. Also lets trim trailing newlines.
Put newlines in shell script after `then` and `else`. Also explicitly select the first element of a bash array (instead of doing so implicitly).
If the download of the HammerDB release failed for some reason you'd have to remove the tarball to try downloading it again. Otherwise it would not redownload and only check the known incorrect hash of the file that's already there. This downloads to a temp file and only renames it to the final tarball name once the hash check completes successfully.
The latest HammerDB release has various improvements for Postgres. The main ones that impact performance are: 1. Changing NUMERIC to INT and BIGINT 2. Support postgres hash partitioning Because of number 2 by default HammerDB now enables Postgres hash partitioning. This does not make sense if we then later distribute these postgres partitions using Citus also by hash. So we explicitly turn this behaviour off in our `build.tcl` file.
Calling run.sh manually required explicitly specifying the HammerDB version and all 5 of the PGXXX environment variables. This changes `run.sh` to use the same defaults as `build-and-run.sh`.
ff809eb
to
b26d444
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename the repo to something like hammerdb-citus? (might also help wikth search engines)
d2fb77c
to
d3c3865
Compare
`shellcheck` found various issues with our shell scripts. Mostly unquoted variable expansions of variables that could have spaces in them. This fixes all the issues it reported.
This adds a separate build.sh file in case you want to build the dataset, but don't actually want to run the benchmark after (e.g. to prepare a shared dataset that's pushed to blob storage).
Getting request latency information is pretty useful. So let's turn it on by default.
This could cause the script to exit successfully even if running hammerdbcli failed, because hammerdbcli output was piped into `tee`.
I created a couple of PRs to upstream HammerDB. It will take some time before these PRs are merged and then released in HammerDB 4.4. In the mean time I've created a branch on our fork of the HammerDB repo, that has all these improvements. The most important improvements are: 1. Official support for Citus, that can be turned on and off with a single setting. 2. Change the way the dataset is built to use COPY instead of multi-row inserts. This speeds up building the dataset significant.
This GUC was removed in Citus 10.1
f657695
to
4bd106e
Compare
134843e
to
8fbe72b
Compare
* Make sure that max_connections is high enough based on #vuuser. max_connections should be at least 150 more than #vuuser. | ||
* Make sure that you do a checkpoint before starting the test, the `build-and-run.sh` already does this. Otherwise the timing of checkpoint can affect the results for short tests. | ||
[hammerdb]: https://github.com/TPC-Council/HammerDB | ||
[ch]: https://db.in.tum.de/research/projects/CHbenCHmark/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, did not know this Markdown syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it's the footnote style syntax, it's especially nice when URLs are long. That way they don't break the flow of the text so much.
the same time. | ||
--ch-queries-only Run only CH-benCHmark queries (so don't run HammerDB | ||
TPROC-C queries | ||
--no-citus Build the dataset and run the benchmark for standard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: above 3 arguments all seem to follow different semantics / grammar
wondering whether we should have something like:
--with-citus / --without-citus (default with)
--with-ch / --without-ch (default without)
--with-tproc-c / --without-tproc-c (default with)
or
--with-citus=true (default true)
--with-ch=false (default false)
--with-tproc-c=true (default true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the flags are not consistent with eachother. But I'm leaving it like this for three reasons:
- It would require more parsing code to support the other options
- Being able to explicitly select defaults seems a bit unnecessary
- Right now you can select only ch queries with a single flag, instead of two.
--without-ch --without-tproc-c
results in a useless run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nice improvement could be to make Ctrl+C work somehow, not sure how hard that is.
Other than that and above comments this looks good. Did some testing on Azure.
The shell scripts to build and run HammerDB did not have very user friendly arguments. This changes the arguments they take, make usage of flags using `getopts` and allow you to run the scripts without any arguments at all.
Now that we support our custom HammerDB 4.3 branch with built in Citus support, we decided to remove our patches to add Citus support for any of the official versions. If you want to run HammerDB against Citus you'll simply have to use the 4.3-custom version.
The README suggested that we were doing a checkpoint before starting the run, but we didn't. This starts doing that again after we stopped doing that in this commit: 9d37d4b
8fbe72b
to
36c8c7d
Compare
The README suggests using a shard count that is divisible by the number of nodes in your cluster. This is good advice, but we did not allow changing the shardcount that we set by default.
After the extensive changes made in the previous commits, the README is now updated to reflect this.
Now that we're not patching HammerDB anymore it doesn't make much sense to have a script called `generate-hammerdb.sh`. So this merges all code into `download-hammerdb.sh`.
To find the NOPM after a run you would have to either scroll up to see it in the output or you'd have to manually look at the contents of the `hammerdb_nopm_xxx.log` file. This changes our script so that the NOPM is shown as the final line of output of the benchmark run.
36c8c7d
to
c1c9b7b
Compare
I created a PR to fix this upstream TPC-Council/HammerDB#303, and meanwhile I added the commit to our |
See commit messages for details. They are included below for completeness
Add more comprehensive editorconfig support
Other files should also have formatting. Also lets trim trailing newlines.
Improve readability of download-hammerdb.sh
Put newlines in shell script after
then
andelse
. Also explicitlyselect the first element of a bash array (instead of doing so implicitly).
Allow easily rerunning of download-hammerdb.sh
If the download of the HammerDB release failed for some reason you'd have
to remove the tarball to try downloading it again. Otherwise it would not
redownload and only check the known incorrect hash of the file that's
already there.
This downloads to a temp file and only renames it to the final tarball name
once the hash check completes successfully.
Support HammerDB 4.3
The latest HammerDB release has various improvements for Postgres. The main
ones that impact performance are: 1. Changing NUMERIC to INT and BIGINT 2.
Support postgres hash partitioning
Because of number 2 by default HammerDB now enables Postgres hash
partitioning. This does not make sense if we then later distribute these
postgres partitions using Citus also by hash. So we explicitly turn this
behaviour off in our
build.tcl
file.Use the same argument patterns for run.sh and build-and-run.sh
Calling run.sh manually required explicitly specifying the HammerDB version
and all 5 of the PGXXX environment variables. This changes
run.sh
to use the same defaults asbuild-and-run.sh
.fix shellcheck issues
shellcheck
found various issues with our shell scripts. Mostly unquotedvariable expansions of variables that could have spaces in them. This fixes
all the issues it reported.
Add build.sh
This adds a separate build.sh file in case you want to build the dataset,
but don't actually want to run the benchmark after (e.g. to prepare a
shared dataset that's pushed to blob storage).
Enable pg_timeprofile
Getting request latency information is pretty useful. So let's turn it on
by default.
Add -o pipefail to bash scripts
This could cause the script to exit successfully even if running
hammerdbcli failed, because hammerdbcli output was piped into
tee
.Add a way to use HammerDB with a few custom citus changes
I created a couple of PRs to upstream HammerDB. It will take some time
before these PRs are merged and then released in HammerDB 4.4. In the
mean time I've created a branch on our fork of the HammerDB repo, that
has all these improvements.
The most important improvements are:
single setting.
inserts. This speeds up building the dataset significant.
Remove replication_model change, it's not needed anymore
This GUC was removed in Citus 10.1
Use getopts for nicer argument interface
The shell scripts to build and run HammerDB did not have very user
friendly arguments. This changes the arguments they take, make usage of
flags using
getopts
and allow you to run the scripts without anyarguments at all.
Only support Citus on the custom HammerDB branch
Now that we support our custom HammerDB 4.3 branch with built in Citus
support, we decided to remove our patches to add Citus support for any
of the official versions. If you want to run HammerDB against Citus
you'll simply have to use the 4.3-custom version.
Add back doing a checkpoint before a run
The README suggested that we were doing a checkpoint before starting the
run, but we didn't. This starts doing that again after we stopped doing
that in this commit: 9d37d4b
Add a --shard-count argument to build-and-run.sh
The README suggests using a shard count that is divisible by the number
of nodes in your cluster. This is good advice, but we did not allow
changing the shardcount that we set by default.
Merge generate-hammerdb.sh and download-hammerdb.sh
Now that we're not patching HammerDB anymore it doesn't make much sense
to have a script called
generate-hammerdb.sh
. So this merges all codeinto
download-hammerdb.sh
.Show NOPM in stdout at the end of the benchmark run
To find the NOPM after a run you would have to either scroll up to see
it in the output or you'd have to manually look at the contents of the
hammerdb_nopm_xxx.log
file. This changes our script so that the NOPMis shown as the final line of output of the benchmark run.