Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HammerDB 4.3 support and general improvements #5

Merged
merged 18 commits into from
Dec 22, 2021
Merged

Conversation

JelteF
Copy link
Contributor

@JelteF JelteF commented Dec 14, 2021

See commit messages for details. They are included below for completeness

Add more comprehensive editorconfig support

Other files should also have formatting. Also lets trim trailing newlines.

Improve readability of download-hammerdb.sh

Put newlines in shell script after then and else. Also explicitly
select the first element of a bash array (instead of doing so implicitly).

Allow easily rerunning of download-hammerdb.sh

If the download of the HammerDB release failed for some reason you'd have
to remove the tarball to try downloading it again. Otherwise it would not
redownload and only check the known incorrect hash of the file that's
already there.

This downloads to a temp file and only renames it to the final tarball name
once the hash check completes successfully.

Support HammerDB 4.3

The latest HammerDB release has various improvements for Postgres. The main
ones that impact performance are: 1. Changing NUMERIC to INT and BIGINT 2.
Support postgres hash partitioning

Because of number 2 by default HammerDB now enables Postgres hash
partitioning. This does not make sense if we then later distribute these
postgres partitions using Citus also by hash. So we explicitly turn this
behaviour off in our build.tcl file.

Use the same argument patterns for run.sh and build-and-run.sh

Calling run.sh manually required explicitly specifying the HammerDB version
and all 5 of the PGXXX environment variables. This changes
run.sh to use the same defaults as build-and-run.sh.

fix shellcheck issues

shellcheck found various issues with our shell scripts. Mostly unquoted
variable expansions of variables that could have spaces in them. This fixes
all the issues it reported.

Add build.sh

This adds a separate build.sh file in case you want to build the dataset,
but don't actually want to run the benchmark after (e.g. to prepare a
shared dataset that's pushed to blob storage).

Enable pg_timeprofile

Getting request latency information is pretty useful. So let's turn it on
by default.

Add -o pipefail to bash scripts

This could cause the script to exit successfully even if running
hammerdbcli failed, because hammerdbcli output was piped into tee.

Add a way to use HammerDB with a few custom citus changes

I created a couple of PRs to upstream HammerDB. It will take some time
before these PRs are merged and then released in HammerDB 4.4. In the
mean time I've created a branch on our fork of the HammerDB repo, that
has all these improvements.

The most important improvements are:

  1. Official support for Citus, that can be turned on and off with a
    single setting.
  2. Change the way the dataset is built to use COPY instead of multi-row
    inserts. This speeds up building the dataset significant.

Remove replication_model change, it's not needed anymore

This GUC was removed in Citus 10.1

Use getopts for nicer argument interface

The shell scripts to build and run HammerDB did not have very user
friendly arguments. This changes the arguments they take, make usage of
flags using getopts and allow you to run the scripts without any
arguments at all.

Only support Citus on the custom HammerDB branch

Now that we support our custom HammerDB 4.3 branch with built in Citus
support, we decided to remove our patches to add Citus support for any
of the official versions. If you want to run HammerDB against Citus
you'll simply have to use the 4.3-custom version.

Add back doing a checkpoint before a run

The README suggested that we were doing a checkpoint before starting the
run, but we didn't. This starts doing that again after we stopped doing
that in this commit: 9d37d4b

Add a --shard-count argument to build-and-run.sh

The README suggests using a shard count that is divisible by the number
of nodes in your cluster. This is good advice, but we did not allow
changing the shardcount that we set by default.

Merge generate-hammerdb.sh and download-hammerdb.sh

Now that we're not patching HammerDB anymore it doesn't make much sense
to have a script called generate-hammerdb.sh. So this merges all code
into download-hammerdb.sh.

Show NOPM in stdout at the end of the benchmark run

To find the NOPM after a run you would have to either scroll up to see
it in the output or you'd have to manually look at the contents of the
hammerdb_nopm_xxx.log file. This changes our script so that the NOPM
is shown as the final line of output of the benchmark run.

Other files should also have formatting. Also lets trim trailing
newlines.
Put newlines in shell script after `then` and `else`. Also explicitly
select the first element of a bash array (instead of doing so
implicitly).
If the download of the HammerDB release failed for some reason you'd
have to remove the tarball to try downloading it again. Otherwise it
would not redownload and only check the known incorrect hash of the file
that's already there.

This downloads to a temp file and only renames it to the final tarball
name once the hash check completes successfully.
The latest HammerDB release has various improvements for Postgres.
The main ones that impact performance are:
1. Changing NUMERIC to INT and BIGINT
2. Support postgres hash partitioning

Because of number 2 by default HammerDB now enables Postgres hash
partitioning. This does not make sense if we then later distribute these
postgres partitions using Citus also by hash. So we explicitly turn this
behaviour off in our `build.tcl` file.
Calling run.sh manually required explicitly specifying the HammerDB
version and all 5 of the PGXXX environment variables. This changes
`run.sh` to use the same defaults as `build-and-run.sh`.
Copy link
Member

@marcocitus marcocitus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename the repo to something like hammerdb-citus? (might also help wikth search engines)

run.sh Outdated Show resolved Hide resolved
build-and-run.sh Outdated Show resolved Hide resolved
build.sh Outdated Show resolved Hide resolved
@JelteF JelteF force-pushed the improvements-jelte branch 2 times, most recently from d2fb77c to d3c3865 Compare December 17, 2021 10:27
parse-arguments.sh Outdated Show resolved Hide resolved
build.tcl Show resolved Hide resolved
`shellcheck` found various issues with our shell scripts. Mostly
unquoted variable expansions of variables that could have spaces in
them. This fixes all the issues it reported.
This adds a separate build.sh file in case you want to build the
dataset, but don't actually want to run the benchmark after (e.g. to
prepare a shared dataset that's pushed to blob storage).
Getting request latency information is pretty useful. So let's turn it
on by default.
This could cause the script to exit successfully even if running
hammerdbcli failed, because hammerdbcli output was piped into `tee`.
I created a couple of PRs to upstream HammerDB. It will take some time
before these PRs are merged and then released in HammerDB 4.4. In the
mean time I've created a branch on our fork of the HammerDB repo, that
has all these improvements.

The most important improvements are:
1. Official support for Citus, that can be turned on and off with a
   single setting.
2. Change the way the dataset is built to use COPY instead of multi-row
   inserts. This speeds up building the dataset significant.
@JelteF JelteF force-pushed the improvements-jelte branch 3 times, most recently from f657695 to 4bd106e Compare December 20, 2021 10:12
@JelteF JelteF force-pushed the improvements-jelte branch 2 times, most recently from 134843e to 8fbe72b Compare December 20, 2021 13:02
generate-hammerdb.sh Outdated Show resolved Hide resolved
parse-arguments.sh Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
* Make sure that max_connections is high enough based on #vuuser. max_connections should be at least 150 more than #vuuser.
* Make sure that you do a checkpoint before starting the test, the `build-and-run.sh` already does this. Otherwise the timing of checkpoint can affect the results for short tests.
[hammerdb]: https://github.com/TPC-Council/HammerDB
[ch]: https://db.in.tum.de/research/projects/CHbenCHmark/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, did not know this Markdown syntax

Copy link
Contributor Author

@JelteF JelteF Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's the footnote style syntax, it's especially nice when URLs are long. That way they don't break the flow of the text so much.

the same time.
--ch-queries-only Run only CH-benCHmark queries (so don't run HammerDB
TPROC-C queries
--no-citus Build the dataset and run the benchmark for standard
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: above 3 arguments all seem to follow different semantics / grammar

wondering whether we should have something like:

--with-citus / --without-citus (default with)
--with-ch / --without-ch (default without)
--with-tproc-c / --without-tproc-c (default with)

or
--with-citus=true (default true)
--with-ch=false (default false)
--with-tproc-c=true (default true)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the flags are not consistent with eachother. But I'm leaving it like this for three reasons:

  1. It would require more parsing code to support the other options
  2. Being able to explicitly select defaults seems a bit unnecessary
  3. Right now you can select only ch queries with a single flag, instead of two.
  4. --without-ch --without-tproc-c results in a useless run

parse-arguments.sh Outdated Show resolved Hide resolved
Copy link
Member

@marcocitus marcocitus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nice improvement could be to make Ctrl+C work somehow, not sure how hard that is.

Other than that and above comments this looks good. Did some testing on Azure.

The shell scripts to build and run HammerDB did not have very user
friendly arguments. This changes the arguments they take, make usage of
flags using `getopts` and allow you to run the scripts without any
arguments at all.
Now that we support our custom HammerDB 4.3 branch with built in Citus
support, we decided to remove our patches to add Citus support for any
of the official versions. If you want to run HammerDB against Citus
you'll simply have to use the 4.3-custom version.
The README suggested that we were doing a checkpoint before starting the
run, but we didn't. This starts doing that again after we stopped doing
that in this commit: 9d37d4b
The README suggests using a shard count that is divisible by the number
of nodes in your cluster. This is good advice, but we did not allow
changing the shardcount that we set by default.
After the extensive changes made in the previous commits, the README is
now updated to reflect this.
Now that we're not patching HammerDB anymore it doesn't make much sense
to have a script called `generate-hammerdb.sh`. So this merges all code
into `download-hammerdb.sh`.
To find the NOPM after a run you would have to either scroll up to see
it in the output or you'd have to manually look at the contents of the
`hammerdb_nopm_xxx.log` file. This changes our script so that the NOPM
is shown as the final line of output of the benchmark run.
@JelteF
Copy link
Contributor Author

JelteF commented Dec 22, 2021

One nice improvement could be to make Ctrl+C work somehow, not sure how hard that is.

I created a PR to fix this upstream TPC-Council/HammerDB#303, and meanwhile I added the commit to our 4.3-custom branch.

@JelteF JelteF merged commit 21b425d into master Dec 22, 2021
@JelteF JelteF deleted the improvements-jelte branch December 22, 2021 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants