Add HammerDB 4.3 support and general improvements #5

JelteF · 2021-12-14T08:49:48Z

See commit messages for details. They are included below for completeness

Add more comprehensive editorconfig support

Other files should also have formatting. Also lets trim trailing newlines.

Improve readability of download-hammerdb.sh

Put newlines in shell script after then and else. Also explicitly
select the first element of a bash array (instead of doing so implicitly).

Allow easily rerunning of download-hammerdb.sh

If the download of the HammerDB release failed for some reason you'd have
to remove the tarball to try downloading it again. Otherwise it would not
redownload and only check the known incorrect hash of the file that's
already there.

This downloads to a temp file and only renames it to the final tarball name
once the hash check completes successfully.

Support HammerDB 4.3

The latest HammerDB release has various improvements for Postgres. The main
ones that impact performance are: 1. Changing NUMERIC to INT and BIGINT 2.
Support postgres hash partitioning

Because of number 2 by default HammerDB now enables Postgres hash
partitioning. This does not make sense if we then later distribute these
postgres partitions using Citus also by hash. So we explicitly turn this
behaviour off in our build.tcl file.

Use the same argument patterns for run.sh and build-and-run.sh

Calling run.sh manually required explicitly specifying the HammerDB version
and all 5 of the PGXXX environment variables. This changes
run.sh to use the same defaults as build-and-run.sh.

fix shellcheck issues

shellcheck found various issues with our shell scripts. Mostly unquoted
variable expansions of variables that could have spaces in them. This fixes
all the issues it reported.

Add build.sh

This adds a separate build.sh file in case you want to build the dataset,
but don't actually want to run the benchmark after (e.g. to prepare a
shared dataset that's pushed to blob storage).

Enable pg_timeprofile

Getting request latency information is pretty useful. So let's turn it on
by default.

Add -o pipefail to bash scripts

This could cause the script to exit successfully even if running
hammerdbcli failed, because hammerdbcli output was piped into tee.

Add a way to use HammerDB with a few custom citus changes

I created a couple of PRs to upstream HammerDB. It will take some time
before these PRs are merged and then released in HammerDB 4.4. In the
mean time I've created a branch on our fork of the HammerDB repo, that
has all these improvements.

The most important improvements are:

Official support for Citus, that can be turned on and off with a
single setting.
Change the way the dataset is built to use COPY instead of multi-row
inserts. This speeds up building the dataset significant.

Remove replication_model change, it's not needed anymore

This GUC was removed in Citus 10.1

Use getopts for nicer argument interface

The shell scripts to build and run HammerDB did not have very user
friendly arguments. This changes the arguments they take, make usage of
flags using getopts and allow you to run the scripts without any
arguments at all.

Only support Citus on the custom HammerDB branch

Now that we support our custom HammerDB 4.3 branch with built in Citus
support, we decided to remove our patches to add Citus support for any
of the official versions. If you want to run HammerDB against Citus
you'll simply have to use the 4.3-custom version.

Add back doing a checkpoint before a run

The README suggested that we were doing a checkpoint before starting the
run, but we didn't. This starts doing that again after we stopped doing
that in this commit: 9d37d4b

Add a --shard-count argument to build-and-run.sh

The README suggests using a shard count that is divisible by the number
of nodes in your cluster. This is good advice, but we did not allow
changing the shardcount that we set by default.

Merge generate-hammerdb.sh and download-hammerdb.sh

Now that we're not patching HammerDB anymore it doesn't make much sense
to have a script called generate-hammerdb.sh. So this merges all code
into download-hammerdb.sh.

Show NOPM in stdout at the end of the benchmark run

To find the NOPM after a run you would have to either scroll up to see
it in the output or you'd have to manually look at the contents of the
hammerdb_nopm_xxx.log file. This changes our script so that the NOPM
is shown as the final line of output of the benchmark run.

Other files should also have formatting. Also lets trim trailing newlines.

Put newlines in shell script after `then` and `else`. Also explicitly select the first element of a bash array (instead of doing so implicitly).

If the download of the HammerDB release failed for some reason you'd have to remove the tarball to try downloading it again. Otherwise it would not redownload and only check the known incorrect hash of the file that's already there. This downloads to a temp file and only renames it to the final tarball name once the hash check completes successfully.

The latest HammerDB release has various improvements for Postgres. The main ones that impact performance are: 1. Changing NUMERIC to INT and BIGINT 2. Support postgres hash partitioning Because of number 2 by default HammerDB now enables Postgres hash partitioning. This does not make sense if we then later distribute these postgres partitions using Citus also by hash. So we explicitly turn this behaviour off in our `build.tcl` file.

Calling run.sh manually required explicitly specifying the HammerDB version and all 5 of the PGXXX environment variables. This changes `run.sh` to use the same defaults as `build-and-run.sh`.

marcocitus

Should we rename the repo to something like hammerdb-citus? (might also help wikth search engines)

run.sh

build-and-run.sh

build.sh

HammerDB-patch/4.3.patch

parse-arguments.sh

build.tcl

`shellcheck` found various issues with our shell scripts. Mostly unquoted variable expansions of variables that could have spaces in them. This fixes all the issues it reported.

This adds a separate build.sh file in case you want to build the dataset, but don't actually want to run the benchmark after (e.g. to prepare a shared dataset that's pushed to blob storage).

Getting request latency information is pretty useful. So let's turn it on by default.

This could cause the script to exit successfully even if running hammerdbcli failed, because hammerdbcli output was piped into `tee`.

I created a couple of PRs to upstream HammerDB. It will take some time before these PRs are merged and then released in HammerDB 4.4. In the mean time I've created a branch on our fork of the HammerDB repo, that has all these improvements. The most important improvements are: 1. Official support for Citus, that can be turned on and off with a single setting. 2. Change the way the dataset is built to use COPY instead of multi-row inserts. This speeds up building the dataset significant.

This GUC was removed in Citus 10.1

generate-hammerdb.sh

parse-arguments.sh

README.md

marcocitus · 2021-12-20T13:44:58Z

README.md

-* Make sure that max_connections is high enough based on #vuuser. max_connections should be at least 150 more than #vuuser.
-* Make sure that you do a checkpoint before starting the test, the `build-and-run.sh` already does this. Otherwise the timing of checkpoint can affect the results for short tests.
+[hammerdb]: https://github.com/TPC-Council/HammerDB
+[ch]: https://db.in.tum.de/research/projects/CHbenCHmark/


Interesting, did not know this Markdown syntax

yeah it's the footnote style syntax, it's especially nice when URLs are long. That way they don't break the flow of the text so much.

marcocitus · 2021-12-20T13:47:44Z

parse-arguments.sh

+                        the same time.
+  --ch-queries-only     Run only CH-benCHmark queries (so don't run HammerDB
+                        TPROC-C queries
+  --no-citus            Build the dataset and run the benchmark for standard


minor: above 3 arguments all seem to follow different semantics / grammar

wondering whether we should have something like:

--with-citus / --without-citus (default with)
--with-ch / --without-ch (default without)
--with-tproc-c / --without-tproc-c (default with)

or
--with-citus=true (default true)
--with-ch=false (default false)
--with-tproc-c=true (default true)

I agree that the flags are not consistent with eachother. But I'm leaving it like this for three reasons:

It would require more parsing code to support the other options

Being able to explicitly select defaults seems a bit unnecessary

Right now you can select only ch queries with a single flag, instead of two.

--without-ch --without-tproc-c results in a useless run

parse-arguments.sh

marcocitus

One nice improvement could be to make Ctrl+C work somehow, not sure how hard that is.

Other than that and above comments this looks good. Did some testing on Azure.

The shell scripts to build and run HammerDB did not have very user friendly arguments. This changes the arguments they take, make usage of flags using `getopts` and allow you to run the scripts without any arguments at all.

Now that we support our custom HammerDB 4.3 branch with built in Citus support, we decided to remove our patches to add Citus support for any of the official versions. If you want to run HammerDB against Citus you'll simply have to use the 4.3-custom version.

The README suggested that we were doing a checkpoint before starting the run, but we didn't. This starts doing that again after we stopped doing that in this commit: 9d37d4b

The README suggests using a shard count that is divisible by the number of nodes in your cluster. This is good advice, but we did not allow changing the shardcount that we set by default.

After the extensive changes made in the previous commits, the README is now updated to reflect this.

Now that we're not patching HammerDB anymore it doesn't make much sense to have a script called `generate-hammerdb.sh`. So this merges all code into `download-hammerdb.sh`.

To find the NOPM after a run you would have to either scroll up to see it in the output or you'd have to manually look at the contents of the `hammerdb_nopm_xxx.log` file. This changes our script so that the NOPM is shown as the final line of output of the benchmark run.

JelteF · 2021-12-22T17:04:52Z

One nice improvement could be to make Ctrl+C work somehow, not sure how hard that is.

I created a PR to fix this upstream TPC-Council/HammerDB#303, and meanwhile I added the commit to our 4.3-custom branch.

JelteF added 5 commits December 10, 2021 14:00

Add more comprehensive editorconfig support

2e96b3b

Other files should also have formatting. Also lets trim trailing newlines.

Improve readability of download-hammerdb.sh

4b3a24c

Put newlines in shell script after `then` and `else`. Also explicitly select the first element of a bash array (instead of doing so implicitly).

Use the same argument patterns for run.sh and build-and-run.sh

77645c5

Calling run.sh manually required explicitly specifying the HammerDB version and all 5 of the PGXXX environment variables. This changes `run.sh` to use the same defaults as `build-and-run.sh`.

JelteF requested a review from marcocitus December 14, 2021 08:50

JelteF mentioned this pull request Dec 14, 2021

Add more comprehensive editorconfig support #4

Closed

JelteF force-pushed the improvements-jelte branch from ff809eb to b26d444 Compare December 14, 2021 08:54

marcocitus reviewed Dec 14, 2021

View reviewed changes

run.sh Outdated Show resolved Hide resolved

build-and-run.sh Outdated Show resolved Hide resolved

marcocitus reviewed Dec 14, 2021

View reviewed changes

build.sh Outdated Show resolved Hide resolved

marcocitus reviewed Dec 14, 2021

View reviewed changes

HammerDB-patch/4.3.patch Outdated Show resolved Hide resolved

JelteF force-pushed the improvements-jelte branch 2 times, most recently from d2fb77c to d3c3865 Compare December 17, 2021 10:27

JelteF requested a review from marcocitus December 17, 2021 11:24

marcocitus reviewed Dec 17, 2021

View reviewed changes

parse-arguments.sh Outdated Show resolved Hide resolved

marcocitus reviewed Dec 17, 2021

View reviewed changes

build.tcl Show resolved Hide resolved

JelteF added 6 commits December 17, 2021 15:57

fix shellcheck issues

443270b

`shellcheck` found various issues with our shell scripts. Mostly unquoted variable expansions of variables that could have spaces in them. This fixes all the issues it reported.

Add build.sh

6a8f7d2

This adds a separate build.sh file in case you want to build the dataset, but don't actually want to run the benchmark after (e.g. to prepare a shared dataset that's pushed to blob storage).

Enable pg_timeprofile

0e5c15b

Getting request latency information is pretty useful. So let's turn it on by default.

Add -o pipefail to bash scripts

0cd4d14

This could cause the script to exit successfully even if running hammerdbcli failed, because hammerdbcli output was piped into `tee`.

Remove replication_model change, it's not needed anymore

c884923

This GUC was removed in Citus 10.1

JelteF force-pushed the improvements-jelte branch 3 times, most recently from f657695 to 4bd106e Compare December 20, 2021 10:12

JelteF requested a review from marcocitus December 20, 2021 10:13

JelteF force-pushed the improvements-jelte branch 2 times, most recently from 134843e to 8fbe72b Compare December 20, 2021 13:02