Skip to content

Commit

Permalink
Merge branch 'master' into initial-es5
Browse files Browse the repository at this point in the history
* master:
  [core] Fixing squid:S1319 -  Declarations should use Java collection interfaces such as "List" rather than specific implementation classes such as "LinkedList". (manolama - updated bindings added since the PR)
  [core] Use longs instead of ints to support larger key spaces. Changed int to long in Measurements code to support large scale workloads. (manolama - fixed checkstyle errors)
  [core] Export totalHistogram for HdrHistogram measurement
  [core] Add an operation enum to the Workload class. This can eventually be used to replace the strings.
  [core] Add a Fisher-Yates array shuffle to the Utils class.
  [core] Fix an issue where the threadid and threadCount were not passed to the workload client threads. Had to use setters to get around the checkstyle complaint of having too many parameters.
  Upgrading googlebigtable to the latest version. The API used by googlebigtable has had quite a bit of churn.  This is the minimal set of changes required for the upgrade.
  [geode] Update to apache-geode 1.2.0 release
  [core] Update to use newer version of Google Cloud Spanner client and associated required change
  [core] Add a reset() method to the ByteIterator abstract and implementations for each of the children. This lets us re-use byte iterators if we need to access the values again (when applicable).
  [hbase12] Add HBase 1.2+ specific client that relies on the shaded client artifact provided by those versions. (brianfrankcooper#970)
  [distro] Refresh Apache licence text (brianfrankcooper#969)
  [memcached] support binary protocol (brianfrankcooper#965)
  [accumulo] A general "refresh" to the Accumulo binding (brianfrankcooper#947)
  [cloudspanner] Add binding for Google's Cloud Spanner. (brianfrankcooper#939)
  [aerospike] Change the write policy to REPLACE_ONLY (brianfrankcooper#937)
  • Loading branch information
jasontedor committed Aug 7, 2017
2 parents c52c438 + cf5d2ca commit 4c84ffa
Show file tree
Hide file tree
Showing 94 changed files with 2,011 additions and 683 deletions.
363 changes: 201 additions & 162 deletions LICENSE.txt

Large diffs are not rendered by default.

37 changes: 36 additions & 1 deletion accumulo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,42 @@ Git clone YCSB and compile:
cd YCSB
mvn -pl com.yahoo.ycsb:aerospike-binding -am clean package

### 3. Load Data and Run Tests
### 3. Create the Accumulo table

By default, YCSB uses a table with the name "usertable". Users must create this table before loading
data into Accumulo. For maximum Accumulo performance, the Accumulo table must be pre-split. A simple
Ruby script, based on the HBase README, can generate adequate split-point. 10's of Tablets per
TabletServer is a good starting point. Unless otherwise specified, the following commands should run
on any version of Accumulo.

$ echo 'num_splits = 20; puts (1..num_splits).map {|i| "user#{1000+i*(9999-1000)/num_splits}"}' | ruby > /tmp/splits.txt
$ accumulo shell -u <user> -p <password> -e "createtable usertable"
$ accumulo shell -u <user> -p <password> -e "addsplits -t usertable -sf /tmp/splits.txt"
$ accumulo shell -u <user> -p <password> -e "config -t usertable -s table.cache.block.enable=true"

Additionally, there are some other configuration properties which can increase performance. These
can be set on the Accumulo table via the shell after it is created. Setting the table durability
to `flush` relaxes the constraints on data durability during hard power-outages (avoids calls
to fsync). Accumulo defaults table compression to `gzip` which is not particularly fast; `snappy`
is a faster and similarly-efficient option. The mutation queue property controls how many writes
that Accumulo will buffer in memory before performing a flush; this property should be set relative
to the amount of JVM heap the TabletServers are given.

Please note that the `table.durability` and `tserver.total.mutation.queue.max` properties only
exists for >=Accumulo-1.7. There are no concise replacements for these properties in earlier versions.

accumulo> config -s table.durability=flush
accumulo> config -s tserver.total.mutation.queue.max=256M
accumulo> config -t usertable -s table.file.compress.type=snappy

On repeated data loads, the following commands may be helpful to re-set the state of the table quickly.

accumulo> createtable tmp --copy-splits usertable --copy-config usertable
accumulo> deletetable --force usertable
accumulo> renametable tmp usertable
accumulo> compact --wait -t accumulo.metadata

### 4. Load Data and Run Tests

Load the data:

Expand Down
Loading

0 comments on commit 4c84ffa

Please sign in to comment.