Skip to content

Benchmarking Datastax Cassandra cluster with YCSB on EC2

Tzach Livyatan edited this page Jul 17, 2014 · 8 revisions

Start a Cassandra cluster

Closely following the Datastax instructions

I manage to run, but not benchmark, a C* cluster. Here is how I did it, what didn't work, an important notes on Cloud-init and a existential question.

  1. Launch AMI ami-ada2b6c4(HVM) or ami-f9a2b690 (PV) For large deployment, use PV (see below)
  2. Use cassandra-cluster security group
  3. Choose one of the recommended instance:
  • Development and light production: m3.large
  • Moderate production: m3.xlarge
  • SSD production with light data: c3.2xlarge
  • Largest heavy production: m3.2xlarge (PV) or i2.2xlarge (HVM)

4.Choose number of VM to run, and set user data (3 in the following example) VM info

User data

--clustername name test-cluster
--totalnodes 3
--version community
--opscenter no
  1. Add Storage
  2. Launch

You now have a running Cassandra cluster

The nodes find each other automatically using a centralize Datastax service.

To validate cluster status, ssh into one of the cluster members and run

nodetool status

This will present a list of the cluster members:

Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.233.78.197   40.97 KB   256     71.1%             edae6d05-4f12-4d57-b518-9b83d1bee493  1c
UN  10.79.142.68    40.94 KB   256     64.0%             aa30ea76-5ec7-41c4-b457-3e88bd6904d9  1c
UN  10.146.227.234  40.95 KB   256     65.0%             ec0fccbb-63b7-42ce-9f8b-17ea321b830c  1c

test machine

  1. Create a EC2 Linux instance
  2. follow the instructions to test the cluster. Make sure to use
strategy_options = {replication_factor:3};

to take advantage of the cluster replication replication factor of 3 is used by Google C* benchmark

Clone this wiki locally