Splout has been designed keeping performance and efficiency in mind. We are continuously benchmarking it in order to make sure that it delivers the performance we designed it for.

Throughput & scalability benchmark

With this benchmark we wanted to showcase several things:

  1. Splout's SQL serving throughput: That Splout delivers a good performance for arbitrary real-time SQL aggregations.
  2. Splout's scalability: That Splout's performance scales linearly - by doubling the number of machines we should see an increase in a factor of, at least 2 (and up to 4, depending on the dataset size versus the amount of RAM).
  3. Splout's indexing & data generation time: The time needed for generating a Splout database in Hadoop should be reasonable. This includes indexing and data generation as well as deployment time.

This benchmark was done on an Intel cluster of machines that were kindly provided to us by Flytech S.A. (http://www.flytech.es/). The specs:

We loaded 350GB worth of data from the "Wikipedia pagecounts dataset", corresponding to 3 months of pageviews. This dataset is interesting because it is real, it follows a real distribution and it allows us to perform real-time "GROUP BY" queries for arbitrary page names. The pseudo-code of the benchmark can be summarized as follows:

while (!benchmark_ends) {
	Choose a random "rowid" and obtain a pagename for it: 
		SELECT * FROM pagecounts WHERE rowid = ?;
	Perform aggregate queries:
		SELECT SUM(pageviews) AS totalpageviews, date FROM pagecounts WHERE pagename = ? GROUP BY date;
		SELECT SUM(pageviews) AS totalpageviews, hour FROM pagecounts WHERE pagename = ? GROUP BY hour;
}

For a 2-machine cluster, each of them holding approximately 175GB, we obtained a throughput of 868 queries / second, returning an average of 15 rows of result each - rows of about 50 bytes, so around 750 bytes / query in average. The average response times for 80 simultaneous threads were around 80 milliseconds.

When we doubled the cluster to a 4-machine cluster, each of them holding approximately 87GB, the throughput increased by a factor of 4, delivering a peak of 3156 operations per second. Response times for 160 threads were around 46 milliseconds. Because the amount of data each node holds is divided by two, a bigger portion of the dataset can be cached in RAM and therefore average response times are almost divided by 2. But because we also added two more machines, the throughput scaled to 4.

Indexing / Generation time

For generating the data structures needed to be deployed to the Splout cluster, we used a 11-machine Hadoop cluster (1 master, 10 slaves) with the same kind of machines. We first ran an "identity job" over the whole dataset that just outputs raw binary data for each record. This job is useless, but it is relevant to the benchmark as it allows us to compare it to the "Tablespace Builder". The identity job executed in 23 minutes.

On the other hand, Splout's "Tablespace Builder" ran for 44 minutes. So we can say that the overhead of generating a Splout database in a Hadoop cluster is about doubling the time it would take to perform a simple identity job that just writes the data back to the HDFS. We used 80 partitions for parallelizing the process better, as there were exactly 80 reduce slots in the cluster. (Generally speaking, the higher the number of partitions, the quicker the "Tablespace Builder" will complete since it must create B-Trees for each partition).

Deploy time

After data generation, deployment was performed by the benchmark machines, which resided into the same rack and were connected through the same network switch. Each machine could download from HDFS at 82 MB/sec which made the deploy last for 36 minutes in the case of the 2-machine cluster and 18 minutes for the 4-machine cluster.

Considerations

This benchmark was designed to test Splout under a real usage pattern: one quick lookup and two real-time "GROUP BY"'s. Because the data is real and follows a kind of "zipfian" distribution, some pages have a lot of rows whereas other pages have few rows. The maximum number of rows that a page impacted was around 2000. All together, the average 15 rows per query can be explained by these two things. If you want to know more details about this benchmark you can see the source code and the tool that generates the data structures needed to run it.

Key / Value benchmark

We designed a benchmark for testing Splout's efficiency as a key / value store. The idea that Splout is an extension to a key / value must be validated by assuring an expected level of performance as a key / value store. We loaded 470GB key-value pairs worth of data, with 1024 byte values, into a 4-machine cluster with the same characteristics than above. We achieved a peak throughput of 2180 queries / second with average response times of 69 ms for 160 threads.

Considerations

Note how the aggregate throughput of the key / value test is worse than that of the "Wikipedia" test, which may seem counter-intuitive. This is so because of two things: on one side, the dataset is bigger and therefore each machine can cache less data in RAM. On the other side, each query returns more bytes in this test (1024 bytes compared to 750 bytes for the previous test). If you calculate the bytes / second for both, you'll get similar figures.

This key / value benchmark can be reproduced by first generating the data with BenchmarkStoreTool and then running BenchmarkTool.

SQLite v.s. MySQL

Splout uses SQLite, which we have measured to be quite fast with big database files.

In the early development stages we compared SQLite to MySQL in an Amazon m1.large instance using a 20GB database. We performed random GROUP BY queries over a dataset whose group distribution was based on a zipfian. These are the results we obtained for different number of concurrent threads:

Splout SQL v.s. Voldemort

We have also compared Splout to Voldemort in order to demonstrate that Splout can also behave well as a Key/Value store in addition to provide full SQL richness. We did so in a physical machine with the following characteristics: 4G RAM, quad core 1.6 MHz, HD 75 MB/Sg (73 seeks/sg). The database had 60000000 rows with values of 1024 bytes size. We executed 10000 uniformly random queries in three iterations.