Splout SQL's architecture

Splout SQL is a read-only DB. Because of this, the overall architecture is quite simple.

Splout SQL is installed on a cluster of commodity hardware machines. Each of the machines runs a DNode service and optionally a QNode service (but there must be at least one QNode service among the cluster).
QNodes implement a REST API for serving users' queries or receiving deploy / rollback requests.
Because data is partitioned, QNodes talk to the appropriate DNode for serving a query and the DNode responds back with the query’s result.
A Hadoop cluster is used for indexing and balancing the data. The resultant SQL files are fetched by the DNodes by handling a data deploy request.

Data balancing & indexing

Arbitrarily big datasets are handled by Splout by using a Hadoop batch process that partitions and indexes the data according to a users' partitioning policy.

Usually this involves partitioning by one or more columns in a columnar dataset, for example customer_id. But you can also partition by an arbitrary Java or JavaScript function, for instance.
Splout uses sampling methods for deciding how to split the dataset after applying users’ partitioning function.
A simple method (a-la-Hadoop's TeraSort) is the default one, but a more sophisticated one - althought more computationally expensive - (Reservoir Sampling) can be used in datasets where the default one is not enough.
The DNodes fetch new data on a deploy request. The data is downloaded, atomically replaced and versioned. The cluster coordinates itself for assuring version consistency and no downtime while deploying. The user can choose to rollback to previous versions anytime, and the operation will be as quick as a "mv" between folders.

Technology choices

QNodes, DNodes are implemented as Java services serving queries through a Jersey REST interface.
QNodes and DNodes communicate between them using Thrift.
Splout SQL cluster handles membership and coordinates deployments and rollbacks through Hazelcast.
Splout uses Pangool for low-level Java Hadoop development. In this way, the code for data store generation is pretty simple and performant at the same time.
Splout generates SQLite files that are used by DNodes for serving data.