What's the difference between Splout SQL and Dremel-like solutions such as BigQuery, Impala or Apache Drill?
Splout SQL is not a "fast analytics" Dremel-like engine. It is more thought to be used for serving datasets under web / mobile high-throughput, many lookups, low-latency applications. Splout SQL is more like a NoSQL database in the sense that it has been thought for answering queries under sub-second latencies. It has been thought for performing queries that impact a very small subset of the data, not queries that analyze the whole dataset at once.
Is it really so fast?
Splout SQL is as fast as SQLite can be. The very good thing about Splout SQL is that, because it is a read-only store and data is replaced entirely every time, it always has the data optimally indexed, there is zero fragmentation and data colocation in disk can be controlled using sorting in the Hadoop indexer process (insertionOrderBy). As an example, we have used data colocation techniques within Splout SQL to obtain < 50ms average query time with 10 threads on dynamic GROUP BY's that hit an average of 2000 records each in a multi-gigabyte database that exceeded available RAM in orders of magnitude in a m1.small EC2 machine.
Can I import data directly from Hive into Splout SQL?
Yes, since release 0.2.2 it is possible to integrate Hive directly with Splout SQL. It is also possible to do the same with Cascading or Pig. Please read the user guide, section Integration with other tools.
I am experiencing slow queries, why?
Splout SQL is optimized for indexing data according to custom needs. You can create arbitrary indexes and colocate data at insertion time for minimizing disk seeks. Please read the Troubleshooting section of the user guide.
Can a DNode use more than one disk for storing data?
Currently a DNode's working directory is fixed to be in a single disk location, however, there is nothing that prevents you from installing two DNode services in the same machine, as long as you configure everything properly so that they bind to different ports, for example.
Can I execute INSERTs / UPDATEs on Splout SQL?
There is nothing that will block or prevent you from executing INSERT / UPDATE statements through Splout SQL's interface. However, it is not the way you are supposed to use it. Splout SQL has been conceived to be a read-only store, and its data can be updated entirely by an atomic deploy mechanism, meaning that the whole dataset is replaced by another version of it. This fits well into batch-processing since usually you will have a new copy of the whole dataset each time you run your batch process, for example in Hadoop. But it doesn't fit well if you want to incrementally update your dataset in real-time.
But really, what happens if I INSERT / UPDATE?
What will happen is that your statements will be executed in one of the replicas of the partition that was hit, so you will have unconsistent data across partitions. It would be only fine if you didn't use replication, but in that case you wouldn't have failover, so it's not something you really want to get into.