| 
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.splout.db.hadoop.TablespaceGenerator
public class TablespaceGenerator
A process that generates the SQL data stores needed for deploying a tablespace in Splout, giving a file set table specification as input.
The input to this process will be:
Tablespace specification.PartitionMap . The format of the output is:
 outputPath + / + OUT_PARTITION_MAP for the partition map, outputPath + / + OUT_SAMPLED_INPUT for
 the list of sampled keys and outputPath + / + OUT_STORE for the folder containing the generated SQL store.
 
 For creating the store we first sample the input dataset with TupleSampler and then execute a Hadoop job that
 distributes the data accordingly. The Hadoop job will use TupleSQLite4JavaOutputFormat.
| Nested Class Summary | |
|---|---|
static class | 
TablespaceGenerator.TablespaceGeneratorException
 | 
| Field Summary | |
|---|---|
static java.lang.String | 
OUT_INIT_STATEMENTS
 | 
static java.lang.String | 
OUT_PARTITION_MAP
 | 
static java.lang.String | 
OUT_SAMPLED_INPUT
 | 
static java.lang.String | 
OUT_STORE
 | 
protected  PartitionMap | 
partitionMap
 | 
protected  TablespaceSpec | 
tablespace
 | 
| Constructor Summary | |
|---|---|
TablespaceGenerator(TablespaceSpec tablespace,
                    org.apache.hadoop.fs.Path outputPath,
                    java.lang.Class callingClass)
 | 
|
| Method Summary | |
|---|---|
protected  com.datasalt.pangool.tuplemr.TupleMRBuilder | 
createMRBuilder(int nPartitions,
                org.apache.hadoop.conf.Configuration conf)
Create TupleMRBuilder for launching generation Job.  | 
protected  void | 
executeViewGeneration(com.datasalt.pangool.tuplemr.TupleMRBuilder builder)
 | 
 void | 
generateView(org.apache.hadoop.conf.Configuration conf,
             TupleSampler.SamplingType samplingType,
             TupleSampler.SamplingOptions samplingOptions)
This is the public method which has to be called when using this class as an API.  | 
 int | 
getBatchSize()
 | 
protected static java.lang.String | 
getPartitionByKey(com.datasalt.pangool.io.ITuple tuple,
                  TableSpec tableSpec,
                  JavascriptEngine jsEngine)
Returns the partition key either by using partition-by-fields or partition-by-javascript as configured in the Table Spec.  | 
 PartitionMap | 
getPartitionMap()
Returns the generated PartitionMap. | 
 int | 
getRecordsToSample()
 | 
protected  void | 
prepareOutput(org.apache.hadoop.conf.Configuration conf)
 | 
protected  PartitionMap | 
sample(int nPartitions,
       org.apache.hadoop.conf.Configuration conf,
       TupleSampler.SamplingType samplingType,
       TupleSampler.SamplingOptions samplingOptions)
Samples the input, if needed.  | 
 void | 
setBatchSize(int batchSize)
 | 
 void | 
setRecordsToSample(int recordsToSample)
 | 
protected  void | 
writeOutputMetadata(org.apache.hadoop.conf.Configuration conf)
Write the partition map and other metadata to the output folder.  | 
| Methods inherited from class java.lang.Object | 
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
| Field Detail | 
|---|
protected final transient TablespaceSpec tablespace
protected PartitionMap partitionMap
public static final java.lang.String OUT_SAMPLED_INPUT
public static final java.lang.String OUT_PARTITION_MAP
public static final java.lang.String OUT_INIT_STATEMENTS
public static final java.lang.String OUT_STORE
| Constructor Detail | 
|---|
public TablespaceGenerator(TablespaceSpec tablespace,
                           org.apache.hadoop.fs.Path outputPath,
                           java.lang.Class callingClass)
| Method Detail | 
|---|
public void generateView(org.apache.hadoop.conf.Configuration conf,
                         TupleSampler.SamplingType samplingType,
                         TupleSampler.SamplingOptions samplingOptions)
                  throws java.lang.Exception
java.lang.Exception
protected void prepareOutput(org.apache.hadoop.conf.Configuration conf)
                      throws java.io.IOException
java.io.IOException
protected void writeOutputMetadata(org.apache.hadoop.conf.Configuration conf)
                            throws java.io.IOException,
                                   JSONSerDe.JSONSerDeException
java.io.IOException
JSONSerDe.JSONSerDeException
protected static java.lang.String getPartitionByKey(com.datasalt.pangool.io.ITuple tuple,
                                                    TableSpec tableSpec,
                                                    JavascriptEngine jsEngine)
                                             throws java.lang.Throwable
java.lang.Throwable
protected PartitionMap sample(int nPartitions,
                              org.apache.hadoop.conf.Configuration conf,
                              TupleSampler.SamplingType samplingType,
                              TupleSampler.SamplingOptions samplingOptions)
                       throws TupleSampler.TupleSamplerException,
                              java.io.IOException
TupleSampler.TupleSamplerException
java.io.IOException
protected com.datasalt.pangool.tuplemr.TupleMRBuilder createMRBuilder(int nPartitions,
                                                                      org.apache.hadoop.conf.Configuration conf)
                                                               throws com.datasalt.pangool.tuplemr.TupleMRException,
                                                                      TupleSQLite4JavaOutputFormat.TupleSQLiteOutputFormatException
com.datasalt.pangool.tuplemr.TupleMRException
TupleSQLite4JavaOutputFormat.TupleSQLiteOutputFormatException
protected void executeViewGeneration(com.datasalt.pangool.tuplemr.TupleMRBuilder builder)
                              throws java.io.IOException,
                                     java.lang.InterruptedException,
                                     java.lang.ClassNotFoundException,
                                     TablespaceGenerator.TablespaceGeneratorException,
                                     com.datasalt.pangool.tuplemr.TupleMRException
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
TablespaceGenerator.TablespaceGeneratorException
com.datasalt.pangool.tuplemr.TupleMRExceptionpublic PartitionMap getPartitionMap()
PartitionMap. It is also written to the HDFS. This is mainly used for testing.
public int getRecordsToSample()
public void setRecordsToSample(int recordsToSample)
public int getBatchSize()
public void setBatchSize(int batchSize)
  | 
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||