|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.splout.db.hadoop.TablespaceGenerator
public class TablespaceGenerator
A process that generates the SQL data stores needed for deploying a tablespace in Splout, giving a file set table specification as input.
The input to this process will be:
Tablespace
specification.PartitionMap
. The format of the output is:
outputPath + / + OUT_PARTITION_MAP
for the partition map, outputPath + / + OUT_SAMPLED_INPUT
for
the list of sampled keys and outputPath + / + OUT_STORE
for the folder containing the generated SQL store.
For creating the store we first sample the input dataset with TupleSampler
and then execute a Hadoop job that
distributes the data accordingly. The Hadoop job will use TupleSQLite4JavaOutputFormat
.
Nested Class Summary | |
---|---|
static class |
TablespaceGenerator.TablespaceGeneratorException
|
Field Summary | |
---|---|
static java.lang.String |
OUT_INIT_STATEMENTS
|
static java.lang.String |
OUT_PARTITION_MAP
|
static java.lang.String |
OUT_SAMPLED_INPUT
|
static java.lang.String |
OUT_STORE
|
protected PartitionMap |
partitionMap
|
protected TablespaceSpec |
tablespace
|
Constructor Summary | |
---|---|
TablespaceGenerator(TablespaceSpec tablespace,
org.apache.hadoop.fs.Path outputPath,
java.lang.Class callingClass)
|
Method Summary | |
---|---|
protected com.datasalt.pangool.tuplemr.TupleMRBuilder |
createMRBuilder(int nPartitions,
org.apache.hadoop.conf.Configuration conf)
Create TupleMRBuilder for launching generation Job. |
protected void |
executeViewGeneration(com.datasalt.pangool.tuplemr.TupleMRBuilder builder)
|
void |
generateView(org.apache.hadoop.conf.Configuration conf,
TupleSampler.SamplingType samplingType,
TupleSampler.SamplingOptions samplingOptions)
This is the public method which has to be called when using this class as an API. |
int |
getBatchSize()
|
protected static java.lang.String |
getPartitionByKey(com.datasalt.pangool.io.ITuple tuple,
TableSpec tableSpec,
JavascriptEngine jsEngine)
Returns the partition key either by using partition-by-fields or partition-by-javascript as configured in the Table Spec. |
PartitionMap |
getPartitionMap()
Returns the generated PartitionMap . |
int |
getRecordsToSample()
|
protected void |
prepareOutput(org.apache.hadoop.conf.Configuration conf)
|
protected PartitionMap |
sample(int nPartitions,
org.apache.hadoop.conf.Configuration conf,
TupleSampler.SamplingType samplingType,
TupleSampler.SamplingOptions samplingOptions)
Samples the input, if needed. |
void |
setBatchSize(int batchSize)
|
void |
setRecordsToSample(int recordsToSample)
|
protected void |
writeOutputMetadata(org.apache.hadoop.conf.Configuration conf)
Write the partition map and other metadata to the output folder. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected final transient TablespaceSpec tablespace
protected PartitionMap partitionMap
public static final java.lang.String OUT_SAMPLED_INPUT
public static final java.lang.String OUT_PARTITION_MAP
public static final java.lang.String OUT_INIT_STATEMENTS
public static final java.lang.String OUT_STORE
Constructor Detail |
---|
public TablespaceGenerator(TablespaceSpec tablespace, org.apache.hadoop.fs.Path outputPath, java.lang.Class callingClass)
Method Detail |
---|
public void generateView(org.apache.hadoop.conf.Configuration conf, TupleSampler.SamplingType samplingType, TupleSampler.SamplingOptions samplingOptions) throws java.lang.Exception
java.lang.Exception
protected void prepareOutput(org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
java.io.IOException
protected void writeOutputMetadata(org.apache.hadoop.conf.Configuration conf) throws java.io.IOException, JSONSerDe.JSONSerDeException
java.io.IOException
JSONSerDe.JSONSerDeException
protected static java.lang.String getPartitionByKey(com.datasalt.pangool.io.ITuple tuple, TableSpec tableSpec, JavascriptEngine jsEngine) throws java.lang.Throwable
java.lang.Throwable
protected PartitionMap sample(int nPartitions, org.apache.hadoop.conf.Configuration conf, TupleSampler.SamplingType samplingType, TupleSampler.SamplingOptions samplingOptions) throws TupleSampler.TupleSamplerException, java.io.IOException
TupleSampler.TupleSamplerException
java.io.IOException
protected com.datasalt.pangool.tuplemr.TupleMRBuilder createMRBuilder(int nPartitions, org.apache.hadoop.conf.Configuration conf) throws com.datasalt.pangool.tuplemr.TupleMRException, TupleSQLite4JavaOutputFormat.TupleSQLiteOutputFormatException
com.datasalt.pangool.tuplemr.TupleMRException
TupleSQLite4JavaOutputFormat.TupleSQLiteOutputFormatException
protected void executeViewGeneration(com.datasalt.pangool.tuplemr.TupleMRBuilder builder) throws java.io.IOException, java.lang.InterruptedException, java.lang.ClassNotFoundException, TablespaceGenerator.TablespaceGeneratorException, com.datasalt.pangool.tuplemr.TupleMRException
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
TablespaceGenerator.TablespaceGeneratorException
com.datasalt.pangool.tuplemr.TupleMRException
public PartitionMap getPartitionMap()
PartitionMap
. It is also written to the HDFS. This is mainly used for testing.
public int getRecordsToSample()
public void setRecordsToSample(int recordsToSample)
public int getBatchSize()
public void setBatchSize(int batchSize)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |