com.splout.db.hadoop
Class TupleSampler

java.lang.Object
  extended by com.splout.db.hadoop.TupleSampler
All Implemented Interfaces:
java.io.Serializable

public class TupleSampler
extends java.lang.Object
implements java.io.Serializable

This class samples a list of TableInput files that produce a certain Table Schema. There are two sampling methods supported:

Sampling can be used by TablespaceGenerator for determining a PartitionMap based on the approximated distribution of the keys.

See Also:
Serialized Form

Nested Class Summary
static class TupleSampler.DefaultSamplingOptions
           
static class TupleSampler.SamplingOptions
           
static class TupleSampler.SamplingType
           
static class TupleSampler.TupleSamplerException
           
 
Constructor Summary
TupleSampler(TupleSampler.SamplingType samplingType, TupleSampler.SamplingOptions options)
           
 
Method Summary
 void sample(java.util.List<TableInput> inputFiles, com.datasalt.pangool.io.Schema tableSchema, org.apache.hadoop.conf.Configuration hadoopConf, long sampleSize, org.apache.hadoop.fs.Path outFile)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TupleSampler

public TupleSampler(TupleSampler.SamplingType samplingType,
                    TupleSampler.SamplingOptions options)
Method Detail

sample

public void sample(java.util.List<TableInput> inputFiles,
                   com.datasalt.pangool.io.Schema tableSchema,
                   org.apache.hadoop.conf.Configuration hadoopConf,
                   long sampleSize,
                   org.apache.hadoop.fs.Path outFile)
            throws TupleSampler.TupleSamplerException
Throws:
TupleSampler.TupleSamplerException


Copyright © 2012-2013 Datasalt Systems S.L.. All Rights Reserved.