TupleSampler (Splout SQL Hadoop library 0.2.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.splout.db.hadoop
Class TupleSampler

java.lang.Object
  com.splout.db.hadoop.TupleSampler

All Implemented Interfaces:: java.io.Serializable

public class TupleSampler
extends java.lang.Object
implements java.io.Serializable
extends java.lang.Object
implements java.io.Serializable

This class samples a list of TableInput files that produce a certain Table Schema. There are two sampling methods supported:

DEFAULT: Inspired by Hadoop's TeraInputFormat. A Hadoop Job is not needed. Consecutive records are read from each InputSplit.
RESERVOIR: It uses a Map-Only Pangool Job for performing Reservoir Sampling over the dataset.

Sampling can be used by TablespaceGenerator for determining a PartitionMap based on the approximated distribution of the keys.

See Also:: Serialized Form

Nested Class Summary
`static class`	`TupleSampler.DefaultSamplingOptions`
`static class`	`TupleSampler.SamplingOptions`
`static class`	`TupleSampler.SamplingType`
`static class`	`TupleSampler.TupleSamplerException`

Constructor Summary
`TupleSampler(TupleSampler.SamplingType samplingType, TupleSampler.SamplingOptions options)`

Method Summary
`void`	`sample(java.util.List<TableInput> inputFiles, com.datasalt.pangool.io.Schema tableSchema, org.apache.hadoop.conf.Configuration hadoopConf, long sampleSize, org.apache.hadoop.fs.Path outFile)`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

TupleSampler

public TupleSampler(TupleSampler.SamplingType samplingType,
                    TupleSampler.SamplingOptions options)

Method Detail

sample

public void sample(java.util.List<TableInput> inputFiles,
                   com.datasalt.pangool.io.Schema tableSchema,
                   org.apache.hadoop.conf.Configuration hadoopConf,
                   long sampleSize,
                   org.apache.hadoop.fs.Path outFile)
            throws TupleSampler.TupleSamplerException

Throws:: TupleSampler.TupleSamplerException