com.splout.db.hadoop
Class SQLiteOutputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.OutputFormat<K,V>
      extended by org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<com.datasalt.pangool.io.ITuple,org.apache.hadoop.io.NullWritable>
          extended by com.splout.db.hadoop.SQLiteOutputFormat
All Implemented Interfaces:
java.io.Serializable

public class SQLiteOutputFormat
extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<com.datasalt.pangool.io.ITuple,org.apache.hadoop.io.NullWritable>
implements java.io.Serializable

Low-level Pangool OutputFormat that can be used to generate partitioned SQL views in Hadoop. It accepts Tuples that have "sql" strings and "partition" integers. Each partition will generate a different .db file named .db

Furthermore, the OutputFormat accepts a list of initial SQL statements that will be executed for each partition when the database is created (e.g. CREATE TABLE and such). It also accepts finalization statements (e.g. CREATE INDEX).

This OutputFormat can be used as a basis for creating more complex OutputFormats such as TupleSQLite4JavaOutputFormat.

Moreover, using this OutputFormat directly can result in poor-performing Jobs as it can't cache PreparedStatements (it has to create a new Statement for every SQL it receives).

See Also:
Serialized Form

Nested Class Summary
 class SQLiteOutputFormat.SQLRecordWriter
           
 
Field Summary
static org.apache.commons.logging.Log LOG
           
static com.datasalt.pangool.io.Schema SCHEMA
           
 
Constructor Summary
SQLiteOutputFormat(java.lang.String[] initSqlStatements, java.lang.String[] endSqlStatements, int batchSize)
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordWriter<com.datasalt.pangool.io.ITuple,org.apache.hadoop.io.NullWritable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
           
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputPath
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static org.apache.commons.logging.Log LOG

SCHEMA

public static final com.datasalt.pangool.io.Schema SCHEMA
Constructor Detail

SQLiteOutputFormat

public SQLiteOutputFormat(java.lang.String[] initSqlStatements,
                          java.lang.String[] endSqlStatements,
                          int batchSize)
Method Detail

getRecordWriter

public org.apache.hadoop.mapreduce.RecordWriter<com.datasalt.pangool.io.ITuple,org.apache.hadoop.io.NullWritable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                                                           throws java.io.IOException,
                                                                                                                                  java.lang.InterruptedException
Specified by:
getRecordWriter in class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<com.datasalt.pangool.io.ITuple,org.apache.hadoop.io.NullWritable>
Throws:
java.io.IOException
java.lang.InterruptedException


Copyright © 2012-2013 Datasalt Systems S.L.. All Rights Reserved.