Description

The ParquetPartitionFileMergeReceiver merges Parquet partition data from many Parquet files into one Parquet file. Parquet partitioned data is a column-oriented data storage format (e.g. RCFile, ORC) that does not require nesting during the merge process.


Partition data generated by GenRocket Partition Receivers are expected to follow a common directory structure of: /home/user/outputPath/outputSubDir/serverN/instanceN/dataFileNameN


Parameters

The following parameters can be defined for the ParquetPartitionFileMergeReceiver. Items with an asterisk (*) are required. 

  • outputPath* - Defines the base path where data files to be merged are stored. 
  • outputSubDir - Defines an optional subdirectory, under the outputPath, where data files to be merged are stored. 
  • mergeSubDir - Defines a subdirectory, under the outputPath, where the merged file is to be stored. 
  • mergeFileName* - Defines the name of the file that will store the merged data. 
  • dataFileName* - Defines the name of the data files that were generated under the standard partition directory structure:

    /home/user
    /outputPath/outputSud/
    /serverN/instanceN/dataFileNameN



Receiver Attribute Property Keys

There are no property keys necessary for this Receiver.