Description
The ParquetPartitionFileMergeReceiver merges Parquet partition data from many Parquet files into one Parquet file. Parquet partitioned data is a column-oriented data storage format (e.g. RCFile, ORC) that does not require nesting during the merge process.
Partition data generated by GenRocket Partition Receivers are expected to follow a common directory structure of: /home/user/outputPath/outputSubDir/serverN/instanceN/dataFileNameN
Parameters
The following parameters can be defined for the ParquetPartitionFileMergeReceiver. Items with an asterisk (*) are required.
- outputPath* - Defines the base path where data files to be merged are stored.
- outputSubDir - Defines an optional subdirectory, under the outputPath, where data files to be merged are stored.
- mergeSubDir - Defines a subdirectory, under the outputPath, where the merged file is to be stored.
- mergeFileName* - Defines the name of the file that will store the merged data.
- dataFileName* - Defines the name of the data files that were generated under the standard partition directory structure:
/home/user
/outputPath/outputSud/
/serverN/instanceN/dataFileNameN
Receiver Attribute Property Keys
There are no property keys necessary for this Receiver.