Description

The ParquetFileReceiver writes Domain Attribute values in a Parquet file format. Parquet stores data in a flat columnar format and is more efficient in terms of storage and performance. 


Parameters

The following parameters can be defined for the ParquetFileReceiver. Items with an asterisk (*) are required: 

  • *path - Defines the location to store the newly generated Parquet output file.
  • subDir - Defines the sub-directory under the path to store the newly generated Parquet output file.
  • *fileName - Defines the name of the Parquet output file.
  • *blockSize - The block size is the size of a row group being buffered in memory. This limits memory usage while writing into the file. Larger values will consume more memory when writing. The default size is 134217728 bytes.
  • *pageSize - The page size is for compression. A block is composed of pages. The page is the smallest unit that must be read fully to access a single record. If this value is too small, the compression will deteriorate. The default size is 1048576 bytes. 
  • *compressionCodecName - Defines the compression algorithm used to compress pages. The compression algorithms that GenRocket supports are UNCOMPRESSED, SNAPPY, and GZIP. 
  • *enableValidation - Specifies whether schema validation should be turned on.
  • *enableDictionary - Defines the Boolean value to enable/disable dictionary encoding. It should be either true or false.


 

Property Keys

The Receiver defines three property keys that can be modified on any of its associated Domain Attributes:

  • columnName - Defines the column name as it will be output into the Parquet file.
  • include - Determines if the Attribute will be included as a column in the output.
  • columnType - Defines the column data type.


Example of Setting Receiver Property Key Values

The example image below shows the property key view for the set of Attributes of a Domain using the ParquetFileReceiver.