The ParquetFileReceiver writes Domain Attribute values in a Parquet file format. Parquet stores data in a flat columnar format and is more efficient in terms of storage and performance.
The following parameters can be defined for the ParquetFileReceiver. Items with an asterisk (*) are required:
- *path - Defines the location to store the newly generated Parquet output file.
- subDir - Defines the sub-directory under the path to store the newly generated Parquet output file.
- *fileName - Defines the name of the Parquet output file.
- *blockSize - The block size is the size of a row group being buffered in memory. This limits memory usage while writing into the file. Larger values will consume more memory when writing. The default size is 134217728 bytes.
- *pageSize - The page size is for compression. A block is composed of pages. The page is the smallest unit that must be read fully to access a single record. If this value is too small, the compression will deteriorate. The default size is 1048576 bytes.
- *compressionCodecName - Defines the compression algorithm used to compress pages. The compression algorithms that GenRocket supports are UNCOMPRESSED, SNAPPY, and GZIP.
- *enableValidation - Specifies whether schema validation should be turned on.
- *enableDictionary - Defines the Boolean value to enable/disable dictionary encoding. It should be either true or false.
The Receiver defines three property keys that can be modified on any of its associated Domain Attributes:
- columnName - Defines the column name as it will be output into the Parquet file.
- include - Determines if the Attribute will be included as a column in the output.
- columnType - Defines the column data type.
Example of Setting Receiver Property Key Values
The example image below shows the property key view for the set of Attributes of a Domain using the ParquetFileReceiver.