The ParquetFileReceiver writes Domain Attribute values in a Parquet file format. Parquet stores data in a flat columnar format and is more efficient in terms of storage and performance.
The following parameters can be defined for the ParquetFileReceiver. Items with an asterisk (*) are required:
- *path - Defines the location to store the newly generated Parquet output file.
- subDir - Defines the sub-directory under the path to store the newly generated Parquet output file.
- *fileName - Defines the name of the Parquet output file.
- *blockSize - The block size is the size of a row group being buffered in memory. This limits memory usage while writing into the file. Larger values will consume more memory when writing. The default size is 134217728 bytes.
- *pageSize - The page size is for compression. A block is composed of pages. The page is the smallest unit that must be read fully to access a single record. If this value is too small, the compression will deteriorate. The default size is 1048576 bytes.
- *compressionCodecName - Defines the compression algorithm used to compress pages. The compression algorithms that GenRocket supports are UNCOMPRESSED, SNAPPY, and GZIP.
- enableValidation - Specifies whether schema validation should be turned on. The default value is "false."
- *enableDictionary - Defines the Boolean value to enable/disable dictionary encoding. It should be either true or false.
The Receiver defines three property keys that can be modified on any of its associated Domain Attributes:
- columnName - Defines the column name as it will be output into the Parquet file.
- include - Determines if the Attribute will be included as a column in the output.
- columnType - Defines the column data type.
Example of Setting Receiver Property Key Values
The example image below shows the property key view for the set of Attributes of a Domain using the ParquetFileReceiver.
File Config Tab
The File Config Tab is used to configure what event will trigger file creation and the naming configuration for generated files.
Constant has been chosen as the event in the example below. A file will be created for every one-hundred records. Each created file will have the defined naming convention (e.g., Address1.parquet, Address2.parquet, Address3.parquet). The number will increment for each generated file.
Note: For more information on how to use the File Config Tab, click here.
Directory Config Tab
The Directory Config Tabs are used to configure what event will trigger directory creation and the naming configuration for generated directories.
Constant has been chosen as the event in the example below. A directory will be created for every ten files that are generated. Each created directory will have the defined naming convention (e.g., AddressFiles1, AddressFiles2, AddressFiles3). The number will increment for each generated directory.
Note: For more information on how to use the Directory Config Tab, click here.