Description

The ParquetSegmentMergeReceiver merges different segments created by the SegmentDataCreatorReceiver to generate complex nested Parquet output. 


This article will show how to generate a complex, nested Parquet file within the GenRocket web platform. Complete the steps shown in this article to generate a nested Parquet file: 


Step 1 - Create a Project in the GenRocket web platform

For this example, a Project titled "ParquetDemo" has been created with the default Project Version.


Click on New Project within the Projects pane to create a new project. For step-by-step instructions, please click here.



Step 2 - Import the JSON File

Parquet has a JSON-like data model and can, therefore, be present as JSON. Each JSON object can be represented as a GenRocket Domain. Each GenRocket Domain will generate a segment of data for that Domain only. All the segments are then merged into a single Parquet file using the ParquetSegmentMergeReceiver with the help of a Merge Domain.


To set up the Project, you can import a JSON file with the required segments for generating the Nested Parquet File Format. Complete the following steps to import the JSON file: 

  • Click on the New Domain menu within the Project Dashboard.



  • Select the Import from JSON option.



  • Click on the Choose File button. Browse to the location of the file and select the file.



  • Choose Parquet for the Output File Format and click the Save button.



This will create all the Domains needed to generate the Nested Parquet File, including the Merge Domain required to merge the individual segments. 


A Scenario will be automatically created for each Domain in the Project and a Scenario Chain will be created to run all Scenarios in sequence to generate the required data. 



A Configuration File will also be created during the import process, which the Merge Domain requires. 



Note: It may take a few minutes for the Domains, Scenarios, Scenario Chain, and Configuration File to be automatically created. 


Step 3 - Download the ParquetConfig.xml File

Select the Configuration Management tab within the Project Dashboard and then click on the Download (Cloud) icon to download the ParquetConfig.xml file to your computer. 



The ParquetConfig.xml file must be placed in a Config folder within your resource output directory. 



Step 4 - Download the Scenario Chain

Click on the Download (Cloud) icon within the Scenario Chains pane to download the Scenario Chain to your local computer. 


Step 5 - Run the Scenario Chain

Note: Ensure GenRocket Runtime and these Jars have been updated before running the Scenario: Engine Jar and Receiver Jar. For more information on how to update individual GenRocket Jars, click here.


The following command line must be run in a Command Window or Terminal Session. 


genrocket -r <ScenarioChainName>.grs


HTM

<ScenarioChainName> should be replaced with the actual name of the Scenario Chain. The command for this example would appear as shown below: 


genrocket -r Parquet1ScenarioChain.grs
HTML

Step 6 - View Generated Files

This example will generate 1 Nested Parquet file titled "PARQUET-1.parquet". 



You can download Apache Spark for Parquet File Validation by clicking here. Click the Download Spark link.



Sample Output