Description

The ORCSegmentMergeReceiver merges different segments created by the SegmentDataCreatorReceiver to generate complex nested ORC output.


In This Article


When should this Receiver be used? 

  • Any time you want to generate nested ORC output.


When should this Receiver not be used? 

  • Any time you want to generate another type of nested output.


Are any additional items required to use this Receiver? 

  • SegmentDataCreatorReceiver- Must be assigned to each Domain, except the Merge Domain.
    • Note: The ORCSegmentMergeReceiver should only be assigned to the Merge Domain that merges the generated segments.

  • Configuration File - Used to determine the XML output format. This is typically named "config.xml" but can be named differently. 
    • Note: When named differently, the name must also be changed for the configName parameter within the Receiver.


List of Steps to Generate Nested ORC Format

  1. Set up a Project with Domains, Parent-Child Relationships, and Scenarios.
  2. For each Domain except the Merge Domain, complete the following: 
  3. Create a Merge Domain with just an id Attribute 
    • Note: This may have already been done if Domains were imported.
  4. Add the ORCSegmentMergeReceiver to the Merge Domain and configure it, as discussed in this article. 
  5. Create a Scenario Chain with all Domain Scenarios. Make sure the Merge Scenario is in the last position.
  6. Create a Configuration File as discussed in this article. 
  7. Download the Scenario Chain to your local computer.
  8. Download the Configuration file to your local computer. 
    • Note: The configuration file will need to be placed in the location defined here (see the image below):


  9. Open a Command Prompt or Terminal window and use the genrocket -r <ScenarioChainName.grs> command.


Receiver Parameters

The ORCSegmentMergeReceiver requires that the following parameters be defined. Items with an asterisk* are required.

  • outputPath* - Defines the location to store the newly generated nested ORC file.
  • outputSubDirectory* - Defines the prefix name of subdirectories that are auto-created under the outputPath and then appended with a number (e.g., data1,data2, data3). 
  • configPath* - Defines the location where the configuration file is stored. 
  • configSubDir - A subdirectory under the configPath directory where configuration and template files are stored. 
  • configName* - Defines the name of the configuration file. 
  • includeRootName* -  Defines whether to include the root Domain name or not. 
  • filesPerOutputSubDir - Defines the number of files to be generated per output subdirectory. 
  • segmentPath* - Defines the path to the segment directory where all segment subdirectories can be found. 
  • segmentSubDirectory* - Defines the subdirectory under the segmentPath where segment files can be found. 
  • nullValue* - Defines the value for null. 
  • overrideFileName - This parameter allows you to override the output file name that is given in the configuration file. Also, this gives the ability to modify the output file name with the help of the Engine API. 
  • deleteOutputSubDir - Defines whether to delete the outputSubDir or not before generating a new output file. 
  • outputFormatType - Defines whether the output file format is expanded or collapsed. 
  • blockSize* - The block size is the size of a row group being buffered in memory. This limits memory usage while writing into the file. Larger values will consume more memory when writing. The default size is 134217728 bytes. 
  • recordsPerFile - Defines the number of records in each output file. 
  • deleteSegmentDir - Defines whether to delete the segments directory or not.


Receiver Attributes Property Keys

There are no property keys necessary for this Receiver.


Configuration File

The ORCSegmentMergeReceiver requires a configuration file to help facilitate the formatting of the data output.


Example Configuration File

The example configuration file below defines the following:

  • fileNameSegments- The fileNameSegment tag defines the file naming convention for the ORC file that is being generated. For example:
    • Output-1.orc, Output-2.orc, and so on
  • segments - The segment files from the segment tag will be loaded and used to create the merged output.
  • segmentsHierarchy- Defines the hierarchical structure of the Domains.

Steps to Create a Configuration File

  • Within the Project Dashboard, select the Configuration Management Tab In the Management Pane.
  • Then click on the New Configuration button.


  • Select the ORC configuration type and click Select


  • A form will open to fill. Enter the details and select the Segments/ Domains from the drop-down. This includes the following: 
    • Name - Name used to identify the configuration file within the Project. 
    • Config File Name - Name the Receiver will look for when generating test data. It should be used in the configName parameter for the Receiver. 
    • Output File Name Format - Defines the naming format used for the generated output file(s).
    • Segment Files- Defines the segments that will be used to create nested ORC output. 


  • Click the Save button once finished. 


  • Select a Domain from the drop-down. 


  • Selecting a Domain will display its Attributes.
     
  • Users can select one of three options for how selected Domain objects are shown: 
    • List Always - Domain object always shows in a list. 
    • List Only When Greater Than 1 - Domain object shows as a list only when the loop count is greater than one. 
    • List of Literals- The Domain data displays as a list of literals. Only one Attribute within the Domain can be selected when this option is selected.



  • Select the Attributes to be included in the final output file for the selected Domain. If an Attribute is unchecked, it will not be included in the generated output. 


  • Optionally, users can use the Array and Null checkboxes for individual Attributes within a selected Domain. 


    Note: For Attributes generating NULL values, remember to also enter the appropriate value for the nullValue Receiver parameter. See the next section of the article for more details.

  • For each Attribute, users can select a Data Type. The default selection is "string". 


    Options include string, boolean, double, float, tinyint, smallint, int, long, date, timestamp, and decimal. 



  • If more than one Domain and its Attributes will be added, use the Save & Next button. Click the Save button once the last Domain and Attributes have been added.



  • Arrange the segment in the appropriate hierarchy by clicking and dragging it.
  • Click Done once finished.



  • The configuration will be ready to download. Click the Cloud icon within the Configuration Management Tab
  • You can download this and place it into the directory as given in the resource.


What Should I Do When Domain Attribute(s) Will Be Generating Null Values?

When Domain Attributes generate null values, a change must be made to the ORCSegmentMergeReceiver parameters and for the specific Domain Attribute(s) within the configuration file. 


Step 1 - Enter the Null Value for the nullValue parameter

For the nullValue parameter, enter the null value as it will be generated.



Step 2 - Select the Null Option for Each Domain Attribute in the Configuration File

When adding or editing elements in the Configuration File, select the Domain and then select the Null checkbox for each Attribute that will be generating null values.