Description
You can use Data Column Profiling to automatically identify Personally Identifiable Information (PII) or sensitive data columns for Synthetic Data Replacement (SDR).
For a selected table in G-Migration+, GenRocket will check the column headers to see if they CONTAIN data column names that have been defined by GenRocket or your Organization. Users review the provided matches and select any columns that require SDR.
Benefit: Great for identifying PII when using G-Migration+ with large schemas, where each table may have many columns. Users do not have to manually locate sensitive columns.
In This Article
- Sensitive Data Column Selection Options
- Prerequisite Steps for Data Column Profiling
- How to Initiate Profiling and Select Columns
- How to Select a New Table in the Profiling Dashboard
- Next Steps after Profiling
- Subsetting/Masking Use Case Examples
Sensitive Data Column Selection Options
In G-Migration+, you have two options for selecting sensitive data columns within each table that is part of a G-Migration+ configuration.
- Manual Identification and Selection - Manually select each sensitive column while adding a table to the G-Migration+ configuration, or use the Manage Column option to make selections after adding a table. Click here to learn more.
- Data Column Profiling (covered in this article) - Use the Profile Name option after adding a table to the G-Migration+ configuration to initiate data column profiling and select data columns for SDR.
Prerequisite Steps for Data Column Profiling
The user has already completed the following steps. See the G-Migration+ Overview page to learn more.
- Installed GenRocket Runtime.
- Created JDBC Config files for both the source and target databases.
- Created XTS file for the source database.
- Created and/or selected a Project Version within a Project.
- Accessed G-Migration+.
- Imported the source database table schema.
- Created a G-Migration+ configuration.
- Added one or more tables to the G-Migration+ configuration.
NOTE: Please ensure you have completed the steps above for the selected Project Version before proceeding with this article.
How to Initiate Profiling and Select Columns
This section shows how to use Data Column Profiling. Complete the following steps:
- Select the G-Migration+ configuration.
- Select a table within a configuration.
- Select Profile Columns. Profiling will be done automatically for the currently selected table.
- Use the checkboxes to select each data column that requires SDR.
- Note: Multiple results may be displayed for the same column when there are multiple matches for the checked profile names. If the results are for the same column, both will be selected when one of the matches is selected.
- Example: The actual column name is "username". Data Column Profiling returns two matches for the same column: username and userName.
- Select Save to finish.
- Select Migration Dashboard to return to the G-Migration+ Dashboard.
How to Select a New Table in the Profiling Dashboard
- Use the Table selection menu at the top of the dashboard to select a different table.
- If profiling has not been performed for the selected table, you must select Begin Profiling to get started.
- Otherwise, matches will be displayed automatically if profiling has already occurred, and you can make any needed changes. All other steps are the same.
Next Steps After Profiling
- Add a subsetting condition for a specific table. (Required for Data Subsetting)
- Make any required Test Data Design changes, such as replacing Generators or creating G-Cases.
- Download the required files (e.g., Scenario, G-Migration+ Configuration, G-Cases).
- Place downloaded Scenario files in the path defined for the resource.output.directory.
- Run the command at the command line to migrate the data to the target database. SDR will be performed for selected PII columns during profiling during the migration.
Subsetting/Masking Use Case Examples
Please refer to the following use case examples for a separate, step-by-step guide on data subsetting and masking. Please note that the images have not been updated at this time to include Data Column Profiling.
- Data Subsetting Only Use Case for Databases
- Synthetic Data Masking (SDM) Use Case Example for Databases