Description

GenRocket uses Synthetic Data Replacement (SDR) to mask sensitive data values with 100% synthetic data values. You can use in-place masking to mask sensitive data values (in one or more columns) within an SQL database that already contains the data.  


For example, a database has already been copied to the lower environment but still contains sensitive information. Use in-place masking to replace all sensitive data values with synthetic data for one or more columns. This article only provides a basic example with one table.


In-place masking uses the SQLUpdateV2Receiver, which works for any Relational Database that has a JDBC driver available (e.g., PostgreSQL, Oracle, DB2, MSSQL, MySQL, Sybase, etc.).


PERFORMANCE GUIDELINES: 

  • This method is not supported for NoSQL databases.
  • It is designed for masking small volumes of data in a range of 100,000 rows of data.
  • The larger the data set, the longer it will take to complete this process.
  • Additional factors that impact performance include: 
    • Number of Tables
    • Number of References
    • Machine Profile (CPUs, amount of RAM, etc.)
  • To learn more about performance factors, please see this article: How to Optimize Test Data Generation

NOTE: If you want to migrate a subset of data to another database (source to target) and mask sensitive data values while migrating that data, you will need to use G-Migration+.


In This Article


Actors

  • MySQL Database
  • Customer Table
  • SSN and Phone Number Columns
  • QueryToMapGen
  • QueryFromMapGen
  • SQLUpdateV2Receiver


Prerequisites

Database Information

  • Contains a Customer table and five fields (id, first_name, last_name, ssn, and phone_number)
  • The table has 1000 customer records (snippet shown below).


Sample Story

A user wants to replace all sensitive data values for the ssn and phone_number columns within a Customer table. They will use in-place masking to do so.  

  • ssn - Use sequential numbers (001-01-0001, 001-01-0002, 001-001-0003, etc.)
  • phone_number - Use this sequential pattern ((123) 500-1000, (123) 500-1001, (123) 500-1002, etc.)


Step 1 - Create Domain(s) with Attributes

Create a Domain for each table and only include the Attributes for the columns that contain sensitive data. For this example, we have created a Customer Domain with an ssn and a phoneNumber Attribute.



Step 2 - Assign QueryToMapGen Generator to the id Attribute

The QueryToMapGen Generator will query the id in the database to get the actual value for each record. It is the best choice for in-place masking because it is more efficient and provides increased speed. 


You will need to configure the following parameters (at minimum):

  • resourcePath - Resource used to connect to the database via JDBC.
  • resourceSubDir - This is an optional parameter; however, it is required in this example because the JDBC Config file resides in a subdirectory in the defined path. See V2 Receivers for more information
  • resourceName - Name of the properties file for the database connection. 
  • setLoop - Set to "true" to ensure the record count matches the number of records in the query. 
  • query - This is the actual query to obtain each record id. You can also add a Where clause to the query. Please note that the syntax may vary depending on what database you are querying and its version. 


Step 3 - Assign QueryFromMapGen Generator to the id Attribute

Next, assign the QueryFromMapGen Generator to the id Attribute. You can do this by Linking GeneratorsThis Generator maps the id column used in the QueryToMapGen query parameter (Step 2). 



Step 4 - Modify Generators for Other Attributes

For this example, changes will be made only to the assigned phoneNumber Generator. 

  • ssn Attribute - No changes are required. The default configuration is set up to generate sequential social security numbers.



  • phoneNumber Attribute - Removed the default assigned Generator and assigned the PhoneNumberGen for sequential phone numbers.


Step 5 - Assign the SQLUpdateV2Receiver to the Domain

Assign the SQLUpdateV2Reciever to the Domain and configure the receiver's parameters in the Parameters tab. This receiver connects to a database and performs batched updates. Its parameters provide the location of the config.properties file used for the database connection. 



Step 6 - Define the SQL Update Statement

Select the Data tab and enter the SQL Update Statement. Each Attribute included in the update statement is assigned a variable value ranging from var1 to var20. Enter the Update Statement and click SaveIn this example, the assigned variable values are:

  • ssn = var1
  • phone_number = var2
  • id = var3 


Please note that the syntax of the update statement will vary based on the database type and version. This example is for MySQL 5.7.  If the syntax is wrong, you will receive an error. The error and troubleshooting recommendations are shown later in this article.



Step 7 - Assign a Variable Number to each Attribute

Next, assign the same variable number used in the Data tab (Step 6) to the Attribute it represents. In this example, "ID" is set as "var3" in Step 6, so the Variable Number will be "3".  To do so, select the Attributes Property Keys tab and then choose an Attribute to change one or more property key values. 



Complete these steps to change the variable property key value (repeat for each Attribute):  

  • Select the Variable Number from the drop-down menu.
  • Click Save and Next



The Attributes Property Keys appear as shown below for this example: 



Step 8 - Create a Scenario for the Domain

You can create a Scenario in multiple ways. We used Quick Scenario in this example. Click here to learn more.


You can create a Scenario Chain for multiple Domains rather than downloading them individually. However, we only need one Scenario for this example.


Step 9 - Complete In-Place Masking Using Scenario and Command

Complete the steps below:

  1. Download the Scenario (or Scenario Chain). Please note this is not necessary when using G-Repository
    • Use the Cloud icon in the Project Dashboard to download a Scenario or Scenario Chain.

  2. Run the Scenario at the command line. Note this may be different depending on how you have set up GenRocket (e.g., local, CI/CD pipeline, etc.).
    • The basic command is: genrocket -r <ScenarioName>.grs 
    • The command is different when using other features such as G-Repository. Please see the corresponding documentation for more information.



Sample Data

The first image is of the same ten records shown at the beginning of this article with masked data. 



The following image is of the last ten records in the Customer table. We have successfully masked all 1,000 records with synthetic data in just a few seconds.


Error Troubleshooting

The following errors are common when performing in-place masking: 


Error 1 - Error in SQL Syntax

As noted in this article, the syntax for the SQLUpdateV2Receiver Data tab and the QuerytoMapGen assigned to the id Attribute may differ from what is shown in this tutorial. It can vary per database type and version, resulting in an error when the entered syntax is incorrect. In most cases, the problem lies within the Receiver Data tab.

When there is a syntax error, you may receive an error like below when running the Scenario(s).


!!!!! An Error Has Occurred !!!!!
You have an error in your SQL syntax; check the manual that corresponds to your
MySQL server version for the right syntax to use near 'set phone_number = '(123)
500-1999' where id = 1000' at line 1


In this example, set is used twice and is only needed once. 



You can check the syntax by referring to the manual for that database and version. You can also enter the syntax into the command line when accessing it from the command line to validate that it works for a single record. 


The image below shows the incorrect syntax:



The next image shows the correct syntax: 



The first record's ssn and phone number were changed to match the values used in the correct update statement. So, the syntax in the second image above is valid and can be entered in the Data tab for the SQLUpdateV2Receiver. Replace the actual values with variables (i.e., var1, var2, etc.).



Error 2 - Column Name Not Found in Query Result

If the wrong value is entered in the QueryFromMapGen Generator assigned to the unique Attribute (id Attribute in this example, the following error may occur: 


!!!!! An Error Has Occurred !!!!!
Column name 'customer' was not found in query result on Attribute Customer.id having
namespace ''


This happens when the wrong value has been entered for the columnName parameter within the QueryFromMapGen. It is looking for a column named 'customer,' which does not exist.