Description

While GenRocket can be used to generate virtually any volume and variety of synthetic test data, there are use cases where it is necessary to use existing production data. But, production data may contain sensitive data, so you need to secure that data. You also do not need all that data for application testing, only a subset of the data. 


GenRocket uses a process called Synthetic Data Masking (SDM) to mask sensitive data values with synthetic data values in files and databases. We also offer Conditional Masking and Data Subsetting capabilities for databases. This functionality is part of G-Migration+. SDM and subsetting can be used together or independently. Each works based on the specifications provided by the user. 


CapabilityDatabasesPerformance
Data Subsetting
2.5 Million Rows per Minute
Data Masking

Millions of Rows in Minutes
(when used with GenRocket's partition engine)
Synthetic Data Augmentation


Note: It's important to note that GenRocket's SDM and Subset features are not an ETL tool. SDM with subsetting cannot be used for the following:  

  • Data Sanitation, Correcting (Cleansing), and Curation
  • Data Consolidation (Consolidate Data Across Different Databases)
  • ETL - Typically used to consolidate into a single location for easier analytics and/or storage


In This Article


What is Synthetic Data Masking (SDM)?

  • SDM is the process of dynamically masking sensitive data values with synthetic test data. It can be used for the following:  
    • Databases - Occurs before insertion into the destination database. The G-Migration+ feature can be used to perform SDM for databases.

    • Files - Occurs at the time of test data generation. A Masking Receiver must be added to the Domain to mask files. Click here to learn more.


  • Synthetically generated values have similar characteristics to real values but cannot be traced back because they are synthetic. SDM maintains the structure of the value to ensure it remains usable.

  • Unlike many other TDM and synthetic TDM providers, GenRocket does not need to look at the actual data in production, making it more secure. The Production data is never exposed or "read" - only the metadata.

What is Data Subsetting?

  • Query a subset of data within a source database and insert it into a destination database based on defined subsetting conditions. 

  • The G-Migration+ feature can be used to perform data subsetting. 


Database Data Subsetting and Masking Capabilities

The G-Migration+ feature is used to perform data subsetting and SDM for databases. Please note that the source and destination databases must be identical (e.g., MySQL to MySQL); however, table schemas can vary between them.  

 

Supported Databases (Subsetting and Masking)

Supported databases include: 

  • Oracle
  • MS SQL Server
  • MySQL Server
  • DB2
  • PostgreSQL
  • Sybase


What Actions Can Users Perform?

Users can perform the following actions: 

  • Data Subsetting Only
  • Synthetic Data Masking Only
  • Data Subsetting and Synthetic Data Masking


Synthetic Data Masking (SDM) Only

The user adds one or more tables from the imported schema and selects sensitive columns within the table. A Scenario is created with Attributes for each sensitive column and is used to insert synthetic test data into the destination database. 

Example - The customer table's date of birth (dob), last name, phone number, and ssn columns have been marked as sensitive data columns. Synthetic data values will be inserted into the destination database for these columns. 


Data Subsetting Only

The user adds a table from the imported table schema and then adds subsetting conditions. Subsetting conditions can only be added to one table and include the following: 


Where clauseA filter/condition that is applied to migrate a subset of data.
% of RowsDefines a percentage of rows (e.g., 25%, 50%).
# of RowsDefines a constant number value of rows.


Example - Subset will start at Customer record id '251' and only contain 100 rows of data along with their associated records in related tables. The last included records will have a customer ID of '350'.


Data Subsetting and SDM

As discussed above, the user adds subsetting conditions for one table and selects sensitive table columns in one or more tables.


Example - Subset will start at '51' and contain 100 rows of data. Customer information will be masked during subsetting (e.g., Last Name, Date of Birth, SSN, Phone Number).