Description
The G-Migration+ feature can be used to query a subset of data within a source database and insert that subset of data into an identical destination database. Data subsetting conditions define what subset of data will be inserted into the destination database.
For data subsetting, a user must perform the following tasks:
- Add a table to the G-Migration+ Configuration.
- Add subsetting conditions for the table (Where Clause or Limit Rows)
- To add multiple conditions use an AND or OR in the Where Clause.
Important: Only one added table can have data subsetting conditions (Where Clause and/or Row Limit).
In This Article
- When Should Tables be Added?
- Prerequisite Steps
- How to Select Tables for Data Subsetting
- How to add a WHERE Clause Condition to a Table
- How to add a Limit Row Condition to a Table
- Traverse Full Hierarchy Option
- Next Steps After Setup
When Should Tables be Added?
- Any time Data Subsetting or Synthetic Data Masking (SDM) will be performed between two identical databases.
Prerequisite Steps
- A table schema (XTS file) must be imported before this step can be completed. If this has not been done, please do so first. Click here for more information.
How to Select Tables for Data Subsetting
- Select a G-Migraton+ Configuration.
- Click the Manage Tables button.
- Use the checkboxes to select one or more Tables.
- Click the Save button.
- Sensitive table columns can be selected at the same time for Synthetic Data Masking (SDM).
- Click Cancel if none should be selected for the table.
- To select sensitive columns for an added table, use the checkboxes.
- Then click Save or Save and Next (when multiple tables are added at a time).
How to add a WHERE Clause Condition to a Table
The Where Clause applies a specific condition to a table. Only rows satisfying the Where Clause will be included in the subset. All records in related tables associated with those meeting the Where Clause condition will be inserted as well. Only one table can have a Where Clause.
Example 1 (Single Condition - String)
Where Clause: address.state = 'CA'
Result: Only records with "CA' as state and associated records in related tables within the source database will be inserted into the destination database.
Example 2 (Single Condition - Number)
Where Clause: customer.id >100
Result: Customer records starting at 101 and associated records in related tables within the source database will be inserted into the destination database.
Example 3 (Multiple Conditions - AND)
The database has three related tables:
- Organization - 2 Organizations
- Department - 10 Departments (5 per Organization)
- User - 1000 Users (100 per Department)
Where Clause: user.id >50 AND user.deptid = 2
- deptid is the foreign key reference column in the User Table.
Result: User records starting at 51, belonging to Department 2, and their associated records will be inserted into the destination database.
Example 4 (Multiple Conditions - OR)
The database has three tables:
- Organization - 2 Organizations
- Department - 10 Departments (5 per Organization)
- User - 1000 Users (100 per Department)
Where Clause: user.id < 200 OR user.deptid = 6
- deptid is the foreign key reference column in the User Table.
Result: User records 1 through 199, the 100 user records that belong to Department 6, and any associated records will be inserted into the destination database.
Steps for Adding Where Clause to a Table
- Click the Edit (Pencil) icon.
- Enter the information into the Where Clause field.
- Click the Save button.
Note: It's a good practice to provide the table name and column name in the Where Clause to avoid any ambiguous column issue during the subsetting process. For example, when the where Clause is for a specific state in the address table, it would look like this address.state = 'CA'.
Similarly, if the table belongs to a schema, it's good to provide both schema and table name in the where Clause. For example, if the address table belongs to the employee schema, the Where Clause becomes employee.address.state = 'CA'. - The completed configuration will appear as shown below:
How to add a Limit Row Condition to a Table
The Limit Rows options can be used to limit how many rows from the subset are inserted into the destination database.
If a Where Clause is present for the table, then the row limit applies only to those rows that satisfy the Where Clause. All associated records in related tables will be inserted based on the Where clause and row limit conditions.
Otherwise, the limit will apply to the entire table. For example, if a table is added and a number limit of '100' is entered, then only 100 rows of data will be inserted into the destination database.
Example 1 (Number Limit)
Where Clause: address.state = 'CA'
Limit Row: Number with '100' as entered value
Result: Only 100 records of those satisfying the Where Clause and all associated records in related tables within the source database will be inserted into the destination database.
Example 2 (Percentage Limit)
Where Clause: address.state = 'CA'
Limit Row: Percentage (%) with '50' as entered value
Result: Only 50% of records satisfying the Where Clause and all associated records in related tables within the source database will be inserted into the destination database.
Steps for Adding a Limit Row Condition to a Table
- Click the Edit (Pencil) icon.
- Select a Limit Type (Number or Percentage).
- Enter a Limit Value and click Save.
Traverse Full Hierarchy Option
When enabled, G-Migration+ will also migrate the other children (and their parents) of the ancestor tables during the migration to the target database, thus covering the full hierarchy of the tables that are related to each other. Ancestor tables are the parent tables of the table with the subsetting condition. This option is disabled (unchecked by default). To learn more and see a full example, click here.
Next Steps After Setup
Note: If Data Subsetting and Synthetic Data Masking will be performed together, additional files will be necessary to complete the migration.
- Download the Default Scenario and G-Migratoin+ Configuration.
- Place the Default Scenario in the file path defined by the resource.output.directory.
- Place the G-Migration+ configuration in the file path defined by the resource.home.directory.
- Run the G-Migration+ command.
Please look at Data Subsetting Only Use Case for Databases for a step-by-step example.