What is Data Column Profiling?
Data Column Profiling is a GenRocket feature designed to automatically identify Personally Identifiable Information (PII) within your datasets by scanning column headers, not the actual data. GenRocket checks each data column name to see if it CONTAINS keyword values that are defined at the GenRocket and Organization levels.
The user is provided with all matches and can then choose which sensitive columns to replace with synthetic test data using GenRocket’s Synthetic Data Replacement (SDR). This capability allows teams to maintain high-quality, compliant, and secure test data without manually combing through large and complex datasets.
In This Article
- What Features Use Data Column Profiling?
- Feature Key Benefits
- How Does It Work?
- GenRocket Level Data Profile Names
- Organization Level Data Profile Names
- Examples of Detected Columns
- Example Use Case Scenarios
- G-Migration+ Data Column Profiling Steps
What Features Use Data Column Profiling?
- G-Migration+ - Migrate a subset of records to an identical database and mask/replace any specified PII columns. The G-Migration+ configuration defines which data columns are PII that need to be replaced with synthetic test data. Users can perform this task manually or utilize the Data Column Profiling feature for automatic detection of PII.
- In-Place Masking (IPM) Engine (In Development/Beta Testing) - Used to perform in-place masking for all data values within a specified database table. The IPM Engine requires G-Migration+ to know which data columns require in-place masking for the defined table.
Feature Key Benefits
- Automated PII Detection - GenRocket intelligently scans column names for built-in and organization-defined profile names (e.g., user_id, email). Data Column Profiling accelerates and standardizes the identification of sensitive data columns. It also saves time in large-scale, repetitive workflows, such as CI/CD pipelines.
- Enhanced Data Privacy Compliance - Organizations can meet regulatory requirements, such as HIPAA, by identifying PII early, to reduce the risk of non-compliance and data exposure.
- Accommodates Unique Organizational Naming Conventions - Define organization-specific PII identifiers based on your data environment. GenRocket adapts to your schema.
- Reduced Risk of Data Breaches or Exposure - Replacing sensitive data early in the data life cycle decreases risk.
How Does It Work?
Profile Column Headers - GenRocket analyzes table schemas and column headers for default system values or custom-defined profile terms.
Identify PII Candidates - Any column that contains one of these terms is considered a match and provided to the user.
User Selection - The user then chooses which flagged columns require Synthetic Data Replacement (SDR).
Synthetic Data Replacement (SDR) - GenRocket replaces sensitive column values with realistic, non-identifiable data.
GenRocket Level Data Profile Names
GenRocket uses 102 built-in data profile names that commonly contain sensitive or PII data values as part of the profiling check. During data profiling, GenRocket will automatically check to see if these values are part of the data column headers. Below are a few examples:
Users (Org Admins and Non-Admins) can view the default GenRocket Data Profile Names by completing the steps below. Changes are not allowed; this is for viewing purposes only.
- Go to My Organization > Data Profile Names.
- Select GenRocket Data Profile Names.
- In the MFA prompt, click Confirm.
- Enter the code received by email and click Verify.
- If the code is correct and you have the appropriate permissions, you will be redirected to the GenRocket Data Profile Names.
Organization Level Data Profile Names
Additionally, an Organization can add custom Data Profile Names in the GenRocket web platform. You can add a single Data Profile Name or bulk import them with a CSV file.
Note: User must be an Org Admin to add, edit, and delete Organization Data Profile Names. If a different role has been assigned, the user will not see these options.
Adding a Single Data Profile Name
Complete the following steps to do so:
- Go to My Organization > Data Profile Names.
- Select Add Profile Name.
- Enter the Name and click Save.
Add Multiple with CSV Bulk Import
The CSV file can only contain one column, and the maximum number of rows (records) is 1,000. Complete the following steps to do so:
- Go to My Organization > Data Profile Names.
- Select CSV Bulk Import.
- Browse to and select the CSV file.
- Then click Upload. If there is a problem with the selected file, an error will be displayed.
- GenRocket checks if the CSV values already exist as GenRocket or Organization Data Profile Names.
- You will see a pop-up confirmation message displaying the number of Data Profile Names added.
Examples of Detected Columns
Actual Column Header | Profile Match |
user_email | |
client_ssn | ssn |
home_city | city |
street_address | address |
Example Use Case Scenarios
These use case scenarios illustrate how GenRocket’s data column profiling, combined with Synthetic Data Replacement (SDR), supports scalability, consistency, compliance, and automation in modern data workflows.
- Development and Testing Environments - Easily detect and replace PII before provisioning test environments with real data. Beneficial for avoiding privacy risks while maintaining data integrity for QA and UAT.
- Data Migration and Onboarding - Identify sensitive information when ingesting new data or migrating legacy systems.
- CI/CD Pipelines - Integrate Data Column Profiling into automated pipelines to ensure that every dataset delivered to staging or testing environments is free of PII.
- Data Analysis and Reporting - Enable analysts to examine production-like datasets without accessing real PII, thus preserving privacy while allowing insights.
- Third-Party Data Sharing - Before sharing datasets with partners, vendors, or contractors, profile and mask all sensitive fields to enforce secure, contract-compliant sharing.
- Standardizing PII Detection Across Departments - GenRocket supports organization-defined profile names, enabling a centralized standard for identifying PII across various team datasets.
- Synthetic Data Simulation for AI Training - Make certain all training data follows realistic synthetic data without any PII for training AI models.
G-Migration+ Data Column Profiling Steps
The required steps to perform Data Column Profiling in G-Migration+ are identical except for how PII columns are identified and selected.
Users will see an additional "Profile Columns " option for the selected table in the G-Migration+ configuration. To see step-by-step instructions for using Data Column Profiling in G-Migration+, click here.