Description
Many types of testing do not require large volumes of data. The GenRocket Engine generates approximately 10,000 rows per second and hands off the data to a respective Receiver.
For databases specifically, the Receiver communicates via JDBC in batches of records (typically 1,000 records per batch). 10,000 rows of data are sufficient for most use cases. However, generating large volumes of patterned or realistic, unique data is necessary for some use cases.
Factors such as the volume of data, type of data, and where it is being generated can also impact test data generation speed and performance. This article provides information about the following:
- Available Features for Generating Large Volumes of Data
- Recommendations for When to Use the Available Features
- Speed and Performance Factors for Optimizing Test Data Generation
In This Article
Features for Generating Large Volumes of Data
Additional GenRocket features can and should be used when generating millions or billions of rows of data. These features speed up test generation for one or multiple Domains (depending on the feature).
Partition Engine
The Partition Engine partitions the load across multiple GenRocket instances running within a given server. When generating enormous amounts of test data, the load can be partitioned across multiple servers, each running multiple GenRocket instances.
Question | Answer |
When should the Partition Engine be used? | Any time a user needs to generate hundreds of millions, billions, or even trillions of rows of test data for a Domain Scenario. |
Recommended For | |
| |
Not Recommended For | |
| |
Can it be used with Scenarios? | Yes |
Can it be used with Scenario Chains? | Not at this time |
Can it be used with Scenario Chain Sets? | Not at this time |
Can it be used when dependencies (parent/child relationships) exist between Domains? | Yes |
Can it be used with small, medium, and large amounts of data? | Large amounts of data for a Scenario |
How does the Partition Engine work? | Data generation is split up for one Domain across multiple threads. The data is generated on each thread at the same time. The volume of the test data generation must be evenly distributed across all instances.
|
What is the recommended approach for generating Domain data when dependencies are present? | For dependencies, it is recommended to do the following order:
Please remember that test data generation must be performed separately for each Domain Scenario. It can become more complicated when many dependencies exist between Domains. |
What Receivers can be used with the Partition Engine? | Bulk Load Receivers are used with the Partition Engine. Please have a look at the Bulk Load Receivers section of this article for more information. |
What is the Attribute Optimizer? | This is a flag built into the Partition Engine. When the flag is set to "true," it looks up the Parent/Child relationships to determine if a particular Generator with a parent is not being referenced by any of the children. If not, it turns off Generators for Attributes that are generating data and not being referenced by the child. It is irrelevant for them to be generating data. This optimizes it and makes test data generation slightly faster. |
Important Note about System Configuration | The user needs to have the appropriate system configuration to support the number of threads being executed at the same time. |
Where can I learn more about the Partition Engine? | Please look at this knowledge base article: What is the GenRocket Partition Engine? |
Scenario Thread Engine
The Scenario Thread Engine provides another method for increasing test data generation speed for better performance by simultaneously executing multiple Scenarios within a Scenario Chain or Scenario Chain Set across multiple threads.
Question | Answer |
When should the Scenario Thread Engine be used? | Any time a user needs to generate large volumes of data faster for multiple Domains where the order of execution does not matter. Note: Data from one Scenario cannot be dependent on data from another Scenario. For example, if Domain B requires data generated from Domain A to generate its data, then this feature cannot be used. |
Recommended For | |
| |
Not Recommended For | |
| |
Can it be used with Scenarios? | No |
Can it be used with Scenario Chains? | Yes |
Can it be used with Scenario Chain Sets? | Yes |
Can it be used when Parent/Child Relationships (dependencies) have been set between Domains? | Yes, when the order of data generation does not matter. Note: It should not be used when the order of data generation matters. |
Can it be used with small, medium, and large amounts of data? | Small, medium, or large amounts of data. |
Can the Scenario Thread Engine be used with the Partition Engine? | No |
How does the Scenario Thread Engine work? | It simultaneously executes multiple Scenarios within a Scenario Chain or Scenario Chain Set across multiple threads. Example: If a user specifies 10 threads, it will run 10 Scenarios simultaneously. When each one is finished, it will grab another Scenario. This process will continue until all test data has been generated. Instead of generating 500 Scenarios in sequence, the user is now generating 10 Scenarios in sequence. This increase the speed of test data generation. |
Important Note about System Configuration | The user needs to have the appropriate system configuration to support the number of threads being executed at the same time (e.g., the number of Scenarios in a Chain or Chain Set). Example: The user is executing 10 Scenarios at the same time within a Scenario Chain. The system must have enough memory to keep 10 Scenarios in memory at the same time and enough CPUs to support that execution. |
Where can I learn more about the Scenario Thread Engine? | Please look at this knowledge base article: What is the Scenario Thread Engine? |
Bulk Load Receivers
Bulk Load Receivers can populate a large amount of data into data warehouses (Teradata, MongoDB, Cassandra, etc.) or unstructured databases faster than through JDBC. These Receivers allow users to generate the data to a given database's native bulk load format.
Question | Answer |
When should Bulk Load Receivers be used? | Use for the following:
|
What is defined in these Receivers? | The selected Receiver defines the following:
|
How do Bulk Load Receivers work? | These Receivers write the data generated for each instance to a subdirectory underneath that specific instance. Based on the defined number of files per directory and number of records per file in the Receiver's parameters, the Receiver will do the following:
Each database takes data in what is called delimited Bulk Loading format. Two files are usually created.
The database receives it and recognizes the file that needs to be looked at for the data and the columns, etc. It then slams that huge amount of data into the database very quickly. |
What Bulk Load Receivers are available? | Bulk Load Data with Partition Engine |
| |
Bulk Load Data into a Database (with or without Partition Engine | |
These two Receivers are considered Bulk Load Receivers as well.
Both should only be used for smaller loop counts and can load the data while maintaining referential integrity. |
Speed and Performance Factors
Several factors can affect the speed of test data generation, even when one of the above features is being used.
System-Related
These non-GenRocket performance factors can slow down test data generation:
Factor | Considerations and Recommendations |
Operating System | Some operating systems are faster than others:
|
Recommendations for Better Performance | |
| |
System Memory and CPUs | The system memory amount must be sufficient to support the number of CPUs. |
Minimum Recommended Number of CPUs and RAM | |
| |
Recommended in Modern Testing Environments | |
| |
Recommendations for Better Performance | |
|
Network-Related
These factors will impact test data generation when generating test data. The most significant impacts are typically for remote locations.
Factor | Considerations and Recommendations |
Network Speed | This is how fast data is transferred from one system to another over the network. The following impacts network speed:
|
Recommended Actions for Better Performance | |
| |
Network Bandwidth | This is the maximum amount of data that can be transferred over a network in a given time (typically 1 second). Bandwidth can vary significantly over a network path, and each network segment can provide a different level of bandwidth. Users, systems, and devices often share the same connection and thus share bandwidth. Some take more bandwidth than others. If many users, systems, and devices share the same connection, this will decrease overall bandwidth and reduce speed. |
Network Utilization | This is how much (in percentage) network bandwidth is being used or consumed by network traffic. Higher traffic decreases speed. This is especially true when you have low bandwidth. |
Network Latency | As the number of hops along the path increases, so does the amount of latency or delay in end-to-end data delivery. This includes any security checks and communications between remote systems or users. |
Database-Related
These performance factors can also impact the speed of test data generation for any amount of data.
Factor | Considerations and Recommendations |
Database Location | The speed will depend on where the database is located:
|
Recommendation 1 - Add GenRocket Runtime to the Same Side as the Database | |
Adding GenRocket Runtime to the same side as the database will increase performance. Example: Firewall -> GenRocket Runtime Data <-> Database (Remote Location) | |
Recommendation 2 - GenRocket Multi-User Server (GMUS) | |
Install a GenRocket Multi-User Server (GMUS)on a machine within the same environment as the test database. The GMUS does not have to be on the same machine, just within the same location where the distance to the database is much shorter, and the connection to the database is secure because it's connecting within the same environment. Testers can send commands to the GMUS via REST as to which Scenario to run (this can include a G-Case). The GMUS will use the GenRocket engine to load and execute the instructions of the Scenario within the local environment. The GenenericSQLInsertReceiver will securely connect to the database via JDBC within the local environment and should be able to send batches of data to the database optimally. This still depends on how well your database has been configured to receive the data optimally (Primary Key, Indexes, Foreign Keys, etc.). Example: User (API of GMUS /rest/scenario) -> Firewalls (Optional) -> GenRocket Runtime (GMUS) <-> Remote Database | |
Recommendation 2 - Generate an SQL file and Upload it to the Database | |
Generate an SQL file and then upload it to the database. Use the SQLFileInsertReceiver, to write ANSII SQL inserts statements (single or batch) to a file. A second Receiver (FTPReceiver, SFTPReceiver, S3Receiver, etc.) can be added to send and deposit the resulting file on a machine within the same secure location as the test database. This alternative would require you to have a solution within your testing environment to read the SQL Inserts statements from the file. Most databases have this capability built in; it's just a matter of calling the database with the proper command. Example: GenRocket Runtime generates file Locally -> Upload into Remote Database | |
Recommendation 3 - Implement a Better-Designed JDBC Driver | |
Implement a type of JDBC driver better designed to work securely over long distances with large batches of data. | |
JDBC Connections | When generating data, multiple calls are being made back and forth between GenRocket Runtime and the database based on the defined batch count. This determines how many records are being sent in each batch.
For example, if 10,000 records are being generated and the batch count is set to 1000, then 10 batches will be sent. |
Database Indexing | Indexes are data structures that allow rapid access to data tables instead of sequentially examining each record to find a given row of data.
|
Network Speed | A slower network will decrease test data generation speed for querying or inserting data into a remote database. |
Server Hardware | Hardware-related performance problems are unlikely for a database maintained on a production system. However, production data that is moved to an under-resourced test server will result in a performance hit. |
Database Size | Multiplies the impact of every other performance issue (slow hardware, slow network, poorly indexed database, etc.). These issues are amplified when large volumes of data are involved. |
Database Queries | The number of queries and type of queries impact test data generation speed.
|
Additional Reading | Generating Data for a Remote Database How does GenRocket work with Databases? |
Query-Related
These factors apply regardless of what is being queried (database, CSV file, Excel file).
Factor | Considerations and Recommendations |
Number of G-Queries | A larger number of G-Queries will slow down test data generation. |
Recommendations | |
Evaluate the G-Queries to determine if any can be eliminated or other changes can be made to improve test data generation. | |
Number of Query Generators | A larger number of Query Generators will also slow down test data generation. This is because more information is being stored in memory while test data generation is occurring. |
Recommendations | |
Evaluate what Query Generators are being used to determine if any can be eliminated or if other changes can be made to improve test data generation. | |
Query Each vs. Query Before | Query Each - The query will occur for each iteration. This will slow down test data generation. Query Before - The query will occur at the beginning and only once (typically faster). |
Recommendations | |
If Query Each is being used for Generators or G-Queries, try using Query Before to increase speed and performance. |
File Format-Related
Specific file formats are faster than others. This section provides details on any file format factors to consider for better speed and performance:
Factor | Considerations and Recommendations |
Excel vs. Delimited File Format | Excel Files are generally slower than the Delimited File format. |
Generating Data | |
Generating data in an Excel sheet will be slower. Recommended steps:
| |
Reading Data | |
Reading data from an Excel File (using ListExcelGen or another Excel Generator) is slower than reading data from a plain delimited file (ListCSV or CSVToMap, etc.). |
Other Factors
This section contains any additional factors that can affect speed and performance:
Factor | Considerations and Recommendations |
Memory Generators | Memory Generators or Generators generating unique data save some information in memory. Performance can be slowed down based on the following:
|
Number of Attributes | Domains having a large number of Attributes can slow down test data generation. |
G-Map and CSV Files/Databases | When using G-Map to store temporary data for mapping data generated by one Project that another Project needs, using CSVs and databases to store the intermediate data can be slow. |