Description
GenRocket has the ability to generate on average 4,000 records per second using a JDBC Driver connection. This works perfectly for generating small batches of data on any system; however, this process will take much longer for generating millions of test data records.
Bulk Load Receivers can be used to populate a large amount of data into data warehouses (Teradata, MondoDB, Cassandra, etc.) or unstructured databases at a faster speed than through JDBC. These Receivers allow you to generate the data to a given database's native bulk load format.
For example, let's say your team needs to populate a Teradata database. To do this, the GenRocket team would implement a new Receiver that would format data output to files native for fast bulk loading into the Teradata database. The same is true for MongoDB, Cassandra, Hadoop, or other NoSQL databases. New Receivers usually take our engineering team one Sprint cycle to implement.
The Database Environment
The Database Environment is an important performance factor because data access takes many forms. A software test may query an external database as part of an application workflow. Data migration testing may involve large amounts of data transferred between systems. GenRocket’s Test Data Queries may be used to blend stored production data with real-time synthetic test data. Or a GenRocket Test Data Scenario may be generating an extremely large test file with hundreds of thousands of records.
Here are performance issues related to the database environment:
- The use of indexing for database access
- The size of the database or test files in use
- The server used for database queries or data insertion
Indexes are data structures that allow rapid access to data tables in place of sequentially examining each record to find a given row of data. However, poor indexing can be a source of poor performance for data-intensive applications. Adding indexes without proper analysis can cause insert, update and delete functions to take longer when a large number of indexes need to be updated. And if the file is not indexed, each database operation must sequentially scan the entire data file to perform the right operation on the right record.
Database size is a factor that multiplies the impact of every other performance issue. The performance issues caused by slow hardware, a slow network or a poorly indexed database will be amplified when large volumes of data are involved.
Finally, the server hardware itself can be a source of performance issues. Hardware-related performance problems are unlikely for a database maintained on a production system. However, production data that is moved to an under-resourced test server will result in a performance hit.
Note: For more information about additional test data automation performance factors, please see this page on the GenRocket website: Test Data Automation Performance.
Available Bulk Load Receivers
- DelimitedPartitionReceiver
- MySQLPartitionReceiver
- OraclePartitionReceiver
- PostgreSQLPartitionReceiver
- MSSQLPartitionReceiver
- ParquetPartitionReceiver
- DB2PartitionReceiver
- SybasePartitionReceiver