Description
The GenRocket Partition Engine is used to generate hundreds of millions, billions, or even trillions of rows of test data in a short period of time. This is accomplished by partitioning the load to generate huge amounts of test data across multiple GenRocket instances running within a given server. When generating enormous amounts of test data, the load can be partitioned across multiple servers, each running multiple GenRocket instances.
Special Note
The G-Partition Engine is a file-intensive processing engine, possibly creating hundreds of files per second depending on the number of instances run simultaneously. Thus, it is highly recommended that the G-Partition Engine is run only on Linux machines. When running ten or more G-Partition instances, the minimum machine hardware requirements are 4 CPUs, 16GB RAM, and 1 TGB SSD.
In This Article
- Test Data Generation Speed Challenge
- Example 1 - Partitioning to Generate 100 Million Rows of Test Data
- Example 2 - Partitioning to Generate 1 Billion Rows of Test Data
- Additional Information
- Partition Engine Receivers
Test Data Generation Speed Challenge
On a given computer, depending on the number of CPU Cores, Memory, and Operating System (OS), GenRocket may generate between 10,000 to 15,000 rows of test data per second. If we base our calculations on the idea that GenRocket is running on one very slow computer, then the following test data generation calculations can be approximated:
- 10,000 rows every second
- 600,000 rows per minute
- 1,000,000 rows every 1 minute and 40 seconds
- 10,000,000 rows every 16 minutes 6 seconds
- 100,000,000 rows every 2 hours 46 minutes
- 1,000,000,000 rows every 27 hours 6 minutes
As seen from the approximations above, generating test data greater than 10 million rows takes far too much time. Thus, depending on the number of instances partitioning and generating test data across multiple servers, it is possible to drastically reduce the amount of time to generate said test data.
Example 1 - Partitioning to Generate 100 Million Rows of Test Data
If 100,000,000 rows of test data were generated on one server running ten partitioned GenRocket instances, it would take approximately 18 to 20 minutes to generate all 100,000,000 rows of the test data. Some speed is lost due to the operating system running 10 GenRocket instances simultaneously. Each GenRocket instance runs within its own Java runtime environment, within its own block of memory, and shares CPU cycles given to it from the OS.
Single Server Deployment Diagram
The deployment diagram below shows one local computer running multiple partitioned GenRocket instances, writing to the local file server under the root directory, server1. Each GenRocket Partitioned Instance has a root directory under the directory server1 (instance1, instance2, and instanceN). Under each instance's root directory, data directories are dynamically created and numbered (e.g., data1, data2, etc.) to store files, also numbered (e.g., file1.txt, file2.txt, etc.) that are dynamically created during test data generation. Each file will store a finite set of rows (e.g., 10000) before the next file is created. Each data directory will store a finite number of files (e.g., 500) before the next data directory is created.
Example 2 - Partitioning to Generate 1 Billion Rows of Test Data
If 1,000,000,000 rows of test data were generated on ten servers, each running ten partitioned GenRocket instances, it would still only take approximately 18 to 20 minutes to generate all 1,000,000,000 rows of test data.
Multi-Server Component Diagram
The diagram below shows multiple servers running multiple partitioned GenRocket instances, writing to a shared Amazon S3 file server, each with a root folder from server1 through serverN.
Additional Information
Topic | Description |
How to Optimize Test Data Generation | Discover ways to optimize test generation, including the Partition Engine, Scenario Thread Engine, Bulk Load Receivers, or a different file type. Additional information is provided to check speed, performance, network-related, and database-related factors. |
What is GenRocket Partitioning? | Learn about GenRocket partitioning and how it is implemented. |
How do I configure GenRocket to run the Partition Engine? | Learn how to configure the REST request payload to run multiple partitioned GenRocket instances. |
What does a GenRocket Receiver require to work with the GenRocket Partition Engine? | Learn what parameters and functionality a GenRocket Receiver must implement to be used with the GenRocket Partition Engine. |
How do I run multiple GenRocket Partitions? | Learn how to run multiple GenRocket Partitioned instances. This article also provides additional commands for the Partition Engine Queue and Queue History. |
Partition Engine Benchmarks | View Partition Engine Benchmark values for different scenarios with system information included. |
Partition Engine Receivers
Receiver | Description |
How do I use theMySQLPartitionReceiver? | Used to load data from the Partition Engine into a MySQL database. |
How do I use the DelimitedPartitionReceiver? | Outputs data in a delimited format to one or more files parsed over multiple instances via the Partition Engine. |
How do I use the PartitionFileMergeReceiver? | Used to merge non-nested partitioned data from many files into one file. |
How do I use the OraclePartitionReceiver? | Used to load data from the Partition Engine into an Oracle database. |
How do I use the SegmentPartitionReceiver? | Morphs Domain data into a set-based XML output format to one or more files created over multiple instances via the GenRocket Partition Engine. |
How do I use the PostgreSQLPartitionReceiver? | Used to load data from the Partition Engine into a PostgreSQL database. |
How do I use the MSSQLPartitionReceiver? | Used to load data from the Partition Engine into an MSSQL database. |
How do I use the ParquetPartitionReceiver? | Outputs data to one or more Parquet files parsed over multiple instances via the Partition Engine. |
How do I use the ParquetPartitionFileMergeReceiver? | Merges Parquet partition data from many Parquet files into one Parquet file. |