Description

The GenRocket Partition Engine is used to generate hundreds of millions, billions, or even trillions of rows of test data in a short period of time. This is accomplished by partitioning the load to generate huge amounts of test data across multiple GenRocket instances running within a given server. When generating enormous amounts of test data, the load can be partitioned across multiple servers, with each running multiple GenRocket instances.


Special Note

The G-Partition Engine is a file-intensive processing engine, possibly creating hundreds of files per second depending on the number of instances run simultaneously. Thus, it is highly recommended that the G-Partition Engine is run only on Linux machines. When running ten or more G-Partition instances, the minimum machine hardware requirements are 4 CPUs, 16GB RAM, and 1 TGB SSD.


In This Article


Test Data Generation Speed Challenge

On a given computer, depending on the number of CPU Cores, Memory, and Operating System (OS), GenRocket may generate between 10,000 to 15,000 rows of test data per second.  If we base our calculations on the idea that GenRocket is running on one very slow computer, then the following test data generation calculations can be approximated:

  • 10,000 rows every second
  • 600,000 rows per minute
  • 1,000,000 rows every 1 minute and 40 seconds
  • 10,000,000 rows every 16 minutes 6 seconds
  • 100,000,000 rows every 2 hours 46 minutes
  • 1,000,000,000 rows every 27 hours 6 minutes


As seen from the approximations above, generating test data greater than 10 million rows takes far too much time. Thus, depending on the number of instances partitioning and generating test data across multiple servers, it is possible to drastically reduce the amount of time to generate said test data.  


Example 1 - Partitioning to Generate 100 Million Rows of Test Data

If 100,000,000 rows of test data were generated on one server running ten partitioned GenRocket instances, it would take approximately 18 to 20 minutes to generate all 100,000,000 rows of the test data.  Some speed is lost due to the operating system running 10 GenRocket instances simultaneously.  Each GenRocket instance runs within its own Java runtime environment, within its own block of memory, and shares CPU cycles given to it from the OS.  


Single Server Deployment Diagram

The deployment diagram below shows one local computer running multiple partitioned GenRocket instances, writing to the local file server under the root directory, server1.  Each GenRocket Partitioned Instance has a root directory under the directory server1 (instance1, instance2, and instanceN).  Under each instance's root directory, data directories are dynamically created and numbered (e.g., data1, data2, etc.) to store files, also numbered (e.g., file1.txt, file2.txt, etc.) that are dynamically created during test data generation. Each file will store a finite set of rows (e.g., 10000) before the next file is created.  Each data directory will store a finite number of files (e.g., 500) before the next data directory is created.  



Example 2 - Partitioning to Generate 1 Billion Rows of Test Data

If 1,000,000,000 rows of test data were generated on ten servers, each running ten partitioned GenRocket instances, it would still only take approximately 18 to 20 minutes to generate all 1,000,000,000 rows of test data.


Multi-Server Component Diagram

The diagram below shows multiple servers running multiple partitioned GenRocket instances, writing to a shared Amazon S3 file server, each with a root folder from server1 through serverN.