Description

The GenRocket Partition Engine is used to generate hundreds of millions, to billions or even trillions of rows of test data in a short period of time. This is accomplished by partitioning the load to generate huge amounts of test data across multiple GenRocket instances running within a given server. When generating enormous amounts of test data, the load can be partitioned across multiple servers with each running multiple GenRocket instances.


Test Data Generation Speed Calculations

On a given computer, depending on the number of CPU Cores, Memory and Operating System (OS), GenRocket may generate between 10,000 to 15,000 rows of test data per second.  If we base our calculations on the idea that GenRocket is running on one very slow computer, then the following test data generation calculations can be approximated:

  • 10,000 rows every second
  • 600,000 rows per minute
  • 1,000,000 rows every 1 minutes and 40 seconds
  • 10,000,000 rows every 16 minutes 6 seconds
  • 100,000,000 rows every 2 hours 46 minutes
  • 1,000,000,000 rows every 27 hours 6 minutes


As seen from the approximations above, generating test data greater than 10 million rows takes far too much time. Thus, depending on the number of instances partitioning and generating test data across multiple servers, it is possible to drastically reduce the amount of time to generate said test data.  


Example One - Partitioning to Generate 100 Million Rows of Test Data

If 100,000,000 rows of test data were generated on 1 server running 10 partitioned GenRocket instances, it would take approximately 18 to 20 minutes to generate all 100,000,000 rows of the test data.  Some speed is lost due to the operating system having to run 10 GenRocket instances simultaneously.  Each GenRocket instance runs within its own Java runtime environment, within its own block of memory and sharing CPU cycles given to it from the OS.  


Single Server Component Diagram

The diagram below shows one local computer, running multiple partitioned GenRocket instances, writing to the local file server under the root directory, server1.  Each GenRocket Partitioned Instance has a root directory under the directory, server1 (instance1, instance2, and instanceN).  Under each instance's root directory, data directories are dynamically created and numbered (e.g. data1, data2, etc) to store files, also numbered (e.g. file1.txt, file2.txt, etc), that are dynamically created during test data generation. Each file will store a finite set of rows (e.g. 10000) before the next file is created.  Each data directory will store a finite number of files (e.g. 500) before the next data directory is created.  



Example One - Partitioning to Generate 1 Billion Rows of Test Data

If 1,000,000,000 rows of test data were generated on 10 servers each running 10 partitioned GenRocket instances, it would still only take approximately 18 to 20 minutes to generate all 1,000,000,000 rows of test data.


Multi-Server Component Diagram

The diagram below shows multiple servers, running multiple partitioned GenRocket instances, writing to a share Amazon S3 file server with each having root buckets of 1 through N.