GenRocket supports test data generation for large data warehouses

GenRocket has the ability to generate on average of 15,000 rows of test data per second to a flat file. That's 900,000 rows per minute or 54,000,000 rows per hour.  If you shard GenRocket across 10 instances, you can generate 540,000,000 rows of data in one hour or 1 billion rows of test data in two hours.  You can spin up a set of Amazon servers and the sky becomes the limit to how much data can be generated and how fast it can be generated.

43 Billion Rows in 19 hours

Take a look at the below diagram to see how GenRocket can generate 43 billion rows of data across 43 Amazon servers, each running 10 instances of GenRocket, each instance knowing what segment of data to generate and completing the data generation in just under 19 hours.

GenRocket Receivers for Bulk Loading

Populating data into a data warehouse like Teradata and unstructured databases such as MongoDB, Cassandra etc., is the task of a GenRocket Receiver.  When generating such large volumes of data, you don't want to connect to the database via JDBC, you want to generate the data to a given database's native bulk load format. 

For example, let's say your team needs to populate a Teradata database. To do this, the GenRocket team would implement a new Receiver that would format data output to files native for fast bulk loading into the Teradata database. The same is true for MongoDB, Cassandra, Hadoop, or other NoSQL databases. New Receivers take our team ~1-2 weeks to create.

New Receivers are easily added because of GenRocket's component-based architecture 

With GenRocket and its component-based architecture, there's virtually no limit as to the type and amount of data that can be generated to any target database.  

  • A GenRocket Domain defines a noun (person, place or thing) that represents a set of generated data,
    • GenRocket Attributes defined within each Domain determine the characteristics of the Domain,
    • One or more GenRocket Generators, linked together, can generate any type of data for an Attribute,
  • A GenRocket Receiver defines the format of the data output,
  • A GenRocket Scenario defines how much data will be generated and will guarantee referential integrity of the generated test data across parent and sibling relationships and across multiple Scenarios.

Let us know your big test data challenge and we can show you how to solve that challenge with GenRocket.

Note: To see a list of our currently supported output formats, click here.