Overview
This article answers common questions about the UDA feature, how it works for PDF generation, and related topics. Use the anchor links for quick access.
- What problem does UDA solve?
- Why use synthetic data instead of real data?
- How realistic are the generated documents?
- How does synthetic data improve testing accuracy and coverage?
- What industries or workflows benefit most from this feature?
- Does synthetic data include any real or masked customer information?
- Can this feature be integrated into automated workflows or CI/CD pipelines?
- How is data variety handled (different layouts, formats, or fields)?
- Is it possible to share unstructured data files containing only synthetic data safely with external teams or vendors?
- Can I customize the PDF format after importing it?
- Should the imported PDF contain data or have no data?
- Are there any requirements or limitations for PDF files?
- Are there any limits to the number of pages a source PDF file can have?
- Are there any limitations on the number of PDFs that can be generated from a created template file?
- Can synthetic PDFs be used to train or validate AI models?
- How can I ensure Referential Integrity between two PDFs?
Template and Project Setup FAQs
- Are any preliminary setup steps required to begin using UDA?
- Are there any special setup jars or dependencies when using UDA?
- What are the minimum system requirements for UDA?
- Can multiple files be imported at one time?
- Can I have multiple files tied to a single Domain?
- How does a template file translate to a Project in GenRocket?
- Can I reuse variables in a template (i.e., name appears more than once in the template)?
- How do I ensure the variables have the same value for each occurrence in the template?
- Can I create a PDF project on top of another PDF Project?
General FAQs
What problem does UDA solve?
- UDA solves the challenge of testing and validating systems that process unstructured data, such as PDFs, without using sensitive or production data.
- It does this by generating synthetic data, enabling safer, more effective testing without exposing real data.
- UDA helps test different document workflows across industries, including finance, insurance, and healthcare. For example, loan processing in finance or patient record workflows in healthcare.
- Click here to learn more.
- UDA solves the challenge of testing and validating systems that process unstructured data, such as PDFs, without using sensitive or production data.
Why use synthetic data instead of real data?
- Synthetic data protects privacy, reduces compliance risk, and allows teams to safely test at scale.
- Synthetic data protects privacy, reduces compliance risk, and allows teams to safely test at scale.
How realistic are the generated documents?
- Synthetic PDFs replicate the structure, fields, and layouts of real documents while ensuring all content is fictitious.
- Synthetic PDFs replicate the structure, fields, and layouts of real documents while ensuring all content is fictitious.
How does synthetic data improve testing accuracy and coverage?
- It provides diverse, realistic input that helps uncover edge cases and improve validation of data extraction and processing systems.
- It provides diverse, realistic input that helps uncover edge cases and improve validation of data extraction and processing systems.
What industries or workflows benefit most from this feature?
- UDA is beneficial for any industry or workflow that uses sensitive documents for testing or automation.
- UDA is beneficial for any industry or workflow that uses sensitive documents for testing or automation.
Does synthetic data include any real or masked customer information?
- All data is generated synthetically by default and does not originate from real records. This ensures customer information is not included by default in synthetic datasets.
- Options exist to include real data values in the generated documents when needed.
- Real data can be queried from external sources, such as databases or files (e.g., CSV or Excel).
- This can be done using G-Queries, Query Generators, and Import Generators.
- All data is generated synthetically by default and does not originate from real records. This ensures customer information is not included by default in synthetic datasets.
Can this feature be integrated into automated workflows or CI/CD pipelines?
- Yes, document generation can automatically trigger in test automation.
- Integration methods, such as REST API, SOAP, and CLI, are available and can be configured within your workflow by following the integration guides specific to your environment.
- Ensure that your automation setup invokes the document generation process and that the output files are written to the expected directories or endpoints.
- Supported integration methods include REST API, SOAP, and CLI, enabling seamless compatibility with various workflows.
- See the Integrations and API article sections for details.
- Yes, document generation can automatically trigger in test automation.
How is data variety handled (different layouts, formats, or fields)?
- Templates and configuration options control document structure, field content, and variability.
- Templates and configuration options control document structure, field content, and variability.
Is it possible to share unstructured data files containing only synthetic data safely with external teams or vendors?
- Synthetic data files containing no real data can be shared externally with teams or vendors safely. Always ensure no real data is present to maintain security.
- It is recommended to take precautions and ensure that no real data has been included in documents before sharing.
- If real data is included during test data generation, explicitly state it in the generated output file names, or use a designated output folder so that it is easy to identify which files contain real data values.
- Synthetic data files containing no real data can be shared externally with teams or vendors safely. Always ensure no real data is present to maintain security.
PDF Specific FAQs
Can I customize the PDF format after importing it?
- Yes. The template created from PDF import is customizable before export.
- Users are directed to the template editor for customization and export.
- Click here to learn more.
Should the imported PDF contain data or have no data?
- Importing a PDF with data gives more accurate modeling.
- Importing a PDF with data gives more accurate modeling.
Are there any requirements or limitations for PDF files?
- Only digital PDFs are supported currently.
- Find PDF requirements and best practices here.
Are there any limits to the number of pages a source PDF file can have?
- No specific page limits exist for a PDF source.
- Limit a single PDF file to 20 pages for best results.
Are there any limitations on the number of PDFs that can be generated from a created template file?
- There are no limitations on the number of PDF documents that can be generated.
- Generating large numbers of PDFs may take additional time.
- Many factors affect speed and performance. Learn more here.
.
Can synthetic PDFs be used to train or validate AI models?
- Yes, they're ideal for AI model training without privacy risk.
- Yes, they're ideal for AI model training without privacy risk.
How can I ensure Referential Integrity between two PDFs?
- Referential integrity can be maintained using one of the following:
Template and Project Setup FAQs
Are any preliminary setup steps required to begin using UDA?
- Yes, some initial setup steps are required to begin using UDA.
- Enable UDA, install dependencies, and download required files.
- More information is available here.
Are there any special setup jars or dependencies when using UDA?
- The UDA Jar must be downloaded from the GenRocket web platform.
- Install Node.js version 20.18.0.
- Find step-by-step instructions here.
What are the minimum system requirements for UDA?
- System requirements are the same as GenRocket Runtime.
- Click here to learn more.
Can multiple files be imported at one time?
- No, only one file can be imported at a time.
- No, only one file can be imported at a time.
Can I have multiple files tied to a single Domain?
- Each source file page has one Domain and its Attributes.
- UDA does not currently support importing multiple documents into the same Project Version.
How does a template file translate to a Project in GenRocket?
- Domain = Each individual template page.
- Label = Attribute, used for static text or images that do not change (e.g., bank statement header or column header).
- Variable = Attribute, fields that require synthetic test data to be generated or will be used as an image, QR, or barcode placeholder. Text examples include a person's first name or date of birth.
Can I reuse variables in a template (i.e., name appears more than once in the template)?
- Yes, an option is provided to duplicate a variable such as a name.
- Each variable is a separate Attribute and must have a unique name when customizing the template.
How do I ensure the variables have the same value for each occurrence in the template?
- For example, let's say it is a 5-page document and the name occurs three times throughout the document. The user wants to make certain that the name is the same at each occurrence, but each is a separate variable in the project.
- To use the same value for repeated fields, a feature such as G-Map Server will be needed to maintain data consistency.
- For example, let's say it is a 5-page document and the name occurs three times throughout the document. The user wants to make certain that the name is the same at each occurrence, but each is a separate variable in the project.
Can I create a PDF project on top of another PDF Project?
- No, but additional Domains, known as Helper Domains, can be added to the Project without altering the existing PDF structure.
- A Helper Domain is a Domain that does not generate data explicitly, but helps the user generate certain logic in other Domains.
- No, but additional Domains, known as Helper Domains, can be added to the Project without altering the existing PDF structure.
Need More Help?
More information can be found on here - Unstructured Data Accelerator (UDA) - PDF Document Generation.
Please reach out to our support team at support@genrocket.com for additional help and information.