What is Test Data Management?

August 12, 2020

Definition

Test data management is the process of planning, designing, storing, and managing the creation of test data for software quality-testing processes and methodologies.

Test data management is also known as software test data management. (source)

Typically, large to medium size businesses have a formal process defined due to the need to integrate data all throughout the company.

Software Quality Assurance and Test Data Management Link

Test Data Management in the Software Quality Assurance domain has really matured the last decade with DevOps and Data Breaches. The Quality Assurance practice must carry forward on validating new technology features which typically require test data to validate the new technology. Therefore, secured and readily available test data is needed. Test data being used for testing must be de-identified or synthetically created in case a breach occurs or an employee sharing personal test data on mistake.

Test Data Management needs to solve a big security risk and availability problem. Traditional test data management processes take months to weeks to process due to the large amounts of data that is typically requested for software engineers and testers. As DevOps methodology requires large amount of automation; so does the creation of test data at the same rapid speed.

The Process

Most Test Data Management techniques either follow an Extract, Transform, Load (ETL) method or a data virtualization method to make the data readily available. Typical process is to load snapshot data from production, de-sensitize it, and load it into QA or developer environments.

To address data protection or data privacy of PHI(personal health information) or PII(personal identifiable information) from production, companies must follow a formal masking or synthetic data creation process. Companies in the financial and health related industries must apply the most transformation typically delaying the availability of test data. The test data is usually more stale too.

What is Data Masking?

Data masking is a data security technique in which a dataset is copied but with sensitive data obfuscated with a transformation method. The core data structure still exists, the masking process either uses a predefined masked formula or empties data elements during the transformation.

What is Synthetic Test Data?

Synthetic test data does not use any actual data from the production database. It is artificial data based on the data model for that database. This method inherently complies with masking due to the fact you can control the data elements so removing any risk of including PHI or PII information.

According to ISTQB (International Software Testing Qualifications Board) the definition of data privacy is the following:
The protection of personally identifiable information or otherwise sensitive information from undesired disclosure.

Learn more about Data Protection for home

Apply QA has partnered with Michael Pasono, a leader in data protection and systems quality to publish a book on this very topic. The book is called “Identity and Data Protection for the Average Person” and can be found on Amazon.

Learn about 3 main attacks causing a collection of your personal data. This book reveals many leading best practices in securing your identity and data from being harvested and limiting your risk for exposure if a data breach occurs.