Data Masking: What Do You Need to Know as a QA Professional?
In this digital age where data acts as an open book to anyone who can read it, it is crucial to be protective of this data. Data masking, also sometimes referred to as anonymization or data obfuscation, is a way to ensure the protection of real data. The goal, however, is to keep the data useful for development and testing. Understanding these principles will help you to ensure data security as well as come up with test cases that mimic real-world scenarios. Read on to get a better understanding of this concept.
What is data masking, and how does it help?
The purpose of data masking is to create a replica of the original data without divulging crucial information. It’s typically used in the environments with less security than production, such as for testing, training purposes, or by sales teams. Masking is usually done by hiding certain data pieces, shuffling them, replacing or encrypting them. The end result has to retain the same structure as the original data, though. This pseudo data is typically used as test data, for user training purposes, by the sales teams, or even by third parties outside the company.
Another need for data masking is in production environments in order to help businesses comply with various regulations like the GDPR. Employing such practices to protect user or organization data allows for maintaining cyber security. Since data can be misused to commit fraud or hack into systems, masking it provides a great way to ensure that security is not breached. The banking and health care industries are among the ones that heavily rely on data masking.
What kind of data should be masked?
- Personally Identifiable Information (PII): This includes user details such as name, phone number, passport number, social security number, etc.
- Protected Health Information (PHI): This includes health care-related information of a patient like insurance number, medical history, laboratory results, etc.
- Payment Card Information (PCI-DSS) includes credit and debit card information.
- Intellectual Property (IP): This includes valuable intellectual data such as business plans, designs, strategies, inventions, research, etc.
Different techniques used to mask data
- Encryption: This allows secure data transmission. The data is encrypted using a secure key. It is one of the most secure forms of data masking, making it essentially impossible for unwanted parties to access without a decryption key.
- Scrambling: The characters in the data are reordered randomly to confuse the viewer. This is a good technique as long as the viewer is unable to rearrange the characters and guess the original data value.
- Substitution: Fake values are used to substitute the original values in the dataset. This is a great way to have life-like data without compromising security.
- Shuffling: This technique is similar to substitution, but the values of the same column are shuffled randomly. So the dataset appears authentic without disclosing any sensitive values.
- Nulling Out (Deleting): Any useful information in the data is replaced with blank or returned as null.
Things to know if you are testing with masked data
- There is always a chance of a security breach due to internal (unethical users) or external (hackers) gaining access to the company’s data and misusing it.
- Many data security laws like GDPR require organizations to mask data to be compliant to carry out business operations in their jurisdictions.
- At times, testing on live production environments with the original dataset may cause undesirable changes to it, thus causing customer dissatisfaction.
- When generating different test reports, screenshots may divulge sensitive information that the client may not appreciate the testers being privy to.
- Using live production data during sales demos or pitches with other clients may cause problems.
- If the organization is expected to get some work to be done by third parties, then sharing sensitive information may incur potential security breaches.
Due to these reasons, working with masked data helps testers get a better picture of the production data without having to worry about data sensitivity and security during testing.
Moreover, it helps with building comprehensive and inclusive test cases so that real-world scenarios can be tested prior to releasing the feature to the customer.
What to keep in mind when testing using masked data
As a QA, your main goal when working with masked data is to ensure that the format is identical to the original data, not omitting any detail. As an example, if the customer’s username includes special characters but that was eliminated when masking data – you might miss testing for this. That’s especially true in today’s world, where documentation for any given project is common to be incomplete or even completely non-existent. Moreover, if special characters suddenly were blocked from the field as invalid or malicious input, this would lead to a critical bug leaking into the production environment facing the end users.
The more different types of data the company has, the less realistic it is to use the same masking techniques across those datasets. It is not uncommon for the QA team to discover that the masking techniques that were used do not offer the desired outcome. If that happens, the database should be restored to the previous unmasked state, and a new masking method applied.
Popular data masking tools
Broadcom Data Masking
Broadcom provides data masking as a feature for enterprises that use Test Data Manager. Developers may use this software to produce convincing test data or utilize current data without compromising sensitive information, thanks to data masking.
Delphix Data Platform
Delphix conceals confidential information and substitutes realistic data, allowing it to be utilized for development, testing, quality assurance, and analytics. Even though the masking algorithms are pre-configured, you may modify them to match the security needs of your business. Furthermore, the platform includes an easy user interface and a no-code/low-code setup, simplifying the installation procedure.
IBM® InfoSphere® Optim Data Privacy
For non-production scenarios like development, testing, or QA, IBM® InfoSphere® Optim Data Privacy helps disguise and regulate sensitive information (PII and other confidential data).
A cyber assault may be prevented or mitigated by the solution’s ability to encrypt data in real-time. It may also hide important information on the screen, ensuring that only the proper individuals have access to it.
Informatica Cloud Data Masking
Informatica’s Persistent Data Masking software combines anonymization and encryption to protect sensitive data. Realistic data may be used for testing and development as well as for analytics with the aid of these duplicates. The program uses masking methods (substitution, randomization, or nullification) and strategies particular to PII data (credit cards or financial information). Masking may be reversed using a NIST-standard FPE transformation.
Microsoft Azure data masking
For the sake of analytics and testing, Microsoft Azure provides both static and dynamic data masking on all of its servers, databases, and environments. All types of cloud installations are supported, and a wide range of configurable features are included to meet the needs of a wide range of customers.
Conclusion
Data safety is crucial to keep an eye on, and it plays an integral role in any application. Thus using data masking techniques to safeguard it from prying eyes is essential. However, care needs to be taken to not malign the data too much so that it can still act as a good reference point for internal teams like testers and developers to build wholesome solutions and test them thoroughly before a release.
Achieve More Than 90% Test Automation | |
Step by Step Walkthroughs and Help | |
14 Day Free Trial, Cancel Anytime |