You are under attack. Not your person, that is, but information that uniquely identifies you. I’d hazard a guess that your privacy is important to you and so you make some effort to protect yourself. You expect those who handle your data to protect it as well. I’ll assume your site is already doing the common sense stuff, such as data encryption, strong passwords, and minimal privileges. Unfortunately, the exposure of personally identifiable information (PII) is a too-frequent occurrence. To that end, Oracle has come up with a new feature in version 12c that may significantly reduce your risk of a data spill. Simply put, the database can hide your sensitive data in plain sight through data redaction. The data itself is not changed, but it appears to be altered when viewed.
Whether you are developing with a traditional waterfall model, or exploring an agile approach, developers and testers often prefer to work with production data. Whether or not these people are allowed access to your production environment is another subject. Let’s assume that you are following all the right steps with your production environment. Source code may be obscured (obfuscated). The data at rest should be encrypted; as well as the data in transit, when content is moving between the database server and the browser-based application. In theory, only the database administrators may see the real data that’s stored in the database. The customers see the real data in online forms or reports. My experience on different sites, both public and private, has been that production data is periodically cloned to non-production environments. Perhaps the non-production systems are less robust due to older equipment, less frequent security patching, or a less secure facility. Auditing might not be enforced, as the staff has a legitimate need to full access. Admittedly this scenario is a generalization, but my point is that we cannot keep these one-off environments to the same standards that are demanded in production.
The main advantage of data redaction is that it dynamically provides realistic-looking data, rather than real data. I also like how easy it is to administer different results for different types of users by way of policies. Let’s use a credit card account query for this example. The consumer is expected to log onto the institution’s database with their user id and password. The database is queried, and returns the 16 digit account number as part of the result set. Any consumer-facing display is commonly represented with the first three groups masked by asterisks, and the fourth in clear text. A customer service agent, on the other hand, may need to view the full string to authenticate the caller. The tester may wish to confirm the right digits display as expected but doesn’t necessarily need the “live” account number from the result set. With data redaction, and the appropriate policies in place, all three results can be satisfied with one SQL statement. The displayed results do not require special formatting. Without character conversion, no additional I/O is required; nor is there any performance load. There are a variety of redaction styles, including:
- Full: a birthdate value of 01/01/1990 could be returned as any valid date (11/30/2001)
- Partial: a telephone number of 937-555-1212 could display as XXX-XXX-1212
- Regular expressions: convert an email address to become “d[hidden]@xyz.org”
- Random: a numeric string is replaced by a random number of the same length
“Data reduction” is the second component available, though Oracle introduces it as a database subset. When a production database is cloned for testing, it has commonly been the “full Monty”: all years, all rows, all good data and bad. A big data warehouse can take hours and hours to clone. With the database subset feature, however, Oracle can now generate a representative or sample set of data, metadata, metrics, and tuning statistics.
These two changes offer great potential in how I manage a customer’s data assets. I reiterate that data encryption is your best tool to eliminate unauthorized access, and that poor security practices expose you to insider abuse. Any user who is allowed to directly query the database tables will be a single point of failure in your security plans. The product is relatively new, and thus exploitable. Also, you should be aware that these features are part of the Advanced Security license pack, for a small additional cost. Finally, it may be difficult to take advantage of reusing SQL code. Let us know if we can help.