How to Partially Encrypt (Mask) Data While Preserving Format

Using structured data encryption, which preserves format, has a ton of different use-cases. One common use is to replace or implement an alternative to traditional masking. This brief paper will walk through why you might want to do that, and once you’ve decided that it fits your use-case, it’ll guide you in the technical implementation of partial (or full) masking using structured data encryption with Ubiq.

You can skip ahead and catch up on your own, but these other introductions will provide context:

  • For deciding if structured data encryption is right for your use case click here.
  • For a general walkthrough of how to use structured data encryption click here.

Data Masking Use-Case Examples

Data masking (full or partial) as a concept can be applied to service a lot of different use-cases; if you’re here, you might already know what you’re trying to solve with it. If you’re just exploring, though, these examples may help inspire some ideas where data masking (structured data encryption!) might be helpful to you:

Data Masking:

Consider a scenario where a healthcare organization needs to share a medical research dataset with external partners while protecting patient privacy. The dataset includes sensitive information such as patient names, social security numbers (SSNs), and medical diagnoses. To ensure privacy, the organization applies data masking to the dataset.

Original data:
Patient Name: John Doe
SSN: 123-45-6789
Diagnosis: Hypertension

Masked data:
Patient Name: Jane Smith
SSN: XXX-XX-XXXX
Diagnosis: Hypertension

In this example, the patient's name and SSN have been replaced with fictional values to protect the privacy of the individuals involved. The diagnosis field remains unaltered since it does not contain sensitive information. By applying data masking, the organization can share the dataset with external parties for research or analysis purposes while safeguarding patient confidentiality.

Partial Data Masking:

Let's consider an organization that operates a customer relationship management (CRM) system. They need to provide a test environment for software development and quality assurance (QA) teams while ensuring the protection of sensitive customer data. In this case, partial data masking is implemented.

Original data:
Customer Name: John Doe
Email: [email protected]
Phone Number: 555-123-4567
Credit Card Number: 1234-5678-9012-3456

Partially masked data:
Customer Name: John Doe
Email: [email protected]
Phone Number: 555-XXX-XXXX
Credit Card Number: XXXX-XXXX-XXXX-3456

In this example, the organization masks sensitive data such as the phone number and credit card number. The customer name and email address remain unaltered as they do not contain sensitive information. By applying partial data masking, the organization can provide realistic test data for development and QA purposes without exposing sensitive customer details that could potentially lead to data breaches or misuse.

Both data masking and partial data masking techniques help organizations protect sensitive information while ensuring data availability for various purposes. The specific approach and fields selected for masking depend on the privacy requirements, data sensitivity, and the intended use of the masked data.

Encryption vs. Data Masking

Data masking is a technique used to protect data by changing the characters in a piece of data so that it is no longer sensitive, but is recognizable in structure. The characters used in masking are arbitrary, but it is common to use a single character like an “x”, a dot, or a zero. It is important to also note that masking in its traditional form is non-reversible, meaning that the original text cannot be retrieved from the masked text.

Data masking can be full, where all of the characters are changed (like when you type a password into a form on a website and it is replaced by dots), or it can be partial where only a subset of characters are changed (like when you see your bank account number on a statement and it only shows the last 4 digits).

The advantage of masking is that it can be used “in-place,” meaning that the masked data structurally looks and feels like the original data, so an application or a human that’s seeing it doesn’t need to change when data goes from plaintext to masked text. This is advantageous when that data is stored in a database (with length or character restrictions), used for other logic in business processing, or needs to be readable.

Data masking has the disadvantage of being non-reversible, which means that either:

  • Masking is applied only at a visual layer, but the data itself is still stored in its full form, which leaves the data largely unprotected and doesn’t help close any of the storage-layer security gaps.
  • Masked data needs to be stored AND the original data needs to be stored, which adds significant complexity (like storing your own tokenized version) and also leaves the original data in need of a second protection solution.

Why Choose Encryption

Encryption doesn’t suffer from either of these disadvantages - it is, by nature, reversible (decrypting) and can be stored in place because of that.

Data masking preserves the format of the original data so that it is recognizable, either by humans or software. With the use of a format-preserving encryption (like the NIST FFx algorithms), encryption can do the same. While format-preserving encryption won’t use the same character as the “mask,” it will preserve length and a character set that you define. “Partial” masking (i.e. masking all but the last 4 of a credit card number) can also be achieved with format-preserving encryption. Once you’ve made the choice to use structured data (format-preserving) encryption, you might get curious about how it actually works.

Encryption supports far more use-cases because the original data is retrievable. It provides cryptographically secure data protection, human- or logic-readable output (just like masking), but it doesn’t force you to lose or duplicate the full content just to achieve your masking goals.

With its default behavior, however, encryption encrypts the entire content of whatever you’re protecting. So if you’re encrypting a credit card number, there’s no encryption “flag” to set to leave the last 5 or the first digit unencrypted. With some very simple (and dare we say creative?) implementation, however, you can make that magic happen...

Partial Structured Data Encryption

First, a brief soapbox moment about our SDK roadmaps here at Ubiq so that I can avoid plugs for “things that are coming” or “stuff we promise we’ll build some day” in the content below that’s actually useful for implementation. We do have a bunch of JIRA stories and even an epic that will productize both the configuration and the implementation of partial encryption within our SDK and our SaaS UI. We believe that this is a great quality-of-life improvement for those who are implementing our SDK and having to write their own code snippets. There are a bunch of capabilities we have planned for the feature, and it includes a flexible way of defining a “mask” in your structured dataset (these things) and then when the SDK is used to encrypt/decrypt using that dataset, the masking/parsing/etc. All automagically happens. And in our defense, we beta’d a version of this to find that gasp we had created a feature that was engineering friendly but that the security folks using the UI didn’t find easy to work with (insert here a shot about engineering-led companies.) Too much regex and not enough buttons. So the whole thing is back in our todo list, but the value and demand hasn’t changed.

</end of rant>

Masking Concepts

The concept of partial masking applies equally as well to structured data encryption - the basic idea is that you want a piece of your content to remain unchanged while protecting the rest. Typically, that piece is the beginning or the end of the thing, but it could be the middle or some combination. When data is masked, this is almost always implemented manually with some software logic that says:

The steps are pretty straightforward, though in the defense of quality software teams out there, many may have abstracted this and/or implemented a general-purpose masking set of methods that simplify this or make it easier:

  1. Separate the string into the part you want to protect (replace with “x”s) and the part you don’t
    Do your replacement on the part you want to protect - see the note in the diagram that this can get annoying in masking logic because there is oftentimes a character or set of characters you want to keep (like the dash).
  2. Using structured datasets with Ubiq makes this happen automatically with structured data encryption, so you don’t have to worry about any of that.
  3. Combine the replaced and the pass-through strings together to get the whole thing again.

Structured Data Encryption

Doing the same thing with structured data encryption (see rant above on the lack of helper methods) is the same - but a little easier - than the masking concepts. Because passthrough characters are already handled automatically, you don’t have to worry about the replacement of those, and instead the 3 steps for masking turns into 3 (simpler) steps with encryption:

  1. Separate the string into the part you want to protect and the part you don’t
  2. Run encryption on the part you want to protect (that’s it!)
  3. Combine the results of the encryption and the part of the string you wanted to keep

And you’re done, partial structured data encryption. See below for real code examples in addition to the pseudocode from the diagrams above:

const ubiq = require('ubiq-security')

const ffsName = "SSN";
const plainText = "123-45-6789";

const ubiqCredentials = new ubiq.ConfigCredentials('./credentials', 'default');

const ubiqEncryptDecrypt = new ubiq.fpeEncryptDecrypt.FpeEncryptDecrypt({ ubiqCredentials });

const stringToEncrypt = plainText.substring(0, 7);
const stringToKeep = plainText.substring(7, 4);

const encrypted_data = await ubiqEncryptDecrypt.EncryptAsync(
        ffsName,
        stringToEncrypt
      );

const partialStructuredMask = encrypted_data + stringToKeep;
        
console.log('PARTIAL STRUCTURED ENCRYPTED => ' + partialStructuredMask+ '\n');
import ubiq_security as ubiq
import ubiq_security.fpe as ubiqfpe

ffs_name = "SSN";
plain_text = "123-45-6789";

credentials = ubiq.ConfigCredentials('./credentials', 'default');

string_to_encrypt = plain_text[:7];
string_to_keep = plain_text[-4];

encrypted_data = ubiqfpe.Encrypt(
        credentials,
        ffs_name,
        plain_text);

partial_structured_mask = encrypted_data + string_to_keep;
        
print(‘PARTIAL STRUCTURED ENCRYPTED => ' + partial_structured_mask + '\n');
#include <ubiq/platform.h>

std::string ffs_name("ALPHANUM_SSN");
std::string pt("123-45-6789");
std::string ct;

ubiq::platform::credentials creds;
ubiq::platform::init();

std::string msk = pt.substr(0,7);
std::string pln = pt.substr(7,4);

ct = ubiq::platform::fpe::encrypt(creds, ffs_name, pt);

std::string ptmsk = msk + pln;

cout << ‘PARTIAL STRUCTURED ENCRYPTED => ‘ << ptmsk;

ubiq::platform::exit();
using UbiqSecurity;

async Task EncryptionAsync(String FfsName, String plainText, IUbiqCredentials ubiqCredentials)
{
    // default tweak in case the FFS model allows for external tweak insertion          
    byte[] tweakFF1 = {};

    using (var ubiqEncryptDecrypt = new UbiqFPEEncryptDecrypt(ubiqCredentials))
    {
        var stringToEncrypt = plainText.Substring(0, 7);
        var stringToKeep = plainText.Substring(7, 4);

        var cipherText = await ubiqEncryptDecrypt.EncryptAsync(FfsName, plainText, tweakFF1);

        var partialStructuredMask = encrypted_data + stringToKeep;
        Console.WriteLine($"PARTIAL STRUCTURED ENCRYPTED => {partialStructuredMask }\n");
    }

    return;
}
import ubiqsecurity.UbiqCredentials;
import ubiqsecurity.UbiqFPEEncryptDecrypt;
import com.ubiqsecurity.UbiqFactory;

String FfsName = "SSN";
String plainText = "123-45-6789";

UbiqCredentials ubiqCredentials = UbiqFactory.readCredentialsFromFile("path/to/file", "default");
// Create single object but use many times
try (UbiqFPEEncryptDecrypt ubiqEncryptDecrypt = new UbiqFPEEncryptDecrypt(ubiqCredentials)) {
  // Can call encryptFPE / decryptFPE many times without creating new UbiqFPEEncryptDecrypt object.

  String stringToEncrypt = plainText.substring(0, 7);
  String stringToKeep = plainText.substring(7, 4);

  String cipherText = ubiqEncryptDecrypt.encryptFPE(FfsName, plainText, null);
  String partialStructuredMask = encrypted_data + stringToKeep;

System.out.println("PARTIAL STRUCTURED ENCRYPTED => " + partialStructuredMask + "\n");
}
call ubiq_begin_fpe_session('SSN', access_key, secret_signing_key, secret_crypto_access_key);

-- update column in table to partial mask
update sample_ssns set ssn_encrypted = 
   CONCATENATE(
      ubiq_fpe_encrypt_cache(SUBSTRING(ssn_plaintext, 1, 7), 'SSN'),
      SUBSTRING(ssn_plaintext, 8, 4)
   );

Summary

As a follow-on to the comparison of masking and structured data encryption in your decision-making process here, partial structured data encryption has a bunch of benefits when compared to traditional masking. Although some creative lines of code could make this easier natively within our SDKs, we’ve covered here how simple it is to do on top of the existing Ubiq SDKs:

  • The same way you use structured data encryption today.
  • Split up your string into the part you want to protect/mask/encrypt vs. the part that you don’t.
  • Encrypt the part you want to encrypt (no need to worry about pesky string manipulations to keep those pass-through characters because that happens for you!).
  • Put the encrypted and the plaintext strings back together… and you’re done!

With that at your fingertips, partial masking via partial-structured-data-encryption can be a tool at your disposal to improve the options you have when protecting your data.