How to Configure Structured Datasets and Calculate Key Rotations
Using structured datasets provide flexible and powerful capabilities to protect data while ensuring that it preserves format attributes that meet your needs for data storage (like database column types) and use. With those capabilities comes a more complex set of configuration options for defining your structured datasets. Much of this is described in the Structured Datasets Guide here. This brief how-to will provide a more technical explanation for how to appropriately choose your character sets to best suit your data protection needs.
The configuration of your structured dataset has a direct correlation to the number of data keys that will be available for use. In the guide below, you will also find the considerations for increasing the number of available key rotations for a structured dataset and the math behind that. Note that unstructured datasets are not bound by these restrictions.
Definitions
- Input Character Set - a list of characters that are valid in the input string that will be encrypted
- Output Character Set - a list of characters that will appear in the cipher text
- Pass-Through Character Set - a list of characters that will be left untouched and in their original position during the encryption process
- Minimum Input Length - the minimum number of valid characters that will be allowed in string being encrypted; this only counts valid characters in the Input Character Set and ignores pass through characters. Validation is enforced at time of encryption.
- Maximum Input Length - the maximum length of the string to be encrypted; also enforced at time of encryption
- Max Data Key Rotations - the number of possible data encryption keys that can be used for the dataset, calculated based on Input Characters, Output Characters, and Minimum Input Length (more on this below); good data security practice requires the data encryption key can be rotated at least four times.
Basic Rules & Relationships
There are a few fundamental relationships that are useful to understand between the configuration options before getting into the math of calculating key rotations. With the above definitions in mind, here they are:
- Your Output Character Set must have more characters than your Input Character Set. In practice, you should generally allow all the output characters that you can for your use-case (supported by your encoding type, that make sense for your data usage/viewers of ciphertext, will allow the data to be used by your app, etc.) The bigger the difference between the number of input and output characters, the more key rotations you'll get.
- Your Pass-Through Characters cannot also be in your input or output character sets. This is simply because pass through characters are ignored in the encryption process, so they can't also be valid as input or ciphertext.
- NIST dictates a minimum length to protect against rainbow-table type attacks for small-character datasets.
- Ubiq dictates a minimum number of key rotations (4), which affects the balance of the other configurations.
In general, more output characters (more unique characters in your output that are not in your input) and longer minimum length will get you more key rotations.
NIST Requirement of Input Character Set and Minimum Length Relationship
NIST requires a minimum relationship of the minimum length and input character set to ensure effective strength of the encryption process. This is their calculation, which is enforced when configuring your Structured Dataset:
For convenience, we will use M to represent the Minimum Input Length and R for the number of characters in the Input Character Set. Examples: If the Input Character Set is numeric (0-9), then R would be 10. If the Input Character Set is mixed case alphabetic (a-zA-Z), then R would be 52. The NIST requirement is that R^M >= 1,000,000. In the case where the Input Character Set is numeric (R=10), M would have to be 6 or greater since 10^6 >= 1,000,000. For the Input Character Set being mixed case alphabetic (R=52), M would have to be greater than or equal to 4.
Maximum Data Key Rotations Calculation
Max Data Key Rotations is calculated from the same inputs. Using the same M and R values above the Minimum Input Length and number of characters in the Input Character Set, we will also add Z for the number of characters in the Output Character Set. The Ubiq Structured Data Encryption algorithms include data compression which allows metadata to be stored with the cipher text - this is how we save you from having to keep track of what key was used to encrypt each of your pieces of data. To achieve the necessary data compression and encoding of the key rotation number, the following formula is used. We also require a minimum of 4 possible key rotations to ensure a minimum-security assurance with key material. The value of Y should be greater than or equal to 4.
n = (R^M) – 1
d = Z ^ (M-1)
x = truncate(n / d)
s = 0
if (x > 0) {
s = truncate(log2(x)) + 1
}
Y = ((Z – 1 – x) >> s) + 1
Examples
Input Character Set: 0-9 (R= 10)
Output Character Set: a-n (Z = 14)
Minimum Input String: Length 6 (Controlled by NIST formulas)
n = 106 – 1 => 999,999
d = 14(6-1) => 537,824
x = truncate(999,999 / 537,824) => 1
s = truncate(log2(1)) + 1 => 1
Y = (truncate(14 - 1 - 1) >> 1) + 1 => 7
Input Character Set: a-z (R= 26)
Output Character Set: a-zA-Z (Z = 52)
Minimum Input String: Length 5 (Controlled by NIST formulas)
n = 265 – 1 => 11,881,375
d = 52(5-1) => 7,311,616
x = truncate(11,881,375 / 7,311,616) => 1
s = truncate(log2(1)) + 1 => 1
Y = (truncate(52 - 1 - 1) >> 1) + 1 => 26
Summary
Configuring structured datasets correctly is crucial for maintaining both security and usability. By carefully selecting your Input Character Set, Output Character Set, and Minimum Input Length, you can optimize the number of key rotations available for encryption.
To ensure strong encryption:
- Follow NIST guidelines to meet minimum security requirements.
- Maximize your Output Character Set to allow more key rotations.
- Ensure at least four key rotations to maintain strong encryption practices.
If you need help navigating complex dataset configurations or optimizing your encryption settings, our team is here to assist. Feel free to reach out for expert guidance!
Updated 9 days ago