Concepts

Introduction to Concepts

As you embark on your journey with Ubiq, you’ll encounter various terms, methodologies, and concepts that form the backbone of our solution and data encryption in general. To help you better understand and navigate through these, we’ve created this “Concepts” section. Here, we’ve explained core ideas like structured and unstructured data, encryption keys, key rotation, and data re-keying, among others. These concepts will be referenced throughout our documentation and your user experience. Familiarizing yourself with them will enable you to leverage Ubiq to its full potential and ensure your data is encrypted, stored, and managed securely and efficiently. Happy exploring!

Structured Data

Structured data refers to any data that is organized and formatted in such a way that it's easily searchable by simple, straightforward search engine algorithms or other search operations.

At its core, structured data is data that is arranged according to a defined model or schema. It is typically tabular with columns and rows where each column represents a certain variable (like an ID, name, or timestamp), and each row corresponds to a certain record.

Common examples of structured data include relational databases (like SQL), where data is organized into tables, and CSV files, where data is presented in clear, delimited text format. Structured data is great for queries that need precise, complex conditionals because the schema is consistent across all records, allowing for accurate and speedy data retrieval.

While structured data is highly organized and easily searchable, it does come with its limitations. One of the most important constraints developers often face is related to the specific character and length limitations of structured data. This is largely due to the nature of how and where the data is stored.

For instance, if a database column is configured to only support 12 characters, any data that exceeds this limit cannot be stored in that column. This restriction necessitates careful planning and structuring of data, especially when dealing with larger text fields or unique identifiers.

Let's delve into a few real-world examples of structured data, focusing on sensitive information that requires careful handling and robust security measures.

  1. Credit Card Information: This includes data like card number, cardholder name, expiry date, and CVV. Given the highly sensitive nature of this information, it is often stored in highly structured, encrypted formats. For example, a card number field in a database might have a specific character limit to accommodate a standard 16-digit credit card number.
  2. Customer Information: This is another form of structured, sensitive data which includes fields like name, address, contact number, and email. Each of these fields will have their own unique data and character limitations, depending on the system's specifications.
  3. Health Records: Medical records often contain structured data such as patient IDs, timestamps for visit, diagnosis codes, treatment codes, and more. Each of these fields requires stringent structuring rules, including character limitations, to ensure accurate, secure, and efficient data management.

In each of these cases, structured data allows for easy data querying and manipulation while also facilitating strict control over data format and size, which is especially important when handling sensitive data.

Due to nature of structured data, we recommend using structured data encryption, which leverages deterministic encryption techniques to protect the sensitive data, while still making the data searchable.

Unstructured Data

Unstructured data refers to data that doesn't conform to a specific data model or isn't organized in a pre-defined way. This data can be either textual or non-textual and is generally more challenging to process, analyze, and interpret than structured data due to its irregular and complex formats.

Examples of unstructured data commonly found in a business setting include:

  1. Business Documents: Files like Word documents, PDFs, and Excel spreadsheets (particularly when used for non-tabular data or mixed content) represent unstructured data. For instance, a company policy document in a PDF format is unstructured data - while the content is valuable, it isn't readily searchable or analyzable without additional processing.
  2. Images and Multimedia: This category includes graphics used for business branding, images within reports, or promotional videos. While these files contain valuable information, they don't adhere to a conventional, structured data model.
  3. IoT Sensor Data: Internet of Things (IoT) devices generate a large amount of unstructured data. For example, a weather station might produce data about temperature, humidity, wind speed, and more. This data, while incredibly valuable for trend analysis and prediction, is considered unstructured because it doesn't adhere to a pre-defined structure. Typically, this data must undergo processing to transform it into a structured form for easier analysis.
  4. Medical Imaging Data: In the healthcare sector, medical images like X-rays, MRIs, and CT scans are non-textual unstructured data. These images are vital for diagnosis and treatment but aren't easily categorized or analyzed without specific tools and software.
  5. Audio/Video Files: Customer service call recordings or security camera footage are other examples of unstructured data. They can provide valuable insights but require specialized processing to transcribe or analyze the content.

While the processing and analysis of unstructured data might be challenging, with the right approach and tools, it can offer valuable insights that might not be captured with structured data alone.

Due to nature of unstructured data, we recommend using unstructured data encryption, which leverages randomized encryption techniques to protect the sensitive data.

Dataset and Dataset Group

A Dataset in the Ubiq UI is a logical structure representing data, classified into two categories:

  1. Structured: This refers to data stored in a database column of fixed length and type, such as names, addresses, or social security numbers.
  2. Unstructured: This encompasses files like audio, video, PDFs, text, etc., stored in unstructured data storage, such as AWS S3, Google Cloud Storage, or a Data Lake.

Datasets allow a flexible and logical representation of various data elements and types for encryption within an application.

A Dataset Group in the Ubiq UI visually clusters different Datasets sharing specific attributes, enabling efficient management and tracking. A Dataset can belong to multiple Dataset Groups. However, Datasets with the same name cannot coexist within a single Dataset Group.

Symmetric Encryption Algorithm

An encryption algorithm, specifically discussing symmetric encryption in this context, is a set of mathematical procedures that converts plaintext data into a scrambled ciphertext, thereby ensuring the data's confidentiality. The same key is used to both encrypt and decrypt data, ensuring that only those possessing the correct key can decipher the ciphertext back into its original plaintext. Symmetric encryption algorithms are a fundamental pillar of data security, particularly when transmitting data over insecure networks or storing sensitive information.

There are numerous symmetric encryption algorithms available, each with different strengths and considerations:

  1. AES-256-GCM*: The Advanced Encryption Standard (AES) with a 256-bit key size in Galois/Counter Mode (GCM) is a widely-used symmetric encryption algorithm. It offers strong encryption and includes built-in authentication, ensuring the integrity of both the encrypted data and any associated data.

  2. FF1*: Format-Preserving Encryption (FPE) FF1 is an encryption methodology where the output (the ciphertext) is in the same format as the input (the plaintext). This is particularly valuable when the validity of data formats needs to be maintained, such as encrypting credit card numbers into other valid credit card numbers.

  3. Blowfish: Blowfish is a symmetric block cipher that can be used as a drop-in replacement for DES or IDEA. It takes a variable-length key, from 32 bits to 448 bits, making it ideal for both domestic and exportable use. Blowfish is known for its incredible speed and effectiveness.

  4. Twofish: Twofish is a symmetric key block cipher with a block size of 128 bits and key sizes up to 256 bits. It's related to the earlier Blowfish algorithm and was one of the five finalists of the Advanced Encryption Standard contest, but it was not selected for standardization. Twofish is considered to be among the fastest of encryption algorithms and is free for any use.

    *Currently supported by Ubiq

When using Ubiq, users can easily switch between different encryption algorithms as per their specific needs, just by updating a setting in our UI. This functionality means that no changes are required to their applications or previously encrypted data.

This ease of use is particularly valuable in the context of quantum readiness. As the world prepares for the advent of quantum computing, there is an urgent need to develop quantum-resistant algorithms, which are algorithms that remain secure against potential quantum computing threats. Once NIST approves these quantum-resistant algorithms, users of our encryption solution will be able to painlessly update their encryption settings, further safeguarding their data without disrupting their existing systems or workflows. This flexibility offers users peace of mind knowing they can readily adapt to evolving security standards and requirements.

Some additional thoughts:

Advanced Encryption Standard (AES) is currently the industry standard encryption algorithm used worldwide. It has key sizes of 128, 192, and 256 bits, with AES-256 providing the highest level of security. It's used in many protocols such as HTTPS, SSH, IPsec, and is even approved by the National Security Agency (NSA) for encrypting top-secret information.

ChaCha20 is a stream cipher that, along with its associated authenticated encryption construction (ChaCha20-Poly1305), is getting increased attention in applications due to its high speed and security. It's used in some versions of TLS and in secure protocols like WireGuard.

3DES (Triple DES) is an older encryption standard that applies the older DES algorithm three times to each data block. While still used in some systems, it's generally being phased out due to its lower security relative to newer algorithms and its relatively slow speed.

Blowfish and Twofish are both recognized symmetric key block cipher algorithms, known for their speed and effectiveness. However, the suitability and safety of these algorithms largely depend on the context and specific requirements of their usage. Let's delve into each one a bit more:

  1. Blowfish: Developed in 1993, Blowfish has been considered secure for many applications. However, it has a block size of 64 bits, which can present security concerns for applications requiring encryption of large amounts of data. Furthermore, better alternatives such as Twofish and AES are now available.
  2. Twofish: An evolution of Blowfish, Twofish has a block size of 128 bits, making it suitable for encryption of larger data sets. It was a finalist in the NIST's Advanced Encryption Standard (AES) contest, where it was well-regarded for its security and speed. It didn't win the contest, but it's still considered a robust and secure algorithm for many applications.

While these algorithms are secure in many cases, it's always important to consider the specific needs and requirements of your application. The Advanced Encryption Standard (AES), particularly with a 256-bit key size, has generally superseded these older algorithms and is currently the most widely accepted and secure standard for data encryption.

Primary Encryption Key

A primary encryption key, which is also commonly referred to as a master encryption key, root key, or key derivation key, is a critical component in a data encryption scheme. It is a cryptographic key that is used to encrypt other keys, usually referred to as data encryption keys or DEKs.

The primary encryption key serves as the primary key in the key hierarchy and is stored securely within our key management infrastructure inside of tamper-proof and FIPS 140-2 compliant hardware security modules (HSM), and in a completely separate location from the data and the data encryption keys it protects. Its primary purpose is to add an extra layer of security in the encryption infrastructure.

In a typical encryption process, the actual data is encrypted with a data encryption key. This key is then further encrypted with the primary encryption key. This layered approach provides two primary benefits:

  1. Enhanced Security: Since the data encryption key, which is directly used for encrypting and decrypting data, is itself encrypted with the primary encryption key, an extra layer of security is added. Even if a malicious actor gains access to the encrypted data and the data encryption key, they cannot decrypt the data without the primary encryption key.
  2. Key Management: primary encryption keys simplify the process of key management. Instead of having to securely store and manage every data encryption key, only the primary encryption key needs to be stringently protected. Data encryption keys can be created as needed, and when they are no longer needed, they can be safely discarded without affecting other data encryption processes, since the primary encryption key remains secure and unchanged.

It’s important to note that a primary encryption key does NOT directly create other data encryption keys, but it plays a crucial role in their lifecycle.

Here's a simplified summary of how it works:

  1. Generation: Data encryption keys are generated using a cryptographically secure random process. The primary encryption key doesn't directly "create" these keys, but it provides the basis for their secure usage.
  2. Encryption: Once a data encryption key is generated, it's used to encrypt the actual data. The resulting encrypted data can only be decrypted using the same data encryption key.
  3. Key Encryption: After the data encryption key has done its job of encrypting the data, a primary encryption key or a public key is used to encrypt the data encryption key itself. This adds an extra layer of security. Now, even if someone were to gain unauthorized access to the encrypted data and the encrypted data encryption key, they would still need the primary encryption key or the private key to decrypt the data encryption key, and subsequently, the data.
  4. Storage: The encrypted data and the encrypted data encryption key are then stored, typically in a database or data store (depending on your use case). The primary encryption key is stored separately within our key management infrastructure and is not exportable or accessible, to prevent unauthorized access.
  5. Decryption: When the data needs to be decrypted, the primary encryption key is used to decrypt the data encryption key, which in turn is used to decrypt the actual data.

So while the primary encryption key doesn't directly create data encryption keys, it is critical in their secure usage. It's also essential in key management, as the system only needs to protect the primary encryption key stringently. The data encryption keys, once they are encrypted with the primary key, can be safely stored, even in less secure environments.

Data Encryption Key

A data encryption key (DEK) is a randomly generated key used to encrypt and decrypt data in the process of securing sensitive information. The primary role of the DEK is to convert plaintext data into ciphertext, rendering it unreadable without the correct key.

Data encryption keys are typically generated using a secure random process to ensure that they are as unpredictable as possible, increasing the security of the encryption. Here's a high-level view of how they're used:

  1. Generation: A data encryption key is generated. This process must be cryptographically secure to prevent the key from being predictable.
  2. Encryption: The DEK is used to encrypt the plaintext data. The resulting ciphertext is virtually impossible to convert back into plaintext without the correct DEK.
  3. Primary Key Encryption or Public Key: To further secure the DEK, a primary encryption key, also known as a master key or a root key, or a public key that is part of a public-private key pair may be used to encrypt the DEK. This creates an extra layer of security and ensures that even if someone gains access to the encrypted data and the encrypted DEK, they cannot decrypt the data without also having the primary encryption key or the private key.
  4. Storage: The encrypted data and the encrypted DEK are stored alongside the data, often in a database or data store (depending on your use case).
  5. Decryption: When the encrypted data needs to be accessed, the process is reversed. The primary encryption key decrypts the DEK, which in turn is used to decrypt the data back into plaintext.

DEKs are central to the encryption process, transforming sensitive data into unreadable ciphertext and helping to ensure that even if unauthorized individuals gain access to the encrypted data, they cannot decipher it without the correct keys. Their lifecycle is closely tied to the primary encryption key, which provides an additional layer of security and simplifies key management in the encryption infrastructure.

Key Encryption Key

A Key Encryption Key (KEK) plays a pivotal role in encryption key management. In essence, a KEK’s primary purpose is to protect other encryption keys, primarily Data Encryption Keys (DEKs). The use of KEKs adds an extra layer of security to the encryption process.

There are two generally accepted approaches to “creating” KEKs:

  1. Asymmetric approach: use the public key in a public/private key pair as a KEK
  2. Symmetric approach: leverage a Hardware Security Module (HSM) to create a KEK

For illustrative purposes, consider a scenario where you have sensitive data encrypted with a DEK. The DEK, needed for future data decryption, must be stored or transmitted securely. Storing this DEK in plaintext exposes it to potential unauthorized access. To mitigate this risk, you could use an HSM to create a KEK that is then used to encrypt the DEK. In this setup, only systems authorized to access the HSM can decrypt the KEK and, subsequently, gain access to the DEK. Hence, even if an attacker were to get unauthorized access to the stored keys, they would still need access to the KEK in the HSM to utilize them. Please note, this is a generic illustration and not indicative of Ubiq's specific approach.

At Ubiq, we adapt to the customer's use case and use both symmetric and asymmetric encryption methods to protect DEKs both at rest within the customer's environment and during transit. Furthermore, we secure the transmission channel with Transport Layer Security (TLS) when transmitting between our backend infrastructure and the customer's environment.

Our approach employs symmetric and asymmetric encryption for DEK protection, coupled with TLS for secure transmission. This robust, dual-layered security approach safeguards the integrity and confidentiality of the DEKs throughout the entire process.

Key Rotation

Encryption key rotation is a vital security practice that involves periodically changing encryption keys. By frequently updating keys, you reduce the amount of data that a key has access to and limit the potential damage if a key is compromised.

Key rotation comes in two primary forms: primary encryption key rotation and data key rotation.

Primary Encryption Key Rotation

The primary encryption key, sometimes referred to as a root key, is the central key used to protect data keys. Rotating the primary key means generating a new primary key and re-encrypting existing data keys with the new primary key.

Primary key rotation doesn't involve re-encrypting the data itself but only the data keys, making the process relatively quick and less resource-intensive. The primary advantage is that even if a primary key is compromised, once it's rotated, the compromised key can no longer decrypt the data keys, effectively safeguarding data encrypted under those data keys.

Data Key Rotation

Data key rotation, on the other hand, involves re-encrypting the actual data. Each piece of data or a set of data is encrypted with a unique data key, which is then encrypted with the primary key. When a data key is rotated, a new key is generated, and the data is re-encrypted with this new key.

Data key rotation is more resource-intensive than primary key rotation, as it involves re-encrypting potentially large volumes of data. However, it can enhance security by limiting the amount of data accessible with a single key.

Key Rotation and PCI-DSS

The Payment Card Industry Data Security Standard (PCI-DSS), which sets the security standards for businesses that handle card payments, includes key rotation as part of its requirements. Specifically, Requirement 3.6.4 of PCI-DSS states that cryptographic keys must be changed at the end of their defined cryptoperiod, which is a timespan during which a specific key is authorized for use.

The defined cryptoperiod for a key is influenced by the sensitivity of the data, the potential risks, and the security controls in place. The PCI-DSS does not specify an exact timeline for key rotation, but it's generally recommended to perform key rotation at least annually.

Data Re-Keying

Data re-keying is another important aspect of managing encryption keys. It refers to the process of decrypting data that was encrypted with an old key and then re-encrypting that same data with a new key. This process is also commonly referred to as key rotation.

The primary motivation behind data re-keying is to limit the potential damage if an encryption key is compromised. By periodically changing the keys used to encrypt data, you limit the amount of data that could be decrypted with a compromised key.

Data re-keying can also be used as part of an update process. For instance, if a newer, more secure encryption algorithm becomes available (e.g. quantum-resistant algorithms), data that was encrypted with an old algorithm can be re-keyed using the new algorithm. This is an essential aspect of maintaining data security as cryptographic techniques and standards evolve.

Use Case: Re-keying as part of a Data Breach

In the event of a confirmed or suspected data breach, or if there's evidence suggesting that an attacker has accessed or compromised your data or encryption keys, it's highly recommended to perform data re-keying. This process involves decrypting the affected data that was encrypted with the compromised key and then re-encrypting it with a new, secure key.

Re-keying in this context helps to mitigate potential damage by ensuring that the compromised keys can no longer be used to access more data than has already been exposed. This proactive approach enhances your overall data security, restricts the access of unauthorized users, and aids in the recovery process following a data breach.

EncryptForSearch

EncryptForSearch is a technical process that allows searching within a database for data that has been encrypted, such as sensitive information like an employee’s credit card number. To perform these searches, the method EncryptForSearchAsync() is employed. This method takes an original value (e.g., a credit card number) and generates a set of all possible encrypted values for that original value, considering various encryption keys that might have been used over different time periods.

Example:

Consider a credit card number “1234 5678 9012 3456”, Initially, this credit card number was encrypted with Key A, resulting in an encrypted value “EncA 9876 5432 1098". After some time, for security reasons, the encryption key was rotated to Key B, which would theoretically encrypt the credit card number as “EncB 9876 5432 1098”.

Now, if someone needs to search for this particular credit card number in the encrypted database, simply searching for the value “EncA 9876 5432 1098" won’t yield results, as the database now holds the credit card number in the “EncB 9876 5432 1098” form due to the key rotation.

In this situation, the EncryptForSearchAsync() method is used. It takes the original credit card number “1234 5678 9012 3456" and creates a collection of potential encrypted representations based on different key rotations—in this example, it generates “EncA 9876 5432 1098” from Key A and “EncB 9876 5432 1098" from Key B.

Once this set of potential encrypted values is generated by EncryptForSearchAsync(), the database can be queried with each of these values to find matches. Thus, the database would be searched for both “EncA 9876 5432 1098” and “EncB 9876 5432 1098", ensuring that the credit card number can be effectively found, irrespective of which encryption key was used at any point in time.

📘

In a real-world implementation, the encrypted values would likely appear as a seemingly random sequence of characters and numbers, rather than a prefixed and clearly recognizable form as presented in this illustrative example. The purpose of the example is to illustrate how the method might function rather than to depict actual encryption results.