# ACT 400: AES Data Encryption & Decryption with Data Distiller

## Prerequisites

{% content-ref url="act-100-dataset-activation-with-data-distiller" %}
[act-100-dataset-activation-with-data-distiller](https://data-distilller.gitbook.io/adobe-data-distiller-guide/unit-9-data-distiller-activation-and-data-export/act-100-dataset-activation-with-data-distiller)
{% endcontent-ref %}

Download the file:

{% file src="<https://1899859430-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEhcgqFIfGdE0GXJzi5yR%2Fuploads%2FXHEWt8bFSzYNoLPBBX58%2Fhealthcare_customers.csv?alt=media&token=4d5bf0a1-7a1a-4c08-85c9-12728f1ab6e5>" %}

Ingest the data as **`healthcare_customers`** dataset using this:

{% content-ref url="../prep-500-ingesting-csv-data-into-adobe-experience-platform" %}
[prep-500-ingesting-csv-data-into-adobe-experience-platform](https://data-distilller.gitbook.io/adobe-data-distiller-guide/prep-500-ingesting-csv-data-into-adobe-experience-platform)
{% endcontent-ref %}

Also recommended&#x20;

{% content-ref url="act-300-functions-and-techniques-for-handling-sensitive-data-with-data-distiller" %}
[act-300-functions-and-techniques-for-handling-sensitive-data-with-data-distiller](https://data-distilller.gitbook.io/adobe-data-distiller-guide/unit-9-data-distiller-activation-and-data-export/act-300-functions-and-techniques-for-handling-sensitive-data-with-data-distiller)
{% endcontent-ref %}

## **Why Support AES (Advanced Encryption Standard)?**

**AES (Advanced Encryption Standard)** support in **Data Distiller** enhances data security and aligns with industry standards. AES is the most popular symmetric encryption algorithm, widely trusted for its speed, efficiency, and strong security across industries like finance, healthcare, and cloud services. Its ability to encrypt large volumes of data efficiently makes it a superior choice over asymmetric algorithms like **RSA**, which, while highly secure, is slower and typically used for specific tasks like key exchanges and digital signatures rather than large-scale encryption.

Data Distiller includes support for encryption modes like **GCM (Galois/Counter Mode)**, which is the most favored mode due to its dual ability to provide both encryption and data integrity. This makes it ideal for protecting sensitive data in secure communications, cloud storage, and large-scale enterprise operations.

In comparison to asymmetric encryption like RSA, which requires different keys for encryption and decryption, AES uses a single key, making it not only faster but also easier to manage in environments where large amounts of data need to be securely processed and stored. While RSA is excellent for securing small, highly sensitive pieces of data and key exchanges, AES is the gold standard for encrypting bulk data efficiently and securely.

AES support in Data Distiller ensures fast, scalable, secure, and robust data protection needed to meet regulatory standards like **GDPR** and **HIPAA**, while also offering high performance for enterprise use cases.&#x20;

## **AES and Its Encryption Modes in Data Distiller**

**AES (Advanced Encryption Standard)** is one of the most widely used and trusted methods for encrypting data. It’s employed globally to secure sensitive information, from financial transactions to personal communications. AES works by converting plain text data into an unreadable format, known as ciphertext, using a secret key. Only someone with the correct key can decrypt the data back into its original form.&#x20;

AES in Data Distiller comes in 2 different key sizes: **128-bit** and **256-bit**, with the larger 256-bit key providing stronger security. But **AES-256** is the most widely used. It offers the highest level of security with a **256-bit key**, making it ideal for safeguarding sensitive data in industries like finance, healthcare, and government. AES-256 strikes a balance between security and performance, making it the preferred choice for robust encryption needs, especially where long-term data protection is critical.

However, AES doesn’t work alone—it uses different **modes** to encrypt and process data. These modes define how data is broken down and transformed, offering varying levels of security and performance. The three most common modes are **GCM (Galois/Counter Mode)** and **ECB (Electronic Codebook Mode)**, each serving different purposes.

**GCM (Galois/Counter Mode)** is highly regarded for its speed and security. It not only encrypts data but also ensures that it hasn’t been tampered with, making it ideal for secure communications. GCM is especially useful in scenarios where both confidentiality and data integrity are important.

**ECB (Electronic Codebook Mode)** is the simplest and fastest mode, but also the least secure. In ECB, each block of data is encrypted independently, meaning identical pieces of input will result in identical encrypted output. While this makes ECB efficient, it can expose patterns in the data, making it less suitable for sensitive information.

Along with these modes, AES often relies on **padding** to ensure that data fits perfectly into the blocks required for encryption. For example, **PKCS** padding is commonly used to fill gaps when data doesn't perfectly match the block size. In some modes, like **GCM**, padding isn't required, making the encryption process more efficient.

The most popular mode of operation for AES encryption is **GCM (Galois/Counter Mode)**. GCM is widely favored because it provides both **data confidentiality** (encryption) and **data integrity** (authentication) in a highly efficient manner. Its ability to ensure that data hasn't been tampered with while being transmitted, combined with its speed and performance, makes it ideal for modern applications, including secure communications, cloud services, and network encryption. GCM’s versatility and security features have made it the go-to mode in many industry-standard implementations.

Together, AES and its modes offer a versatile set of tools for protecting data in a wide range of scenarios, from high-security communications to everyday data protection. Whether you need speed, security, or flexibility, AES provides the foundation for keeping sensitive information safe.

{% hint style="warning" %}
**CBC (Cipher Block Chaining)** offers strong security by linking each block of data with the previous one. This chaining makes it difficult for an attacker to spot patterns in the encrypted data, even if the input has repeated elements. CBC is slower than GCM due to its sequential nature but is still widely used for its robustness. This feature is yet to be released in Data Distiller.&#x20;
{% endhint %}

{% hint style="warning" %}
**Data Distiller** does not currently support **asymmetric encryption** natively. Asymmetric encryption (which uses a pair of keys: a public key for encryption and a private key for decryption) is not provided as part of the built-in functions in Data Distiller.

Data Distiller primarily supports symmetric encryption functions with **AES (Advanced Encryption Standard)** for data encryption and decryption.&#x20;

If you need asymmetric encryption (e.g., RSA), you would typically need to implement this outside of Data Distiller using external libraries in **Python** or **Java**, or through integration with a third-party encryption service.
{% endhint %}

{% hint style="warning" %}
Since **Data Distiller** supports AES for symmetric encryption, a single secret key is used for both encrypting and decrypting data. This means that the same key must be securely shared between the parties involved in exchanging information. The key is the critical element: anyone who has access to it can decrypt the encrypted data. Therefore, protecting the key itself is essential to maintaining the security of the data.

Symmetric encryption, like AES, is typically faster than asymmetric encryption, making it ideal for efficiently securing large volumes of data. However, this approach requires careful key management to ensure that unauthorized individuals cannot access or compromise the key, as this would undermine the entire encryption process.
{% endhint %}

## AES Encryption Syntax

The generalized syntax is:

```sql
aes_encrypt(expr, key, mode [, padding])
```

* **`expr`**: The data to be encrypted.
* **`key`**: The binary key (use `UNHEX()` for hexadecimal key).
  * **16 bytes** for **AES-128**.
  * **32 bytes** for **AES-256**.
* **`mode`**: Encryption mode (case-insensitive).
  * `'ECB'`: Electronic CodeBook mode.
  * `'GCM'`: Galois/Counter Mode (default mode).
* **`padding`** (optional): Padding scheme (case-insensitive).
  * `'NONE'`: No padding (for `'GCM'` mode only).
  * `'PKCS'`: Public Key Cryptography Standards padding (for `'ECB'` mode).
  * `'DEFAULT'`: Uses `'NONE'` for `'GCM'` and `'PKCS'` for `'ECB'.`

## AES Descryption Syntax

The generalized syntax is:

```sql
aes_decrypt(expr, key, mode [, padding])
```

* **`expr`**: The binary data to be decrypted (typically stored as hex, so use `UNHEX()`).
* **`key`**: The binary key (use `UNHEX()` for hexadecimal key).
  * **16 bytes** for **AES-128**.
  * **32 bytes** for **AES-256**.
* **`mode`**: Decryption mode (must match the encryption mode).
  * `'ECB'`: Electronic CodeBook mode.
  * `'GCM'`: Galois/Counter Mode (default mode).
* **`padding`** (optional): Padding scheme (must match the encryption padding).
  * `'NONE'`: No padding (for `'GCM'` mode only).
  * `'PKCS'`: Public Key Cryptography Standards padding (for `'ECB'` modes).
  * `'DEFAULT'`: Uses `'NONE'` for `'GCM'` and `'PKCS'` for `'ECB'`.

## **Understanding GCM and ECB Modes**

**GCM** and **ECB** are different methods (or modes) of encrypting data. **GCM (Galois/Counter Mode)** is like locking your data with a secure padlock, but with an additional layer of protection to ensure that no one has tampered with it. This mode not only encrypts the data but also verifies its integrity, making it highly secure and fast. It is often used for secure communication, where speed and data integrity are critical.

**ECB (Electronic Codebook Mode)** treats each chunk of data the same way, without any chaining. It’s like putting each letter of a message in the same type of envelope, without considering the surrounding letters. This makes ECB fast but predictable, as identical chunks of data will produce identical encrypted output. Because of this, ECB is considered less secure than GCM since it can reveal patterns in the data.

### **What is Padding**

In encryption, **padding** refers to filling in extra spaces when the data doesn’t perfectly fit the required block size (usually 16 bytes). Imagine you have a box that fits exactly 16 letters, but your message is only 13 letters long. Padding is like adding extra filler to make the message fit perfectly.

**PKCS (Public Key Cryptography Standards)** is a widely used method for padding. It adds extra characters to fill the gaps, making sure the data fits the block size. When the data is decrypted, the system knows how to remove the padding. In contrast, **NONE** means no padding is added, which only works if the data already fits the block size perfectly. This is commonly used in **GCM mode**, where padding isn’t required.

{% hint style="warning" %}
**AAD (Additional Authenticated Data)** is a feature in **GCM mode** that allows you to include extra information (such as metadata) alongside your encrypted data. This extra information isn’t encrypted, but it is part of the secure process and helps ensure that the message hasn't been tampered with. Think of it as adding an extra label on a package, indicating who sent it or when it was sent. While the label itself isn’t hidden, it’s essential to verify that the information hasn’t been altered. AAD is useful in situations where the integrity of this additional information is important for verifying the authenticity of the message. <br>

This feature is yet to be released in Data Distiller.
{% endhint %}

## Key Generation

**AES** is a type of **symmetric encryption**. In symmetric encryption, the same key is used for both **encrypting** and **decrypting** data. This means that the person or system encrypting the data and the one decrypting it must both have access to the same secret key. Since AES is symmetric, the security of the system depends on keeping the key confidential. If someone gains access to the key, they can both encrypt and decrypt the data. Before using these functions, you will need to generate a key, securely track it, and store it in a secure vault.

{% hint style="warning" %}
The key should be kept in a **secure key management system (KMS)** or a **hardware security module (HSM)**. These systems are designed to securely store, manage, and control access to encryption keys, preventing unauthorized access. Popular cloud providers like AWS, Google Cloud, and Azure offer managed KMS services, which automate the secure storage and handling of keys. By using a KMS or HSM, you can ensure that the key is protected, access is tightly controlled, and audit logs are maintained for compliance with security standards.
{% endhint %}

### Generate a 16-Byte Key

{% code overflow="wrap" %}

```sql
-- Generate a random 16-byte key (32 hexadecimal characters)
SELECT 
  UPPER(SUBSTRING(SHA2(CAST(RAND() AS STRING), 256), 1, 32)) AS generated_16_byte_key;

```

{% endcode %}

{% hint style="info" %}
The query above generates hexadecimal characters, but the ae&#x73;**`_encrypt`** and **aes`_decrypt`** functions require binary values. Therefore, you need to use the **unhex`(generated_16_byte_key)`** function in Data Distiller to convert the hexadecimal key into the required binary format
{% endhint %}

### Generate a 32-Byte Key

{% code overflow="wrap" %}

```sql
-- Generate a random 32-byte key (64 hexadecimal characters)
SELECT 
  UPPER(SHA2(CAST(RAND() AS STRING), 256)) AS generated_32_byte_key;
```

{% endcode %}

{% hint style="info" %}
The query above generates hexadecimal characters, but the ae&#x73;**`_encrypt`** and **aes`_decrypt`** functions require binary values. Therefore, you need to use the **unhex`(generated_24_byte_key)`** function in Data Distiller to convert the hexadecimal key into the required binary format
{% endhint %}

## AES-256 Encryption & Decryption with GCM (Default Mode, No Padding)

Let us demonstrate how the encryption and decryption works. Note that we will be using the HEX function and `CAST` functions for the purpose of displaying the results i.e. binary values cannot be displayed in the **Data Distiller Query Pro Mode Editor**. You should remove them when using these to functions:

{% code overflow="wrap" %}

```sql
WITH EncryptedData AS (
    -- Step 1: Encrypt the email and convert the encrypted binary data into a readable hex string
    SELECT
        customer_id,
        HEX(AES_ENCRYPT(email, UNHEX('6BB8E32DB365D1953C95377C547330B52FAF9C35C9350A2BA1FC5CB4651D28E9'))) AS encrypted_email_hex
    FROM
        healthcare_customers
)
-- Step 2: Decrypt the encrypted email and cast it back to STRING
SELECT
    customer_id,
    encrypted_email_hex,  -- Display encrypted email as hex string
    CAST(AES_DECRYPT(UNHEX(encrypted_email_hex), UNHEX('6BB8E32DB365D1953C95377C547330B52FAF9C35C9350A2BA1FC5CB4651D28E9')) AS STRING) AS decrypted_email
FROM
    EncryptedData;

```

{% endcode %}

The result should be:

<figure><img src="https://1899859430-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEhcgqFIfGdE0GXJzi5yR%2Fuploads%2FoG7lDAaa7A2B69XmRoqP%2FScreen%20Shot%202024-10-06%20at%2012.52.33%20PM.png?alt=media&#x26;token=8fc9afbd-66d3-4fc6-835b-2446b0d1fd24" alt=""><figcaption><p>Demonstration of AES encryption and decryption in Data Distiller</p></figcaption></figure>

## AES-256 Encryption & Decryption with ECB Mode and PKCS Padding

{% code overflow="wrap" %}

```sql
WITH EncryptedData AS (
    -- Step 1: Encrypt email using AES-256 with ECB mode and PKCS padding
    SELECT
        customer_id,
        HEX(AES_ENCRYPT(email, UNHEX('6BB8E32DB365D1953C95377C547330B52FAF9C35C9350A2BA1FC5CB4651D28E9'), 'ECB', 'PKCS')) AS encrypted_email_hex
    FROM
        healthcare_customers
)
-- Step 2: Decrypt the encrypted email using the same key, mode, and padding
SELECT
    customer_id,
    encrypted_email_hex,
    CAST(AES_DECRYPT(UNHEX(encrypted_email_hex), UNHEX('6BB8E32DB365D1953C95377C547330B52FAF9C35C9350A2BA1FC5CB4651D28E9'), 'ECB', 'PKCS') AS STRING) AS decrypted_email
FROM
    EncryptedData;
```

{% endcode %}

<figure><img src="https://1899859430-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEhcgqFIfGdE0GXJzi5yR%2Fuploads%2F21d1ZxI6KLEqVUDKLC8m%2FScreen%20Shot%202024-10-06%20at%201.51.13%20PM.png?alt=media&#x26;token=5f39fded-9a52-41f1-8678-8a9f07b9704a" alt=""><figcaption><p>Demonstration of AES encryption and decryption in Data Distiller</p></figcaption></figure>

## The Genius of Galois: His Math Powers Modern Encryption

<figure><img src="https://1899859430-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEhcgqFIfGdE0GXJzi5yR%2Fuploads%2FiSjdYuonN0cklNgOU7vj%2Fimage.png?alt=media&#x26;token=d82fdb40-c118-4c22-9c34-a2cd3541558b" alt="" width="267"><figcaption><p>Galois</p></figcaption></figure>

**GCM (Galois/Counter Mode)** is a mode of operation for encryption that ties back to the innovative work of mathematician **Évariste Galois**, whose contributions to abstract algebra, specifically **Galois fields**, play a pivotal role in how GCM operates.

What makes GCM special—and really **cool**—is that it combines both encryption and **authentication** in a highly efficient way, ensuring not only that data is protected, but also that it hasn’t been tampered with during transmission. This dual capability is crucial for modern data security.

At the heart of GCM's strength is its use of **Galois fields**, a concept developed by Galois in the 19th century, *which involves operations on finite sets of numbers*. In GCM, these fields enable fast and secure mathematical operations that verify data integrity while keeping the encryption itself highly efficient.

What’s particularly cool about this is that Galois, who tragically died young, couldn’t have foreseen how his abstract work in algebra would one day become foundational in securing digital communications in the 21st century. By leveraging the power of Galois fields, **GCM mode** manages to be both faster and more secure than many other encryption modes, making it a go-to solution for protecting sensitive data, especially in high-performance environments like cloud computing and secure messaging.

So, when using **AES with GCM mode**, you’re benefiting from the mathematical genius of Galois—applying 19th-century mathematics to cutting-edge digital encryption!
