Protecting Sensitive Data in Cloud Environments (AWS, Azure, GCP) and Data Platforms

Executive Summary

Enterprises have moved critical systems to cloud infrastructure (AWS, Azure, GCP) and consolidated sensitive data into centralized platforms such as Snowflake, Databricks, BigQuery, and Redshift. These environments now power applications, analytics, and AI. They also introduce a structural security problem that most organizations underestimate.

Sensitive data is no longer just stored in the cloud. It is continuously processed inside execution environments that organizations do not fully control.

Encryption is widely assumed to solve this. It does not.

In most architectures, data is encrypted at rest, keys are managed through cloud KMS or BYOK models, and the platform invokes those keys during execution. The moment data is queried, joined, or processed, it is transformed into usable form inside the same environment. The system behaves exactly as designed, and in doing so, becomes capable of revealing the data it is supposed to protect.

This is not a failure of cryptography. It is a failure of architecture.

The issue is simple. If the same environment can store data, access keys, and execute decryption, then that environment is a point of compromise.

This problem is most acute in cloud and managed environments where organizations still control data architecture but not execution, including CSP-based applications, integrator-operated systems such as Infosys-managed platforms, and modern data platforms. It is not primarily a SaaS problem. It is an execution control problem.

Regulators are beginning to reflect this shift. The distinction is no longer just key ownership (BYOK). It is whether the environment processing the data can independently reveal it. That distinction separates compliance posture from actual security.

This paper explains why this condition exists, how it manifests in real systems, and what it takes to remove it.

The Real Scope of the Problem

This paper is intentionally focused on environments where the organization still has meaningful control over architecture and data handling. That includes applications deployed on AWS, Azure, or GCP, managed or integrator-operated platforms such as Infosys-managed systems, and modern analytics or AI data platforms such as Snowflake and Databricks.

These are the environments that matter because they sit in an uncomfortable middle ground. They are not pure infrastructure in the old sense, because the organization does not directly own or operate every layer. But they are not fully closed SaaS either, because the organization still makes meaningful design decisions about how data moves, how it is queried, how users interact with it, and where controls are applied.

That middle ground creates false confidence. Teams think, correctly, that they control schemas, queries, pipelines, and access policies. They then infer, incorrectly, that this means they control the exposure of sensitive data during execution. They do not.

In a cloud-native application or data platform, logical control and execution control are different things. The customer defines intent. The platform executes that intent. During execution, the platform must gain access to usable data. The system, not the customer, becomes the point at which sensitive data is transformed from protected form into plaintext.

That is why this problem is so dangerous. It hides inside normal system behavior.

Why Traditional Encryption Does Not Remove the Risk

Most organizations start from a familiar mental model. Encrypt the data, control the keys, enforce role-based access, and the problem is solved. That model works well for protecting data at rest or in transit. It does not fully work when the same environment that stores the data is also responsible for using it.

The reason is straightforward. Computation requires data to be usable. Queries cannot run on ciphertext unless the system is specifically designed for that mode of operation. Dashboards, joins, machine learning pipelines, ETL workflows, customer-facing applications, and reporting engines all expect the platform to access usable values at some point during processing.

This means that the execution environment needs some combination of the following:

access to the protected data,
access to a key or key usage path,
the ability to decrypt or transform the data into usable form.

Once those three conditions exist inside the same environment, encryption stops being a control against that environment. It remains a control against outsiders or unauthorized storage access, but it no longer protects data from the system that is legitimately designed to use it.

That is why so many cloud security conversations get stuck in the wrong place. They focus on whether data is encrypted, when the more important question is whether the execution environment can independently reveal it.

BYOK, BYOE, and the Difference Between Ownership and Separation

This distinction is why regulatory language around cloud security has become more nuanced. A useful way to frame the issue is the difference between key ownership and control separation.

BYOK improves ownership. The enterprise originates or controls the keys, which is helpful for governance, audit posture, lifecycle management, and in some cases revocation. But in most implementations, BYOK does not move decryption outside the cloud or data platform execution path. The platform can still use those keys, directly or indirectly, to process data. In practical terms, the execution environment still has access to both the data and the decryption mechanism.

BYOE points in a stronger direction. Encrypting data before it enters the cloud or platform begins to create actual separation. The cloud provider or platform no longer receives plaintext by default. But even BYOE can stop short if decryption is later reintroduced inside the same environment through application workflows, API mediation, or loosely governed service paths.

The lesson is simple. Owning the keys is not the same thing as preventing the platform from using them. And moving encryption earlier in the data flow is not the same thing as guaranteeing that plaintext never becomes available inside the environment.

The only model that fully changes the risk equation is one in which the environment processing the data cannot independently decrypt it.

Threat Modeling the System Properly

A useful threat model does not begin with a laundry list of generic attacker ideas. It begins with an explicit model of the system, its trust zones, and the boundary crossings that matter. Once the structure of the system is clear, the threat paths become much easier to reason about.

System Model

The system in scope is a modern cloud and data platform environment in which sensitive structured data is stored and used for operational or analytical purposes. The relevant components are the storage layer, the execution layer, the key control layer, and the user and service interfaces that trigger workloads.

The storage layer contains the protected records. The execution layer includes query engines, compute clusters, application runtimes, and processing jobs. The key control layer includes KMS services, external key providers, encryption policy services, or other mechanisms used to authorize transformation of data into usable form. Above all of that sit the interfaces through which users, applications, services, and administrators interact with the platform.

This relationship can be represented at a high level as follows:

Layer	Function	Typical Examples	Security Relevance
Data Storage Layer	Stores protected records	S3, cloud databases, Snowflake storage, Delta Lake	Holds high-value data at scale
Execution Layer	Runs queries, jobs, joins, transforms, model pipelines	Databricks compute, Snowflake query engine, application runtime	Where plaintext is exposed during use
Key Control Layer	Authorizes or enables transformation into usable form	KMS, BYOK integration, external key service, policy engine	Determines whether decryption can occur
Access and Control Layer	Triggers workloads, administers systems, configures policies	SQL clients, APIs, notebooks, IAM roles, support tools	Defines who can act through the system

The most important observation is that these layers are logically separate but operationally linked. During execution, they converge.

Assets

The threat model is about protecting specific things of value, not abstract “data.”

Asset	Description	Why It Matters
Sensitive Fields	PII, PAN, account numbers, balances, regulated financial records	Direct regulatory and business impact if exposed
Derived Sensitive Results	Query outputs, aggregates, joined datasets, model features	Often more revealing than source data alone
Key Usage Authority	Ability to invoke decryption or transformation operations	Equivalent to decryption power in many systems
Access Policies	Rules that determine which identities may access usable data	Weak policy turns control separation into fiction
Processing Context	Runtime memory, temporary files, intermediate tables, caches, logs	Common leakage points during legitimate execution

These assets matter because attackers do not always need the original table. In many cases, intermediate outputs, cached results, enriched datasets, or repeated query access are just as damaging.

Trust Zones

The system must be divided into trust zones to show where responsibility and control actually differ.

Trust Zone	Controlled By	What Lives There	Why the Zone Matters
Customer Logic Zone	Enterprise	Schemas, query intent, application code, business rules	Defines what should happen
Platform Execution Zone	CSP, platform provider, shared runtime	Query engine, compute nodes, application execution, temporary state	Defines what actually happens during processing
Key Authority Zone	Customer, cloud provider, or external control service	Key material, key invocation path, policy service	Determines whether data can become usable
Administrative / Operational Zone	Internal ops, vendor ops, support teams, automation	Debug access, notebooks, consoles, support tooling, orchestration systems	Common source of privileged exposure paths

The trust model begins to break down when the Platform Execution Zone and the Key Authority Zone are effectively fused during runtime. That fusion can happen even if the customer “owns” the keys on paper.

Trust Boundaries

A threat model becomes useful when it shows the exact boundaries where risk changes.

Boundary ID	Boundary Crossing	What Changes at This Point	Why It Is Critical
B1	Protected data enters execution workflow	Stored data becomes available to processing engine	Processing context is created
B2	Execution environment invokes key usage	System gains ability to transform data into usable form	Decryption authority is activated
B3	Plaintext or usable values exist in runtime	Sensitive values become observable in memory, temp state, output buffers, logs, caches	Exposure becomes possible without “breaking” encryption
B4	Results are returned or materialized	Sensitive or derived sensitive data exits core processing path	Amplifies blast radius through output channels
B5	Administrative tools interact with runtime or output	Operators can inspect state, replay jobs, or observe results	Privileged access path becomes high impact

These boundaries matter more than static architecture diagrams because they show where control is lost. Data is not compromised merely because it is stored in the cloud. It becomes vulnerable when the system crosses from protected storage into usable execution.

Threat Actors and Their Real Capabilities

A useful model also needs realistic attacker types. Not every actor needs to steal keys or compromise the entire stack. Many only need to exploit the fact that the system is already designed to reveal plaintext during normal operation.

Threat Actor	Realistic Capabilities	What They Usually Need	Why They Are Dangerous
Privileged Internal User	Can run broad queries, access notebooks, inspect outputs	Legitimate platform access	Can extract large amounts of sensitive data without breaking controls
Platform Operator or Integrator	Can troubleshoot workloads, inspect execution state, access support tooling	Operational privileges	May observe or access data during processing
Cloud Administrator Equivalent	Can influence infrastructure, storage, snapshots, runtime state, or service behavior	Infra-level privilege or compromise	Can potentially access system behavior below normal app controls
Compromised Workload or Service Principal	Can execute code inside trusted environment	Application compromise, notebook compromise, stolen token	Can use the system as designed to obtain plaintext
External Attacker with Partial Access	Can pivot from app/API, abuse permissions, exfiltrate outputs	Access to workload, credentials, or misconfigured role	Often needs far less than full system takeover

This is why “but our admins are trusted” is not a complete answer. The issue is not just intent. It is concentration of capability.

Threat Scenario Matrix

The following matrix is the core of the threat model. It ties together the system, the actors, the trust boundaries, and the actual failure modes.

Scenario ID	Threat Scenario	Primary Actor	Boundary Crossed	What the Actor Exploits	Result	Why Encryption Alone Fails
T1	Legitimate query returns sensitive plaintext or sensitive derived results	Privileged internal user	B1, B2, B3, B4	Normal query path and authorized execution	Exposure of raw or derived sensitive data	The platform decrypts during normal processing
T2	Operator or integrator inspects runtime, temp state, or job outputs during support activity	Platform operator / integrator	B2, B3, B5	Operational tooling and privileged access	Plaintext exposure during debugging or support	Data is already usable inside execution layer
T3	Compromised notebook, job, or service principal runs inside trusted platform context	Compromised workload	B1, B2, B3, B4	Existing execution permissions and platform trust	Bulk data extraction or silent exfiltration	Encryption does not protect against trusted runtime
T4	Mis-scoped role or policy allows broad access to decrypted outputs	Internal user or compromised identity	B2, B4	Over-permissioned access policy	Large-scale exposure through legitimate interfaces	Key ownership is irrelevant if policy allows broad use
T5	Platform or cloud-level access observes memory, snapshots, caches, or intermediate state	Cloud or platform privileged actor	B3, B5	Infra or operational visibility below app layer	Exposure of plaintext during processing	Plaintext must exist somewhere to enable computation
T6	Sensitive output is materialized into downstream tables, features, dashboards, or logs	Internal user, service, or downstream system	B4	Normal output channels and secondary storage	Persistent spread of sensitive information	Encryption at source does not constrain derived outputs
T7	Attacker compromises application path that can request and return sensitive data	External attacker with partial foothold	B1, B2, B4	Existing app logic and decryption workflow	Data theft without key theft	The app acts as the decryption proxy

The pattern is consistent. The attacker does not usually need to “steal the keys” in the traditional sense. The attacker only needs access to the system or workflow that can already use them.

Threat Analysis and Architectural Findings

The value of the matrix is not the number of scenarios. It is the pattern it reveals.

First, the execution environment is the decisive control point. That is where protected data is transformed into usable data. If that environment can independently perform that transformation, then it becomes the point of compromise for nearly every realistic scenario.

Second, the system’s normal features are the attacker’s best tools. Queries, notebooks, dashboards, processing jobs, support consoles, and runtime diagnostics all exist for legitimate business reasons. That is precisely why they are so hard to defend once the platform has decryption authority.

Third, output paths are often more dangerous than source tables. Sensitive information rarely remains in one place. Once decrypted data is queried, joined, aggregated, or materialized, it proliferates into downstream results, temporary tables, feature stores, logs, dashboards, extracts, and caches. This means the blast radius is often much larger than teams expect.

Fourth, policy alone is not enough. Identity and access management are necessary, but they do not fix the core problem if the system itself can reveal plaintext broadly whenever policy allows it. A single over-broad entitlement, a compromised token, or an abused runtime becomes sufficient to expose data at scale.

The threat model therefore points to one architectural conclusion: as long as the same environment both accesses the data and can independently decrypt or transform it into usable form, that environment remains a systemic point of compromise.

What Security Properties Are Actually Required

The answer is not “more encryption” in the abstract. The answer is a system design that changes the threat model itself.

A secure architecture for cloud and data platforms needs to satisfy several specific properties.

First, data should enter shared environments in protected form. This reduces dependence on platform-native storage controls as the primary line of defense.

Second, the authority to reveal usable values should not be embedded in the same environment that stores and processes the data. That authority must be separated, tightly scoped, and externally governed.

Third, access to usable values should depend on explicit identity and policy decisions, not merely on the fact that a workload is running inside a trusted platform context.

Fourth, the architecture should minimize downstream spread by limiting when, where, and for whom sensitive values become available.

These principles can be expressed more concretely as follows:

Required Security Property	Why It Matters	What It Prevents
Separation of data from decryption authority	Removes unilateral exposure capability from platform	T1, T2, T3, T5, T7
Identity- and policy-bound reveal control	Prevents generic platform context from granting plaintext access	T1, T3, T4, T7
Scoped reveal with minimal blast radius	Limits downstream spread and output misuse	T4, T6
Externalized control over usable value access	Ensures the platform cannot act as universal decryption engine	T2, T3, T5
Reduced processing-time plaintext exposure	Shrinks runtime observation opportunities	T2, T5

This is where the conversation moves from compliance language to real architecture. The question is not merely whether the key is customer-owned. The question is whether the platform can still use that key path in ways that make plaintext broadly available.

Why This Matters Most in Snowflake, Databricks, and Similar Platforms

The reason this issue is especially important in data platforms is that they centralize both value and access. These systems do not hold isolated application rows. They hold the institution’s aggregated analytical truth. Customer profiles, account history, transactions, model features, service interactions, and operational telemetry all come together in one place.

That concentration is what makes these platforms powerful. It is also what makes their execution environments so dangerous if they are allowed to function as universal decryption engines.

A compromise in a cloud-hosted application may expose one workflow or one table. A compromise in a central data platform can expose years of historical records, sensitive joins across domains, high-value derived insights, and AI or analytics artifacts that are even more revealing than the original data.

That is why treating Snowflake or Databricks as “just another encrypted system” is a mistake. They are not just systems of storage. They are systems of reveal.

Regulatory Perspective: MAS Cloud Advisory and the Limits of Traditional Encryption Models

Regulators are already moving in the direction this paper outlines. The Monetary Authority of Singapore (MAS), in its Cloud Advisory on Migration, Risk Management, and Data Security, makes it clear that financial institutions cannot assume that native cloud controls are sufficient simply because encryption is in place.

The guidance repeatedly emphasizes that sensitive data in cloud environments must be protected through a combination of controls, including encryption, tokenization, and strong key management. More importantly, it implicitly recognizes that how encryption is implemented matters just as much as whether it exists.

This becomes clear in MAS’ treatment of different key management and encryption models.

BYOK: Improved Governance, Not Separation

MAS identifies Bring Your Own Key (BYOK) as a model where the financial institution retains ownership and lifecycle control of cryptographic keys, while allowing those keys to be used within the cloud environment.

This improves governance in meaningful ways. It gives institutions control over:

key generation and rotation
revocation and lifecycle management
audit and compliance posture

However, BYOK does not change where decryption occurs.

In a typical BYOK implementation:

data remains stored in the cloud platform
the platform can request or invoke key usage
decryption occurs inside the execution environment

From an architectural standpoint, this means the same condition still exists:

the environment that processes the data can also decrypt it

Ownership of keys has changed. Control separation has not.

BYOE: A Step Toward Separation

MAS introduces Bring Your Own Encryption (BYOE) as a stronger model. In this approach, data is encrypted before it enters the cloud, and the encryption mechanism is not delegated to the cloud provider.

This begins to change the structure of the system.

Instead of relying on the cloud to protect data, the organization ensures that:

the cloud receives data in protected form
encryption is applied outside the execution environment
key material is not inherently exposed to the platform

At a high level, the progression MAS is pointing to can be summarized as follows:

Model	What Changes	What Remains
Native Cloud Encryption	Basic protection at rest	Cloud controls keys and decryption
BYOK	Customer owns keys	Cloud still performs decryption
BYOE	Encryption moves outside cloud	Decryption may still occur inside execution

Where These Models Still Fall Short

Even with BYOE, most real-world implementations do not fully eliminate risk.

In practice:

applications and data platforms still need to process usable data
decryption is often reintroduced inside the runtime
key usage may still be accessible through system-integrated APIs

This means the execution environment can still function as a de facto decryption engine, even if encryption was applied earlier in the data flow.

The structural condition remains unchanged:

the system that processes the data can still reveal it

What MAS Is Actually Driving Toward

MAS does not explicitly prescribe a single architecture, but the direction is clear.

The evolution is not just:

from provider-managed keys to customer-managed keys
or from in-cloud encryption to pre-cloud encryption

It is toward true separation of control.

The real question MAS is pushing institutions to answer is:

Can any single environment, including the cloud or data platform, independently access and decrypt sensitive data?

If the answer is yes, then that environment remains a point of compromise, regardless of how strong the encryption appears on paper.

Why This Matters in Practice

This distinction becomes critical in environments such as AWS-based applications, Snowflake, and Databricks, where:

data is continuously processed, not just stored
decryption is part of normal execution
multiple actors can act through the same system

In these environments, encryption without separation does not prevent exposure. It simply defines how the system reveals data.

Summary

MAS guidance reinforces a key principle: Encryption is necessary, but it is not sufficient.

BYOK improves ownership.
BYOE improves placement.

But neither guarantees that the execution environment cannot act as a universal decryption point.

The only model that fully addresses the risk is one in which:

data can be processed by the platform
but the platform cannot independently reveal it

This is the difference between managing encryption and actually controlling data exposure.

Conclusion

Cloud and modern data platforms have changed the risk model for sensitive data. The main problem is no longer whether stored data is encrypted. The main problem is whether the environment processing that data can independently reveal it.

Threat modeling makes the issue unambiguous. Across realistic actors and realistic scenarios, the same pattern appears again and again. Data enters an execution environment. That environment gains or invokes decryption authority. Plaintext or usable values become available during runtime. An actor who can act through that environment can access the data without breaking cryptography.

That is the structural weakness.

The path forward is not to abandon cloud or analytics platforms. It is to adopt an architecture in which those environments are no longer trusted to function as universal reveal points for sensitive data. Once that condition is removed, the threat model changes fundamentally. Until it is removed, encryption alone will remain an incomplete answer.

The difference between strong cloud security and weak cloud security is not the presence of encryption. It is whether the environment processing the data can expose it on its own.