Executive Summary

Azure native controls provide strong infrastructure protection.

They encrypt storage.
They manage keys.
They enforce resource-level access.
They classify sensitive data.

However, analytics environments introduce a different question:

Once an identity is authorized to query a dataset, what prevents that identity from seeing sensitive fields by default?

This paper threat-models a common Azure analytics architecture and evaluates where native controls apply, and where residual exposure risk remains.

This distinction matters because analytics overexposure typically occurs without misconfiguration and without bypassing native controls.

1. Scope

Modern cloud platforms provide a wide range of security controls. To avoid ambiguity, this threat model explicitly defines what is being evaluated and what is not.

The focus here is structured sensitive data used in analytics workflows, where data is copied, transformed, queried, and shared across large internal audiences. The goal is to analyze exposure risk within authorized analytics sessions, not to evaluate general cloud security posture.

In Scope

Structured sensitive data in:

Azure Data Lake Storage (ADLS Gen2)
Azure SQL / Synapse
Databricks / Spark
Power BI / Fabric
Internal analysts and external auditors

Out of Scope

Network segmentation
Endpoint security
Detection and monitoring controls

This is a data exposure model, not a perimeter defense model.

2. System Overview

Threat modeling begins with a clear understanding of the system being analyzed.

The architecture below represents a common Azure analytics pattern in regulated enterprises. Data originates in operational systems, is replicated into storage and processing layers, and is ultimately consumed by analysts, auditors, and reporting tools.

Each layer introduces different trust assumptions and different exposure risks.

A typical regulated enterprise architecture includes:

Layer	Description	Primary Risk
Source Systems	Claims, billing, CRM, financial, clinical systems	Raw sensitive data
Storage	ADLS Gen2, Blob, SQL	Concentrated data volume
Processing	Synapse, Databricks, SQL	Data reshaping & joins
Consumption	Power BI, notebooks, reporting tools	Broad user visibility
Identity	Entra ID, RBAC, SQL roles	Authorization boundary

As data moves from operational systems into analytics environments, the exposure surface expands significantly.

3. Assets

Before evaluating controls, it is important to define what is being protected.

In analytics environments, risk does not stem solely from raw data storage. It also includes derived datasets, aggregated outputs, and the organization’s ability to demonstrate effective access control to regulators and auditors.

The assets listed below reflect both technical exposure and business impact.

Asset ID	Asset	Why It Matters
A1	Sensitive structured fields	Direct regulatory exposure
A2	Derived sensitive data	Re-identification risk
A3	Audit defensibility	Regulatory credibility
A4	Blast radius control	Limits breach severity

Examples of sensitive structured fields include:

Member IDs
Claim IDs
SSNs
Financial account numbers
Diagnosis and procedure codes
Addresses, emails, phone numbers

4. Security Goals

Security goals define what “success” looks like.

Rather than focusing narrowly on encryption or access control features, these goals describe the desired outcomes for structured data in analytics environments: minimizing unnecessary visibility, reducing breach impact, and preserving legitimate analytical functionality.

Controls should be evaluated against these goals, not in isolation.

Goal ID	Goal	Description
G1	Field-level minimization	Prevent unnecessary exposure of sensitive columns
G2	Identity-aware enforcement	Tie field visibility to user and context
G3	Blast radius reduction	Limit impact of credential compromise
G4	Preserve analytics utility	Avoid breaking legitimate workflows
G5	Audit evidence	Demonstrate enforceable least privilege

5. Native Azure Controls in Scope

Azure provides multiple native security controls that are frequently cited as sufficient for protecting sensitive data.

This section evaluates those controls specifically through the lens of analytics exposure. The question is not whether these controls are valuable, but whether they enforce field-level minimization and identity-aware visibility inside authorized query sessions.

5.1 Encryption at Rest

Control	What It Protects	What It Does Not Protect
Storage Service Encryption	Disk and storage media	Query-time field visibility
Transparent Data Encryption	Database files and backups	Authorized session exposure

Encryption at rest protects against offline media theft and storage compromise.
It does not restrict what an authenticated user can query.

5.2 Always Encrypted (Azure SQL)

Capability	Strength	Constraint in Analytics Context
Column encryption	Protects certain roles from plaintext	Limited query functionality in some modes
Client-side key model	Keeps DB engine blind in some configurations	Operational complexity
Column-level protection	Strong for tightly coupled app workloads	Hard to extend across lake-style analytics ecosystems
Static encryption policy	Effective for specific columns	Not identity-aware reveal at runtime

Always Encrypted can be powerful for application-bound database workloads.
It is not inherently a dynamic, cross-tool, identity-aware analytics enforcement engine.

5.3 Identity and Access Controls

Control	Enforces	Limitation
Microsoft Entra ID	Authentication	Does not control field-level visibility
Azure RBAC	Resource access	Grants dataset-level access
SQL / Synapse roles	Object-level access	No default field minimization

RBAC determines who can access a dataset.
It does not determine which sensitive fields are visible inside that dataset by default.

5.4 Data Discovery and Classification

Control	Strength	Limitation
Microsoft Purview	Identifies and labels sensitive data	Does not shape runtime query results
Sensitivity labels	Improves governance visibility	Enforcement depends on downstream integrations

Classification improves awareness.
It does not automatically enforce field-level exposure control inside analytics sessions.

6. If We Rely Solely on Azure Native Controls

Assume the following posture:

Encryption at rest is enabled
RBAC is configured correctly
Entra ID enforces MFA
Sensitive fields are classified in Purview
No misconfigurations exist

Under this model:

Any user granted dataset access can see all fields in that dataset
Compromise of that user enables full field-level exposure
Auditors granted read access can view entire records by default
Exports and derived datasets may replicate sensitive plaintext

This is not a failure scenario.

It is the expected behavior of a system where protection operates at the resource boundary rather than the data element boundary.

This exposure occurs even when native controls are configured correctly and operating as designed.

The dominant residual risk becomes:

Overexposure within authorized access.

7. Trust Boundaries

Trust boundaries define where assumptions change.

In cloud analytics architectures, some controls operate at infrastructure boundaries, while exposure risk often materializes within authorized sessions. Mapping these boundaries clarifies where native protections apply and where additional controls may be necessary.

Boundary	Description	Exposure Risk
TB1	Storage boundary	Disk-level theft
TB2	Identity boundary	Unauthorized login
TB3	Analytics session boundary	Overexposure in authorized queries
TB4	Output boundary	Export and redistribution of query results

Most Azure native controls operate at TB1 and TB2.
Most analytics exposure risk occurs at TB3 and TB4.

8. Threat Actors

Threat models consider not only external attackers, but also realistic internal scenarios.

In analytics environments, authorized users often represent the dominant exposure vector. Over-privilege, credential compromise, and broad third-party access can all result in sensitive data disclosure without violating infrastructure security controls.

Actor ID	Actor	Primary Risk
TA1	Over-privileged analyst	Broad internal exposure
TA2	Compromised identity	Mass data exfiltration
TA3	Careless analyst	Accidental leakage
TA4	External auditor	Excessive third-party visibility
TA5	Privileged platform operator	Elevated access exposure

9. Threat Scenarios

Each threat scenario below evaluates a realistic exposure path.

For each scenario, the analysis identifies:

The system conditions required
Which native controls apply
Why those controls may not prevent exposure
The resulting impact

The intent is not to highlight misconfiguration, but to evaluate structural limitations.

Scenario 1: Authorized Analyst, Excessive Visibility

Element	Description
Setup	Analyst has dataset reader access
Action	Executes `SELECT *` or builds BI report including sensitive columns
Native Controls Applied	Encryption at rest, RBAC
Why Insufficient	RBAC grants dataset access, not field-level minimization
Impact	Broad internal exposure of sensitive fields

Encryption at rest is irrelevant once the analytics engine decrypts data during normal operation.

Scenario 2: Credential Compromise

Element	Description
Setup	Attacker obtains valid Entra ID credentials
Action	Queries sensitive datasets and exports results
Native Controls Applied	Entra ID, RBAC, encryption at rest
Why Insufficient	From the system’s perspective, access is authorized
Impact	High blast radius breach

If a single identity has broad dataset access, compromise of that identity enables mass disclosure.

Scenario 3: Auditor Overexposure

Element	Description
Setup	Auditor granted dataset-level access
Action	Pulls records including unnecessary sensitive fields
Native Controls Applied	RBAC
Why Insufficient	No field-level default minimization
Impact	Violation of minimum necessary principles

Scenario 4: Classification Without Enforcement

Element	Description
Setup	Sensitive fields labeled in Purview
Action	Analyst queries labeled fields in BI
Native Controls Applied	Classification
Why Insufficient	Labels do not block query output
Impact	False sense of protection

Scenario 5: Always Encrypted Partial Deployment

Element	Description
Setup	Always Encrypted deployed in Azure SQL
Action	Data replicated to analytics lake for broader consumption
Native Controls Applied	Column encryption at DB layer
Why Insufficient	Downstream analytics often requires plaintext or separate replication
Impact	Protection stops at database boundary

10. Residual Risk Profile

Residual risk represents the exposure that remains after native controls are applied correctly.

If infrastructure encryption, identity management, and classification are functioning as designed, what risk conditions still exist? This section summarizes the systemic exposure patterns that persist in analytics environments.

If protection relies primarily on:

Encryption at rest
RBAC at dataset boundaries
Classification for awareness

The dominant residual condition becomes:

Authorized access equals broad visibility.

This leads to:

Risk	Outcome
Broad dataset reader groups	Sensitive fields widely visible
Credential compromise	Large-volume exfiltration possible
Third-party sharing	Difficult to enforce minimum necessary
Process reliance	Human discipline replaces technical enforcement

Control Outcome Comparison

Question	Azure Native Only	Data-Layer Enforcement Present
Can disks be stolen safely?	Yes	Yes
Can unauthorized users access datasets?	Yes (prevented)	Yes (prevented)
Can authorized users see all fields by default?	Yes	No
Does credential compromise expose entire datasets?	Often Yes	Reduced blast radius
Is field-level minimization enforced at query time?	No	Yes
Is "minimum necessary" technically enforced?	Not by default	Yes

11. Design Implications

A threat model is most useful when it informs system design.

The requirements below are derived directly from the scenarios and residual risks identified earlier. They describe characteristics a data-layer exposure control model must possess in order to materially reduce analytics overexposure.

A data-layer exposure control model must:

Requirement	Purpose
Default protection	Prevent automatic field exposure
Identity-aware reveal	Tie visibility to user and context
Cross-tool consistency	Avoid weakest-path bypass
Plaintext minimization	Reduce replication and export risk
Auditable reveal events	Strengthen compliance defensibility

Closing Observation

Azure native controls secure infrastructure layers exceptionally well.

They answer questions such as:

Can someone steal the disk?
Can an unauthorized identity access the resource?

They do not inherently answer:

Which sensitive fields should this authorized analyst see right now?

Threat modeling clarifies the distinction.

In analytics environments, the dominant exposure risk is not unauthorized intrusion.

It is excessive visibility granted to legitimate identities operating as designed.

Infrastructure security reduces the probability of intrusion. Data-layer enforcement reduces the impact of exposure.

Both are necessary.
They operate at different layers of the system.