Threat Modeling Azure Native Controls for Structured Data in Analytics Workloads

A detailed threat model of Azure native security controls in analytics environments, examining encryption at rest, RBAC, and Always Encrypted, and identifying residual data exposure risks within authorized access.

Executive Summary

Azure native controls provide strong infrastructure protection.

They encrypt storage.
They manage keys.
They enforce resource-level access.
They classify sensitive data.

However, analytics environments introduce a different question:

Once an identity is authorized to query a dataset, what prevents that identity from seeing sensitive fields by default?

This paper threat-models a common Azure analytics architecture and evaluates where native controls apply, and where residual exposure risk remains.

This distinction matters because analytics overexposure typically occurs without misconfiguration and without bypassing native controls.

1. Scope

Modern cloud platforms provide a wide range of security controls. To avoid ambiguity, this threat model explicitly defines what is being evaluated and what is not.

The focus here is structured sensitive data used in analytics workflows, where data is copied, transformed, queried, and shared across large internal audiences. The goal is to analyze exposure risk within authorized analytics sessions, not to evaluate general cloud security posture.

In Scope

Structured sensitive data in:

  • Azure Data Lake Storage (ADLS Gen2)
  • Azure SQL / Synapse
  • Databricks / Spark
  • Power BI / Fabric
  • Internal analysts and external auditors

Out of Scope

  • Network segmentation
  • Endpoint security
  • Detection and monitoring controls

This is a data exposure model, not a perimeter defense model.

2. System Overview

Threat modeling begins with a clear understanding of the system being analyzed.

The architecture below represents a common Azure analytics pattern in regulated enterprises. Data originates in operational systems, is replicated into storage and processing layers, and is ultimately consumed by analysts, auditors, and reporting tools.

Each layer introduces different trust assumptions and different exposure risks.

A typical regulated enterprise architecture includes:

LayerDescriptionPrimary Risk
Source SystemsClaims, billing, CRM, financial, clinical systemsRaw sensitive data
StorageADLS Gen2, Blob, SQLConcentrated data volume
ProcessingSynapse, Databricks, SQLData reshaping & joins
ConsumptionPower BI, notebooks, reporting toolsBroad user visibility
IdentityEntra ID, RBAC, SQL rolesAuthorization boundary

As data moves from operational systems into analytics environments, the exposure surface expands significantly.

3. Assets

Before evaluating controls, it is important to define what is being protected.

In analytics environments, risk does not stem solely from raw data storage. It also includes derived datasets, aggregated outputs, and the organization’s ability to demonstrate effective access control to regulators and auditors.

The assets listed below reflect both technical exposure and business impact.

Asset IDAssetWhy It Matters
A1Sensitive structured fieldsDirect regulatory exposure
A2Derived sensitive dataRe-identification risk
A3Audit defensibilityRegulatory credibility
A4Blast radius controlLimits breach severity

Examples of sensitive structured fields include:

  • Member IDs
  • Claim IDs
  • SSNs
  • Financial account numbers
  • Diagnosis and procedure codes
  • Addresses, emails, phone numbers

4. Security Goals

Security goals define what “success” looks like.

Rather than focusing narrowly on encryption or access control features, these goals describe the desired outcomes for structured data in analytics environments: minimizing unnecessary visibility, reducing breach impact, and preserving legitimate analytical functionality.

Controls should be evaluated against these goals, not in isolation.

Goal IDGoalDescription
G1Field-level minimizationPrevent unnecessary exposure of sensitive columns
G2Identity-aware enforcementTie field visibility to user and context
G3Blast radius reductionLimit impact of credential compromise
G4Preserve analytics utilityAvoid breaking legitimate workflows
G5Audit evidenceDemonstrate enforceable least privilege

5. Native Azure Controls in Scope

Azure provides multiple native security controls that are frequently cited as sufficient for protecting sensitive data.

This section evaluates those controls specifically through the lens of analytics exposure. The question is not whether these controls are valuable, but whether they enforce field-level minimization and identity-aware visibility inside authorized query sessions.

5.1 Encryption at Rest

ControlWhat It ProtectsWhat It Does Not Protect
Storage Service EncryptionDisk and storage mediaQuery-time field visibility
Transparent Data EncryptionDatabase files and backupsAuthorized session exposure

Encryption at rest protects against offline media theft and storage compromise.
It does not restrict what an authenticated user can query.

5.2 Always Encrypted (Azure SQL)

CapabilityStrengthConstraint in Analytics Context
Column encryptionProtects certain roles from plaintextLimited query functionality in some modes
Client-side key modelKeeps DB engine blind in some configurationsOperational complexity
Column-level protectionStrong for tightly coupled app workloadsHard to extend across lake-style analytics ecosystems
Static encryption policyEffective for specific columnsNot identity-aware reveal at runtime

Always Encrypted can be powerful for application-bound database workloads.
It is not inherently a dynamic, cross-tool, identity-aware analytics enforcement engine.

5.3 Identity and Access Controls

ControlEnforcesLimitation
Microsoft Entra IDAuthenticationDoes not control field-level visibility
Azure RBACResource accessGrants dataset-level access
SQL / Synapse rolesObject-level accessNo default field minimization

RBAC determines who can access a dataset.
It does not determine which sensitive fields are visible inside that dataset by default.

5.4 Data Discovery and Classification

ControlStrengthLimitation
Microsoft PurviewIdentifies and labels sensitive dataDoes not shape runtime query results
Sensitivity labelsImproves governance visibilityEnforcement depends on downstream integrations

Classification improves awareness.
It does not automatically enforce field-level exposure control inside analytics sessions.

6. If We Rely Solely on Azure Native Controls

Assume the following posture:

  • Encryption at rest is enabled
  • RBAC is configured correctly
  • Entra ID enforces MFA
  • Sensitive fields are classified in Purview
  • No misconfigurations exist

Under this model:

  • Any user granted dataset access can see all fields in that dataset
  • Compromise of that user enables full field-level exposure
  • Auditors granted read access can view entire records by default
  • Exports and derived datasets may replicate sensitive plaintext

This is not a failure scenario.

It is the expected behavior of a system where protection operates at the resource boundary rather than the data element boundary.

This exposure occurs even when native controls are configured correctly and operating as designed.

The dominant residual risk becomes:

Overexposure within authorized access.

7. Trust Boundaries

Trust boundaries define where assumptions change.

In cloud analytics architectures, some controls operate at infrastructure boundaries, while exposure risk often materializes within authorized sessions. Mapping these boundaries clarifies where native protections apply and where additional controls may be necessary.

BoundaryDescriptionExposure Risk
TB1Storage boundaryDisk-level theft
TB2Identity boundaryUnauthorized login
TB3Analytics session boundaryOverexposure in authorized queries
TB4Output boundaryExport and redistribution of query results

Most Azure native controls operate at TB1 and TB2.
Most analytics exposure risk occurs at TB3 and TB4.

8. Threat Actors

Threat models consider not only external attackers, but also realistic internal scenarios.

In analytics environments, authorized users often represent the dominant exposure vector. Over-privilege, credential compromise, and broad third-party access can all result in sensitive data disclosure without violating infrastructure security controls.

Actor IDActorPrimary Risk
TA1Over-privileged analystBroad internal exposure
TA2Compromised identityMass data exfiltration
TA3Careless analystAccidental leakage
TA4External auditorExcessive third-party visibility
TA5Privileged platform operatorElevated access exposure

9. Threat Scenarios

Each threat scenario below evaluates a realistic exposure path.

For each scenario, the analysis identifies:

  • The system conditions required
  • Which native controls apply
  • Why those controls may not prevent exposure
  • The resulting impact

The intent is not to highlight misconfiguration, but to evaluate structural limitations.

Scenario 1: Authorized Analyst, Excessive Visibility

ElementDescription
SetupAnalyst has dataset reader access
ActionExecutes SELECT * or builds BI report including sensitive columns
Native Controls AppliedEncryption at rest, RBAC
Why InsufficientRBAC grants dataset access, not field-level minimization
ImpactBroad internal exposure of sensitive fields

Encryption at rest is irrelevant once the analytics engine decrypts data during normal operation.

Scenario 2: Credential Compromise

ElementDescription
SetupAttacker obtains valid Entra ID credentials
ActionQueries sensitive datasets and exports results
Native Controls AppliedEntra ID, RBAC, encryption at rest
Why InsufficientFrom the system’s perspective, access is authorized
ImpactHigh blast radius breach

If a single identity has broad dataset access, compromise of that identity enables mass disclosure.

Scenario 3: Auditor Overexposure

ElementDescription
SetupAuditor granted dataset-level access
ActionPulls records including unnecessary sensitive fields
Native Controls AppliedRBAC
Why InsufficientNo field-level default minimization
ImpactViolation of minimum necessary principles

Scenario 4: Classification Without Enforcement

ElementDescription
SetupSensitive fields labeled in Purview
ActionAnalyst queries labeled fields in BI
Native Controls AppliedClassification
Why InsufficientLabels do not block query output
ImpactFalse sense of protection

Scenario 5: Always Encrypted Partial Deployment

ElementDescription
SetupAlways Encrypted deployed in Azure SQL
ActionData replicated to analytics lake for broader consumption
Native Controls AppliedColumn encryption at DB layer
Why InsufficientDownstream analytics often requires plaintext or separate replication
ImpactProtection stops at database boundary

10. Residual Risk Profile

Residual risk represents the exposure that remains after native controls are applied correctly.

If infrastructure encryption, identity management, and classification are functioning as designed, what risk conditions still exist? This section summarizes the systemic exposure patterns that persist in analytics environments.

If protection relies primarily on:

  • Encryption at rest
  • RBAC at dataset boundaries
  • Classification for awareness

The dominant residual condition becomes:

Authorized access equals broad visibility.

This leads to:

RiskOutcome
Broad dataset reader groupsSensitive fields widely visible
Credential compromiseLarge-volume exfiltration possible
Third-party sharingDifficult to enforce minimum necessary
Process relianceHuman discipline replaces technical enforcement

Control Outcome Comparison

QuestionAzure Native OnlyData-Layer Enforcement Present
Can disks be stolen safely?YesYes
Can unauthorized users access datasets?Yes (prevented)Yes (prevented)
Can authorized users see all fields by default?YesNo
Does credential compromise expose entire datasets?Often YesReduced blast radius
Is field-level minimization enforced at query time?NoYes
Is "minimum necessary" technically enforced?Not by defaultYes

11. Design Implications

A threat model is most useful when it informs system design.

The requirements below are derived directly from the scenarios and residual risks identified earlier. They describe characteristics a data-layer exposure control model must possess in order to materially reduce analytics overexposure.

A data-layer exposure control model must:

RequirementPurpose
Default protectionPrevent automatic field exposure
Identity-aware revealTie visibility to user and context
Cross-tool consistencyAvoid weakest-path bypass
Plaintext minimizationReduce replication and export risk
Auditable reveal eventsStrengthen compliance defensibility

Closing Observation

Azure native controls secure infrastructure layers exceptionally well.

They answer questions such as:

  • Can someone steal the disk?
  • Can an unauthorized identity access the resource?

They do not inherently answer:

  • Which sensitive fields should this authorized analyst see right now?

Threat modeling clarifies the distinction.

In analytics environments, the dominant exposure risk is not unauthorized intrusion.

It is excessive visibility granted to legitimate identities operating as designed.

Infrastructure security reduces the probability of intrusion. Data-layer enforcement reduces the impact of exposure.

Both are necessary.
They operate at different layers of the system.



© 2026 Ubiq Security, Inc. All rights reserved.