Hand-Picked Top-Read Stories

AI agent runtime protection Defender for Endpoint

AI agent runtime protection in Microsoft Defender for Endpoint

Microsoft Defender for Endpoint custom data collection: get the telemetry you need

Identity attack graph sentinel datalake

Identity attack graph in Microsoft Sentinel: easy find lateral movement paths

19 minute read

Microsoft Sentinel data lake: implementation guide

Modern Security

20 September 2025

Table of Contents Hide

Introduction
Prerequisites and requirements
1. Required permissions
2. Technical prerequisites
Pricing model
1. Current pricing structure
2. Critical pricing notes
Data lake onboarding
Data Connectors
Retention configuration and management
Understanding data mirroring behavior
The Data lake retention model
Special case: XDR tables and data lake
1. Understanding XDR table tiers:
2. Two methods for extending XDR table retention:
How to check current retention configuration
How to change retention configuration
1. For Sentinel and custom tables:
2. For XDR tables (enabling data lake – standard method):
Recommended retention configurations
Optimizing XDR costs with DCR transformations
Data analysis and operations
Performance monitoring and optimization
Troubleshooting common issues
Best practices and governance
Additional information
Conclusion

Introduction

What is Microsoft Sentinel data lake

Microsoft Sentinel data lake is a purpose-built, cloud-native security data platform that addresses the fundamental challenge organizations face between comprehensive security coverage and cost sustainability. The platform transforms how organizations manage and analyze security data through:

Unifying security data across Microsoft Defender XDR, third-party sources, assets, activity logs, and threat intelligence
Optimizing costs with tiered storage, on-demand data promotion, and single copy architecture
Enabling deep security insights with up to 12 years of queryable security data
Powering AI and automation for faster detection and response

The business challenge solved

Traditional SIEM solutions struggle with the cost and complexity of storing and querying long-term security data. Organizations often face impossible choices between comprehensive security coverage and budget constraints. Microsoft Sentinel data lake solves these challenges through centralized data management, cost-effective storage with significantly lower per-GB costs for historical data, maintained query capabilities ensuring no loss of analytical functionality, and automated data lifecycle management reducing operational overhead.

Two-tier architecture benefits

Data tiers architecture

Microsoft Sentinel data lake implements a sophisticated two-tier architecture optimized for both cost and performance:

Feature	Analytics tier	Data lake tier
Key characteristics	High-performance querying and indexing (hot/interactive retention)	Cost-effective long-term retention (cold storage)
Best use cases	Real-time analytics rules, alerting, hunting, workbooks, all Sentinel SIEM features	Compliance and regulatory logging, historical trend analysis and forensics, low-touch data not needed for real-time alerts
Query price included	Yes – queries included in ingestion cost	No – queries charged separately at $0.005/GB scanned
Query performance	Optimized – sub-second response for recent data	Slower – optimized for large-scale scans, good for auditing
Query capabilities	Full KQL in Defender and Azure portals via APIs	Full KQL on single table with lookup enrichment, scheduled KQL/Spark jobs, Jupyter notebooks
Real-time analytics features	Yes – all detection rules, alerts, workbooks, hunting queries	No – limitations on analytics rules, hunting queries, parsers, watchlists, workbooks, playbooks
Search jobs	Supported	Supported
Summary rules	Supported	Supported – Full KQL on single table, extendable with analytics table lookups
Restore capability	Supported	Not supported
Data export	Supported	Not supported
Retention period	90 days (Sentinel) / 30 days (Defender XDR), extendable to 2 years with prorated monthly charge	Same as analytics by default, extendable to 12 years

Critical Note: Data processing charge ($0.10/GB) applies ONLY to data ingested directly to data lake tier only. Data mirrored from analytics tier incurs NO processing charge.

Key advantages over Azure Data Explorer (ADX)

While Azure Data Explorer remains a powerful platform, it demands significant administrative investment. Organizations using ADX must manually create and maintain table structures, custom functions, monitoring systems, and data mappings. The Sentinel data lake eliminates this operational burden by having Microsoft handle the underlying infrastructure management.

Impact on Auxiliary logging

When organizations enable the Sentinel data lake, auxiliary log tables automatically disappear from both Defender Advanced Hunting and the Microsoft Sentinel portal interface. However, this data isn’t lost—it’s seamlessly migrated to the data lake environment where analysts can access it through KQL queries or Jupyter notebooks. This transition represents an automatic upgrade from auxiliary table storage to a more scalable data lake architecture.

Prerequisites and requirements

Required permissions

You need specific roles across different scopes:

Permission Scope	Required Role	Purpose
Azure Subscription	Subscription Owner	Billing setup and resource creation
Microsoft Entra ID	Global Administrator OR Security Administrator	Cross-tenant security platform configuration
Workspace Level	Microsoft Sentinel Reader	Access to all participating workspaces

Technical prerequisites

Infrastructure requirements:

Existing Azure subscription with billing access
Resource group (existing or new) – cannot be changed after setup
Microsoft Sentinel workspace(s) in correct region
Microsoft Sentinel primary workspace is connected to the Microsoft Defender XDR portal

Pricing model

Understanding the cost structure of Microsoft Sentinel data lake is essential for planning your implementation and maximizing cost savings. The platform uses a consumption-based pricing model with distinct meters for ingestion, storage, processing, and compute.

The pricing model underwent changes as the platform transitioned from public preview to general availability. Most notably, data processing and asset data ingestion now incur charges, though these costs are offset by the substantial savings on storage and the elimination of analytics tier ingestion costs for appropriately routed data.

Current pricing structure

Cost component	Rate	What it covers
Data lake ingestion	$0.05/GB	Data entering data lake tier directly
Data processing	$0.10/GB	All transformations (filtering, normalization, enrichment)
Data lake storage	$0.026/GB/month	Storage beyond 30-day included period (6:1 compression applied)
Query execution	$0.005/GB	Data scanned during KQL queries and jobs
Jupyter notebook compute	$0.15/compute hour	Notebook sessions (cores × session time)
Asset data ingestion	$0.05/GB	Entra ID, Microsoft 365, and Azure asset connector data

For comparison – Analytics tier: $2.30-$4.60/GB ingestion (region and commitment dependent)

Critical pricing notes

Included with ingestion:

30 days of storage included – No separate storage charges for first 30 days
No double billing – Data mirrored from analytics tier to data lake incurs no additional ingestion charges

Sentinel-specific advantage:

Filtering exemption – Sentinel workspaces have NO penalty charges for DCR filtering, regardless of volume reduction percentage (Standard Log Analytics charges penalties for filtering >50%)

Uniform compression:

6:1 compression ratio – Applied consistently across all data sources for storage billing

October 2025 changes:

Data processing ($0.10/GB) – Now charged (was free during preview)
Asset data ingestion ($0.05/GB) – Now charged (was free during preview)

Data lake onboarding

Step 1: Initiate onboarding process

Method 1: Via onboarding banner

Sign in to Microsoft Defender Portal
Look for data lake onboarding banner at top of page
Click Get started

Method 2: Via settings navigation

Navigate to System → Settings → Microsoft Sentinel → Data lake
Click Set up data lake

Step 2: Permission validation screen

If you encounter the permissions screen, refer to the requirements above for required permissions.

Step 3: Billing configuration

Subscription: Choose billing subscription carefully
Resource group: Select or create resource group for data lake resources

Click Set up data lake to begin provisioning process.

Step 4: Monitor setup progress

Setup timeline: Refer to the Critical Requirements Reference box above for detailed timeline expectations.

Step 5: Validate successful completion

Success indicators:

Completion banner Your data lake is ready with feature access cards
New navigation option: Data lake exploration under Sentinel
Enhanced Tables management capabilities
Jobs section becomes available

Data Connectors

Microsoft Sentinel data lake works with all existing Sentinel data connectors, including:

DNS, proxy, and email telemetry
All Microsoft Defender XDR and Microsoft Sentinel data sources
Microsoft 365
Microsoft Entra ID
Microsoft Resource Graph
Endpoint Detection and Response (EDR) platforms
Firewall and network logs
Cloud infrastructure and workload telemetry
Identity and access logs (Microsoft Entra, Okta, etc.)

Below the steps to activate for example the Defender XDR data connector (in case you haven’t got it enabled yet).

Access XDR data connector

Navigate to Microsoft Defender Portal → Microsoft Sentinel → Configuration → Data connectors
Search for “Microsoft Defender XDR” in the connector gallery
Click Open connector page

Configure integration components The XDR connector configuration includes three main sections:

Connect incidents and alerts: Enables basic integration for incident synchronization between platforms
Connect entities: Integrates on-premises Active Directory user identities through Microsoft Defender for Identity
Connect events: Enables collection of raw advanced hunting events from Defender components

Understanding XDR data streams

Free data streams (included with XDR integration):

SecurityAlert: Alerts from all Defender products
SecurityIncident: Unified incident management data
Bi-directional synchronization: Incidents sync between Sentinel and Defender portals

Paid data streams (require analytics tier ingestion):

DeviceProcessEvents: Process execution and command-line data
DeviceNetworkEvents: Network connection and communication data
DeviceFileEvents: File creation, modification, and access events
DeviceLogonEvents: Authentication and session data
EmailEvents: Email security and communication data
CloudAppEvents: Cloud application activity and security events
IdentityLogonEvents: Identity provider authentication events

If you’ve Sentinel setup before and transitioned to the Unified Experience (Defender portal), existing connectors continue working without interruption. However, Microsoft Defender connectors (XDR, Endpoint, Identity, Office 365, Cloud Apps, and Defender for Cloud) won’t appear in the Data connectors tab under Sentinel—they’re automatically managed as part of the unified experience. To view these connectors, access Microsoft Sentinel through the Azure portal or from Exposure Management (only the status):

Retention configuration and management

Understanding data mirroring behavior

When you enable data lake, automatic mirroring begins for all tables from the onboarding point forward with these characteristics:

Forward-looking only: Historical data existing before onboarding is not mirrored
Automatic mirroring: All existing connectors automatically mirror data to data lake
Consistent retention initially: Data lake tier matches analytics tier retention settings
Single copy architecture: Same data serves both tiers without duplication

The Data lake retention model

As covered earlier, Data Lake uses a two-tier model:

Tier	Default retention	Maximum retention	Purpose
Analytics tier (hot)	30 days	2 years	Real-time detection, hunting, analytics
Data lake tier (cold)	Mirrors analytics	12 years	Long-term compliance, historical analysis

Key cost insight: For Sentinel solution tables, 90 days of analytics tier storage is free (you only pay ingestion). For XDR tables, only 30 days is included in the XDR license. Data lake storage beyond analytics retention incurs additional costs.

Special case: XDR tables and data lake

XDR tables behave fundamentally differently than regular Sentinel solution tables, particularly around costs and retention:

Understanding XDR table tiers:

Before they’re ingested into your Sentinel workspace, XDR tables exist in what’s called the “XDR default tier” – essentially a 30-day buffer managed by the Defender XDR service. This data is:

Included in your Defender XDR license (already paid for)
Available via Advanced Hunting in Defender portal
View-only and doesn’t participate in the data lake architecture
Not ingested into Sentinel (so no Sentinel ingestion costs)

Table type	Minimum analytics	Can use data lake?	Note
Sentinel/Custom	30 days	✅ Yes	Standard data lake behavior, 90 days free storage
XDR (not ingested)	N/A (30-day XDR default)	❌ No	Must ingest first by setting retention > 30 days
XDR (ingested)	30 days minimum	✅ Yes	Now follows standard data lake model with costs

Two methods for extending XDR table retention:

Method 1: Standard ingestion (covered in this section)

Set analytics retention > 30 days in Defender portal
XDR data ingests into Sentinel Analytics tier (expensive ingestion costs)
Then automatically mirrors to data lake

Method 2: DCR transformation (cost-optimized)

Bypass expensive Analytics tier ingestion entirely
Route XDR data directly to data lake using custom tables
Massive cost savings for compliance/archival scenarios
Covered in next section: “Optimizing XDR costs with DCR transformations”

How to check current retention configuration

Step 1: Navigate to table management

Open Microsoft Defender Portal → Microsoft Sentinel → Configuration → Tables

You’ll see a list of all tables with their current retention settings

Step 2: Review retention overview

The table list displays key information:

Column	Description
Table name	Name of the table
Workspace	Which Log Analytics workspace
Tier	Analytics, Data Lake, or XDR default
Analytics retention	Hot storage duration
Total retention	Analytics + data lake combined

Step 3: Check detailed retention for a specific table

Click on any table name
A details panel opens on the right showing:
- Current tier (Analytics, data lake, or XDR default)
- Analytics retention period
- Total retention period

How to change retention configuration

For Sentinel and custom tables:

Step 1: Navigate to Microsoft Defender Portal → Microsoft Sentinel → Configuration → Tables

Step 2: Click on your table → Click Manage table

Step 3: Configure retention:

Analytics retention: 30 days to 2 years (hot storage)
Total retention: Equal to or greater than analytics, max 12 years (hot + cold)

Step 4: Click Save

Example configuration:

Scenario: Need 1 year total retention for a Sentinel solution table

Configuration:

Analytics retention: 90 days (free for Sentinel tables)
Total retention: 1 year (365 days)

Result: 90 days in analytics + 275 days in data lake

Cost: Only data lake storage for days 91-365 (analytics storage is free)

Cost tip: You can also fully skip the analytics tier for some tables – this results in lower cost, but loses real-time analytics features like alerting, hunting queries, and analytics rules. Make sure you understand what functionality breaks before doing this:

For XDR tables (enabling data lake – standard method):

IMPORTANT COST NOTICE: This method incurs significant Analytics tier ingestion costs. XDR tables only have 30 days included in your license (not 90 like Sentinel tables). Every day beyond 30 requires paying full Sentinel ingestion rates before data reaches the data lake. See the next section for a cost-optimized alternative via DCR.

Step 1: Navigate to Tables → Locate XDR table showing “XDR default tier”

Examples: DeviceEvents, EmailEvents, IdentityInfo, AlertEvidence, DeviceProcessEvents

Step 2: Click on the XDR table → Click Manage table

Step 3: Set analytics retention > 30 days

This triggers ingestion into Sentinel and incurs expensive ingestion costs
Common values: 60, 90, 180, 365 days (preferable 90 days if using standard method)

Step 4: Set total retention (equal to or greater than analytics)

Example: 365 days for 1-year compliance

Step 5: Review cost warnings → Click Save

What happens:

Ingestion begins immediately (costs start here)
Data flows: XDR → Sentinel Analytics tier (expensive) → Data lake mirror
Table type changes from “XDR default” to “Microsoft Sentinel”
You’re now paying for both Analytics ingestion + any data lake storage beyond analytics retention

To stop ingestion: Set both Analytics and Total retention back to 30 days

Cost breakdown example for XDR tables:

Scenario	Analytics	Total	Cost structure
Stay in XDR default	30 days	30 days	$0 – Included in license
Standard method	90 days	90 days	Analytics ingestion for 90 days (expensive)
Standard + compliance	90 days	1 year	Analytics ingestion for 90 days + Data lake storage (days 91-365)

Next section: Learn how to bypass these expensive ingestion costs using DCR transformations to send XDR data directly to data lake tier (especially handy for noisy tables).

Recommended retention configurations

Scenario	Analytics	Total	Use case
Standard operations	90 days	90 days	Active security operations, no extended compliance
Balanced approach	90 days	365 days	Operations with 1-year compliance
Extended compliance	90 days	2-7 years	Regulatory requirements (GDPR, HIPAA)
Cost-optimized	30 days	365 days – 7 years	Minimal hot storage, extended cold storage

Optimizing XDR costs with DCR transformations

The cost problem

As we explored in the previous section, extending XDR table retention beyond 30 days using the standard method triggers expensive Analytics tier ingestion. The warning dialog makes this clear: increasing retention moves your XDR data to Sentinel and incurs ingestion costs.

The impact: 1TB/day of XDR data = ~$96,000/month in ingestion costs alone. For many organizations, this makes long-term retention financially prohibitive, even when required for compliance.

Understanding the optimization approach

The key insight is that XDR data already exists for free in your environment – it’s included in your Defender XDR license and available via Advanced Hunting for 30 days. The problem is the expensive journey it takes when you need to keep it longer.

Standard flow (expensive):

XDR Advanced Hunting (30d free) → Analytics Tier Ingestion ($$$) → Data Lake Mirror

Optimized flow (cost-effective):

XDR Advanced Hunting (30d free) → Workspace DCR Transformation → Custom Data Lake Tables ($)

By using Data Collection Rules (DCR) with workspace transformations, we can route XDR data directly to custom tables in the data lake tier, completely bypassing expensive Analytics tier ingestion. This is a legitimate Microsoft-supported feature that’s just not widely documented for this use case, but got inspired by Jeffrey Appel his approach for this.

Workspace transformation DCR

Key benefits and trade-offs

Benefits:

Eliminate Analytics tier ingestion costs entirely
Keep 30 days free in XDR Advanced Hunting for real-time detection
Store years of data in low-cost data lake tier
Perfect for compliance and historical analysis
Single DCR handles multiple tables

Trade-offs:

No real-time alerting or analytics rules on custom tables
Slower queries compared to Analytics tier
Requires initial setup and ongoing management
Custom tables have limited feature support

Critical decision: Only use this method for XDR data needed for compliance/historical purposes. If you need real-time detection beyond 30 days, keep those tables in Analytics tier. Many organizations use a hybrid approach.

Choosing which tables to optimize

Not all XDR tables are equal candidates. Base your decision on volume, analytical needs, and compliance requirements.

XDR Table	Volume	Recommendation	Reasoning
DeviceProcessEvents	Very High	Optimize	Highest volume, mostly needed for forensics
DeviceNetworkEvents	Very High	Optimize	High noise-to-signal ratio
DeviceFileEvents	High	Optimize	Primarily used for investigations
CloudAppEvents	Medium	Consider	Depends on detection requirements
EmailEvents	Medium	Consider	May need for phishing detection
AlertEvidence	Low	Keep in Analytics	Critical for investigations
IdentityInfo	Very Low	Keep in Analytics	Required for UEBA

Optimize when:

High-volume, low-value data (verbose logs)
Compliance retention only
Historical analysis and forensics needs

Keep in Analytics when:

Real-time alerting required
UEBA or behavioral analytics needed
Active detection rules depend on the data

Prerequisites and tools

Before implementing this optimization, you’ll need a few things in place.

1. Marko Lauren’s Table Creator Tool

The tool is essential because Azure Data Lake tables (Auxiliary tier) don’t natively support dynamic data types, which many XDR tables use. Marko’s tool solves this by automating the table creation process and handling the schema complexity.

GitHub link

What it does:

Reads source table schema from your workspace
Creates matching custom table with correct structure
Handles dynamic fields via Analytics tier workaround
Enables seamless switching to Data Lake tier

The workaround: Create tables in Analytics tier first (which supports dynamic types), then switch to Data Lake tier via the portal. This preserves the dynamic field support.

2. Azure access requirements

Contributor or Owner permissions on Log Analytics workspace
Ability to deploy ARM templates
Access to Azure Portal and Defender portal

3. Optimization scope decision

Identify which XDR tables to optimize based on current ingestion volumes, real-time analytics requirements, compliance needs, and detection rule dependencies. Start with your highest-volume tables for maximum impact.

Implementation guide

The implementation follows three main steps: creating custom tables, configuring the DCR, and verifying data flow. Let’s walk through each one.

Step 1: Create custom data lake tables

We’ll use Marko Lauren’s tableCreator.ps1 script to automate the table creation process and avoid schema issues.

Download the tool

Download tableCreator.ps1 from the GitHub repository:

GitHub link

Run the table creator

The tool uses command-line parameters. Execute it via Azure CLI with your workspace details:

# Login to Azure
az login

# Run the table creator script with parameters
.\tableCreator.ps1 -tableName DeviceProcessEvents -newTableName DeviceProcessEventsLake_CL -type analytics -retention 30 -totalRetention 365

Parameter explanation:

Parameter	Value	Description
`-tableName`	`DeviceProcessEvents`	Existing XDR table to replicate
`-newTableName`	`DeviceProcessEventsLake_CL`	New custom table name (must end with `_CL`)
`-type`	`analytics`	Start with analytics, not datalake/auxiliary
`-retention`	`365`	Total retention in days
`-ConvertToString`	(optional)	Use if you get dynamic type errors
`-TenantId`	(optional)	Specify if multi-tenant environment

Why -type analytics first? Even though our final destination is Data Lake, we must create the table as analytics type initially to preserve dynamic field support. We’ll switch it to Data Lake tier in the next step.

Switch table to data lake tier

Now that the table exists with the correct schema including dynamic fields, switch it to data Lake tier:

Navigate to Microsoft Defender Portal → Microsoft Sentinel → Configuration → Tables

Locate your custom table (DeviceProcessEventsLake_CL)
Click on the table → Click Manage table
Under table tier options, select data lake tier only
Configure retention (e.g., 1 year for compliance)
Click Save

Your table is now in the Auxiliary (data lake) tier with full dynamic field support, ready to receive data.

Create additional tables

Repeat this process for each XDR table you want to optimize:

Source Table	Custom Table Name	Use Case
DeviceProcessEvents	DeviceProcessEventsLake_CL	Process execution history
DeviceNetworkEvents	DeviceNetworkEventsLake_CL	Network connection logs
DeviceFileEvents	DeviceFileEventsLake_CL	File activity forensics
CloudAppEvents	CloudAppEventsLake_CL	Cloud application activity
EmailEvents	EmailEventsLake_CL	Email flow history

Step 2: Create workspace transformation DCR

Now we’ll create the Data Collection Rule that routes XDR data from original tables to your custom Data Lake tables. This is where the magic happens.

Initial DCR creation via portal

Start by creating the basic DCR structure through the Azure Portal:

Navigate to Azure Portal → Log Analytics workspaces → Select your workspace → Tables

Locate any source XDR table (e.g., DeviceProcessEvents)
Click on the table → Click Create transformation
Configure the DCR:

Field	Value	Note
DCR name	`dcr-sentinel-workspace-xdr-lake-routing`	Use descriptive name
Resource group	Same as workspace	Keep together
Region	Same as workspace	Performance
Description	Routes XDR data to Data Lake custom tables	Clear purpose

Click Create

This creates a Workspace Transformation DCR (Kind: WorkspaceTransforms) that’s automatically associated with your workspace.

Important limitation: You can only have one Workspace Transformation DCR per workspace, but this single DCR can handle all your table routing.

Configure DCR via ARM template

The portal UI only allows configuring one table at a time. To efficiently route multiple XDR tables, we’ll edit the DCR’s ARM template directly.

Navigate to Azure Portal → Data Collection Rules → Locate your DCR

Verify Kind shows WorkspaceTransforms in the Overview
Click Export template → Deploy → Edit template

Write down the name from the workspaceresourceId (under LogAnalytics):

        {
            "type": "Microsoft.Insights/dataCollectionRules",
            "apiVersion": "2023-03-11",
            "name": "[parameters('dataCollectionRules_dcr_sentinel_workspace_xdr_lake_routing_name')]",
            "location": "westeurope",
            "kind": "WorkspaceTransforms",
            "properties": {
                "dataSources": {},
                "destinations": {
                    "logAnalytics": [
                        {
                            "workspaceResourceId": "[parameters('workspaces_sec_management_siemlog_prd_001_externalid')]",
                            "name": "5511c2c9b1764fed860fa62716b60686"
                        }
                    ]
                },

Replace the template content (dataFlows-part) with the configuration below. The template configures data flows for multiple XDR tables simultaneously:

"dataFlows": [
  {
    "streams": [
      "Microsoft-Table-DeviceProcessEvents"
    ],
    "destinations": [
      "5511c2c9b1764fed860fa62716b60686"
    ],
    "outputStream": "Custom-DeviceProcessEventsLake_CL"
  },
  {
    "streams": [
      "Microsoft-Table-DeviceNetworkEvents"
    ],
    "destinations": [
      "5511c2c9b1764fed860fa62716b60686"
    ],
    "outputStream": "Custom-DeviceNetworkEventsLake_CL"
  },
  {
    "streams": [
      "Microsoft-Table-DeviceFileEvents"
    ],
    "destinations": [
      "5511c2c9b1764fed860fa62716b60686"
    ],
    "outputStream": "Custom-DeviceFileEventsLake_CL"
  },
  {
    "streams": [
      "Microsoft-Table-CloudAppEvents"
    ],
    "destinations": [
      "5511c2c9b1764fed860fa62716b60686"
    ],
    "outputStream": "Custom-CloudAppEventsLake_CL"
  },
  {
    "streams": [
      "Microsoft-Table-EmailEvents"
    ],
    "destinations": [
      "5511c2c9b1764fed860fa62716b60686"
    ],
    "outputStream": "Custom-EmailEventsLake_CL"
  }
]

You will end up with a template like this:

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "dataCollectionRules_dcr_sentinel_workspace_xdr_lake_routing_name": {
            "defaultValue": "dcr-sentinel-workspace-xdr-lake-routing",
            "type": "String"
        },
        "workspaces_sec_management_siemlog_prd_001_externalid": {
            "defaultValue": "/subscriptions/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/resourceGroups/sec-management-siemlog-prd-001/providers/microsoft.operationalinsights/workspaces/sec-management-siemlog-prd-001",
            "type": "String"
        }
    },
    "variables": {},
    "resources": [
        {
            "type": "Microsoft.Insights/dataCollectionRules",
            "apiVersion": "2023-03-11",
            "name": "[parameters('dataCollectionRules_dcr_sentinel_workspace_xdr_lake_routing_name')]",
            "location": "westeurope",
            "kind": "WorkspaceTransforms",
            "properties": {
                "description": "Routes XDR tables directly to Data Lake custom tables, bypassing Analytics tier ingestion costs",
                "dataSources": {},
                "destinations": {
                    "logAnalytics": [
                        {
                            "workspaceResourceId": "[parameters('workspaces_sec_management_siemlog_prd_001_externalid')]",
                            "name": "5511c2c9b1764fed860fa62716b60686"
                        }
                    ]
                },
                "dataFlows": [
                    {
                        "streams": [
                            "Microsoft-Table-DeviceProcessEvents"
                        ],
                        "destinations": [
                            "5511c2c9b1764fed860fa62716b60686"
                        ],
                        "outputStream": "Custom-DeviceProcessEventsLake_CL"
                    },
                    {
                        "streams": [
                            "Microsoft-Table-DeviceNetworkEvents"
                        ],
                        "destinations": [
                            "5511c2c9b1764fed860fa62716b60686"
                        ],
                        "outputStream": "Custom-DeviceNetworkEventsLake_CL"
                    },
                    {
                        "streams": [
                            "Microsoft-Table-DeviceFileEvents"
                        ],
                        "destinations": [
                            "5511c2c9b1764fed860fa62716b60686"
                        ],
                        "outputStream": "Custom-DeviceFileEventsLake_CL"
                    },
                    {
                        "streams": [
                            "Microsoft-Table-CloudAppEvents"
                        ],
                        "destinations": [
                            "5511c2c9b1764fed860fa62716b60686"
                        ],
                        "outputStream": "Custom-CloudAppEventsLake_CL"
                    },
                    {
                        "streams": [
                            "Microsoft-Table-EmailEvents"
                        ],
                        "destinations": [
                            "5511c2c9b1764fed860fa62716b60686"
                        ],
                        "outputStream": "Custom-EmailEventsLake_CL"
                    }
                ]
            }
        }
    ]
}

Understanding the template:

Each dataFlow entry routes data from a source stream to a destination table. The streams value must match Microsoft’s exact naming (e.g., Microsoft-Table-DeviceProcessEvents). The outputStream must include the Custom- prefix for custom tables. The transformKql value of "source" means no filtering – all data passes through. You can add KQL filtering here if needed (covered later).

Click Save → Review + create → Create to deploy the DCR

Step 3: Verify data flow

After deploying the DCR, data should start flowing to your custom tables within 15-30 minutes. Let’s verify everything is working correctly.

Check 1: Original table status

DeviceProcessEvents
| take 10

Expected: Low count or zero (data is being routed to the custom table)

Check 2: Custom table receiving data

DeviceProcessEventsLake_CL
| take 10

Datalake
Microsoft Sentinel
XDR

Expected: Data flowing with proper schema

Monitor cost savings

Track the effectiveness of your optimization with these queries:

Volume comparison:

let OriginalVolume = toscalar(DeviceProcessEvents | where TimeGenerated > ago(24h) | count);
let OptimizedVolume = toscalar(DeviceProcessEventsLake_CL | where TimeGenerated > ago(24h) | count);
print 
    OriginalTableEvents = OriginalVolume,
    DataLakeTableEvents = OptimizedVolume,
    RoutingSuccess = iff(OptimizedVolume > OriginalVolume, "Working", "Check DCR")

Cost savings estimate KQL example:

let DailyVolumeGB = toscalar(
    DeviceProcessEventsLake_CL
    | where TimeGenerated > ago(24h)
    | extend DataSizeBytes = estimate_data_size(*)
    | summarize TotalBytes = sum(DataSizeBytes)
    | extend GB = TotalBytes / 1024 / 1024 / 1024
    | project GB
);
// Updated costs for 2025 GA pricing (East US region)
let AnalyticsIngestionCost = 4.30;  // Per GB (updated from $3.207)
let DataLakeIngestionCost = 0.05;    // Per GB
let DataLakeProcessingCost = 0.10;   // Per GB
let DataLakeStorageCost = 0.02;      // Per GB per month (compressed)
// Calculate monthly costs
let StandardMethodCost = DailyVolumeGB * 30 * AnalyticsIngestionCost;
let OptimizedMethodCost = (DailyVolumeGB * 30 * (DataLakeIngestionCost + DataLakeProcessingCost)) + (DailyVolumeGB * 30 * DataLakeStorageCost);
let MonthlySavings = StandardMethodCost - OptimizedMethodCost;
print 
    DailyVolumeGB = round(DailyVolumeGB, 2),
    MonthlyVolumeTB = round((DailyVolumeGB * 30) / 1024, 2),
    StandardMethodCost = strcat("$", round(StandardMethodCost, 2)),
    OptimizedMethodCost = strcat("$", round(OptimizedMethodCost, 2)),
    EstimatedMonthlySavings = strcat("$", round(MonthlySavings, 2)),
    SavingsPercentage = strcat(round((MonthlySavings / StandardMethodCost) * 100, 1), "%")

Optional: Advanced filtering with transformKql

The examples (template) above don’t use "transformKql": "source" which routes all data without filtering. You can add KQL transformations to filter data before ingestion, providing even more cost savings.

Filtering at the DCR level provides maximum savings because filtered data never reaches your workspace. However, ensure you’re not filtering out data needed for compliance or future investigations. When in doubt, keep everything and filter during queries instead.

Data analysis and operations

Data lake exploration with KQL

Accessing data lake exploration interface

Navigation steps:

Navigate to Microsoft Defender Portal → Microsoft Sentinel → Data lake exploration
Select target workspace from dropdown menu
Configure appropriate time range for analysis
Begin KQL query development and execution

Query capabilities and limitations

Supported KQL control commands: Microsoft’s data lake documentation shows the following control commands are currently supported:

.show version
.show databases
.show databases entities
.show database

Performance and functional limitations:

Limitation category	Specification	Recommended workaround
Query timeout	8 minutes maximum	Use time-based filtering and sampling techniques
Result size	500,000 rows or 64MB	Implement summarization and aggregation
Concurrent queries	45 per minute per workspace	Plan and stagger query execution schedules
Query scope	Single workspace per query	Design workspace-specific analysis approaches

Sample queries and optimization

Suspicious travel activity analysis:

SigninLogs
| where TimeGenerated >= ago(180d)
| where ResultType == 0
| summarize CountriesAccessed = make_set(Location) by UserPrincipalName
| where array_length(CountriesAccessed) > 3  // Adjust threshold as needed

Time-first filtering approach:

// Efficient query pattern leveraging time partitioning
SecurityEvent
| where TimeGenerated > ago(7d)     // Partition elimination first
| where Computer startswith "DC"    // Apply specific filters second
| where EventID in (4624, 4625)     // Multiple conditions last
| summarize count() by Computer

Jupyter notebooks for security analytics

Step 1: Install and configure VS Code extension

Required components:

Visual Studio Code (latest version)
Microsoft Sentinel extension for VS Code
Python environment with appropriate packages

Installation process:

Open Visual Studio Code
Navigate to Extensions (Ctrl+Shift+X)
Search for “Microsoft Sentinel”
Click Install on the Microsoft Sentinel extension
Microsoft Sentinel shield icon appears in left toolbar

Step 2: Configure compute resources

Select runtime

Runtime pool options:

Pool size	Compute resources	Memory	Recommended use cases	Cost considerations
Small	4 cores	8GB	Basic analytics, data exploration, development	Lower cost, suitable for small datasets
Medium	8 cores	16GB	Standard security analytics, reporting	Moderate cost, production workloads
Large	16 cores	32GB	ML models, complex analysis, large datasets	Higher cost, resource-intensive operations

Microsoft’s billing documentation shows compute hours are calculated by multiplying the number of cores with session time, affecting Advanced Data Insights billing.

Step 3: Develop security analytics workflows

Basic data exploration example:

from sentinel_lake.providers import MicrosoftSentinelProvider
import pandas as pd
import matplotlib.pyplot as plt

# Initialize data provider
data_provider = MicrosoftSentinelProvider(spark)

# Read data from custom data lake table
df = data_provider.read_table("DeviceNetworkEventsLake_CL")
df.select("TimeGenerated", "DeviceName", "ProcessName", "ProcessCommandLine").show(10)

Automated jobs and scheduling

Understanding data lake jobs

Microsoft’s job documentation shows that jobs enable automated analysis and data promotion between tiers. Two primary job types are supported:

KQL jobs: Query-based analytics using KQL syntax
Notebook jobs: Python/Spark analytics using Jupyter notebooks

Creating and managing KQL jobs

Navigate to Microsoft Defender Portal → Microsoft Sentinel → Data lake exploration
Develop and test your KQL query
Click Create job button in upper right corner

Job limitations during preview:

Limitation category	Specification	Impact on implementation
Concurrent executions	3 per tenant maximum	Requires queue management and scheduling coordination
Job timeout	1 hour maximum execution time	Necessitates query optimization for complex operations
Enabled jobs	100 per tenant maximum	Requires lifecycle planning and job prioritization

Performance monitoring and optimization

Microsoft’s data lake exploration documentation shows search functionality within results and comprehensive query management capabilities.

Performance optimization guidelines:

Performance area	Limitation	Optimization strategy
Query execution time	8 minutes maximum	Implement time-based filtering and data sampling
Result set size	500,000 rows or 64MB	Use summarization and aggregation techniques
Concurrent operations	45 queries per minute	Plan and schedule query execution appropriately
Data processing	Single workspace scope	Design workspace-specific analysis workflows

Troubleshooting common issues

Onboarding error codes

Microsoft’s troubleshooting documentation provides specific error codes for common onboarding issues:

DL101: Regional mismatch error

Error: Can't complete setup
Description: Primary workspace region differs from tenant home region
Resolution: Ensure workspace is in same region as tenant home region

DL102: Resource availability error

Error: Lack of Azure resources in region
Description: Insufficient Azure resources available during provisioning
Resolution: Retry setup or attempt during off-peak hours

DL103: Azure policy restrictions

Error: Azure policies prevent resource creation
Description: Organizational policies blocking required resource creation
Resolution: Create policy exemption for Microsoft.SentinelPlatformServices

Data integration troubleshooting

Data not appearing in data lake:

Wait period: Refer to the sections above for estimated timeline expectations
Connector verification: Verify data connector configuration and status
Retention settings: Check tier configuration and retention policies
Permissions: Validate managed identity permissions and role assignments

Best practices and governance

Security and access management:

Implement principle of least privilege access
Conduct regular access reviews and role assignments
Use Unified RBAC for granular workspace-specific permissions
Monitor data access through comprehensive audit logs

Cost management best practices:

Use time-based filtering for optimal partition elimination
Implement appropriate sampling techniques for large datasets
Monitor query patterns and associated costs regularly
Balance operational needs with cost optimization objectives

Additional information

Official Microsoft documentation:

Community resources:

Conclusion

When I first started testing with Microsoft Sentinel data lake, I’ll be honest – the pricing models and retention tiers felt overwhelming. But after implementing it across multiple environments and seeing the cost savings firsthand, I’m convinced this is one of the most significant improvements Microsoft has made to Sentinel.

Why this matters to me (and probably to you too):

We’ve all been in those budget meetings where leadership questions why security data retention costs so much. The data lake finally gives us a real answer to that problem. Being able to keep years of security logs without burning through budget means we can actually do our jobs properly – hunt for threats, investigate incidents thoroughly, and meet compliance requirements without constantly worrying about costs.

My honest take on implementation:

The standard retention configuration through the Defender portal is straightforward – you can set it up in minutes. For most Sentinel tables, it just works and the 90 days of free analytics storage is genuinely helpful. Where it gets tricky is with XDR tables. That 30-day limit catches a lot of people off guard, and suddenly you’re looking at massive ingestion bills if you’re not careful.

What I’ve learned along the way:

Start simple. Don’t try to optimize everything at once. Get comfortable with basic retention configuration first, understand your data volumes, and then decide if the advanced DCR approach makes sense for your environment. Monitor your costs closely in the first few months – you’ll quickly see which tables are driving expenses and where optimization efforts will have the biggest impact.

Also, don’t underestimate the value of having historical data readily available. I’ve been in too many investigations where we needed logs from 6+ months ago and they were either gone or stuck in some archived format that took hours to access. The data lake solves that problem elegantly.

Looking ahead:

Microsoft Sentinel data lake is now GA, which means it’s stable and production-ready. The pricing has settled down, the features are solid, and the performance is good. I’m excited to see where Microsoft takes this – the foundation is there for some really powerful capabilities around AI-driven analysis and long-term behavioral detection. I will try to cover MCP and Graph in my next posts, also digging more in Jupyter and Notebooks soon :-).

If you’re still on the fence about enabling data lake, my advice is simple: just do it. The automatic mirroring from analytics to data lake costs nothing, and you’ll immediately have a safety net for your data. You can always optimize later, but at least you won’t lose historical data while you’re figuring out your strategy.

Final thoughts:

Security operations is hard enough without having to choose between doing the right thing and staying within budget. The data lake removes that impossible choice. Yes, there’s complexity to manage. Yes, you need to understand the cost model. But the alternative – deleting security logs because you can’t afford to keep them – is far worse.

Take the time to understand your options, configure things properly, and you’ll have a data retention strategy that actually works for both security and finance. And honestly? That’s a win we don’t get very often in this field.

Leave a Reply Cancel reply

Previous Post

Automatic Attack Disruption XDR M365 Defender

Automatic Attack Disruption in Microsoft Defender XDR

13 August 2025

Next Post

Proactively Block Cloud Apps (like AI) with Microsoft Defender for Cloud Apps and Defender for Endpoint

24 October 2025