Approach
Phase 1: Discovery and Data Collection
- Objective: Understand the current state of TMF's IT
environment.
- Activities:
o
Inventory all
resources using Azure.
o
Identify application
and resource dependencies with the help of customer.
o
Conduct stakeholder
interviews to gather insights into critical applications.
o
Collect data from
Azure Monitor, Log Analytics, and Application Insights.
o
Document existing
policies, governance practices, and tagging strategies.
Phase 2: Pilot Mapping and Baseline Establishment
- Objective: Develop a reference model using a pilot application.
- Activities:
o
Choose a
representative application with TMF's input.
o
Map resource
dependencies and utilization metrics.
o
Establish benchmarks
for CPU, memory, disk I/O, and network usage.
o
Analyse application
performance metrics using Application Insights.
o
Identify initial
optimization opportunities based on the pilot.
Phase 3: Scalable Assessment
- Objective: Scale the methodology to assess all
applications and resources.
- Activities:
o
Extend pilot
methodology to other applications and environments.
o
Use automated tools
and manual efforts as per the extensive experience in Azure.
o
Categorize
applications by criticality, cost, and complexity.
o
Document findings and
prioritize actions based on ROI.
Phase 4: Governance and Policy Implementation
- Objective: Ensure resource control and enforce best
practices.
- Activities:
o
Design and implement
Azure Policies to restrict resource sprawl.
o
Develop a
comprehensive tagging framework for cost management and visibility.
o
Implement role-based
access controls (RBAC) for secure resource management.
o
Automate policy
enforcement using Azure Blueprints.
Phase 5: Optimization Planning
- Objective: Develop actionable recommendations for
optimization.
- Activities:
o
Propose rightsizing
for underutilized or oversized resources.
o
Identify workloads for
autoscaling configurations.
o
Recommend PaaS or SaaS
options for refactoring applications.
o
Highlight
opportunities for cost savings through reserved instances and savings plans.
Phase 6: Reporting and Final Presentation
- Objective: Deliver insights and align with stakeholders.
- Activities:
o
Compile a detailed
final report summarizing findings and recommendations.
o
Provide a roadmap for
implementing recommendations, including estimated ROI and timelines.
Steps for Benchmarking
Steps to Find Benchmarking of Applications Running on
Azure
Benchmarking involves establishing performance,
utilization, and cost baselines for applications to identify optimization
opportunities. Here's how to systematically benchmark applications running on
Azure:
1. Inventory and Dependency Mapping
· Objective: Identify all resources associated with the
application and its dependencies.
o
Use Azure
Resource Graph Explorer, Powershell or Azure CLI to list resources
tagged with the application's name.
o
az resource list --tag
ApplicationName=YourApp
o
Map resource
dependencies using Azure Monitor Service Map or Network Watcher.
o
Meet with Stakeholders.
Output: Complete inventory of the
application resources and dependency graph. Mapping
dependencies ensures you don’t break critical integrations or services during
optimization or deprovisioning.
How It Helps:
·
Prevents accidental deletion of
dependent resources.
·
Enables accurate scope
definition for optimization and migration planning.
2. Monitor Resource Utilization
Objective: Gather real-time and historical performance data for
resources. This will help us understand how resources
are actually used and prevents overprovision or under provisioning.
o
Enable Azure
Monitor for each resource (App Service, VMs, databases, etc.).
o
Key metrics to
monitor:
§ App Service: CPU,
Memory, Request Latency, Throughput.
§ Azure SQL Database: DTU/VCU
utilization, Query Performance.
§ VMs: CPU, Memory, Disk
I/O, Network Bandwidth.
§ Azure Storage:
Transaction Metrics, Data Ingress/Egress.
o
Configure data
collection in Log Analytics Workspace for centralized
monitoring.
- Output: Performance dashboards and utilization reports.
How It Helps:
·
Informs right-sizing
decisions.
·
Helps trigger auto-scaling
rules.
3. Collect Application Insights
· Objective: Analyze
application-level metrics and performance.
· Steps:
o
Enable Application
Insights for the application.
o
Track key metrics:
§ Response Times.
§ Error Rates.
§ Dependency Failures (e.g., Database Calls).
§ User Load Patterns.
o
Review the Performance and Failures tabs
for actionable insights.
· Output:
Application performance reports highlighting bottlenecks and usage trends.
How It Helps:
·
Supports application tuning and
faster root-cause analysis.
·
Improves resiliency and
responsiveness of the app.
4. Establish Utilization Patterns
Objective: Identify normal, peak, and underutilized usage patterns.
Supports planning for auto-scale, shutdown, or resizing.
o
Analyze usage trends
over a representative period (e.g., 30 days).
o
Use Azure
Metrics to view hourly, daily, and monthly usage patterns.
- Correlate
usage data with business activities (e.g., peak hours, campaigns).
- Output: Detailed utilization patterns and baseline
thresholds.
How It Helps:
- Enables
time-based optimizations like auto-shutdown for dev/test or scheduled
scaling.
5. Conduct Load Testing
- Objective:
Simulate user activity to evaluate application performance under varying
loads. Load test results with recommendations for
scaling and optimization. Helps identify performance breaking points
before users do.
o
Use Azure Load
Testing or third-party tools like JMeter or Locust.
o
Define test scenarios
to mimic real-world workloads (e.g., peak user traffic).
o
Monitor resource
performance during tests using Azure Monitor and Application
Insights.
- Output: Load test results with recommendations for
scaling and optimization.
How It Helps:
- Informs
capacity planning and auto-scaling thresholds.
- Validates
infrastructure readiness for high demand (e.g., campaigns, product
launches).
6. Cost Benchmarking
- Objective:
Understand cost allocation and identify areas for savings. Helps us to understand the breakdown of what is costing and why.
Identifies wasteful or inefficient spending.
o
Use Azure Cost
Management to analyze cost by resource, service, and application.
o
Identify underutilized
or idle resources contributing to unnecessary costs.
o
Review recommendations
in Azure Advisor for cost-saving opportunities.
- Output: Cost
breakdown reports and optimization suggestions.
How It Helps:
- Enables
cost governance policies.
- Helps
forecast and set budgets per app or department.
7. Compare Against Industry Standards
- Objective: Ensure the application meets performance and
security benchmarks. We will have the report and
prioritized improvement list here to act upon.
o
Use the Azure
Well-Architected Framework Review Tool to evaluate the application.
o
Compare findings with
industry standards for performance, security, and reliability.
o
Identify gaps and
prioritize improvements.
- Output: Compliance report and prioritized improvement
list.
How It Helps:
- Guides
roadmap for architectural and governance improvements.
- Justifies
investments in refactoring or modernization.
8. Document and Validate Benchmarks
- Objective:
Compile a report of established benchmarks and validate with stakeholders.
Ensure everyone agrees on what’s normal as per
the apps and business needs and what needs improvement. Captures
everything in one place for stakeholder validation and future reference.
o
Document baseline
metrics, including:
§ Resource utilization thresholds.
§ Performance benchmarks.
§ Cost metrics.
o
Validate benchmarks
with TMF stakeholders to ensure alignment with business goals.
- Output:
Final benchmarking report ready for optimization planning.
How It Helps:
- Becomes
the foundation for governance policy design.
- Drives
data-backed decision-making.
A benchmarking report provides
data-driven insights into resource utilization, cost efficiency, and
performance. This information is essential for implementing Azure
governance and baseline policies to prevent wastage and
optimize spending.
Technical Excecution of Baseline Benchmarking on Azure
1. Inventory and Dependency Mapping
What
to Check
|
Tools
Used
|
Action
/ How It’s Done
|
Full Inventory of Resources
|
Azure Resource Graph, Azure CLI, PowerShell,
Azure Portal, Azure REST API, Monitoring tool
|
Run queries to list all VMs, Storage, Databases,
App Services, etc., grouped by workload
|
Application
to Infra Mapping
|
Stakeholder interviews,
Tags, Naming conventions Resource Groups, Management Groups
|
Interview app owners and
correlate apps to the infra stack; validate by environment & tier
|
Network & App Dependencies
|
Azure Monitor Dependency Map, Network Watcher,
Log Analytics
|
visualize inter-service calls and port
dependencies
|
Identify
Shared Services
|
Network Topology (via
Network Watcher), Peering Maps, Application Diagrams
|
Identify common services
like AD, DNS, SQL, Bastion shared across apps
|
Service Connectivity Patterns
|
Azure Activity Logs, NSG Flow Logs, Application
Gateway Diagnostics, Firewalls, VPN
|
Analyze east-west and north-south traffic for
planning segmentation or isolation
|
2.
Monitor Resource Utilization
What
to Check
|
Tools
Used
|
Action
/ How It’s Done
|
VM Performance (CPU, Mem, Disk)
|
Azure Monitor, Log Analytics, Azure Metrics,
Insights, Monitoring Tools in-place
|
Enable diagnostics; review average vs. peak CPU,
memory, disk I/O over 30+ days
|
App Service
Metrics
|
Azure Monitor,
Application Insights, APM tool in-place
|
Monitor requests/sec,
response time, memory %, and throughput; check for autoscale triggers
|
Azure SQL DB/MI Usage
|
Query Performance Insight, Azure SQL Analytics,
Metrics Blade
|
Track DTU/VCU usage, long-running queries,
deadlocks, and connection spikes
|
Storage Usage
|
Azure Monitor, Storage
Metrics, Capacity Alerts
|
Monitor egress/ingress,
request latency, IOPS, and blob capacity over time
|
Network Bandwidth
|
Network Watcher, NSG Flow Logs, Azure Metrics
|
Evaluate outbound/inbound traffic trends for
sizing and security optimization
|
3.
Collect Application Insights
What
to Check
|
Tools
Used
|
Action
/ How It’s Done
|
Application Performance
|
Application Insights (Azure-native) or APM in
use, Monitoring tool in place
|
Enable App Insights via Azure Portal, SDK, or
ARM/Bicep templates for supported platforms
|
Response Time
& Latency Trends
|
Application Insights –
Performance Blade, APM in use
|
Track response times per
API or operation; identify slow endpoints
|
Error Rates & Exceptions
|
Application Insights – Failures Blade
|
Analyze failed requests, unhandled exceptions,
and HTTP status code distribution
|
Dependency
Failures
|
Application Insights –
Dependencies View
|
Detect failures in
backend services like SQL DB, Redis, or APIs; correlate with spike timing
|
User Load Patterns
|
Application Insights – Users/Session View
|
Identify peak usage hours, geography, session
duration, and entry/exit pages
|
Transaction
Tracing
|
Application Map &
Transaction Search
|
Visualize request flow;
use distributed tracing to pinpoint performance bottlenecks
|
4.
Establish Utilization Patterns
What
to Check
|
Tools
Used
|
Action
/ How It’s Done
|
Daily/Weekly Usage Patterns
|
Azure Monitor, Log Analytics, Azure Metrics
|
Collect and chart resource utilization over 30+
days (CPU, memory, DTUs, IOPS)
|
Peak vs. Idle
Hours
|
Workbooks, Application
Insights – Users & Sessions
|
Analyze traffic and usage
during business hours vs off-hours
|
Seasonal or Campaign-Based Spikes
|
Azure Metrics Correlation, App Insights + Biz
Event Mapping
|
Match resource spikes to business events (e.g.,
month-end jobs, promotions)
|
Service
Uptime & Scaling History
|
Autoscale Logs, Activity
Logs
|
Review when autoscaling
was triggered or service restarted due to load
|
Time-based Shutdown Opportunities
|
Azure Automation, Scheduled Events
|
Identify consistent idle periods (e.g., 7PM–7AM)
to schedule auto-shutdown for savings
|
5.
Conduct Load Testing
What
to Check
|
Tools
Used
|
Action
/ How It's Done
|
Load Testing Capabilities
|
Azure Load Testing (native), JMeter, Locust (Customer)
|
Simulate user traffic to test application
behavior under stress
|
Real-time
Performance Under Load
|
Azure Monitor,
Application Insights. IN-use tools
|
Monitor CPU, memory,
response time, and failures during the load test
|
Test Result Analysis
|
Azure Load Testing Reports, Logs, Dashboards
|
Identify bottlenecks, configure scaling rules,
plan for pre-scaling during heavy usage
|
Integration
into CI/CD
|
Azure DevOps, GitHub
Actions
|
Automate load tests as
part of your release pipelines to validate each build under load
|
6.
Cost Benchmarking
What
to Check
|
Tools
Used
|
Action
/ How It’s Done
|
Cost by Resource, Service, Subscription
|
Azure Cost Management + Billing, Azure Resource
Graph, Stakeholders
|
View cost breakdown by service/resource group/app
using “Cost by Resource” and “Cost by Tag” views
|
Unused or
Idle Resources
|
Azure Advisor, Cost
Analysis, Azure Monitor
|
Identify VMs, disks,
NICs, IPs, or PaaS services with little or no usage
|
High-Cost Services with Low Utilization
|
Azure Monitor + Cost Management + Manual efforts
|
Correlate top spenders with underutilized metrics
(e.g., high-cost VM using 5% CPU)
|
Opportunities
for Reservations or Savings Plans
|
Azure Advisor,
Reservations Blade, Savings Plans (Preview)
|
Identify workloads with
steady usage to move to Reserved Instances or Savings Plans
|
Department or Project Spend Allocation
|
Tags (CostCenter, AppName, Owner), Cost
Allocation Reports
|
Apply and enforce tagging, then use tag-based
reports to show department or team-level costs
|
Budget &
Forecasting
|
Budgets in Azure Cost
Management
|
Set monthly budgets per
app, department, or subscription with email alerts and forecasting
|
7.
Compare Against Industry Standards (WAF)
What
to Check
|
Tools
Used
|
Action
/ How It’s Done
|
Alignment with WAF Best Practices
|
Azure Well-Architected Review Tool (Portal or
PDF), Microsoft Learn
|
Perform structured WAF assessment across 5
pillars (Cost, Security, Ops, Performance, Reliability)
|
Secure Score
& Compliance Posture
|
Microsoft Defender for
Cloud – Secure Score, Compliance Blade
|
Review current Secure
Score and drill into controls not met (e.g., NSG, encryption, patching)
|
Gaps in Architecture Design
|
WAF Review + Architectural Diagrams
|
Identify missing design patterns (e.g., HA,
autoscale, tagging, DR strategy)
|
Security
Benchmark CIS (Azure Default)
|
Defender for Cloud
Azure Policies
|
Check mapped standards
and list of passed/failed controls
|
Comparison to Reference Architectures
|
Microsoft Azure Architecture Center, Landing Zone
Accelerator Docs
|
Use reference architectures to benchmark current
setup for each workload type
|
Prioritized
Improvement List
|
WAF Review Summary,
Secure Score Action Plan
|
Document prioritized and
quick-win areas for improvement based on criticality and impact
|
8.
Document and Validate Benchmarks
What
to Check
|
Tools
Used
|
Action
/ How It’s Done
|
Consolidated Performance & Utilization
Metrics
|
Azure Monitor, Log Analytics, Application
Insights, Cost Management
|
Export dashboards, charts, or workbook summaries
showing baseline CPU, memory, cost, availability
|
Cost
Benchmarks with Trends
|
Azure Cost Management,
Cost Analysis Reports, Advisor Snapshots
|
Summarize top spenders,
idle resources, and forecasted trends over the past 30–90 days
|
Security & Governance Baseline
|
Defender for Cloud Reports, Policy Compliance
Reports, Secure Score
|
Export Secure Score, policy compliance, and RBAC
audit reports
|
Business-Accepted
“Normal” Thresholds
|
Stakeholder Interviews,
Workbook Snapshots, Shared Dashboards
|
Review utilization and
cost baselines with app owners to validate what’s “normal” or expected
|
Stakeholder Sign-Off on Benchmark Report
|
Excel, Power BI, SharePoint, or PowerPoint
|
Compile all benchmarking data into a unified
report; review in workshop; finalize scope
|
Optimization
Readiness Snapshot
|
All of the Above
|
Use the validated
benchmark report as the foundation for optimization, governance, or
modernization planning
|
Implementation of Governance and Optimization
1. Defining & Enforcing Cost
Governance Policies
- Use
the benchmarking report to identify idle, underutilized, or
overprovisioned resources.
- Implement Azure
Policies to automatically detect and deallocate unused VMs,
storage, and databases.
- Enforce budget
limits using Azure Cost Management + Billing.
Example Policy:
- Auto-shutdown
non-production VMs after business hours.
- Restrict
VM sizes to cost-effective SKUs for dev/test environments.
- Set budgets
with cost alerts in Azure Cost Management.
2. Automating Performance & Resource
Optimization
- Benchmarking
identifies CPU, Memory, and Network bottlenecks.
- Helps
Identifies when and where resources are maxing out or sitting idle.
- Use Azure
Auto-Scaling to dynamically scale resources based on real
usage patterns.
Example Policy:
- Set
up Azure Monitor alerts to detect VMs exceeding 80%
CPU for 10+ minutes.
- Auto-scale
or move to a larger SKU only if sustained high usage is observed.
- Scale
down databases when DTU/VCU utilization is below 10% for 7
consecutive days.
3. Implementing Security &
Compliance Baselines
- The
benchmarking report highlights security gaps like
unencrypted storage, open ports, and missing access controls.
- Use Azure
Security Center (Microsoft Defender for Cloud) to apply
governance policies.
Example Policy:
- Enforce RBAC
(Role-Based Access Control) for least privilege access.
- Block
resources that are not encrypted at rest.
- Monitor
for publicly exposed endpoints and enforce Private Link
where required.
🔹
Action: Apply Azure Policy definitions
for security compliance.
4. Enforcing Tagging & Naming
Conventions
- Use
the benchmarking report to track untagged resources.
- Apply Azure
Policy to enforce consistent naming and tagging standards.
Example Policy:
- Enforce
tagging on resources
(e.g., Environment=Production, CostCenter=IT).
- Block
deployment of untagged resources to prevent untraceable costs.
- Utilize
Azure devOps with Gates to deploy as per the conditions met
- Use
IAC with preconfigured naming convention and tagging
🔹
Action: Implement Azure Policy to
enforce tagging and naming conventions.
5. Aligning with Azure Well-Architected
Framework
- Benchmark
results can be evaluated against the Azure Well-Architected
Framework. Shows gaps in architecture compared to WAF principles.
- Which
will help us Prioritize improvements in performance, security, cost,
and operational efficiency.
🔹
Action: Perform Azure Well-Architected
Review and implement governance recommendations.
**************************************
How Azure Application Insights contributes to:
-
🔍 Baseline Benchmarking
-
💰 Governance and Cost Optimization
-
🏛 Azure Well-Architected Framework (WAF) Assessment
🔹 1. Baseline Benchmarking
Objective: Establish a performance and usage baseline for each application.
How App Insights Helps:
-
Captures historical performance data (e.g., average response time, CPU load, number of requests)
-
Identifies normal vs. peak usage patterns
-
Highlights areas with frequent errors or long latencies
📊 Example: You observe that CPU usage spikes at 2 PM daily and response time increases by 40% — this becomes part of your baseline for scaling or optimization.
Value for Benchmarking:
-
Helps determine minimum required resources
-
Sets alert thresholds for future deviations
-
Informs auto-scaling policies and VM/app sizing
-
Forms the “before” picture for comparison after optimization
🔹 2. Governance and Cost Optimization
Objective: Reduce unnecessary spending by identifying inefficient or idle application behavior.
How App Insights Helps:
-
Detects unused features (low engagement endpoints)
-
Identifies overprovisioned services (low throughput, low CPU/memory usage)
-
Finds dependency failures causing retries or timeouts (wasting compute resources)
💡 Example: A backend API makes redundant calls to a DB causing timeouts and re-executions. Fixing this saves compute time and reduces load — directly reducing cost.
Governance Use Cases:
-
Tag apps and resources by function, environment, or owner for traceability
-
Monitor non-compliant response times or security exceptions
-
Align apps to resource usage governance policies
🔹 3. Azure Well-Architected Framework (WAF) Assessment
App Insights contributes to multiple WAF pillars, including:
WAF Pillar | App Insights Contribution |
---|
✅ Performance Efficiency | Measures app responsiveness, identifies scaling bottlenecks |
✅ Reliability (Resiliency) | Detects failed dependencies and service interruptions |
✅ Operational Excellence | Monitors live telemetry, supports alerting and proactive issue fixing |
✅ Cost Optimization | Flags inefficient code paths or redundant calls wasting compute |
✅ Security (indirect) | Can surface exceptions related to authentication/authorization failures or insecure endpoints |
🛠 Example (WAF Use Case): During a resiliency assessment, App Insights reveals a 7% failure rate in outbound API calls to a 3rd-party payment gateway — a candidate for retry logic improvement and failover design.
🔁 Summary Table
Goal | How App Insights Helps |
---|
Baseline Benchmarking | Establishes app usage and performance trends over time |
Governance | Monitors policy violations, resource inefficiencies, tag-based visibility |
Cost Optimization | Flags unnecessary compute usage, dependency issues, and idle resources |
WAF Assessment | Provides evidence for performance, resiliency, and operational reviews |
Would you like a sample report template or a set of recommended queries to extract insights from App Insights for these use cases?