Breach Postmortem Template: What to Collect, How to Analyze Root Causes, and Action the Fixes

Breach postmortem template overview, showing what to collect, root cause analysis, and actioning fixes with a checklist.

When a breach hits, the hardest part isn’t stopping the attack. It’s answering, clearly and fast, “Why did this happen?” and “What will we change so it can’t happen again?” I’ve helped teams write incident timelines where one missing log delayed the whole story by weeks. A good breach postmortem template prevents that pain by forcing you to collect the right data, analyze root causes the right way, and ship fixes you can measure.

A breach postmortem is a document and a meeting that turn an incident into learning you can act on. It’s not blame. It’s not a report for a law firm. It’s the shortest path from “we got hit” to “we’re safer next quarter.”

Below is a practical template you can copy for your next incident response and security review. It includes what to collect, how to analyze root causes (with examples), and how to turn findings into real work with owners, dates, and proof.

What a breach postmortem should produce (and what most teams get wrong)

The output should be a clear timeline, proven root causes, and a prioritized fix plan with measurable results. If your postmortem doesn’t end with tasks that can be verified, it’s just a story.

Here’s what teams often get wrong:

They collect too much data. Then nobody reads it. Keep evidence focused on the questions you need to answer.
They stop at “user clicked a phishing link.” That’s not a root cause. It’s a trigger. The real issue is usually training gaps, risky email rules, weak MFA, or missing endpoint controls.
They jump to fixes before analysis. You end up buying tools you don’t need, or patching the wrong thing.
They write action items without owners. If there’s no person and no due date, nothing moves.

My rule of thumb: if a postmortem can’t answer “What exact change will reduce risk?” it’s not complete.

Breach postmortem template: sections you should fill in every time

This template is set up like a checklist so you can move from evidence to findings to fixes without gaps.

1) Incident basics (fast context for everyone)

Start with enough detail that a reader can understand the incident without reading the entire evidence pile.

Incident ID
Date/time first detected (include time zone)
Date/time confirmed
Date/time contained
Systems/teams involved
Primary concern (data theft, ransomware, account takeover, service outage)
What was affected (users, assets, apps, regions)
Severity rating (your internal scale is fine)

Tip: Write the “who, what, when” in plain language first. Then you can add technical details.

2) Executive summary (the version leaders actually read)

Keep this short: 10 to 15 lines. Leaders should see the impact, the root cause level (not just symptoms), and the top 3 fixes.

What happened in 2–3 sentences
What data or access was at risk
How long it lasted (dwell time)
How you contained it
Top root causes (usually 1–4)
Top fixes with owners

3) Timeline (every key moment, with proof)

This is the backbone of any breach postmortem template. Build the timeline like a chain: detection → investigation → escalation → containment → eradication → recovery.

For each timeline line, include:

Timestamp and time zone
Event type (alert, login, file access, config change, block, patch)
Source (SIEM rule, EDR alert, firewall log, cloud audit log)
Evidence link (case ID, log query, screenshot, hash)
Confidence level (High/Med/Low) based on log quality

In 2026, most teams use cloud audit logs plus EDR plus SIEM. If any one of those is missing, say so. Don’t pretend it’s there.

4) Scope and impact (what you’re sure about vs. what you’re still checking)

Separate confirmed impact from assumptions. This matters for both trust and legal/compliance steps.

Confirmed affected accounts
Confirmed affected hosts
Confirmed data accessed (file paths, DB tables, bucket names)
Potentially affected systems (with why)
Estimated data volume (rows/files/GB) if known
Customer impact (if any) and notification decision

Opinion: I prefer writing “confirmed” like a detective. If you can’t prove it, put it in “possible” and assign a follow-up query.

5) Detection and response quality

Don’t just ask “were we alerted?” Ask “did the alert lead us to the right place quickly?”

Detection method (SIEM alert, EDR, user report)
Time to first meaningful action
Mean time to contain (MTTC)
Was the alert accurate? (true positive/false positive)
Gaps in tooling (missing logs, wrong retention, no asset inventory)

6) Root cause analysis (the “why” section)

This is where your breach postmortem template proves its value. Root cause means the underlying cause that allowed the incident to happen—not the last person who touched a keyboard.

In the next sections, I’ll show a clear method you can use.

7) Corrective and preventive actions (CAPA)

CAPA is a common term from quality and safety work. In security, it means fixes now (corrective) and changes that stop repeats (preventive).

For each action item, record:

Action title
Category (identity, endpoint, network, app code, monitoring, process)
Root cause it addresses (map to a specific finding)
Owner (name or team)
Due date
Effort (S/M/L or person-weeks)
Verification method (test case, log query, control check)
Status

8) Lessons learned (what you’ll do differently next time)

This is the human part. Capture lessons like “we need an incident runbook for X” or “our log retention wasn’t enough.” Keep it practical.

What worked
What wasted time
What to change in runbooks
What training is needed
Any decisions you’d make sooner

What evidence to collect for a breach postmortem (so root causes are provable)

Security team collaborating around a whiteboard with corrective action steps and owners

Analyst reviewing a security incident timeline on a monitor with event markers

Evidence collection is the difference between “we think” and “we know.” The goal is to prove the timeline and the decision points with logs and system records you can re-check later.

Below is a collection list I’ve used across cloud and on-prem incidents. Adjust to your stack.

Core log and system sources

Identity and access logs: SSO provider (Okta/Azure AD/Auth0) sign-in logs, MFA events, role changes, password resets
Cloud audit logs: AWS CloudTrail, GCP Cloud Audit Logs, Azure Activity Logs
EDR telemetry: process start/stop, command lines, script execution, file changes, network connections (examples: CrowdStrike, Microsoft Defender for Endpoint, SentinelOne)
SIEM event data: alert metadata, rule version, related events, enrichment fields
Server and endpoint logs: Windows Event Logs (Security/System), Linux auth.log, application logs
Network logs: firewall, proxy, DNS logs, load balancer logs
Database logs when relevant: query logs, auth logs, audit tables
Configuration history: IAM policy changes, security group changes, infrastructure changes (Terraform plan/apply logs)

Evidence that matters for “how it happened”

For root cause analysis, you don’t need everything. You need the proof for each stage of the attack lifecycle.

Collect:

Initial access proof: first suspicious login or first malicious process, earliest related IPs/hosts, first edited config
Privilege changes: role assignments, token grants, group membership changes, new API keys, new service accounts
Persistence: new scheduled tasks, cron jobs, services, startup scripts, new keys stored in CI/CD secrets
Discovery and lateral movement: scans, enumeration commands, unusual connections to internal services
Exfiltration or impact: large egress, downloads from sensitive stores, data staging directories, bulk reads

One thing I insist on: save the exact queries you used in SIEM. That way the next incident uses the same logic (and you can improve it).

Evidence for “why we missed it”

If you only collect evidence about the attacker, you won’t get strong root cause findings.

Detection coverage gaps: what alerts should have fired but didn’t
Log gaps: retention too short, wrong log category enabled, sampling issues
Alert tuning: overly broad allowlists, missing enrichment (like host owner)
Runbook gaps: no step-by-step for triage, unclear escalation paths
Asset inventory gaps: unknown endpoints or cloud resources without monitoring

Preserve chain-of-custody (when you need it)

If you work with legal, regulators, or customers, you may need evidence handling rules. I’m not a lawyer, but I’ve seen teams lose credibility by exporting logs without preserving timestamps or source system details.

Keep at least:

Raw log exports with time range
System snapshots (where possible) with hashes
Case IDs and who approved each containment step

If your organization has a formal incident evidence policy, follow it even if it slows you down a bit.

How to analyze root causes: a practical method that avoids “blame the user”

Root cause analysis is not guesswork. It’s a structured way to connect evidence to failure points in people, process, and controls.

I use a method that combines a timeline, “attack path stage mapping,” and then a control gap check. Most teams skip the mapping step and jump straight into generic categories like “training” or “patching.”

Step 1: Build a stage map of the incident

Define the stages you’ll use (common ones are initial access, execution, privilege escalation, persistence, discovery, exfiltration/impact). Then place each major event in the timeline into one of those stages.

Example (account takeover):

Initial access: MFA fatigue or stolen password sign-in from a new country
Execution: OAuth token used to access email and drive files
Privilege escalation: new admin role assignment
Persistence: new recovery phone set or new API token created
Discovery: searches for “contract” and “client” files

This stage map keeps your root cause analysis tied to real behavior, not vibes.

Step 2: For each stage, list the specific failure

Use plain wording. Instead of “weak security,” write “MFA did not stop sign-in from risky location because conditional access policy excluded that app.”

Here are examples that go beyond the usual:

Symptom: attacker got in via phishing
Failure: mail gateway allowed a look-alike domain, and the user’s mailbox had no safe link rewrite for external URLs
Symptom: attacker used stolen creds
Failure: privileged role used long-lived tokens with no rotation, and the token lifecycle wasn’t audited
Symptom: ransomware spread
Failure: lateral movement path existed via admin shares, and endpoint segmentation rules didn’t block it

Step 3: Use the “5 Whys” but stop at controls

The “5 Whys” technique is simple: ask why five times. But in security, you should stop when you reach a control or decision point you can change.

Example (audit log missing):

Why didn’t we detect the privilege change? Because audit logs weren’t searchable.
Why weren’t they searchable? Because log retention was 7 days in production.
Why only 7 days? Because the SIEM ingest license limited volume.
Why didn’t we budget for security retention? Because the last review focused only on cost, not detection needs.
What can we change? Increase retention for IAM events and reduce noise by sampling low-value sources.

Notice how the last “why” ends with a control change you can fund and test.

Step 4: Separate root causes from contributing factors

One incident often has many contributing problems. Root causes are the small set of issues that most directly allowed the breach to succeed.

Use this rule:

Root cause: if fixed, the same attack path would not work (or would be stopped sooner).
Contributing factor: makes detection harder, but the attack would still likely succeed without it.

This is where teams get it wrong. They list “training” as a root cause even when MFA policy clearly failed. Training can be a contributing fix, not the main root cause.

Step 5: Map findings to controls (so fixes are grounded)

For each root cause, map it to the control type: identity, endpoint, network, application, monitoring, or process. You can even reference NIST CSF categories (like ID.AM, PR.AC, DE.CM) if your org uses them.

This also helps your next audit.

Action the fixes: how to write CAPA tasks that actually prevent repeats

Action items fail when they’re vague. Your breach postmortem template should force tasks to be testable.

Use a fix pattern: Prevent → Detect → Respond → Learn

For each root cause, decide which loop you’re strengthening.

Prevent: change configurations, patch code, lock down permissions, add MFA/conditional access, remove risky defaults
Detect: improve alerts, add detections in SIEM/EDR, tighten rules, add missing logs
Respond: update runbooks, automate containment steps, add escalation and ownership
Learn: training updates, tabletop exercises, improved documentation

Make tasks measurable with verification steps

Every action item needs a “how we prove it worked” line.

Examples of good verification:

“After rollout, SIEM query X returns zero matches for last 30 days.”
“Conditional access policy blocks sign-in for app Y from countries not in allowlist; test account confirms.”
“EDR policy blocks macro execution and alerts when scripts spawn PowerShell with encoded commands.”
“New IAM role requires approval workflow; verify in staging that requests are logged and require ticket ID.”

My extra rule: add one verification task that checks coverage, not just the fix setting. People often set the policy correctly but forget the alert, the logging, or the affected systems.

Prioritize like a triage nurse (not like a wish list)

Use a simple priority score. You can do it without fancy tools.

Factor	Question	Score (1-5)
Risk reduction	Does this stop the same attack path?	1=low, 5=high
Reach	How many systems/accounts does it protect?	1=small, 5=large
Confidence	Do we have evidence that this is the root cause?	1=weak, 5=strong
Effort	How much time/money?	1=easy, 5=hard

Then prioritize by highest (risk reduction + reach + confidence) and lowest effort. You’ll still need judgment, but this keeps debates grounded.

Examples of CAPA items you can copy

Here are realistic CAPA examples based on common incident types.

Example A: Stolen credentials / account takeover

Prevent: enforce phishing-resistant MFA for admins and risky apps; remove legacy auth methods
Detect: alert on impossible travel + new admin role + OAuth token creation
Respond: runbook for rapid session revocation and key rotation
Learn: targeted training for the impacted department with the exact lure used

Example B: Vulnerability exploited in web app

Prevent: patch and add code-level fix; add input validation tests to CI
Detect: add WAF rules and SIEM alert on exploit pattern URIs
Respond: verify logs for data access and block suspicious sessions
Learn: review patch SLA for that component and ensure owners are named

Example C: Ransomware through endpoint + weak segmentation

Prevent: block admin shares between segments; disable unused SMB services
Detect: alert on mass file rename patterns and unusual backup deletions
Respond: tabletop exercise for containment isolation and restore steps
Learn: update endpoint hardening checklist and confirm it’s enforced

Real-world mini case study (what the template catches)

Here’s a scenario I’ve seen in slightly different forms in 2026: an attacker gains access to a vendor portal account. The breach postmortem shows that the credentials came from an old password reused by a staff member.

Many teams stop at “password hygiene.” That’s not enough. The template forces deeper answers using the stage map and control gap checks.

In one real audit-style review, the root cause wasn’t only password reuse. It was a mix:

Identity controls required MFA, but the vendor portal was excluded from conditional access for a legacy reason.
SIEM alerts were tuned to block “new country logins” but enrichment for the vendor portal app was missing, so the alert couldn’t tag it correctly.
Session revocation wasn’t in the runbook, so it took too long to kick the attacker out after confirmation.

That set produced CAPA tasks that were verifiable: remove the exclusion, fix enrichment, add session revocation steps to the playbook, and add a new detection rule tied to the app.

The big win: the team could prove reduced risk by testing the conditional access policy and confirming the new alert fired on staged attempts.

Internal linking: related security topics on our blog

If you’re building out your incident response process, these posts pair well with this breach postmortem template:

Incident response playbook checklist — a practical guide for organizing response roles and steps.
Vulnerability management SLAs that actually work — what “time to patch” should look like in real life.
Phishing detections and threat intel signals for 2026 — how to turn intel into alerts, not bookmarks.

Image SEO note (what to use as your featured image)

Use a simple graphic showing the postmortem flow: Evidence → Timeline → Root Cause → CAPA Actions. Keep it readable. If you’re adding the keyword in the alt text, keep it descriptive.

Featured image alt text example: “Breach postmortem template evidence checklist for root cause analysis and action items”

Conclusion: your breach postmortem template is only “good” if fixes are testable

A breach postmortem template should not be a wall of text. It should be a proof tool: it connects evidence to root causes, then root causes to measurable fixes.

Use the sections above every time, especially the timeline with evidence links and the CAPA list with verification steps. If you do that, you’ll stop repeating the same failures and you’ll be able to show progress in the next incident review—fast.

If you want a quick starting move, pick one root cause from your last incident and rewrite it as a control failure you can test. When you can test it, you can fix it. That’s the whole job.

Marcus Hale

37 Posts

Marcus is a whitehat security researcher who has spent the better part of a decade breaking things on purpose — mostly web applications, the occasional misbehaving API, and one memorable smart doorbell. He started QuickFix Security after one too many friends asked him to "just explain what a zero-day actually is." His day job is penetration testing for mid-market companies, and his night job is writing these posts with a cup of coffee that is always colder than he remembers putting it down. If you need to reach him, info@quickfixappli.com is the fastest route — just don't send him a PDF resume.

View All Posts