Incident Response Playbook (No Fluff): Containment, Eradication, and Recovery for Small Teams

A painful truth I’ve seen in real incident calls: most small teams don’t fail because they lack “cool tools.” They fail because they don’t have a fast, practiced plan for what to do in the first 60 minutes. That’s where this Incident Response Playbook (No Fluff) earns its keep.

Here’s the direct answer you’re looking for: for a real incident, you contain first, prove the scope next, remove the cause, then recover in a controlled way. The playbook below is built for small teams with limited staff, and it focuses on actions you can take today in 2026.

If you want more background on how threats move and how to spot them, pair this with our Threat Intelligence updates and practical detection tips. For hands-on log work, our Splunk incident detection tutorial is also a good companion.

Incident Response Playbook (No Fluff): the rules that stop panic

The goal of an incident response playbook is simple: make sure you do the right things in the right order under stress. When you’re tired, distracted, and the business is calling, you need a checklist that doesn’t rely on memory.

Before we touch containment, I want to set four rules I follow on every response I run. These prevent the most common mistakes I’ve seen with small teams.

One incident owner. Pick one person to make final calls. If everyone is “helping,” decisions stall and evidence gets missed.
No “temporary” exceptions. If you make an exception (like leaving a risky account unlocked “just for now”), write it down with a time and reason.
Time-stamp everything. If you can’t say when something happened, you can’t prove what fixed it.
Separate business calls from forensics work. Don’t let customer updates drown out log review and evidence handling.

Incident response refers to the steps a team takes to handle a security event end-to-end: identifying it, stopping the damage, removing the threat, and restoring systems safely. In practice, you’ll repeat these steps in cycles as you learn more.

Containment: stop the bleeding without breaking your evidence

Team reviewing security alerts on multiple monitors during an incident response.

Containment is about halting harm fast while keeping enough evidence to understand how the attacker got in. The best containment move is the one you can do quickly and that won’t destroy your proof.

When I teach teams, I use this mindset: you’re not “fixing” the breach yet. You’re buying time to learn the scope.

First 15 minutes: triage and safe isolation

In the first 15 minutes, you decide which systems to treat as “hot.” A hot system is one that’s confirmed or strongly suspected of compromise.

Do these in order:

Freeze the timeline. Record alert IDs, timestamps, hostnames, user accounts, IPs, and any hashes you already have.
Identify the blast radius. Look for shared logins, shared admin tools, shared service accounts, and any lateral movement signs.
Isolate the obvious winners. For endpoint compromise, use your EDR to pull the device off the network (examples: CrowdStrike “containment” actions, Microsoft Defender for Endpoint device actions, or Sophos Intercept X isolation). For server compromise, disable inbound access at the firewall first.
Preserve data. If your team can do it fast, export relevant logs from SIEM/EDR (Windows Event Logs, process creation logs, auth logs). Keep them in a read-only folder.

What most people get wrong here: they isolate too late and they isolate too hard. “Too late” means the attacker keeps moving. “Too hard” means you shut down systems so aggressively that you lose the artifacts you need (like volatile processes, in-memory scripts, or still-running malware behavior).

Containment options for small teams (pick one lane)

You generally have three containment lanes. Pick one lane per asset based on what you have in place and how fast you can act.

Lane	Best for	Typical actions	Tradeoffs
Network containment	Servers and services	Block inbound ports, restrict egress, move to quarantine VLAN	May leave malware running internally
Endpoint containment (EDR)	Laptops, workstations, desktops	Isolate host, block suspicious processes, revoke tokens	Needs EDR coverage
Account containment	Compromised identities	Force password reset, disable sessions, revoke refresh tokens	May not stop an active host breach

My rule of thumb: if the incident started with phishing or a suspicious login, containment should include account containment within the first hour. If the incident started with malware execution on a host, use endpoint containment immediately.

Eradication: remove the cause, not just the symptoms

Eradication is the step where you remove the attacker’s foothold for real. If you only delete one file or reset one password, you’ll almost always get a second incident.

Eradication means finding what was changed (accounts, scheduled tasks, persistence points, backdoors, web shells, new admin groups) and then cleaning it with proof.

Prove scope before you wipe everything

Before you wipe, you need a scope statement. A scope statement is a short list of what’s affected, what’s not, and what evidence supports it.

Here’s a scope checklist I’ve used with small teams during 2026 incidents:

Accounts: Which user accounts logged in from where, and when?
Hosts: Which endpoints ran the suspicious processes or contacted the suspicious IPs?
Paths: What was the likely initial access path (phishing link, exposed RDP, stolen cookie, vulnerable app)?
Persistence: Did you find scheduled tasks, new services, startup items, registry run keys, or unusual cron jobs?
Data access: Did sensitive files get read, copied, encrypted (ransomware), or exfiltrated?

Tip: write the scope as bullet points with timestamps. It keeps you honest and helps with decision-making when the business asks, “Are we clean yet?”

Eradication steps for the three most common incidents

Most small team incidents fall into a few patterns. Below are practical eradication actions you can run even with limited staff.

1) Credential theft / suspicious logins

If the attacker stole credentials, remove the access path and look for persistence in accounts.

Revoke sessions and refresh tokens. In Microsoft 365, force sign-out and revoke tokens in Entra ID. In Google Workspace, revoke sessions for the user.
Reset passwords and rotate secrets. Reset user passwords and rotate API keys, access tokens, and service account secrets.
Check for rules and automation. Attackers often set up forwarding rules or new mail rules in O365/Gmail.
Hunt for related logins. Look for other accounts that logged in from the same IP ranges or used the same user agent.

My experience: if you only reset the password and ignore refresh tokens, you’ll see the attacker come back.

2) Malware / endpoint compromise

If the attacker dropped malware, you remove it and remove persistence.

Block the initial payload and indicators. Use EDR indicators (hashes, file paths, domains) to block future execution.
Remove persistence points. Check scheduled tasks, startup entries, services, WMI event subscriptions, and browser extensions.
Scan the full endpoint. Run a full malware scan with your EDR. If you don’t have one, at least run a known-good offline scan on the host.
Validate admin group membership. Verify local administrators and domain admin groups. Malware often adds itself or adds a user to elevate access.

Time estimate for small teams: if you catch persistence fast, eradication often takes 2–6 hours per endpoint. If you need to rebuild the host from a clean image, plan a full day per “hard” device.

3) Web app compromise (web shells, changed configs)

For web app incidents, you don’t “repair” files by guessing. You confirm integrity and then restore.

Freeze the deployment. Stop CI/CD for that app or lock it to a known good pipeline revision.
Look for web shell signatures. Search for suspicious server-side scripts in upload folders and temp directories.
Check config drift. Look for changes in environment variables, reverse proxy rules, and authentication settings.
Rebuild from known-good artifacts. Restore the app from a tagged release or a verified backup, then redeploy.
Patch the entry point. If the root cause was a vulnerability, patch it before recovery or the attacker returns.

What many teams miss: they find a web shell and delete it, but the deployment pipeline is still compromised. That means the shell comes back on the next deploy.

Recovery: bring systems back in a safe order (and verify)

IT professional verifying systems in a server room after restoring services.

Recovery is where incidents often get worse. Teams rush to “get back to normal” and accidentally re-enable the same weak spot.

Recovery refers to restoring operations while proving that the threat is gone. It’s not a restart. It’s a controlled return.

Recovery plan order: identity → endpoints → servers → apps

Use this ordering because it reduces the chance of re-compromise.

Identity first. Confirm tokens are revoked, MFA is enforced where needed, and admin roles are correct.
Endpoints next. Patch the initial entry point, confirm malware removal, and rejoin devices after they pass checks.
Servers after that. Restore from clean backups, rotate secrets, and check persistence again.
Apps last. Redeploy only from verified builds, then watch logs closely for repeated indicators.

If you’re using a SIEM like Splunk or Microsoft Sentinel, keep a “watch period” dashboard ready for at least 24–72 hours after recovery. For small teams, this is one of the cheapest ways to avoid surprise repeats.

Verification checks you can run in the first 4 hours

These checks are built for “no extra staff” reality. You do them right after you restore.

Auth sanity: No new suspicious logins from the same attacker IPs; no unusual impossible travel patterns.
Process sanity: No repeated suspicious process names or command lines on endpoints.
Persistence sanity: No new scheduled tasks, services, run keys, or web shells.
Network sanity: No unexpected egress to known bad domains or new outbound connections to random IPs.
Integrity sanity: Key files match known-good hashes (or you redeployed from signed artifacts).

In 2026, a lot of teams are also turning on “advanced hunting” views inside EDR products. If you have CrowdStrike or Microsoft Defender, use those hunts as your verification checklist, not as your investigation tool only.

Post-incident work: the part that prevents the next one

Post-incident is where small teams win long-term. You don’t need a 50-page report. You need clear fixes and proof you’ll do them again.

The 5-question postmortem that actually changes behavior

After every incident, I run a short meeting around these five questions. This keeps it practical.

What happened? Write the timeline in plain language.
How did we detect it? Which alert or signal started the chain?
What slowed us down? Missing logs, unclear ownership, unclear access?
What fixed it? Which containment and eradication actions removed the cause?
What will we change in 30 days? One or two measurable actions, not ten vague ones.

If you also track vulnerabilities, tie this back to our Vulnerabilities & Exploits playbook. A lot of incidents are just “known weakness” plus a bad day.

Make your playbook better with “incident receipts”

Here’s an original angle I push: build “incident receipts.” A receipt is a tiny piece of proof that one action worked. Examples: a screenshot of EDR containment status, a log export showing a revoked token, a hash check of restored files, or a firewall rule change plus evidence that outbound traffic stopped.

When you collect receipts, your next incident response gets faster because you don’t argue about facts. You also have evidence if leadership asks why you isolated something for 18 hours.

Tooling and setup for small teams (what to use in 2026)

You don’t need an enterprise stack to run an effective Incident Response Playbook. You need coverage and speed.

Minimum practical stack for response

If you’re building from scratch, aim for this baseline. It’s enough to run containment and eradication without guesswork.

EDR for endpoints: CrowdStrike, Microsoft Defender for Endpoint, SentinelOne, Sophos—any serious EDR is fine as long as isolation and telemetry work.
Central logs: A SIEM or log platform (Splunk, Sentinel, Elastic). Even basic auth and process logs matter.
Identity controls: Entra ID (Azure AD), Google Workspace, Okta—so you can revoke sessions and enforce MFA.
Ticketing and incident notes: A place where you can write timestamps and decisions (Jira, ServiceNow, even a shared doc if needed).
Backups you can restore: Verify restore speed. Backups that can’t be restored are just “storage,” not protection.

Limitation: if you don’t have EDR coverage, you can still respond, but your time to scope will jump. In that case, lean harder on identity logs, network logs, and manual host checks.

Practice drill that takes one afternoon

Here’s a drill you can run without hurting production: pick a fake alert and walk through the playbook. Assign an incident owner, isolate a test device (or a lab host), and do a mock eradication plan.

Timebox it:

30 minutes: triage and containment decisions
45 minutes: scope and evidence preservation
45 minutes: eradication plan and recovery order
15 minutes: postmortem action items

After the drill, adjust your checklists based on what confused people. Your goal isn’t a perfect plan. Your goal is a plan your team can follow while stressed.

Incident Response Playbook template you can copy today

If you want something you can put into a doc and run the next time, use this template. It’s short enough for a small team, but it covers the real steps.

Incident header (fill this in first)

Incident owner:
Start time (with timezone):
Detection source (alert name / SIEM rule):
Impacted systems / users:
Containment actions taken (with timestamps):
Evidence saved (links or folder path):

Containment checklist

[ ] Isolated hot endpoints / blocked risky servers
[ ] Revoked sessions/tokens for impacted users
[ ] Logged firewall and identity changes
[ ] Preserved EDR telemetry and key logs

Eradication checklist

[ ] Confirmed persistence points and removed them
[ ] Rotated secrets (keys, tokens, passwords)
[ ] Patched root cause (or locked down compensating control)
[ ] Validated that malicious indicators stop triggering

Recovery checklist

[ ] Restored from known-good sources
[ ] Re-enabled services in the right order
[ ] Monitored for repeated indicators for 24–72 hours
[ ] Updated the watch dashboard and alerts

Conclusion: your small team doesn’t need more tools—you need faster decisions

This Incident Response Playbook (No Fluff) is built around one idea: containment, eradication, and recovery work only when you keep the order straight and you prove what you changed. If you do nothing else this week, write down who owns incidents, practice isolation once, and create receipts for your fixes.

When the next alert hits in 2026, you shouldn’t be Googling what to do. You should be running your checklist, saving evidence, and getting systems back safely—without guessing.

Marcus Hale

64 Posts

Marcus is a whitehat security researcher who has spent the better part of a decade breaking things on purpose — mostly web applications, the occasional misbehaving API, and one memorable smart doorbell. He started QuickFix Security after one too many friends asked him to "just explain what a zero-day actually is." His day job is penetration testing for mid-market companies, and his night job is writing these posts with a cup of coffee that is always colder than he remembers putting it down. If you need to reach him, info@quickfixappli.com is the fastest route — just don't send him a PDF resume.

View All Posts

Incident Response Playbook (No Fluff): Containment, Eradication, and Recovery for Small Teams

Incident Response Playbook (No Fluff): the rules that stop panic

Containment: stop the bleeding without breaking your evidence

First 15 minutes: triage and safe isolation

Containment options for small teams (pick one lane)