Emergency Rollback Playbook

Published · Last updated: May 2026 · 3 min read

Who this is for

Users in an incident where a recent deploy, config change, or restore has broken a production site/app, and they need to recover service quickly.

Rollback Priority Order (Fastest Recovery First)

When production is down, use this order:

Rollback deployment (fastest, least destructive)
Restart service (if issue is runtime-only)
Restore from backup (slower, more destructive)
Cloud snapshot restore (slowest, disaster recovery path)

Step 0 — Stabilize Incident Communication

Before technical steps:

Notify your team that rollback is in progress.
Pause new deployments temporarily.
Assign one person to execute rollback and one person to monitor user impact.

Step 1 — Roll Back the Latest Deployment

For Sites

Go to Site detail → Deployments.
Find the last known good deployment (green/success).
Click Rollback.
Wait for completion in Activity Center.

For Apps

Go to App detail → Deployments.
Select the last known good image/commit.
Click Rollback.
Confirm the app status returns to running.

Step 2 — Validate Recovery

After rollback completes:

Open the production URL in a private/incognito window.
Check core user actions (homepage, login, checkout/API endpoint).
Check server CPU/memory and app logs for fresh errors.

If service is restored, keep the rollback state and begin root cause analysis before any re-deploy.

Step 3 — If Deployment Rollback Fails

If rollback also fails:

Check SSH connectivity (rollback requires server access):

See KB-12-05: SSH Unreachable Playbook

Check disk space (rollback can fail if disk is full):

Delete old backups/logs or resize disk

Retry rollback.

If still failing, proceed to backup restore.

Step 4 — Restore from Backup (Last Resort in App Layer)

Go to Server detail → Backups.
Choose the most recent known-good full backup.
Click Restore with Safe Mode enabled.
Monitor in Activity Center.

Warning: Backup restore overwrites current files/database state. Changes made after the backup point are lost.

See KB-05-05: Restore from Backup.

Step 5 — Provider Snapshot Restore (Disaster Path)

If server-level corruption exists (disk failure, severe misconfiguration):

Restore from a cloud snapshot using provider console (AWS/GCP/Azure/DO).
Re-import the server in CloudAIPilot if needed.

This path is slower but can recover from deep infrastructure damage.

Post-Rollback Checklist

After service is stable:

[ ] Keep incident timeline notes (what failed, when, what fixed it)
[ ] Preserve failed deploy logs for analysis
[ ] Create a root cause issue/ticket
[ ] Add a pre-deploy backup policy if missing
[ ] Add a staging verification gate before production deploys