Create Alert Rules Effectively

Who this is for

Users setting up monitoring alert rules for the first time, or anyone who wants to improve their existing alert configuration to reduce noise and catch real problems.

What you will complete

Create alert rules for CPU, RAM, disk, and load average, understand how to set effective thresholds, and verify the rule is working.

Before you begin

  • At least one server must be connected and in a Running state.
  • Admin role required to create alert rules.
  • Navigate to Alerts in the left sidebar.

How alert rules work

An alert rule defines: "If metric X on server Y crosses threshold Z for duration D, fire an alert at severity S."

When the condition is met, an alert event is created in the Alert Events tab, a notification is sent to your configured channels (email, Slack, etc.), and the alert stays in "Firing" state until the metric recovers past the hysteresis threshold.


Step-by-step: create an alert rule

  1. Go to Alerts in the left sidebar.
  2. Click the Rules tab (it is the default view).
  3. Click Add Rule or the + button.
  4. Fill in the rule form:

a. Select a metric:

  • CPU Usage — monitors CPU percentage
  • RAM Usage — monitors memory percentage
  • Disk Usage — monitors disk percentage
  • Load Average — monitors process queue depth

b. Set the condition:

  • Greater than (gt) — fires when metric exceeds the threshold (most common for CPU, RAM, disk)
  • Less than (lt) — fires when metric drops below the threshold (useful for detecting server going offline)
  • Equal to (eq) — fires when metric matches an exact value (rarely used)

c. Set the threshold value: Enter a number. For percentage metrics (CPU, RAM, disk), this is a percentage from 0–100. For load average, this is the raw load value.

d. Set the duration: How many minutes the condition must be continuously true before the alert fires. Recommended values:

  • CPU: 5 minutes (avoids false positives from short deployment spikes)
  • RAM: 10 minutes (RAM changes slowly)
  • Disk: 0 minutes (disk usage is not transient — fire immediately)
  • Load: 5 minutes

e. Set the severity:

  • Warning — elevated condition, no immediate action required
  • Critical — requires immediate attention

f. Select the server (optional): If left empty, the rule applies to all servers in your organization. To scope it to a specific server, select it from the dropdown.

g. Set a resolve threshold (optional): Override the default 10% hysteresis. Useful if you want to fine-tune when the alert clears. Leave empty to use the default.

h. Set escalation minutes (optional): If the alert stays firing for this many minutes without being acknowledged, it escalates. Leave empty for no escalation.

  1. Click Create Rule or Save.
  2. The rule appears in the Rules list with status "Active".

Recommended starting rules for all servers

RuleMetricConditionThresholdDurationSeverity
High CPUCPU Usagegt855 minWarning
Critical CPUCPU Usagegt955 minCritical
High RAMRAM Usagegt8510 minWarning
Critical RAMRAM Usagegt955 minCritical
High DiskDisk Usagegt800 minWarning
Critical DiskDisk Usagegt900 minCritical
High LoadLoad Averagegt[num cores × 1.5]5 minWarning

Replace [num cores × 1.5] with 1.5× the number of CPU cores on your server (e.g., for a 2-core server, set load threshold to 3.0).


How to pause or delete a rule

To temporarily stop a rule from firing:

  1. Find the rule in the Rules list.
  2. Click the Pause button on the rule row. The rule enters Paused status and will not fire alerts.
  3. Click Resume to re-activate.

To permanently delete a rule:

  1. Click the Delete (trash) button on the rule row.
  2. Confirm deletion. This cannot be undone.

What success looks like

  • The rule appears in the Rules list with status "Active".
  • To verify the rule fires, you can temporarily set an artificially low threshold (e.g., CPU > 5%), wait for it to trigger, confirm the alert event appears and notification arrives, then delete the test rule and create the correct one.

Common errors and fixes

"My alert fires immediately even though the condition was just met" Cause: Duration is set to 0 minutes — the alert fires the moment the threshold is crossed. Fix: Set the duration to 5 minutes or more for transient metrics like CPU.

"My alert fires constantly and clears within seconds" Cause: The metric is oscillating around the threshold (flapping). The hysteresis default may not be enough. Fix: Raise the threshold slightly, or set a custom resolve threshold under "Resolve Threshold" that requires the metric to recover further before clearing.

"I created a rule but it never fires even when the metric is high" Cause: The rule may be scoped to a specific server while you are checking a different server. Fix: Check the server selector on the rule. If empty, the rule applies to all servers. If set, it applies only to that specific server.

"Alert fired but I did not receive a notification" Cause: No notification channel is configured, or the channel has an error. Fix: Go to Settings → Notification Channels and verify at least one channel is active and tested. See KB-10-07.


Related articles