feat(uptime): wire alarm notifications on uptime transitions#407
feat(uptime): wire alarm notifications on uptime transitions#407capacitron wants to merge 3 commits intodatabuddy-analytics:stagingfrom
Conversation
Query enabled uptime alarms for the monitored website on DOWN/RECOVERED transitions and dispatch notifications via NotificationClient to all configured destinations (Slack, Discord, Teams, Google Chat, Telegram, Webhook). No schema changes required. Closes databuddy-analytics#268
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
izadoesdev
left a comment
There was a problem hiding this comment.
👋 Welcome to Databuddy, @capacitron! Thanks for picking up #268 — really appreciate the contribution and the thorough PR description.
Summary
This PR wires the existing alarms system to uptime monitoring transitions (DOWN/RECOVERED). When a monitored site changes state, it queries enabled uptime alarms for that websiteId, builds a NotificationClient per alarm from its configured destinations, and dispatches notifications via Slack, Discord, Teams, Google Chat, Telegram, or Webhook. No schema changes needed.
Review
What's good:
- Clean implementation that follows the existing
NotificationClientdispatch pattern frompackages/rpc/src/routers/alarms.ts - Proper early returns and null checks throughout
- Per-alarm error isolation with
captureErrorcontext - No breaking changes, no schema changes, no high-impact pipeline modifications
- PR description is excellent — clear approach, linked issue, AI disclosure
Issues found:
🟡 Unsafe type assertions for Telegram config — cfg.botToken as string and cfg.chatId as string skip runtime validation and could pass undefined into the Telegram client if config is malformed. See inline comment for a fix.
🔵 Sequential alarm dispatch — notifications are sent one-by-one which could slow the uptime pipeline if destinations are slow. Promise.allSettled would parallelize safely. Not blocking since it matches the existing email pattern.
🔵 Webhook headers cast — same as pattern concern for cfg.headers.
Assessment: Looks good for merge after addressing the Telegram type safety issue (🟡). The suggestions (🔵) are nice-to-haves.
| channels.push("google-chat"); | ||
| } else if (dest.type === "telegram") { |
There was a problem hiding this comment.
🟡 Warning: Unsafe type assertions
cfg.botToken and cfg.chatId could be undefined at runtime if a Telegram destination has malformed or incomplete config. Casting straight to string skips any runtime validation and would pass undefined into the Telegram client silently.
Consider guarding before pushing the channel:
const botToken = typeof cfg.botToken === "string" ? cfg.botToken : undefined;
const chatId = dest.identifier || (typeof cfg.chatId === "string" ? cfg.chatId : undefined);
if (botToken && chatId) {
clientConfig.telegram = { botToken, chatId };
channels.push("telegram");
}This way a misconfigured Telegram destination gets skipped instead of causing a runtime error or a silent API failure.
| if (channels.length === 0) { | ||
| continue; | ||
| } | ||
|
|
There was a problem hiding this comment.
🔵 Suggestion: Parallelize alarm dispatch
Alarms are dispatched sequentially here — each client.send() is awaited before moving to the next alarm. If a webhook destination is slow or times out, it blocks notifications for all subsequent alarms and holds up the uptime check pipeline.
Since each alarm is independent, Promise.allSettled would let them run in parallel with per-alarm error isolation:
await Promise.allSettled(
matchingAlarms
.filter((a) => a.destinations?.length)
.map(async (alarm) => {
// ... build clientConfig + channels ...
const client = new NotificationClient(clientConfig);
await client.send(payload, { channels });
})
);Not a blocker since it matches the sequential pattern in the existing email function, but worth considering since alarm count per site could grow.
| chatId: dest.identifier || (cfg.chatId as string), | ||
| }; | ||
| channels.push("telegram"); | ||
| } else if (dest.type === "webhook") { |
There was a problem hiding this comment.
🔵 Suggestion: Same as cast pattern applies to webhook headers
cfg.headers as Record<string, string> | undefined has the same issue as the Telegram casts above. If cfg.headers is unexpectedly a non-object type, this could cause subtle issues downstream. A quick typeof guard or a validation utility would make this more robust.
Greptile SummaryThis PR wires the existing
Confidence Score: 4/5Not safe to merge as-is: the same-type destination collision will silently drop all but the last webhook and send duplicates to the survivor. One P1 logic bug (last-one-wins on same-type destinations) means some alarm destinations will never receive notifications when multiple of the same channel type are configured, while the surviving destination gets duplicate notifications. This is a correctness defect on the primary changed code path and should be addressed before merging. apps/uptime/src/uptime-transition-emails.ts — the destination-loop logic in sendUptimeAlarmNotificationsIfNeeded needs a per-destination client approach. Important Files Changed
Sequence DiagramsequenceDiagram
participant QS as QStash
participant UP as Uptime Service
participant CH as ClickHouse
participant KF as Kafka (Redpanda)
participant PG as PostgreSQL
participant EM as Resend (Email)
participant NC as NotificationClient
QS->>UP: POST / (uptime check trigger)
UP->>UP: lookupSchedule(scheduleId)
UP->>UP: checkUptime(url)
UP->>CH: getPreviousMonitorStatus(monitorId)
CH-->>UP: previousStatus
UP->>KF: sendUptimeEvent(result)
UP->>UP: resolveTransitionKind(previous, current)
alt Transition detected (down / recovered)
UP->>EM: sendUptimeTransitionEmailsIfNeeded
UP->>PG: query alarms WHERE websiteId + enabled + triggerType=uptime
PG-->>UP: matchingAlarms[]
loop for each alarm
UP->>NC: new NotificationClient(clientConfig)
NC->>NC: send(payload, channels)
Note over NC: Same-type destinations collapse into one config key
end
end
UP-->>QS: 200 OK
Reviews (1): Last reviewed commit: "feat(uptime): wire alarm notifications o..." | Re-trigger Greptile |
| const clientConfig: Record<string, Record<string, unknown>> = {}; | ||
| const channels: NotificationChannel[] = []; | ||
|
|
||
| for (const dest of alarm.destinations) { | ||
| const cfg = (dest.config ?? {}) as Record<string, unknown>; | ||
|
|
||
| if (dest.type === "slack") { | ||
| clientConfig.slack = { webhookUrl: dest.identifier }; | ||
| channels.push("slack"); | ||
| } else if (dest.type === "discord") { | ||
| clientConfig.discord = { webhookUrl: dest.identifier }; | ||
| channels.push("discord"); | ||
| } else if (dest.type === "teams") { | ||
| clientConfig.teams = { webhookUrl: dest.identifier }; | ||
| channels.push("teams"); | ||
| } else if (dest.type === "google_chat") { | ||
| clientConfig.googleChat = { webhookUrl: dest.identifier }; | ||
| channels.push("google-chat"); | ||
| } else if (dest.type === "telegram") { | ||
| clientConfig.telegram = { | ||
| botToken: cfg.botToken as string, | ||
| chatId: dest.identifier || (cfg.chatId as string), | ||
| }; | ||
| channels.push("telegram"); | ||
| } else if (dest.type === "webhook") { | ||
| clientConfig.webhook = { | ||
| url: dest.identifier, | ||
| headers: cfg.headers as Record<string, string> | undefined, | ||
| }; | ||
| channels.push("webhook"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Multiple same-type destinations silently dropped
The loop overwrites clientConfig[type] on each iteration, so when an alarm has two destinations of the same channel type — permitted by the alarm_destinations_alarm_type_identifier_unique index which is keyed on (alarmId, type, identifier) — only the last destination's config survives. channels still contains one entry per destination, so NotificationClient.send ends up calling the surviving provider twice, sending duplicate notifications to the last URL while all earlier ones receive nothing. The same pattern exists in packages/rpc/src/routers/anomalies.ts buildClientConfig.
The fix is to instantiate one NotificationClient per destination rather than collapsing everything into a single config map. Process each dest individually, build a singleConfig with only that one channel key, and call client.send(payload, { channels: [channel] }) — one client per destination. This ensures every configured URL receives exactly one notification regardless of how many destinations share the same channel type.
| try { | ||
| const client = new NotificationClient(clientConfig); | ||
| await client.send(payload, { channels }); | ||
| } catch (error) { | ||
| captureError(error, { | ||
| error_step: "uptime_alarm_notification", | ||
| alarm_id: alarm.id, | ||
| }); |
There was a problem hiding this comment.
Notification send failures silently ignored
client.send() returns NotificationResult[] and resolves successfully even when individual channel sends fail (it uses Promise.allSettled internally, reporting failures in the result objects rather than throwing). Errors are only captured for exceptions. Failed deliveries — e.g. a bad webhook URL or an expired token — will leave the results array with { success: false, error: "..." } entries that are never logged or captured.
…dispatch - Replace unsafe `as string` casts on botToken/chatId with runtime typeof checks, skipping the destination if config is malformed - Add typeof guard for webhook headers cast - Parallelize alarm dispatch with Promise.allSettled for better performance when multiple alarms have slow destinations
|
Thanks for the thorough review @izadoesdev! Addressed both items: 🟡 Telegram type safety — replaced 🔵 Sequential dispatch — switched to 🔵 Webhook headers — applied the same |
Fixes #268
Approach
Wire the existing alarms system to uptime monitoring DOWN/RECOVERED transitions. No schema changes — the
alarmstable already haswebsiteIdandtriggerType="uptime".Changes:
apps/uptime/src/uptime-transition-emails.ts— addssendUptimeAlarmNotificationsIfNeeded()that queries enabled uptime alarms for the monitoredwebsiteIdon transition, builds aNotificationClientper alarm from its destinations, and dispatches viabuildUptimeNotificationPayloadapps/uptime/src/index.ts— calls the new function aftersendUptimeTransitionEmailsIfNeededSupports all existing destination types: Slack, Discord, Teams, Google Chat, Telegram, and Webhook.