mirror of
https://github.com/dcarrillo/atalaya.git
synced 2026-04-18 02:24:05 +00:00
372 lines
10 KiB
Markdown
372 lines
10 KiB
Markdown
# Atalaya Uptime Monitor
|
|
|
|
Atalaya (Spanish for watchtower) is an uptime & status page monitoring service running on Cloudflare Workers and Durable Objects.
|
|
|
|
Thanks to the generous Cloudflare free tier, Atalaya provides a simple, customizable, self-hosted solution to monitor the status of public network services,
|
|
aimed at hobbyists and users who want more control for free and are comfortable with Cloudflare's ecosystem.
|
|
|
|
Live [example](https://uptime.ifconfig.es/).
|
|
|
|
:warning: 99% of the code has been generated by an IA agent under human supervision, bearing in mind that I havent used TypeScript before. You have been warned!
|
|
|
|
- [Features](#features)
|
|
- [Architecture](#architecture)
|
|
- [Prerequisites](#prerequisites)
|
|
- [Setup](#setup)
|
|
- [Configuration](#configuration)
|
|
- [Settings](#settings)
|
|
- [Monitor Types](#monitor-types)
|
|
- [Regional Monitoring](#regional-monitoring)
|
|
- [Alerts](#alerts)
|
|
- [Status Page](#status-page)
|
|
- [Secret Management](#secret-management)
|
|
- [Security Notes](#security-notes)
|
|
- [Data Retention](#data-retention)
|
|
- [Development](#development)
|
|
- [Testing](#testing)
|
|
- [TODO](#todo)
|
|
|
|
```ASCII
|
|
🏴☠️
|
|
|
|
|
_ _|_ _
|
|
|;|_|;|_|;|
|
|
\\. . /
|
|
\\: . /
|
|
||: |
|
|
||:. |
|
|
||: .|
|
|
||: , |
|
|
||: |
|
|
||: . |
|
|
_||_ |
|
|
__ ----~ ~`---,
|
|
__ ,--~' ~~----____
|
|
```
|
|
|
|
## Features
|
|
|
|
- HTTP, TCP, and DNS monitoring.
|
|
- Regional monitoring from specific Cloudflare locations.
|
|
- Configurable retries with immediate retry on failure.
|
|
- Configurable failure thresholds before alerting.
|
|
- Custom alert templates for notifications (currently only webhooks are supported).
|
|
- Historical data stored in Cloudflare D1.
|
|
- Status page built with Astro 6 SSR, served by the same Worker via Static Assets.
|
|
- 90-day uptime history with daily bars.
|
|
- Response time charts (uPlot) with downtime bands.
|
|
- Basic auth or public access modes.
|
|
- Dark/light mode.
|
|
- Custom banner image support with optional clickable link.
|
|
|
|
## Architecture
|
|
|
|
The project is an npm workspace with a single Cloudflare Worker that handles everything:
|
|
|
|
```text
|
|
atalaya/
|
|
src/ Worker source (monitoring engine, JSON API, auth, Astro SSR delegation)
|
|
status-page/ Astro 6 SSR source (status page UI, built and served as static assets)
|
|
```
|
|
|
|
- **Worker** runs cron-triggered health checks, stores results in D1, enforces basic auth on all other routes, serves static assets (CSS, JS) via the `ASSETS` binding, and delegates to the Astro SSR handler for page rendering.
|
|
- **Regional checks** runs on Durable Objects.
|
|
- **Pages** is an Astro 6 SSR site built into `status-page/dist/`. It accesses D1 directly via `import { env } from 'cloudflare:workers'` — no service binding needed since everything runs in the same Worker.
|
|
|
|
## Prerequisites
|
|
|
|
- Node.js 22+
|
|
- [Wrangler](https://developers.cloudflare.com/workers/wrangler/install-and-update/) CLI
|
|
|
|
## Setup
|
|
|
|
1. Install dependencies:
|
|
|
|
```bash
|
|
npm install
|
|
```
|
|
|
|
2. Create the configuration file (`wrangler.toml`):
|
|
|
|
```bash
|
|
cp wrangler.example.toml wrangler.toml
|
|
```
|
|
|
|
3. Create D1 database:
|
|
|
|
```bash
|
|
wrangler d1 create atalaya
|
|
```
|
|
|
|
**Update `database_id`** in `wrangler.toml`.
|
|
|
|
4. Run migrations:
|
|
|
|
```bash
|
|
wrangler d1 migrations apply atalaya --remote
|
|
```
|
|
|
|
5. Configure alerts and monitors in `wrangler.toml`.
|
|
|
|
**For regional monitoring:** Ensure Durable Objects are configured in `wrangler.toml`. The example configuration (`wrangler.example.toml`) includes the necessary bindings and migrations.
|
|
**The status page is disabled by default**. To enable it, see the "Status Page" section in the configuration below.
|
|
|
|
6. Deploy:
|
|
|
|
```bash
|
|
npm run deploy
|
|
```
|
|
|
|
This builds the Astro site and deploys the Worker with static assets in a single step.
|
|
|
|
## Configuration
|
|
|
|
### Settings
|
|
|
|
Default values:
|
|
|
|
```yaml
|
|
settings:
|
|
title: 'Atalaya Uptime Monitor' # Status page title
|
|
default_retries: 2 # Retry attempts on failure
|
|
default_retry_delay_ms: 1000 # Delay between retries
|
|
default_timeout_ms: 5000 # Request timeout
|
|
default_failure_threshold: 2 # Failures before alerting
|
|
```
|
|
|
|
### Per-Monitor Overrides
|
|
|
|
Each monitor can override the global default\_\* settings:
|
|
|
|
```yaml
|
|
- name: 'critical-api'
|
|
type: http
|
|
target: 'https://api.example.com/health'
|
|
timeout_ms: 10000 # Override global check_timeout_ms
|
|
retries: 3 # Override global check_retries
|
|
retry_delay_ms: 500 # Override global check_retry_delay_ms
|
|
failure_threshold: 1 # Override global check_failure_threshold
|
|
alerts: ['alert']
|
|
```
|
|
|
|
### Monitor Types
|
|
|
|
**HTTP**
|
|
|
|
```yaml
|
|
- name: 'api-health'
|
|
type: http
|
|
target: 'https://api.example.com/health'
|
|
method: GET
|
|
expected_status: 200
|
|
headers: # optional, merged with default User-Agent: atalaya-uptime
|
|
Authorization: 'Bearer ${API_TOKEN}'
|
|
Accept: 'application/json'
|
|
alerts: ['alert']
|
|
```
|
|
|
|
All HTTP checks send `User-Agent: atalaya-uptime` by default. Monitor-level `headers` are merged with this default; if a monitor sets its own `User-Agent`, it overrides the default.
|
|
|
|
**TCP**
|
|
|
|
```yaml
|
|
- name: 'database'
|
|
type: tcp
|
|
target: 'db.example.com:5432'
|
|
alerts: ['alert']
|
|
```
|
|
|
|
**DNS**
|
|
|
|
```yaml
|
|
- name: 'dns-check'
|
|
type: dns
|
|
target: 'example.com'
|
|
record_type: A
|
|
expected_values: ['93.184.216.34']
|
|
alerts: ['alert']
|
|
```
|
|
|
|
### Regional Monitoring
|
|
|
|
Atalaya supports running checks from specific Cloudflare regions using Durable Objects. This allows you to test your services from different geographic locations, useful for:
|
|
|
|
- Testing CDN performance from edge locations
|
|
- Verifying geo-blocking configurations
|
|
- Measuring regional latency differences
|
|
- Validating multi-region deployments
|
|
|
|
**Valid Region Codes:**
|
|
|
|
- `weur`: Western Europe
|
|
- `enam`: Eastern North America
|
|
- `wnam`: Western North America
|
|
- `apac`: Asia Pacific
|
|
- `eeur`: Eastern Europe
|
|
- `oc`: Oceania
|
|
- `safr`: South Africa
|
|
- `me`: Middle East
|
|
- `sam`: South America
|
|
|
|
**Example:**
|
|
|
|
```yaml
|
|
- name: 'api-eu'
|
|
type: http
|
|
target: 'https://api.example.com/health'
|
|
region: 'weur' # Run from Western Europe
|
|
method: GET
|
|
expected_status: 200
|
|
alerts: ['alert']
|
|
|
|
- name: 'api-us'
|
|
type: http
|
|
target: 'https://api.example.com/health'
|
|
region: 'enam' # Run from Eastern North America
|
|
method: GET
|
|
expected_status: 200
|
|
alerts: ['alert']
|
|
```
|
|
|
|
**How it works:**
|
|
When a monitor specifies a `region`, Atalaya creates a Cloudflare Durable Object in that region, runs the check from there, and returns the result. Durable Objects are terminated after use to conserve resources. If the regional check fails, it falls back to running the check from the worker's default region.
|
|
|
|
**Note:** Regional monitoring requires Durable Objects to be configured in your `wrangler.toml`. See the example configuration for setup details.
|
|
|
|
### Alerts
|
|
|
|
Alerts are configured as a top-level array. Currently only webhook alerts are supported.
|
|
|
|
```yaml
|
|
alerts:
|
|
- name: 'slack'
|
|
type: webhook
|
|
url: 'https://hooks.slack.com/services/xxx'
|
|
method: POST
|
|
headers:
|
|
Content-Type: 'application/json'
|
|
body_template: |
|
|
{"text": "Monitor {{monitor.name}} is {{status.current}}"}
|
|
```
|
|
|
|
Template variables: `event`, `monitor.name`, `monitor.type`, `monitor.target`, `status.current`, `status.previous`, `status.consecutive_failures`, `status.last_status_change`, `status.downtime_duration_seconds`, `check.error`, `check.timestamp`, `check.response_time_ms`, `check.attempts`
|
|
|
|
### Status Page
|
|
|
|
The status page is an Astro 6 SSR site (under `status-page/`) served by the same Worker. It accesses D1 directly and renders monitor status, uptime history, and response time charts.
|
|
|
|
**Configuration (via Wrangler secrets on the Worker):**
|
|
|
|
```bash
|
|
# Set credentials for basic auth
|
|
wrangler secret put STATUS_USERNAME
|
|
wrangler secret put STATUS_PASSWORD
|
|
|
|
# Or make it public
|
|
wrangler secret put STATUS_PUBLIC # Set value to "true"
|
|
```
|
|
|
|
**Access rules:**
|
|
|
|
- If `STATUS_PUBLIC` is `"true"`: public access allowed
|
|
- If credentials are set: basic auth required
|
|
- Otherwise: 403 Forbidden
|
|
|
|
**Custom Banner:**
|
|
|
|
You can add a custom banner image to replace the title text on the status page:
|
|
|
|
```toml
|
|
# In wrangler.toml
|
|
[vars]
|
|
STATUS_BANNER_URL = "https://example.com/banner.png"
|
|
STATUS_BANNER_LINK = "https://example.com" # optional
|
|
```
|
|
|
|
## Secret Management
|
|
|
|
Secrets are managed via Cloudflare's secret system. To add a new secret:
|
|
|
|
1. Set the secret value:
|
|
|
|
```bash
|
|
wrangler secret put SECRET_NAME
|
|
```
|
|
|
|
2. Use it in config with `${SECRET_NAME}` syntax.
|
|
|
|
```yaml
|
|
alerts:
|
|
- name: 'slack'
|
|
type: webhook
|
|
url: 'https://hooks.slack.com/services/${SLACK_PATH}'
|
|
method: POST
|
|
headers:
|
|
Authorization: 'Bearer ${WEBHOOK_TOKEN}'
|
|
body_template: |
|
|
{"text": "{{monitor.name}} is {{status.current}}"}
|
|
|
|
monitors:
|
|
- name: 'private-api'
|
|
type: http
|
|
target: 'https://api.example.com/health'
|
|
method: GET
|
|
expected_status: 200
|
|
headers:
|
|
Authorization: 'Bearer ${API_KEY}'
|
|
webhooks: ['slack']
|
|
```
|
|
|
|
**Note:** `${VAR}` is for secrets (resolved at startup). `{{var}}` is for alert body templates (resolved per-alert).
|
|
|
|
### Security Notes
|
|
|
|
- Secrets are never logged or exposed in check results
|
|
- Unresolved `${VAR}` placeholders remain as-is (useful for debugging missing secrets)
|
|
- Worker secrets are encrypted at rest by Cloudflare
|
|
|
|
## Data Retention
|
|
|
|
- Raw check results: 7 days
|
|
- Hourly aggregates: 90 days
|
|
|
|
An hourly cron job aggregates raw data and cleans up old records automatically.
|
|
|
|
## Development
|
|
|
|
```bash
|
|
# Run worker locally
|
|
wrangler dev --test-scheduled
|
|
|
|
# Run status page locally
|
|
npm run dev:pages
|
|
|
|
# Trigger cron manually
|
|
curl "http://localhost:8787/__scheduled?cron=*+*+*+*+*"
|
|
```
|
|
|
|
### Testing
|
|
|
|
```bash
|
|
# Fist build the status page
|
|
npm run build:pages
|
|
|
|
# Worker tests
|
|
npm run test
|
|
|
|
# Status page tests
|
|
npm run test:pages
|
|
|
|
# Type checking and linting
|
|
npm run check # worker
|
|
npm run check:pages # pages (astro check + tsc)
|
|
```
|
|
|
|
## TODO
|
|
|
|
- [ ] Add support for TLS checks (certificate validity, expiration). Apparently, the Workers API does not support certificate data access, even at the socket level. An external service may be required.
|
|
- [ ] Refine the status page to look... well... less IA generated.
|
|
- [ ] Initial support for incident management (manual status overrides, incident timeline).
|
|
- [ ] Branded status page.
|
|
- [ ] Add support for notifications other than webhooks.
|