Canary Checker

0

Kubernetes operator for executing synthetic tests

Observability

monitoring
prometheus
kubernetes-operator
canary-test

Kubernetes Native Health Check Platform


Canary checker is a kubernetes-native platform for monitoring health across application and infrastructure using both passive and active (synthetic) mechanisms.

Features

  • Batteries Included - 35+ built-in check types
  • Kubernetes Native - Health checks (or canaries) are CRD's that reflect health via the status field, making them compatible with GitOps, Flux Health Checks, Argo, Helm, etc..
  • Secret Management - Leverage K8S secrets and configmaps for authentication and connection details
  • Prometheus - Prometheus compatible metrics are exposed at /metrics. A Grafana Dashboard is also available.
  • Dependency Free - Runs an embedded postgres instance by default, can also be configured to use an external database.
  • JUnit Export (CI/CD) - Export health check results to JUnit format for integration into CI/CD pipelines
  • JUnit Import (k6/newman/puppeter/etc) - Use any container that creates JUnit test results
  • Scriptable - Go templates, Javascript and CEL can be used to:
    • Evaluate whether a check is passing and severity to use when failing
    • Extract a user friendly error message
    • Transform and filter check responses into individual check results
    • Extract custom metrics
  • Multi-Modal - While designed as a Kubernetes Operator, canary checker can also run as a CLI and a server without K8s

Getting Started

  1. Install canary checker with Helm
helm repo add flanksource https://flanksource.github.io/charts
helm repo update

helm install \
  canary-checker \
  flanksource/canary-checker \
 -n canary-checker \
 --create-namespace
 --wait
  1. Create a new check
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: http-check
spec:
interval: 30
http:
  - name: basic-check
    url: https://httpbin.demo.aws.flanksource.com/status/200
  - name: failing-check
    url: https://httpbin.demo.aws.flanksource.com/status/500

2a. Run the check locally (Optional)

wget  https://github.com/flanksource/canary-checker/releases/latest/download/canary-checker_linux_amd64 \
-O canary-checker &&  chmod +x canary-checker
./canary-checker run canary.yaml

asciicast

  1. Apply the check
kubectl apply -f canary.yaml
  1. Check the health status
kubectl get canary
NAME               INTERVAL   STATUS   LAST CHECK   UPTIME 1H        LATENCY 1H   LAST TRANSITIONED
http-check.        30         Passed   13s          18/18 (100.0%)   480ms        13s

See fixtures for more examples and docs for more comprehensive documentation.

Use Cases

Synthetic Testing

Run simple HTTP/DNS/ICMP probes or more advanced full test suites using JMeter, K6, Playright, Postman.

# Run a container that executes a playwright test, and then collect the
# JUnit formatted test results from the /tmp folder
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
  name: playwright-junit
spec:
  interval: 120
  junit:
    - testResults: "/tmp/"
      name: playwright-junit
      spec:
        containers:
          - name: playwright
            image: ghcr.io/flanksource/canary-playwright:latest

Infrastructure Testing

Verify that infrastructure is fully operational by deploying new pods, spinning up new EC2 instances and pushing/pulling from docker and helm repositories.

# Schedule a new pod with an ingress and then time how long it takes to schedule, be ready, respond to an http request and finally be cleaned up.
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
  name: pod-check
spec:
  interval: 30
  pod:
    - name: golang
      spec: |
        apiVersion: v1
        kind: Pod
        metadata:
          name: hello-world-golang
          namespace: default
          labels:
            app: hello-world-golang
        spec:
          containers:
            - name: hello
              image: quay.io/toni0/hello-webserver-golang:latest
      port: 8080
      path: /foo/bar
      scheduleTimeout: 20000
      readyTimeout: 10000
      httpTimeout: 7000
      deleteTimeout: 12000
      ingressTimeout: 10000
      deadline: 60000
      httpRetryInterval: 200
      expectedContent: bar
      expectedHttpStatuses: [200, 201, 202]

Backup Checks / Batch File Monitoring

Check that batch file processes are functioning correctly by checking the age and size of files in local file systems, SFTP, SMB, S3 and GCS.

# Checks that a recent DB backup has been uploaded
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
  name: folder-check
spec:
  schedule: 0 22 * * *
  folder:
    - path: s3://database-backups/prod
      name: prod-backup
      maxAge: 1d
      minSize: 10gb

Alert Aggregation

Aggregate alerts and recommendations from Prometheus, AWS Cloudwatch, Dynatrace, etc.

apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
  name: alertmanager-check
spec:
  schedule: "*/5 * * * *"
  alertmanager:
    - url: alertmanager.monitoring.svc
      alerts:
        - .*
      ignore:
        - KubeScheduler.*
        - Watchdog
      transform:
        # for each alert, transform it into a new check
        javascript: |
          var out = _.map(results, function(r) {
            return {
              name: r.name,
              labels: r.labels,
              icon: 'alert',
              message: r.message,
              description: r.message,
            }
          })
          JSON.stringify(out);

Prometheus Exporter Replacement

Export custom metrics from the result of any check, making it possible to replace various other promethus exporters that collect metrics via HTTP, SQL, etc..

apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
  name: exchange-rates
spec:
  schedule: "every 1 @hour"
  http:
    - name: exchange-rates
      url: https://api.frankfurter.app/latest?from=USD&to=GBP,EUR,ILS
      metrics:
        - name: exchange_rate
          type: gauge
          value: result.json.rates.GBP
          labels:
            - name: "from"
              value: "USD"
            - name: to
              value: GBP

Platform Ready

Canary checker is ideal for building platforms, developers can include health checks for their applications in whatever tooling they prefer, with secret management that uses native Kubernetes constructs.

apiVersion: v1
kind: Secret
metadata:
  name:  basic-auth
stringData:
   user: john
   pass: doe
---
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
  name: http-basic-auth-configmap
spec:
  http:
    - url: https://httpbin.demo.aws.flanksource.com/basic-auth/john/doe
      username:
        valueFrom:
          secretKeyRef:
            name: basic-auth
            key: user
      password:
        valueFrom:
          secretKeyRef:
            name: basic-auth
            key: pass

Dashboard

Canary checker comes with a built-in dashboard by default

There is also a grafana dashboard, or build your own using the metrics exposed.

Getting Help

If you have any questions about canary checker:

Your feedback is always welcome!

Check Types

ProtocolStatusChecks
HTTP(s)GAResponse body, headers and duration
DNSGAResponse and duration
Ping/ICMPGADuration and packet loss
TCPGAPort is open and connectable
Data Sources
SQL (MySQL, Postgres, SQL Server)GAAbility to login, results, duration, health exposed via stored procedures
LDAPGAAbility to login, response time
ElasticSearch / OpensearchGAAbility to login, response time, size of search results
MongoBetaAbility to login, results, duration,
RedisGAAbility to login, results, duration,
PrometheusGAAbility to login, results, duration,
AlertsPrometheus
Prometheus Alert ManagerGAPending and firing alerts
AWS Cloudwatch AlarmsGAPending and firing alarms
Dynatrace ProblemsBetaProblems deteced
DevOps
GitGAQuery Git and Github repositories via SQL
Azure DevopsBeta
Integration Testing
JMeterBetaRuns and checks the result of a JMeter test
JUnit / BYOBetaRun a pod that saves Junit test results
K6BetaRuns K6 tests that export JUnit via a container
NewmanBetaRuns Newman / Postman tests that export JUnit via a container
PlaywrightBetaRuns Playwright tests that export JUnit via a container
File Systems / Batch
Local Disk / NFSGACheck folders for files that are: too few/many, too old/new, too small/large
S3GACheck contents of AWS S3 Buckets
GCSGACheck contents of Google Cloud Storage Buckets
SFTPGACheck contents of folders over SFTP
SMB / CIFSGACheck contents of folders over SMB/CIFS
Config
AWS ConfigGAQuery AWS config using SQL
AWS Config RuleGAAWS Config Rules that are firing, Custom AWS Config queries
Config DBGACustom config queries for Mission Control Config D
Kubernetes ResourcesGAKubernetes resources that are missing or are in a non-ready state
Backups
GCP DatabasesGABackup freshness
ResticBetaBackup freshness and integrity
Infrastructure
EC2GAAbility to launch new EC2 instances
Kubernetes IngressGAAbility to schedule and then route traffic via an ingress to a pod
Docker/ContainerdDeprecatedAbility to push and pull containers via docker/containerd
HelmDeprecatedAbility to push and pull helm charts
S3 ProtocolGAAbility to read/write/list objects on an S3 compatible object store

Contributing

See CONTRIBUTING.md

Thank you to all our contributors !

License

Canary Checker core (the code in this repository) is licensed under Apache 2.0 and accepts contributions via GitHub pull requests after signing a CLA.

The UI (Dashboard) is free to use with canary checker under a license exception of Flanksource UI