Supported connections: BigQuery, Postgres, Presto, Redshift, Snowflake
Whale supports automatic, barebones scheduled metrics calculation. Metrics are defined by creating a ```metrics
block, as explained below. Any metric defined in this way will automatically be scheduled alongside the metadata scraping job. Metric definitions support Jinja2 templating -- for more information on how to set this up, see Jinja2 templating.
A metric is simply a named SQL statement that returns a single value, defined in plain yaml in a table stub, as shown below:
```metricsmetric-name:sql: |select statement
For example, below two metrics, null-registrations
and distinct-registrations
are defined:
```metricsnull-registrations:sql: |selectcount(distinct user_id)from mart.user_signupswhere user_id is nulldistinct-registrations:sql: |selectcount(distinct user_id)from mart.user_signupswhere user_id is not null
The same block is shown within the context of a full table stub, below:
schema.table## Column details## Partition info------------------------------------------------------*Do not make edits above this line.*```metricsnull-registrations:sql: |selectcount(distinct user_id)from mart.user_signupswhere user_id is nulldistinct-registrations:sql: |selectcount(distinct user_id)from mart.user_signupswhere user_id is not null
These metrics will be scheduled, with the latest calculations injected into the programmatic portion of the table stub. An example is shown below:
schema.table## Column details## Partition info## Metricsnull-registrations: 103 @ 2020-04-01 05:12:15distinct-registrations: 30104 @ 2020-04-01 05:12:18------------------------------------------------------*Do not make edits above this line.*```metricsnull-registrations:sql: |selectcount(*)from mart.user_signupswhere user_id is nulldistinct-registrations:sql: |selectcount(distinct user_id)from mart.user_signupswhere user_id is not null
A full list of all historical values are saved in ~/.whale/metrics
.
Metrics can be enhanced with Slack alerts. These will send a message to you or your channel if a certain condition is met.
We are building out our dedicated 🐳 Slack app, but in the meantime, feel free to join our community and set up alerts there (ask us how to set this up in #general).
To enable Slack alerts for your Slack workspace first add the 🐳 Slack app by clicking this link. This will provide you with a code you'll need to convert into a token through Slack's API (TODO: instructions), which should be stored as an environment variable WHALE_SLACK_TOKEN
. That's all!
The syntax is as follows:
```metricsmetric-name:sql: |select statementalerts:- condition: "condition"message: "message"slack:- "channel"
Using the earlier example we could set an alert every time we find a null in column user_id
like this:
```metricsnull-registrations:sql: |selectcount(*)from mart.user_signupswhere user_id is nullalerts:- condition: "> 0"message: "Nulls found in column 'user_id' of mart.user_signups."slack:- "#data-monitoring"- "@bob"
As you can see, you can send a message on Slack to individuals as well as Slack channels. In case you are interested, it's also possible to attach several conditions and messages to one metric.
All in all your table.md
file with metrics and corresponding alerts could look like this:
schema.table## Column details## Partition info## Metricsnull-registrations: 103 @ 2020-04-01 05:12:15distinct-registrations: 30104 @ 2020-04-01 05:12:18------------------------------------------------------*Do not make edits above this line.*```metricsnull-registrations:sql: |selectcount(*)from mart.user_signupswhere user_id is nullalerts:- condition: ">0"message: "Nulls found in column 'id' of mart.user_signups."slack:- "#data-monitoring"- "@bob"- condition: "> 100"message: "More than 100 nulls found in column 'id' of mart.user_signups."slack:- "#incident-room"- "@joseph"distinct-registrations:sql: |selectcount(distinct user_id)from mart.user_signupswhere user_id is not nullalerts:- condition: "<10"message: "Less than 10 users in mart.user_signups."slack:- "#data-monitoring"- "@bob"