Query automation and alerting
Supported connections: BigQuery, Postgres, Presto, Redshift, Snowflake
Whale supports automatic, barebones scheduled metrics calculation. Metrics are defined by creating a ```metrics block, as explained below. Any metric defined in this way will automatically be scheduled alongside the metadata scraping job. Metric definitions support Jinja2 templating -- for more information on how to set this up, see Jinja2 templating.

Basic usage

A metric is simply a named SQL statement that returns a single value, defined in plain yaml in a table stub, as shown below:
1
```metrics
2
metric-name:
3
sql: |
4
select statement
Copied!
For example, below two metrics, null-registrations and distinct-registrations are defined:
1
```metrics
2
null-registrations:
3
sql: |
4
select
5
count(distinct user_id)
6
from mart.user_signups
7
where user_id is null
8
distinct-registrations:
9
sql: |
10
select
11
count(distinct user_id)
12
from mart.user_signups
13
where user_id is not null
Copied!
The same block is shown within the context of a full table stub, below:
1
schema.table
2
​
3
## Column details
4
​
5
## Partition info
6
​
7
------------------------------------------------------
8
*Do not make edits above this line.*
9
​
10
```metrics
11
null-registrations:
12
sql: |
13
select
14
count(distinct user_id)
15
from mart.user_signups
16
where user_id is null
17
distinct-registrations:
18
sql: |
19
select
20
count(distinct user_id)
21
from mart.user_signups
22
where user_id is not null
Copied!
These metrics will be scheduled, with the latest calculations injected into the programmatic portion of the table stub. An example is shown below:
1
schema.table
2
​
3
## Column details
4
​
5
## Partition info
6
​
7
## Metrics
8
null-registrations: 103 @ 2020-04-01 05:12:15
9
distinct-registrations: 30104 @ 2020-04-01 05:12:18
10
​
11
------------------------------------------------------
12
*Do not make edits above this line.*
13
​
14
```metrics
15
null-registrations:
16
sql: |
17
select
18
count(*)
19
from mart.user_signups
20
where user_id is null
21
distinct-registrations:
22
sql: |
23
select
24
count(distinct user_id)
25
from mart.user_signups
26
where user_id is not null
Copied!
A full list of all historical values are saved in ~/.whale/metrics.

Slack alerts

Metrics can be enhanced with Slack alerts. These will send a message to you or your channel if a certain condition is met.
We are building out our dedicated 🐳 Slack app, but in the meantime, feel free to join our community and set up alerts there (ask us how to set this up in #general).

Setup [WIP]

To enable Slack alerts for your Slack workspace first add the 🐳 Slack app by clicking this link. This will provide you with a code you'll need to convert into a token through Slack's API (TODO: instructions), which should be stored as an environment variable WHALE_SLACK_TOKEN. That's all!

Syntax

The syntax is as follows:
1
```metrics
2
metric-name:
3
sql: |
4
select statement
5
alerts:
6
- condition: "condition"
7
message: "message"
8
slack:
9
- "channel"
Copied!
Using the earlier example we could set an alert every time we find a null in column user_id like this:
1
```metrics
2
null-registrations:
3
sql: |
4
select
5
count(*)
6
from mart.user_signups
7
where user_id is null
8
alerts:
9
- condition: "> 0"
10
message: "Nulls found in column 'user_id' of mart.user_signups."
11
slack:
12
- "#data-monitoring"
13
- "@bob"
Copied!
As you can see, you can send a message on Slack to individuals as well as Slack channels. In case you are interested, it's also possible to attach several conditions and messages to one metric.
All in all your table.md file with metrics and corresponding alerts could look like this:
1
schema.table
2
​
3
## Column details
4
​
5
## Partition info
6
​
7
## Metrics
8
null-registrations: 103 @ 2020-04-01 05:12:15
9
distinct-registrations: 30104 @ 2020-04-01 05:12:18
10
​
11
------------------------------------------------------
12
*Do not make edits above this line.*
13
​
14
```metrics
15
null-registrations:
16
sql: |
17
select
18
count(*)
19
from mart.user_signups
20
where user_id is null
21
alerts:
22
- condition: ">0"
23
message: "Nulls found in column 'id' of mart.user_signups."
24
slack:
25
- "#data-monitoring"
26
- "@bob"
27
- condition: "> 100"
28
message: "More than 100 nulls found in column 'id' of mart.user_signups."
29
slack:
30
- "#incident-room"
31
- "@joseph"
32
​
33
distinct-registrations:
34
sql: |
35
select
36
count(distinct user_id)
37
from mart.user_signups
38
where user_id is not null
39
alerts:
40
- condition: "<10"
41
message: "Less than 10 users in mart.user_signups."
42
slack:
43
- "#data-monitoring"
44
- "@bob"
Copied!
Last modified 9mo ago