Connection configuration
We provide access to warehouse configuration through the ~/.whale/config/connections.yaml file. The accepted key/value pairs, however, are warehouse-specific and, as such, are most easily added through the wh init workflow. However, in the case where this needs to be done manually, refer to the following warehouse-specific documentation below.

Universal connection parameters

1
---
2
name: ~
3
metadata_source: ~
4
database: ~ # For all but bigquery
5
​
Copied!
    name Unique warehouse name. This will be used to name the subdirectory within ~/.whale/metadata that stores metadata and UGC for each table.
    metadata_source The type of connection that this yaml section describes. These are case sensitive and can be one of the following:
      Bigquery
      Neo4j
      Presto
      Snowflake
    database Specify a string here to restrict the scraping to a particular database under your connection. Specifying this modifies the SQLAlchemy conn string used for connection, using this string as the "database" field (in ANSI SQL, this is known as the "catalog"). See the SQLAlchemy docs for more details.

Bigquery

1
---
2
name:
3
metadata_source: Bigquery
4
key_path: /Users/robert/gcp-credentials.json
5
project_credentials: # Only one of key_path or project_credentials needed
6
project_id:
Copied!
Only one of key_path and project_credentials are required.

Cloud spanner

1
---
2
name:
3
metadata_source: spanner
4
instance:
5
database:
6
project_id:
Copied!
To do: Unlike Bigquery, we currently don't allow you to specify key_path or project_credentials explicitly.

Glue

1
---
2
name: whatever-you-want # Optional
3
metadata_source: Glue
Copied!
A name parameter will place all of your glue documentation within a separate folder, as is done with the other extractors. But because Glue is already a metadata aggregator, this may not be optimal, particularly if you connect to other warehouses with whale directly. In this case, the name parameter can be omitted, and the table stubs will reside within subdirectories named after the underlying warehouse/instance.
For example, with name, your files will be organized like this:
1
your-name/my-instance/postgres_public_table
Copied!
Without name, your files will be stored like this:
1
my-instance/postgres_public_table
Copied!

Hive metastore

1
---
2
name:
3
metadata_source: HiveMetastore
4
uri:
5
port:
6
username: # Optional
7
password: # Optional
8
dialect: postgres # postgres/mysql/etc. This is the dialect used in the SQLAlchemy conn string.
9
database: hive # The database within this connection where the metastore lives. This is usually "hive".
Copied!
For more information the dialect field, see the SQLAlchemy documentation.

Neo4j

We provide support to scrape metadata from Amundsen's neo4j backend. However, by default we do not install the neo4j drivers within our installation virtual environment. To use this, you must install using make && make install, then pip install neo4j-driver within the virtual environment located at ~/.whale/libexec/env.
1
---
2
name:
3
metadata_source: Neo4j
4
uri:
5
port:
6
username: # Optional
7
password: # Optional
Copied!

Postgres

1
---
2
name:
3
metadata_source: Postgres
4
uri:
5
port:
6
username: # Optional
7
password: # Optional
Copied!

Presto

1
---
2
name:
3
metadata_source: Presto
4
uri:
5
port:
6
username: # Optional
7
password: # Optional
Copied!

Redshift

1
---
2
name:
3
metadata_source: Redshift
4
uri:
5
port:
6
username: # Optional
7
password: # Optional
Copied!

Snowflake

1
---
2
name:
3
metadata_source: Snowflake
4
uri:
5
port:
6
username: # Optional
7
password: # Optional
8
role: # Optional
Copied!

Splice Machine

1
---
2
name:
3
metadata_source: splicemachine
4
uri: jdbc-cluster114-splice-prod.splicemachine.io # an example
5
username:
6
password:
Copied!

Build script

We also support use of custom scripts that handle the metadata scraping and dumping of this data into local files (in the metadata subdirectory) and manifests (in the manifests subdirectory). For more information, see Custom extraction.
1
---
2
build_script_path: /path/to/build_script.py
3
venv_path: /path/to/venv
4
python_binary_path: /path/to/binary # Optional
5
​
Copied!
Last modified 6mo ago