~/.whale
file structure. Unlike most command-line tools, whale requires data storage: warehouse credentials, scraped metadata, metadata caches for speedier access, user-generated content, logs, etc. We therefore use the ~/.whale
directory to store all of this information.~/.whale
path. This path contains the following directory structure:whale
, if whale is built from source (homebrew manages all binaries within its own file structure). The whale
binary within can be executed directly, though we recommend setting an alias in your shell's rc
file so wh
can be executed without having to remember this path.whale
binary within is not portable without the rest of the directory, as it relies on the other directories for metadata scraping, caching, and storage.connections.yaml
is stored here, which contains a list of all registered warehouse connections.wh connections edit
, and new connections are configured through the wh init
workflow.connections.yaml
file, meaning they are separated by ---
. Unlike most warehouse interfaces, we chose to use configuration keys that differ for each warehouse for easier end-user comprehension.wh init
workflow to add new warehouses (you can repeat this command multiple times without worrying about clearing your existing connections).libexec
houses the virtual environment containing the whale-pipelines
installation, as well as some scripts that are accessed by the rust-side of the program (build_script.py
and run_script.py
).libexec
comes from the unix /usr/libexec
directory, which houses binaries that are not intended to be accessed by the end-user. However, because these are written entirely in python and stored as a virtual environment, it is completely acceptable to modify these directly.cron.logs
within contains logs from any registered cron job, which can be useful for debugging broken ETL processes.manifest.txt
within this directory that only contains tables from the most recent metadata scrape. During a scraping job, these tables are appended to a temporary manifest named tmp_manifest_<NUMBER>.txt
file (where NUMBER
is appended to prevent simultaneous scraping jobs from appending to the same temporary manifest). Upon completion of the ETL job, this manifest is copied to manifest.txt
.enter
on a selected wh
-searched file, documents within this directory are opened.warehouse_name/catalog.schema.table/metric-name.md
.warehouse-connection-name.sql
, these templates are pre-pended to any queries run against the warehouse with connection name warehouse-connection-name
. See the Jinja2 templating section for more details. Connection names can be found by running wh connections
, in the name
field of each yaml block.