~/.whalefile structure. Unlike most command-line tools, whale requires data storage: warehouse credentials, scraped metadata, metadata caches for speedier access, user-generated content, logs, etc. We therefore use the
~/.whaledirectory to store all of this information.
~/.whalepath. This path contains the following directory structure:
whale, if whale is built from source (homebrew manages all binaries within its own file structure). The
whalebinary within can be executed directly, though we recommend setting an alias in your shell's
whcan be executed without having to remember this path.
whalebinary within is not portable without the rest of the directory, as it relies on the other directories for metadata scraping, caching, and storage.
connections.yamlis stored here, which contains a list of all registered warehouse connections.
wh connections edit, and new connections are configured through the
connections.yamlfile, meaning they are separated by
---. Unlike most warehouse interfaces, we chose to use configuration keys that differ for each warehouse for easier end-user comprehension.
wh initworkflow to add new warehouses (you can repeat this command multiple times without worrying about clearing your existing connections).
libexechouses the virtual environment containing the
whale-pipelinesinstallation, as well as some scripts that are accessed by the rust-side of the program (
libexeccomes from the unix
/usr/libexecdirectory, which houses binaries that are not intended to be accessed by the end-user. However, because these are written entirely in python and stored as a virtual environment, it is completely acceptable to modify these directly.
cron.logswithin contains logs from any registered cron job, which can be useful for debugging broken ETL processes.
manifest.txtwithin this directory that only contains tables from the most recent metadata scrape. During a scraping job, these tables are appended to a temporary manifest named
NUMBERis appended to prevent simultaneous scraping jobs from appending to the same temporary manifest). Upon completion of the ETL job, this manifest is copied to
enteron a selected
wh-searched file, documents within this directory are opened.
warehouse-connection-name.sql, these templates are pre-pended to any queries run against the warehouse with connection name
warehouse-connection-name. See the Jinja2 templating section for more details. Connection names can be found by running
wh connections, in the
namefield of each yaml block.