Security & Data
Information on security and data collected.
Security
We take security very seriously and we use industry standard security practices to ensure your data is safe.
If you see an issue or have questions, email us at security@dagworks.io.
Advised practices
We recommend you follow the following practices to ensure your data is secure:
- Secure your API Keys by storing them in an appropriate secrets management solution.
- API Keys are bound to individual users, and allow writing to projects that a user has been granted access to. Each user should have their own API key(s). A user can have many API Keys. Currently there is no programmatic API read access to your data.
- If an API Key is exposed, the organization admin, and/or individual user can revoke the key (and create a new one) via clicking the key icon on the left hand navigation bar. Email us at security@dagworks.io if you need help.
Sensitive Data Scrubbing
If you sent data to DAGWorks that you didn’t want to, email security@dagworks.io, providing the project ID(s) and the run ID(s) to delete.
Data Collected
DAGWorks CLI
We currently do not capture usage data from the DAGWorks CLI.
DAGWorksTracker
Via the DAGWorks Tracking Adapter (DAGWorksTracker), we capture the following information:
- The Hamilton code defining the DAG. This is required.
- DAG Execution telemetry. This is required.
- Summary statistics of any tabular/vector data observed, in addition to any python primitive return values. This is required.
- Coming soon - python environment dependencies used by a DAG run. This is required.
- Telemetry on DAGWorksTracker usage. Optional. You can opt-out of telemetry by setting the environment variable
DW_DISABLE_TRACKING=true
.
We will add the ability to add custom filtering/processing in the future, but for now we only store the above.
If you have any questions about what we do and don’t store, please contact us at support@dagworks.io.
Function return types we introspect
We introspect the following function return types automatically:
Type | What we capture |
---|---|
Pandas DataFrame | Summary statistics |
Pandas Series | Summary statistics |
Python primitive | The actual value for int, float, string, booleans. |
Polars DataFrame | Summary statistics |
Polars Series | Summary statistics |
Numpy Arrays | Summary statistics |
Python dicts | Actual values or summary statistics based on values in the dictionary. |
Python list | Actual values or summary statistics based on values in the list. |
Masking sensitive data
To mask sensitive data it shouldn’t be serializable. One way is to wrap it in an object. For example,
In the case of api_keys
you can create a class like the following:
And then use it like this: