Security

We take security very seriously and we use industry standard security practices to ensure your data is safe.

If you see an issue or have questions, email us at security@dagworks.io.

Advised practices

We recommend you follow the following practices to ensure your data is secure:

  1. Secure your API Keys by storing them in an appropriate secrets management solution.
  2. API Keys are bound to individual users, and allow writing to projects that a user has been granted access to. Each user should have their own API key(s). A user can have many API Keys. Currently there is no programmatic API read access to your data.
  3. If an API Key is exposed, the organization admin, and/or individual user can revoke the key (and create a new one) via clicking the key icon on the left hand navigation bar. Email us at security@dagworks.io if you need help.

Sensitive Data Scrubbing

If you sent data to DAGWorks that you didn’t want to, email security@dagworks.io, providing the project ID(s) and the run ID(s) to delete.

Data Collected

DAGWorks CLI

We currently do not capture usage data from the DAGWorks CLI.

DAGWorksTracker

Via the DAGWorks Tracking Adapter (DAGWorksTracker), we capture the following information:

  1. The Hamilton code defining the DAG. This is required.
  2. DAG Execution telemetry. This is required.
  3. Summary statistics of any tabular/vector data observed, in addition to any python primitive return values. This is required.
  4. Coming soon - python environment dependencies used by a DAG run. This is required.
  5. Telemetry on DAGWorksTracker usage. Optional. You can opt-out of telemetry by setting the environment variable DW_DISABLE_TRACKING=true.

We will add the ability to add custom filtering/processing in the future, but for now we only store the above.

If you have any questions about what we do and don’t store, please contact us at support@dagworks.io.

Function return types we introspect

We introspect the following function return types automatically:

TypeWhat we capture
Pandas DataFrameSummary statistics
Pandas SeriesSummary statistics
Python primitiveThe actual value for int, float, string, booleans.
Polars DataFrameSummary statistics
Polars SeriesSummary statistics
Numpy ArraysSummary statistics
Python dictsActual values or summary statistics based on values in the dictionary.
Python listActual values or summary statistics based on values in the list.

Masking sensitive data

To mask sensitive data it shouldn’t be serializable. One way is to wrap it in an object. For example, In the case of api_keys you can create a class like the following:

class Secret:
    def __init__(self, value):
        self._value = value

    def __repr__(self):
        return "********"

    @property
    def value(self):
        return self._value

And then use it like this:

def my_function(api_key: Secret) -> pd.DataFrame:
    client = Client(api_key=api_key.value)
    ...  # do stuff
    return result