DAGWorks aims to be your one-stop-shop for tracking, visualizing, and debugging Hamilton dataflows. By seamlessly integrating with Hamilton, DAGWorks allows you to:

Collaborate on Projects

Share project with your team/anyone who needs access, enabling you to quickly onboard new developers and collaborate on Hamilton dataflows. You can share a project by first navigating to the projects page and then clicking Edit on your project and modifying who can see and/or write to your project.

Notes:

  • (i) currently users need to be existing users of the DAGWorks platform to be added to a project. The ability to invite users to the platform is in the works, but until then, if a user does not have a DAGWorks Platform account, you’ll need to ask them to sign up. Alternatively, reach out to us at support@dagworks.io and we can help provision their accounts.
  • (ii) to create organizations and have people automatically join them upon sign up, please reach out to us at support@dagworks.io for help.

Track Executions

Every Hamilton dataflow executed with the DAGWorks platform is tracked and stored, along with a basic set of data quality/results. Results are streamed in, and this allows you to:

  • See results for every node in your dataflow as it executes
  • See all past executions of your Hamilton dataflows
  • Separate out by tags to track different environments, etc…
  • Find/manage errors
  • Understand and debug performance issues on a per-function basis

You can do this all (and more) by navigating to the “runs” page for you project.

Lineage: Understand your artifacts and the code that produced it

The DAGWorks platform allows you to keep track of code and artifacts generated. This enables you to:

  • Visualize the structure of your dataflow.
  • Determine lineage of any generated artifact
  • Track lineage of, for example, features for machine learning, or inputs to a dataset.
  • Search (coming soon) through all previously generated artifacts and link to the code that generated it.

You can explore this under the “Structure” tab in your project.

See how your Hamilton dataflows change over time

DAGWorks allows for diffing of DAG versions and runs, enabling you to answer questions like:

  • How did the structure of my DAG change over time?
  • How does the structure of my DAG change between configurations?
  • How does the performance of my DAG change over time?
  • How does the data my DAG generates change over time?

You can answer these questions in one of the following ways:

  1. navigating to the project versions page, selecting two versions of your DAG and then clicking compare.
  2. navigating to the run history page, selecting two runs and then clicking compare.

DAGWorks Data Model

Projects

DAGWorks is centered around “Projects”. Projects are a collection of DAGs that share a common business goal. All DAGs in a project share the same codebase.

Project versions

Every time you create a DAG by using the Hamilton Driver + DAGWorksTracker, it has the potential to create a new “project version”. Project versions are uniquely determined by the following attributes:

  1. DAG Name (assigned by you when instantiating the DAGWorksTracker).
  2. Code version (derived from the code of the passed in python modules and final DAG structure).
Different configuration passed to the Hamilton Driver will not result in a different project version. Use tags to differentiate when configuration changes. You will then be able to filter to projects/runs with those tags.

You can always reset the DAG Name for a certain driver instance by calling set_name(new_name) on the driver. This will force a new version to be logged. This is useful when calling DAG-manipulation functions, such as materialize.

Note that if you archive a version (in the DAGWorks UI in the browse/versions page), it will no longer be visible in the DAGWorks UI. That said, if you then save to the same version (E.G. with the same DAG Name/code version), it will un-archive that version, and you will be able to see it again.

Runs

Every time you execute a run attaching the DAGWorksTracker to Hamilton, it uploads a run. These are associated with a project and a project version, i.e. an instantiation of a DAG. This consists of metadata about the run, task-level summaries of data, and any information about errors you encountered.

Tracking runs can have a marginal impact on performance for very short executions, as it connects to the server. For longer executions, this impact disappears.

Behind the Scenes

When you run a DAG with the DAGWorksTracker, the following happens:

  1. You instantiate a DAGWorksTracker with a project ID (created via the Platform UI) and an API key.
  2. On Hamilton Driver instantiation, a new “version” is created if code is changed or you provide a new DAG name. The DAGWorksTracker then saves the structure of your DAG to the Platform. You can view this before you execute your DAG.
  3. When the run is complete, the DAGWorksTracker logs relevant metadata to the Platform, and a run appears.
  4. You can update, delete, and modify your projects (as well as archive your project’s DAG versions) in the Platform UI.

Supported Browsers

Chrome is the only supported browser at this time. We are working on supporting other browsers, but for now, please use Chrome. Reach out to support@dagworks.io for questions/help.