Hosted Hamilton UI on DAGWorks Overview
DAGWorks aims to be your one-stop-shop for tracking, visualizing, and debugging Hamilton dataflows. By seamlessly integrating with Hamilton, DAGWorks allows you to:
Collaborate on Projects
Share project with your team/anyone who needs access, enabling you to quickly onboard new developers and
collaborate on Hamilton dataflows. You can share a project by first navigating to the projects page
and then clicking Edit
on your project and modifying who can see and/or write to your project.
Notes:
- (i) currently users need to be existing users of the Hosted Hamilton UI on DAGWorks to be added to a project. The ability to invite users to the platform is in the works, but until then, if a user does not have a Hosted Hamilton UI on DAGWorks account, you’ll need to ask them to sign up. Alternatively, reach out to us at support@dagworks.io and we can help provision their accounts.
- (ii) to create organizations and have people automatically join them upon sign up, please reach out to us at support@dagworks.io for help.
Track Executions
Every Hamilton dataflow executed with the Hosted Hamilton UI on DAGWorks is tracked and stored, along with a basic set of data quality/results. Results are streamed in, and this allows you to:
- See results for every node in your dataflow as it executes
- See all past executions of your Hamilton dataflows
- Separate out by tags to track different environments, etc…
- Find/manage errors
- Understand and debug performance issues on a per-function basis
You can do this all (and more) by navigating to the “runs” page for you project.
Lineage: Understand your artifacts and the code that produced it
The Hosted Hamilton UI on DAGWorks allows you to keep track of code and artifacts generated. This enables you to:
- Visualize the structure of your dataflow.
- Determine lineage of any generated artifact
- Track lineage of, for example, features for machine learning, or inputs to a dataset.
- Search (coming soon) through all previously generated artifacts and link to the code that generated it.
You can explore this under the “Structure” tab in your project.
See how your Hamilton dataflows change over time
DAGWorks allows for diffing of DAG versions and runs, enabling you to answer questions like:
- How did the structure of my DAG change over time?
- How does the structure of my DAG change between configurations?
- How does the performance of my DAG change over time?
- How does the data my DAG generates change over time?
You can answer these questions in one of the following ways:
- navigating to the project versions page, selecting two versions of your DAG and then clicking
compare
. - navigating to the run history page, selecting two runs and then clicking
compare
.
DAGWorks Data Model
Projects
DAGWorks is centered around “Projects”. Projects are a collection of DAGs that share a common business goal. All DAGs in a project share the same codebase.
Project versions
Every time you create a DAG by using the Hamilton Driver + DAGWorksTracker, it has the potential to create a new “project version”. Project versions are uniquely determined by the following attributes:
- DAG Name (assigned by you when instantiating the DAGWorksTracker).
- Code version (derived from the code of the passed in python modules and final DAG structure).
You can always reset the DAG Name for a certain driver instance by calling set_name(new_name)
on the driver. This will force a new
version to be logged. This is useful when calling DAG-manipulation functions, such as materialize
.
Note that if you archive a version (in the DAGWorks UI in the browse/versions page), it will no longer be visible in the DAGWorks UI. That said, if you then save to the same version (E.G. with the same DAG Name/code version), it will un-archive that version, and you will be able to see it again.
Runs
Every time you execute a run attaching the DAGWorksTracker to Hamilton, it uploads a run. These are associated with a project and a project version, i.e. an instantiation of a DAG. This consists of metadata about the run, task-level summaries of data, and any information about errors you encountered.
Behind the Scenes
When you run a DAG with the DAGWorksTracker, the following happens:
- You instantiate a DAGWorksTracker with a project ID (created via the Platform UI) and an API key.
- On Hamilton Driver instantiation, a new “version” is created if code is changed or you provide a new DAG name. The DAGWorksTracker then saves the structure of your DAG to the Platform. You can view this before you execute your DAG.
- When the run is complete, the DAGWorksTracker logs relevant metadata to the Platform, and a
run
appears. - You can update, delete, and modify your projects (as well as archive your project’s DAG versions) in the Platform UI.
Supported Browsers
Chrome is the only supported browser at this time. We are working on supporting other browsers, but for now, please use Chrome. Reach out to support@dagworks.io for questions/help.