Collaborate on Projects
Share project with your team/anyone who needs access, enabling you to quickly onboard new developers and collaborate on Hamilton dataflows. You can share a project by first navigating to the projects page and then clickingEdit
on your project and modifying who can see and/or write to your project.
Notes:
- (i) currently users need to be existing users of the Hosted Hamilton UI on DAGWorks to be added to a project. The ability to invite users to the platform is in the works, but until then, if a user does not have a Hosted Hamilton UI on DAGWorks account, you’ll need to ask them to sign up. Alternatively, reach out to us at support@dagworks.io and we can help provision their accounts.
- (ii) to create organizations and have people automatically join them upon sign up, please reach out to us at support@dagworks.io for help.
Track Executions
Every Hamilton dataflow executed with the Hosted Hamilton UI on DAGWorks is tracked and stored, along with a basic set of data quality/results. Results are streamed in, and this allows you to:- See results for every node in your dataflow as it executes
- See all past executions of your Hamilton dataflows
- Separate out by tags to track different environments, etc…
- Find/manage errors
- Understand and debug performance issues on a per-function basis
Lineage: Understand your artifacts and the code that produced it
The Hosted Hamilton UI on DAGWorks allows you to keep track of code and artifacts generated. This enables you to:- Visualize the structure of your dataflow.
- Determine lineage of any generated artifact
- Track lineage of, for example, features for machine learning, or inputs to a dataset.
- Search (coming soon) through all previously generated artifacts and link to the code that generated it.
See how your Hamilton dataflows change over time
DAGWorks allows for diffing of DAG versions and runs, enabling you to answer questions like:- How did the structure of my DAG change over time?
- How does the structure of my DAG change between configurations?
- How does the performance of my DAG change over time?
- How does the data my DAG generates change over time?
- navigating to the project versions page, selecting two versions of your DAG and then clicking
compare
. - navigating to the run history page, selecting two runs and then clicking
compare
.
DAGWorks Data Model
Projects
DAGWorks is centered around “Projects”. Projects are a collection of DAGs that share a common business goal. All DAGs in a project share the same codebase.Project versions
Every time you create a DAG by using the Hamilton Driver + DAGWorksTracker, it has the potential to create a new “project version”. Project versions are uniquely determined by the following attributes:- DAG Name (assigned by you when instantiating the DAGWorksTracker).
- Code version (derived from the code of the passed in python modules and final DAG structure).
Different configuration passed to the Hamilton Driver will not result in a different project version. Use tags to differentiate
when configuration changes. You will then be able to filter to projects/runs with those tags.
set_name(new_name)
on the driver. This will force a new
version to be logged. This is useful when calling DAG-manipulation functions, such as materialize
.
Note that if you archive a version (in the DAGWorks UI in the browse/versions page),
it will no longer be visible in the DAGWorks UI. That said, if you then save to the same version
(E.G. with the same DAG Name/code version), it will un-archive that version, and you will be
able to see it again.
Runs
Every time you execute a run attaching the DAGWorksTracker to Hamilton, it uploads a run. These are associated with a project and a project version, i.e. an instantiation of a DAG. This consists of metadata about the run, task-level summaries of data, and any information about errors you encountered. Tracking runs can have a marginal impact on performance for very short executions, as it connects to the server.
For longer executions, this impact disappears.
Behind the Scenes
When you run a DAG with the DAGWorksTracker, the following happens:- You instantiate a DAGWorksTracker with a project ID (created via the Platform UI) and an API key.
- On Hamilton Driver instantiation, a new “version” is created if code is changed or you provide a new DAG name. The DAGWorksTracker then saves the structure of your DAG to the Platform. You can view this before you execute your DAG.
- When the run is complete, the DAGWorksTracker logs relevant metadata to the Platform, and a
run
appears. - You can update, delete, and modify your projects (as well as archive your project’s DAG versions) in the Platform UI.