
A brief introduction to directed acyclic graphs (DAGs)
A directed acyclic graph (DAG), as its name suggests, is a directed graph that does not contain any cycles. To break this description down further:
- In graph theory, a directed graph is an arrangement of connected nodes.
- The connections between nodes go in a certain direction – between vertices from point to point. For this reason, these connections are often called “directed edges.” For example, the connection between Node A and Node B might point from A to B, but not from B to A.
- If a directed graph is cyclic, it contains closed loops or cycles, allowing you to make your way back to your initial node after visiting a series of other nodes.
- Conversely, if a directed graph is acyclic, there are no closed loops or cycles in it – one can progress point to point in the graph, but it never finishes at the start of the graph.
DAGs provide a method for assessing the relationship(s) between certain data points – whether those points represent events, dependencies, tasks, or the like.
For a more in-depth look at DAGs through the lens of clinical epidemiology, refer to Digitale, Martin, and Glymour’s Tutorial on Directed Acyclic Graphs.
Putting DAGs to use
DAGs have a wide variety of practical applications. For example, Sangmin Byeon and Woojoo Lee sketch how DAGs can be used in clinical research. The coding knowledge base GeeksForGeeks reports that DAGs are also helpful for data flow analysis and task scheduling. Questions or problems that involve analyzing causal or hierarchical relationships tend to be situations where DAGs shine.
In terms of the actual logistics of working with DAGs, there are plenty of tools available depending on the intended use case. DAGitty is a helpful browser-based application for working directly with DAGs. The distributed event streaming platform Apache Kafka is helpful for incorporating DAGs into other applications, especially when data streaming is a necessary component.
Since we can use DAGs to study relationships – and can harness the power of streaming data with the right tools – this means DAGs can be leveraged to help deliver decision intelligence.
Using DAGs in decision support
How does one take advantage of DAGs to support decision-making? The short answer is that DAGs can provide the bedrock for machine-guided analysis.
To provide a slightly simplified picture, DAGs let you model the thought process that a subject matter expert would follow as they evaluate a set of premises and draw a conclusion. By letting a machine follow that process and repeatedly apply it at a much larger scale than an individual person could handle, you can take a vast amount of noisy information and make it usable. In other words, DAGs are one tool for turning data into intelligence. (As a side note, dealing with data on a massive scale is where distributed event stream processing infrastructure like Apache Kafka comes in handy – one of many reasons why Cogynt adopts it.)
This principle represents one of Cogynt’s core operating concepts. With Cogynt, you can construct DAGs to detect key indicators in a data stream and alert you to what they could mean. Cogynt lets your organization increase its analytical capabilities, empowering you to make decisions more quickly and with greater confidence.
Cogynt’s use of directed acrylic graphs can provide the decision maker with insight traceability and explainability. Within our Analyst Workstation, the outcome of Cogynt findings can be presented within a directed acrylic graph to trace how specific prior events contributed to the result. This not only provides the advantage of allowing the decision maker to understand (explainability) how an insight was ascertained but also can facilitate the correction and optimization of logic within the Cogynt model.
The question to ask, however, is at what point in the analytical process DAGs should be included. There are certain decisions that we simply would not want to entrust to a model or machine, even if we had years of reliable machine learning informing its decision.
For example, most of us would accept having a machine decide to temporarily disable logging into a bank account after several failed attempts, because that might indicate (and deter) an attempt to compromise the account. Yet we probably wouldn’t want a machine to file criminal charges against an account it suspects is laundering money. In such situations, it makes much more sense for a human to review the machine’s findings and suggestions, then judge what to do for themselves – aka human-in-the-loop.
For human-in-the-loop to be effective in machine-assisted (AI/ML) decision support (and for the purpose of model refinement), it is imperative for the AI or respective analytic system to offer traceability and explainability. As described by “Humans need to learn internalize the lessons for themselves in order to actualize the meaning of the learning, which is very different from its meaning as in the phrase ‘machine learning’ – explore more by visiting this topic in an article within Harvard Data Science Review.
In this respect, DAGs are at their most useful when doing the work of an analyst – i.e., reviewing and sorting through information in search of indicators or patterns Putting DAGs at lower levels of the decision tree, where their role is to handle “raw” informational materials and look for meaning in them, provides two key advantages. First, it utilizes machine speed and processing power to filter out the noise. Second, it leaves the important decisions to the right people. This is the true benefit of decision intelligence: Taking care of the basics so that you can focus on what’s most important. The use of DAGs are among core features of a modern decision intelligence platform.