Getting a Graph Representation of a Pipeline in Apache Beam- 3 mins
Getting a pipeline representation in Apache Beam explained step-by-step.
Constructing advanced pipelines, or trying to wrap your head around the existing pipelines, in Apache Beam can sometimes be challenging. We have seen some nice visual representations of the pipelines in the managed Cloud versions of this software, but figuring out how to get a graph representation of the pipeline required a little bit of research. Here is how it is done in a few steps using Beam’s Java SDK.
TL;DR: Getting Graph Representation
If you just want to see a few lines that let you generate the DOT representation of the graph, here it is:
Now, if you want a slightly more comprehensive example, keep on reading.
A Full Example
Adding Maven Dependency
First, we need to add a dependency to the Maven file under
Now, we will need to add a few imports (assuming you already added the Maven dependency mentioned earlier):
To get the DOT representation of the pipeline graph we will be passing the pipeline object to the
PipelineDotRenderer class, and in this example, we are only logging the output to the console (hence the log4j imports).
That’s it. To see the code in action, run it from the command line:
This code will produce a DOT representation of the pipeline and log it to the console.
A Complete Example
In the next section, we will have a brief look at what can be done with the DOT representations.
Now that we have a DOT representation of the pipeline graph, we can use it to get a better understanding of the pipeline. For instance, you can generate an SVG or a PNG image from the data. Note that the generated graph might be a bit verbose, but gives a good overview of the pipeline graph.
Assuming that you have Graphviz tools installed, you can convert a DOT file to a PNG image using this command: