Building an Org Chart with Mermaid

Building an Org Chart with Mermaid

The other day someone asked me about creating an Org Chart in HTML instead of using Visio. Since visualizing a fast changing organizational structure for investors, potential customers, and curious journalists is a common problem for people in the startup space, what I found seems to have broad applicability.

Unfortunately, I found far more questions than answers. Rather than simple working solutions, I came across complicated solutions that either required a lot of manual data entry or an IT department to set up and use.

So, I dug in and came up with a working solution which focuses on ease of deployment by a team without deep technical knowledge. This article details my prototype solution using Python code and a csv file to produce a Mermaid flowchart.

Requirements

While researching, I found that there are some excellent Javascript frameworks, but many unexpectedly require a lot of manual data entry. The ones which were more automated required setting up a server, installing the Javascript package locally. So I want back to the requestor and received the following requirements

Want MoSCoW Ranking
Multiple people maintain the list Must
Simple workflow for non-tech end user Must
Straight lines between nodes Must
Employee IDs non-contiguous integers Must
Update the employee list constantly Should
Easy to embed in a department web page Should
Easy to export to an image file Should
Use of identifying icons for rank Should
Use different shapes, colours for visual identity Could

Research

Mermaid.js has long been a go-to for me. Its Markdown-like syntax makes it easy to write and understand.

Mermaid's growing popularity means that it's embedded by default in many popular products, like Hack.io, Notion, Obsidian, and many, many others. Its popularity also means that it is in active development. Every time I check back to the project's website, it seems there are new features or types of graphs added. So, it's unlikely to be a project which will die from neglect soon.

Being easy to understand and embedded in software I use has an incredible advantage: I actually use it! And I use it multiple times a week, often at the drop of a hat, to outline things I'm brainstorming, draw up a quick schedule, profile relationships in a database, and so on. That means it is a thinking tool while things are still in draft and will change frequently. At the same time, its professional looking output is a pleasure to share with others.

Compared to other JavaScript frameworks, Mermaid's output was not as pretty, but its usefulness significantly higher. It definitely won on customizability and easy use by anyone who does not consider themselves "technical".

Mermaid Overview

Mermaid has several chart types. For this project, the flowchart (previously called graph) was most appropriate. It is easy to lay out nodes in a hierarchy from top-down or left-right.

flowchart TD
    A --> B

produces…

flowchart TD A --> B

and

flowchart LR
    A --> B

produces…

flowchart LR A --> B

Another useful feature of Mermaid is that node definitions can be separate from the graph definition. I find this much easier to read when returning to a complicated graph, months later, to figure out the logic. For example, demonstrating the full range of shapes and lines available in a flowchart in this code snippet, the nodes are defined at the beginning, followed by the graph definition.

flowchart LR

a(Node with round corners)
a1[Node with square corners]
b([Stadium])
c[[subroutine]]
d[(Database)]
e((Circle))
f>Asymetric]
g{Rhombus}
h{{hexagon}}
i[/Parallelogram to the right/]
j[\Parallelogram to the left\]
k[/Trapazoid\]
l[\Trapazoid updside down/]
m(((Double circle)))

a --- a1
b --> c
d -.- e
f ==> g
h --> i & j
j --> k & l --> m
flowchart LR a(Node with round corners) a1[Node with square corners] b([Stadium]) c[[subroutine]] d[(Database)] e((Circle)) f>Asymetric] g{Rhombus} h{{hexagon}} i[/Parallelogram to the right/] j[\Parallelogram to the left\] k[/Trapazoid\] l[\Trapazoid updside down/] m(((Double circle))) a --- a1 b --> c d -.- e f ==> g h --> i & j j --> k & l --> m

Mermaid supports multiple curve styles include basis, bumpX, bumpY, cardinal, catmullRom, linear, monotoneX, monotoneY, natural, step, stepAfter, and stepBefore. After testing each option, stepAfter was the closest to what I was looking for. In Obsidian, for example, just start the code block with an init which includes the type of curve to use.

%%{ init: { 'flowchart': { 'curve': 'stepAfter' } } }%%
graph LR

My preference in a project like this is to use Obsidian, which makes it easy for others to see and play with the code. While Obsidian's version of Mermaid lags behind the latest whiz-bang version, Mermaid maintains a handy live editor with the latest version. For my uses in building an organization chart from a spreadsheet, I planned to copy and paste output from the script.

Planning

My initial plan was to receive a CSV (comma separate value) file and process it with Python. All modern spreadsheets import and export CSV virtually seamlessly, making it easy for employees to work together to keep the org list updated. Python easily reads and writes CSV.

With the need for privacy high in the project, I did not have an actual spreadsheet to work with for prototyping, so I used the Chinook sample SQLite database. Peter Stark generously maintains the Chinook database as part of the SQLite Tutorial site. If you’d like to learn more about SQLite and SQL, the site is terrific.

I exported the Employees table to CSV and changed one ID to be non-contiguous. I also added a department name and three fictitious departments: Administration, Sales and IT. When exporting, I chose tab (\t) as the delimiter instead of , --- this is my preference, not deeply meaningful.

I wanted visual representations of the departments. Mermaid provides subgraphs which create background fills and titles for the subgraphs. Knowing the area organizes as a matrix organization, some people have complicated reporting structures, often represented by dotted lines.

%%{ init: { 'flowchart': { 'curve': 'stepAfter' } } }%%
flowchart TD
subgraph Sales
    A --- B
end
subgraph IT
    C --- D & E
    A -.- E
end
%%{ init: { 'flowchart': { 'curve': 'stepAfter' } } }%% flowchart TD subgraph Sales A --- B end subgraph IT C --- D & E A -.- E end

Mermaid recently added support for Markdown inside nodes. The node string must be wrapped in double-quotes and a backtic like this: "` Markdown string `". The usual <br> and \n characters do not force a newline within such a Markdown string, though. Instead, format the string to be multi-lined (as shown in the final code sample below).

The latest versions of Mermaid Flowchart allow for basic fontawesome icons (the list of supported icons is here). I chose three icons for my prototype:

fa:fa-address-book
fa:fa-user-circle
fa:fa-user

Writing the Code

The code for my solution, including a csv testing file and sample output, is available from my GitHub repository. For insight into how to understand and modify the code, read on!

Getting the Data

The first step in any project like this is to get the data. We wanted to use Pandas for its ability to read CSV files. Personally, I find Pandas cumbersome for working with string data, so dumped everything into an SQLite database. The database doesn't have to persist, so we just use an in-memory database that disappears when the program stops running. SQLAlchemy is a terrific way to set this up.

The following code snippet shows the necessary imports, creating the database, reading the CSV and putting it into the database. For those interested in peak efficiency, yes, the two lines of Pandas could be written with Python's CSV module without resorting to the Pandas module, but not (as far as I'm aware) in two lines.

import sqlalchemy as db
from sqlalchemy import create_engine
from sqlalchemy import Column, Integer
from sqlalchemy.ext.declarative import declarative_base
import pandas as pd

# Set up SOLite db
engine = create_engine('sqlite:///:memory:', echo=False)

Session = sessionmaker(bind=engine)
session = Session()

Base = declarative_base()

# Read the CSV and put into database
emps = pd.read_csv("chinook.tsv", sep='\t')
emps.to_sql('employee', engine)

Defining the SQLAlchemy interface comes next and creating the connection to the database. In the full code, we define an interface to all the columns, but a more stripped down definition would only include the required columns.

#   SQLAlchemy interface
class Employee(Base):
    __tablename__ = 'employee'

    EmployeeId = Column(Integer, primary_key=True)
    LastName = Column(String)
    FirstName = Column(String)
    Title = Column(String)
    ReportsTo = Column(Integer)
    Department = Column(String)
    City = Column(String)

    def __repr__(self):
        return f"<Employee(FirstName={self.FirstName}, LastName={self.LastName})"

Base.metadata.create_all(engine)

This implementation really shines right here, though it might not be obvious. Instead of being forced into a single way of representing users and a few data items, you are free to use much more data in the node representations. The only limitation here is one's imagination.

In our prototype, we're using several features, each of which can be re-assigned. Name, employee type and location are each on their own line. Fontawesome icons and node shapes show employee type. Subgraphs to capture department information.

Starting the Action

Thinking that this might become part of a larger project, we organized the code as a class, making it easy to import elsewhere. The downside is that running this a standalone program requires creating the class to start the code working.

So, main() simply creates an instance of the class. __name__ == '__main__' runs only when the code runs as a script, not when imported into other files.

def main():
    oc = OrgChart()

if __name__ == '__main__':
    main()

The class definition is super simple, only calling self.display() which sets off the chain of processing which ends in the printout a Mermaid flowchart.

class OrgChart(object):
    """Initializes OrgChart class"""
    def __init__(self):
        super(OrgChart, self).__init__()
        self.display()

Display Set Up

The display method sets up the beginning and ending of flowchart for inputting into an HTML page, Obsidian or other text editor where Mermaid is available.

    def display(self):
        print("""
```mermaid
%%{init: {"flowchart": {"curve": "stepAfter"}} }%%
flowchart TD
classDef Lethbridge fill:#FFCE54
classDef Calgary fill:#A0D468
classDef Edmonton fill:#8CC152
""")
        self.processOrg()
        print("```")

The classDef calls change the fill colour of nodes what have a style class associated with them. Mermaid applies the styles when laying out the graph with the node name, three colons (:::) and the class name. For example, Michael6:::Calgary

You should definitely get a designer, or at least someone with a bit of colour sense, to pick your colours. We deliberately chose something garish to prove the point, not as an appropriate solution to anyone's branding needs.

Besides fill, node classes can define stroke (colour), and stroke-width (px) to provide a border around the node. The elements are comma separated.

Processing Users

The processOrg method fetches the list of all user IDs and iterates through them. The code breaks into two parts: a node definition for each employee and the graph definition showing the relationships between employees.

    def processOrg(self):
        """docstring for fname"""
        users = self.getUserIDs()
        managers = {}
        # Node definitions
        for x in users:
            x= x[0]
            uid = self.getUIDString(x) 
            user = self.getUser(x)
            print(self.getEmployeeCard(x))
            if user.ReportsTo and user.ReportsTo >= 1:
                ruid = int(user.ReportsTo)
                ##  Add EmployeeId to Manager's list
                if ruid in managers:
                    managers[ruid].append(int(x))
                else:
                    managers[ruid] = [int(x)]
        print("\n")
        # Graph definition
        for m in managers:
            ruid = self.getUIDString(m, withClass=True) 
            manager = self.getUser(m)
            print(f"subgraph {manager.Department}")
            for e in managers[m]:
                uid = self.getUIDString(e, withClass=True)
                print(f"\t{ruid} --> {uid}")
            print("end")

The user IDs return as a list of single tuples (i.e. [(1,), (2,)]) [see section Getting All User IDs below]. So we change x to x[0] for improved readability throughout the rest of the code. We are also using x here instead of something more descriptive as a reminder that this is just an id and does not represent an Employee object.

The code then iterates through each id, creating a UIDString which will serve the Mermaid node label (see section Getting UIDString as Mermaid Label below). We differentiate between a uid (user id) and reports to uid (ruid) to build the connections between employees.

Every user who reports to someone, gets added to the manager dictionary, where the manager's id is the key to a list of employee ids. In the second part of the code, the dictionary is parsed, and each manager links to the employees reporting to them.

Getting All User IDs

Processing Users calls getUserIDs, which returns the list of all user ids.

    def getUserIDs(self):
        """Returns list of user ids for all users"""
        results = session.query(Employee.EmployeeId).order_by(Employee.EmployeeId).all()
        return results

Getting UIDString as Mermaid Label

To define Mermaid Flowchart nodes requires a label, a type of shape, and the text for display inside the shape. For example, to display a stadium shape with the text "Hello World", looks like this: label[(Hello World)].

flowchart TD label([Hello World])

The label should be unique, so Technically, the ID is sufficient, but we chose this label for two reasons: Early versions of Mermaid didn't like numbers at the start of labels. The more important reason is for readability. If non-technical people late want to change the code, tweaking the automated output, just an id number is hard to understand.

With that background, the code is:

    def getUIDString(self, uid, withClass=False):
        """Returns a user string for definition of a node or in the graph.
        Returns FirstName+ID for definitions.
        Returns FirstName+ID+:::City for nodes where :::City is a defined Mermaid style class"""
        user = self.getUser(uid)
        if not user:
            print("Found no user for UID", uid, user)
        fname = user.FirstName
        styleclass = user.City
        if withClass:
            uidstring = f"{fname}{uid}:::{styleclass}"
        else:
            uidstring = f"{fname}{uid}"
        return uidstring

We check for whether the string should return with or without a styling class. For demonstration, City is the hard-coded class. When costuming this code, instead of the city name, the variable could change to anything appropriate.

The Employee Card

Defining how each employee's card looks is important. When defining the node, there is no styling definition, so we get the UID string without it (see the section Getting UIDString as Mermaid Label above).

    def getEmployeeCard(self, id):
        """Returns a string representation of a user card for user id"""
        user = self.getUser(id)
        uid = getUIDString(id)
        title = user.Title.lower()
        if 'manager' in title:
            icon = "fa:fa-address-book"
            shapestart = "["
            shapeend = "]"
        elif 'staff' in title:
            icon = "fa:fa-user-circle"
            shapestart = "["
            shapeend = "]"
        else:
            icon = "fa:fa-user"
            shapestart = "{{"
            shapeend = "}}"
        response = f"""{uid}{shapestart}"`{icon} **{user.FirstName} {user.LastName}**
    {user.Title}
    *{user.City}*`"{shapeend} """
        return response

This code looks in the employee's title for a hint about their position , differentiating between managers, staff, and everyone else. It could easily change to look at a different database column. Adding more elif statements would increase the number of user types.

We wanted each card to have an icon as a visual representation of the employee's title. Each of the employee types has a different icon from the available fontawesome set.

Each employee also has a different shape. The Research section above had the full list of available shapes and their associated starting and ending indicators.

As noted in the Research section above, the Mermaid Markdown code doesn't use <br> or \n for line breaks. The three lines of the card are instead literally on three separate lines. Python's triple quote """ allows for formatting this seamlessly and produces code that looks like this.

Andrew1[/"`fa:fa-address-book **Andrew Adams**
    General Manager
    *Edmonton*`"\] 

Things to Improve

The code works as provided, but like anything in software, could improve!

The current code assumes that the csv file is in the same folder. The script could receive the filename of the CSV file when called from the command line, making it much more flexible.

I'm passing around a lot of user IDs between methods. This is memory efficient in terms of data passing, but means a lot more database accessing than necessary. When processing a large spreadsheet, this could significantly slow down processing and, ultimately, memory usage. I could pass around Employee instances instead.

If a manager has no people reporting to them, or an employee doesn't report to anyone, their nodes appear at the top level of the graph as independent nodes. No styling applies to these nodes. There is probably a better way of dealing with these situations.

I probably should not be assuming that each manager is the head of their own department and needs their own subgraph. Departments should be able to have multiple managers, team-leads, and so on. Because subgraphs can nest, though, it might work out okay, but currently the code does not produce nesting. A better solution would go through the departments and add nested manager graphs belonging to the department.

Finally, instead of copy/pasting, a call to the Mermaid CLI could produce the image. This requires installing the Mermaid CLI, though. In the meantime, the copy-paste method, though manual, reliably works.

Conclusion

The final output looks like this.

%%{init: {"flowchart": {"curve": "stepAfter"}} }%%
flowchart TD
classDef Lethbridge fill:#FFCE54
classDef Calgary fill:#A0D468
classDef Edmonton fill:#8CC152

Andrew1["`fa:fa-address-book **Andrew Adams**
    General Manager
    *Edmonton*`"] 
Nancy2["`fa:fa-address-book **Nancy Edwards**
    Sales Manager
    *Calgary*`"] 
Jane3{{"`fa:fa-user **Jane Peacock**
    Sales Support Agent
    *Calgary*`"}} 
Margaret4{{"`fa:fa-user **Margaret Park**
    Sales Support Agent
    *Calgary*`"}} 
Michael6["`fa:fa-address-book **Michael Mitchell**
    IT Manager
    *Calgary*`"] 
Robert7["`fa:fa-user-circle **Robert King**
    IT Staff
    *Lethbridge*`"] 
Laura8["`fa:fa-user-circle **Laura Callahan**
    IT Staff
    *Lethbridge*`"] 
Steve27{{"`fa:fa-user **Steve Johnson**
    Sales Support Agent
    *Calgary*`"}} 

subgraph Administration
    Andrew1:::Edmonton --> Nancy2:::Calgary
    Andrew1:::Edmonton --> Michael6:::Calgary
end
subgraph Sales
    Nancy2:::Calgary --> Jane3:::Calgary
    Nancy2:::Calgary --> Margaret4:::Calgary
    Nancy2:::Calgary --> Steve27:::Calgary
end
subgraph IT
    Michael6:::Calgary --> Robert7:::Lethbridge
    Michael6:::Calgary --> Laura8:::Lethbridge
end

Which creates the following chart.

Sample Org Chart Output.

You can grab the code, CSV file and sample output from my GitHub repository.

Space