Cannot Afford Tableau? How About the Free Apache Superset?

Data Visualization Tools

I've been looking to visualize some data for a few projects lately. My experience up to this point on data visualization is Microsoft Excel pivot tables and pie/bar charts. The cool kids use more sophisticated tools to create a mix of amazing live graphs in a dashboard like this one. Tableau tends to be the de facto standard for Business Intelligence visualization, but what if you cannot afford $70/month for Tableau? What else is out there? A few Tableau options:
Microsoft Power BI: Possibly free if you already have an Office365 account. Very nice, easy to use, and matches Tableau pretty closely.

Google Data Studio: Free. As in Google free. You know the old adage "If the product is free then you are the product." Nice mix of maps and dashboards and formatting options. Definitely worth spending some time with.

Apache Superset: Free. As in Open Source free. Definitely need to RTFM, nice selection of charts, if more difficult to customize than the other tools. If you have simple needs though, Apache Superset may be a fit. Plus, as an Open Source option, features will be added over time by the maintainers and the community.
 

 

Installing Apache Superset

There are two methods of installing Superset: 1) Into a docker container (easy) 2) In a Python Virtual Environment (As delightful as any Python solution installπŸ€•πŸ). I suggest the docker route.

The Docker Route

The secret to get this working the first time is RAM. I failed the Docker route in a Ubuntu Linux VM and in MacOS Docker because I did not assign enough RAM for Docker to use. In the case of the Ubuntu VM, I needed to assign 8 GB of RAM for success. In the case of MacOS Docker, I needed to increase the default Docker RAM from 2 GB to 4 GB for success. H/T to @willbarrett for solving both issues for me on the Superset GitHub page. Here are the simple install instructions:
  • Increase your Docker RAM accordingly
  • Open a terminal session
  • $ git clone https://github.com/apache/incubator-superset/
  • $ cd incubator-superset
  • $ docker-compose up
  • Wait
  • You will eventually see a message that looks something like this:
  • Running on http://0.0.0.0:8088/ (Press CTRL+C to quit)
  • Congratulations, Apache Superset is now running and can be accessed at localhost:8088

Using Apache Superset



To be honest, I never read the instructions. That is always a good test as to how intuitive something is. As a test, I wanted to see if I could create a map graph based on the latest John Hopkins COVID-19 dataset. These are the steps I took:

Adding New Data

I downloaded the latest COVID-19 dataset from the John Hopkins site. I used the latest daily report CSV from this directory. By default, the "examples" database in Superset does not allow uploading CSV files, so I had to flip that switch:

Choose Sources, then Databases
Choose "Edit Record" next to the examples database
Check the box next to "Allow Csv Upload", scroll down and click Save
 


Now choose Sources, Upload a CSV



Come up with a Table Name to create in the Examples database, and choose the CSV file that you downloaded. Scroll down to the bottom and click Save.

Creating a Chart

This is the fun part. Making something beautiful and easy to understand.

Click New and then Chart



Choose the table you just created and then click the word "Table" to change the visualization type to something more fun like Country Map and click Create New Chart




OK, let's fix the default values for the Country Map to map the states of the US.



Time Range: Click "Last week", click "No filter" and then "OK"

Under the ISO 31622-2 code error, choose the column in table that holds these country/county/state codes. I added them to my CSV before uploading with the format US-NN where NN is the two letter state abbreviation. You can find more about ISO 31622 here.

Change Metric to be the number that you want to map to each state. I chose the MAX of the latest COVID-19 daily value for that state.

Change the Country Name to the country you want to map. I chose "Usa"

Change the Linear Color Scheme to suit your liking. I chose a yellow to red scheme.

Here are my final selections:



Click "Run Query"

If you are lucky, you will get a map that looks something like this and when you hover over different sections, your chosen Metric will display

Name your chart by clicking "untitled" typing a name and hitting Enter

Creating a Dashboard

Dashboards are where you collect charts and, optionally, share your visualizations with the world.

Choose New and then Dashboard





Give your dashboard a name. I'll call mine "Blog Dashboard"

Drag a new tab over until a blue line shows up right under the dashboard title and then drop the tab. 


Rename the tab

Drag a header over, then a row, maybe a column, a divider, another lower header, a lower row and maybe a lower column.







After resizing your column and renaming your headers, you should have a blank canvas like this:
Click on "Your charts & filters" and start dragging charts to your dashboard.



When you are done arranging, click Save Changes and you will have your first dashboard. Congratulations

If you want to make your dashboard available to others, just choose Share Dashboard and make note of the URL

Closing Thoughts

Apache Superset is a nice tool if you do not have access to a Tableau nor Power BI license and you think Google is icky. Note: The Superset built-in web server is only accessible locally to the machine it is running on. There are nice instructions on the Superset site on using the gunicorn web server instead.
As always, I hope you enjoyed this post and I welcome your feedback.


Comments