We are going to take a look at Altair which is a data visulization library for Python. What is unique abiut Altair compared to other packages experienced on this blog is that it allows for interactions.

ad

The interactions can take place inside jupyter or they can be exported and loaded onto websites as we shall see. In the past, making interactions for website was often tught using a jacascript library such as d3.js. D3.js works but is cumbersome to work with for the avaerage non-coder. Altair solves this problem as Python is often seen as easier to work with compared to javascript.

Installing Altair

If Altair is not already install on your computer you can do so with the following code

pip install altair vega_datasets  OR  conda install -c conda-forge altair vega_datasets

Which one of the lines above you use will depend on the type of Python installation you have.

Goal

We are going to make some simple visualizations using the "Duncan" dataset from the pydataset library using Altair. If you do not have pydataset install on your ocmputer you can use the code listed above to install it. Simple replace "altair vega_datasets" with "pydataset." Below is the initial code followed by the output

import pandas as pd from pydataset import data df=data("Duncan") df.head()

In the code above, we load pandas and import "data" from the "pydataset" library. Next, we load the "Duncan" dataset as the object "df". Lastly, we use the .head() function to take a look at the dataset. You can see in the imagine above what variables are available.

Our first visualization is a simple bar graph. The code is below followed by the visualization.

import altair as alt alt.Chart(df).mark_bar().encode( x= "type", y = "prestige" )

In the code above we did the following,

  1. Line one loads the altair library.
  2. Line 2 uses several functions together to make the bar graph. .Chart(df) loads the data for the plot. .mark_bar() assigns the geomtric shape for the plot which in this case is bars. Lastly, the .encode() function contains the information for the variables that will be assigned to the x and y axes. In this case we are looking at job type and prestige.

The three dots in the upper right provide options for saving or editing the plot. We will learn more about saving plots later. In addition, Altair follows the grammar of graphics for creating plots. This has been discussed in another post but a summary of the components are below.

  • Data
  • Aesthetics
  • Scale.
  • Statistical transformation
  • Geometric object
  • Facets
  • Coordinate system

We will not deal with all of these but we have dealt with the following

  • Data as .Chart()
  • Aesthetics and Geometric object as .mark_bar()
  • coordinate system as .encode()

In our second example, we will make a scatterplot. The code and output are below.

alt.Chart(df).mark_circle().encode( x= "education", y = "prestige" )

The code is mostly the same. We simple use .mark_circle() as to indicate the type of geometric object. For .encode() we made sure to use two continuous variables.

In the next plot, we add a categorical variable to the scatterplot by manipulating the color.

alt.Chart(df).mark_circle().encode(     x= "education",     y = "prestige",     color='type' )

The only change is the addition of the "color"argument which is set to the categorical vareiable of "type."

It is also possible to use bubbles to indicate size. In the plot below we can add the income varibale to the plot using bubbles.

alt.Chart(df).mark_circle().encode(     x= "education",     y = "prestige",     color='type',     size="income" )

The latest argument that was added was the "size" argument which was used to map income to the plot.

You can also facet data by piping. The code below makes two plots and saving them as objects. Then you print both by typing the name of the objects while separated by the pipe symbol (|) which you can find above the enter key on your keyboard. Below you will find two different plots created through this piping process.

educationPlot=alt.Chart(df).mark_circle().encode(     x= "education",     y = "prestige",     color='type',  ) incomePlot=alt.Chart(df).mark_circle().encode(     x= "income",     y = "prestige",     color='type', ) educationPlot | incomePlot

With this code you can make multiple plots. Simply keep adding pipes to make more plots.

Interaction and Saving Plots

It is also possible to move plots interactive. In the code below we add the command called tool tip. This allows us to add an additional variable called "income" to the chart. When the mouse hoovers over a data-point the income will display.

However, since we are in a browser right now this will not work unless w save the chart as an html file. The last line of code saves the plot as an html file and renders it using svg. We also remove the three dots in the upper left corner by adding the 'actions':False. Below is the code and the plot once the html was loaded to this blog.

interact_plot=alt.Chart(df).mark_circle().encode(     x= "education",     y = "prestige",     color='type',     tooltip=["income"] ) interact_plot.save('interact_plot.html',embed_options={'renderer':'svg','actions':False})

function showError(el, error){ el.innerHTML = ('

' + '

JavaScript Error: ' + error.message + '' + "

This usually means there's a typo in your chart specification. " + "See the javascript console for the full traceback." + '

');
throw error;
}
const el = document.getElementById('vis');
vegaEmbed("#vis", spec, embedOpt)
.catch(error => showError(el, error));
})(vegaEmbed);

I've made a lot of visuals in the past and never has it been this simple

Conclusion

Altair is another tool for visualizations. This may be the easiest way to make complex and interactive charts that I have seen. As such, this is a great way to achieve goals if visualizing data is something that needs to be done.

Comment