This post extends the post on Boxplot in matplotlib and seaborn. Boxplot also called Box-and-Whisker is a type of visualization used for summarizing the characteristics of groups of numerical data points. It shows important statistics such as Minimum values, Maximum, First Quartile (25%), Median (Second Qquartile/50%), Third Quartile (75%), measure of dispersion, the distribution of the dataset and outliers. Boxplot is one of the important type of visualization in exploratory data analysis. In this post we will look at what’s Boxplot, its various components, when to use it and how to use boxplot in plotly. Download the data for this post here.

plotly-logo

Components of a BoxPlot

Boxplot provides data summaries for some of important statistics as below;

  1. Minimum Value. The minimum value is the lowest value in the dataset excluding the outlier.
  2. Maximum Value. Maximum value is the largest value in dataset excluding outlier.
  3. Lower Quartile (Q1). This is the 25-percentile measure. Also referred to as the first quartile.
  4. Median also referred to as 2nd Quartile is the mid-point of the dataset. It’s represented by a line cutting across the Box to two parts.
  5. Upper Quartile. Also referred to as 3rd quartile it’s the 75th-percentile measure of the dataset.
  6. Whiskers are straight lines at the end of each side of the box. The represent the measures outside the box. They represent the 25% lower and upper measure of the dataset.
  7. Interquartile Range (IQR). This is represented by the box. It comprises of the 50% (range between 25% and 75%) of the data points represented by the box.

When to Use Boxplot

  1. Measure of median. The straight line across the box denotes the median. Median is not sensitive to outliers.
  2. Detect Outliers. Boxplot are integral in showing outliers in the dataset. Outliers are data points outside the whiskers.
  3. Measure of Dispersion. Boxplot is used to show the variability of the dataset. These include the range of lowest and highest values and the inter-quartile range.
  4. Show distribution of the dataset. We can see how the data is distributed with a boxplot. Normal distributed data is shown when the median (straight line) across the box is able to dissect the box into two symmetries. When the median is close to the lower quartile then the data is positively skewed, if the median is closer to upper quartile then the dataset is negatively skewed.

Boxplot in Plotly

Import Required Libraries

                    

import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

Load Data

                    

ex_rate_df=pd.read_csv('Exchange_Rates.csv')
ex_rate_df['Country']=ex_rate_df['LOCATION']
ex_rate_df['Year']=ex_rate_df['TIME']
ex_rate_df['Rate']=ex_rate_df['Value'].round(2)
ex_rate_df.head()

plotly-load-exchange-rate

Simple Boxplot

                    

fig = px.box(ex_rate_df[ex_rate_df['Country'].isin(['FRA'])], y="Rate")

fig.update_layout(title={'text': 'Exchange Rate for FRA','y':0.95,'x':0.5, 'xanchor': 'center','yanchor': 'top'},
                          legend=dict(yanchor="top",y=0.95,xanchor="right",x=0.95),
                  autosize=True,margin=dict(t=70,b=0,l=0,r=0), xaxis_title='Currency', yaxis_title='Rate',
                  font=dict(size=20, family='Times New Romans', color='brown') )

fig.update_xaxes(showline=True, linewidth=1, linecolor='white', gridwidth=3, gridcolor='white', mirror=True)
fig.update_yaxes(showline=True, linewidth=1, linecolor='white', gridwidth=3, gridcolor='white', mirror=True)

fig.show()

plotly-simple-boxplot

Multi-Variable Boxplot

                    

fig = px.box(ex_rate_df[ex_rate_df['Country'].isin(['FRA','CAN','AUS'])], y="Rate", x='Country', color='Country')

fig.update_layout(title={'text': 'Exchange Rate for FRA, CAN and AUS','y':0.95,'x':0.5, 'xanchor': 'center','yanchor': 'top'},
                          legend=dict(yanchor="top",y=0.95,xanchor="right",x=0.95),
                  autosize=True,margin=dict(t=70,b=0,l=0,r=0), xaxis_title='Country', yaxis_title='Rate',
                  font=dict(size=20, family='Times New Romans', color='brown') )

fig.update_xaxes(showline=True, linewidth=1, linecolor='white', gridwidth=3, gridcolor='white', mirror=True)
fig.update_yaxes(showline=True, linewidth=1, linecolor='white', gridwidth=3, gridcolor='white', mirror=True)
fig.show()

plotly-multi-vraiable-boxplot

Multi-variate Boxplot with Actual Data Points

                    

fig = px.box(ex_rate_df[ex_rate_df['Country'].isin(['FRA','CAN','AUS'])], y="Rate", x='Country', color='Country',points='all')

fig.update_layout(title={'text': 'Exchange Rate for FRA, CAN and AUS','y':0.95,'x':0.5, 'xanchor': 'center','yanchor': 'top'},
                          legend=dict(yanchor="top",y=0.95,xanchor="right",x=0.95),
                  autosize=True,margin=dict(t=70,b=0,l=0,r=0), xaxis_title='Country', yaxis_title='Rate',
                  font=dict(size=20, family='Times New Romans', color='brown') )

fig.update_xaxes(showline=True, linewidth=1, linecolor='white', gridwidth=3, gridcolor='white', mirror=True)
fig.update_yaxes(showline=True, linewidth=1, linecolor='white', gridwidth=3, gridcolor='white', mirror=True)
fig.show()

plotly-multi-vraiable-boxplot-with-actual-data-points

For complete code check the jupyter notebook here.

Conclusion

In this post we have looked at boxplot. Boxplots are data visualizations used to provide a high level summary of the numerical data points in the dataset. We can be able to see min and max values, quartiles, measure range, inter-quartile range and median and distribution of the data. An important use of Boxplot in data science and Machine Learning is in detecting outliers in the dataset. In the next post we will look at distribution plots and how to create them in plotly. To learn about area chart and how to create it in plolty check our previous post here. To learn about Boxplot in seaborn check our post here.

Boxplot in Plotly

Post navigation


0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x