Pandas gives us powerful and easy to use functions for statistical analysis. We can perform different operations and deliver powerful insights with pandas. Statistics is at the centre of any analytical and data science solution. With it we can understand the characteristics of both the sample and population sizes. We can easily make descriptive and inferential analyses of our data with Pandas functions. In this post we will look at Pandas Statistical Functions and how to apply them. This post is an extension of the previous post on Pandas Mathematical Functions. You can download the dataset for this post from **here**.

**Pandas Statistical Functions**

**Load Data**

import pandas as pd import numpy as np titanic_df=pd.read_csv('titanic.csv') titanic_df.head()

**Summary statistics**

titanic_df.describe(include='all') # To also include categorical summary titanic_df.describe()

**Show Min Value**

titanic_df['Age'].min() # min of age column only titanic_df.min() # min of every column

**Show max value**

titanic_df['Age'].max() # max of age column only titanic_df.max() # max of every column

**Show mode of values**

titanic_df['Age'].mode() # max of Age

**Show Median Value**

titanic_df['Fare'].median() # Median of Fare

**Show sum of values**

titanic_df['Fare'].sum() # max of Fare

**Show frequency of each category**

titanic_df['Pclass'].value_counts() # freequency of passengers in each class

**Calculate mean**

titanic_df['Age'].mean()

**Calculate standard deviation**

titanic_df['Age'].std() # std for age only titanic_df.std() # std for all columns

**Show Variance**

titanic_df['Age'].var() # variance for age column only titanic_df.var() # variance for all numeric columns

**Show Covariance**

titanic_df[['Age','Fare']].cov() # Covariance of Age and Fare titanic_df.cov() # Covariance for entire dataframe

**Correlation**

Correlation Measures the relationship between two variables

**1. Pearson Correlation**

Measures the linear relationship between two variables. Pearson correlation coefficient is the default correlation method in Pandas Data Frame.

**NOTE**: Pearson Correlation assumes that the data is normally distributed. It’s sensitive to outliers

titanic_df.corr(method='pearson') titanic_df.corr() # Or don't specifiy since it's the default

**2. Spearman Rank Correlation**

Measures the monotonic relationship between two variables. Does not assume normal distribution of the dataset. Has a growth rate of* O(nlogn)*

titanic_df.corr(method='spearman')

**3. Kendall Rank Correlation**

It measures the monotonic relationship between two variables. It does not assume normal distribution of the data. It has a growth rate of O(n^2) hence tends to be a bit slower on large dataset.

titanic_df.corr(method='kendall')

**Calculate Kurtosis**

titanic_df.kurtosis()

**Calculate Skew**

titanic_df.skew()

**Compute Percent change**

Calculates the percent change over a given number of periods. Handle missing values (Nulls) before computing the percent change).

titanic_df['Fare'].pct_change(periods=3)

**Rank**

ranks the data and shows the ties in data values

titanic_df.rank().head()

For complete code check the jupyter notebook **here**.

**Conclusion**

In this post we have looked at various commonly used Pandas Statistical Functions. Statistics is important in analysing and interpreting analytical results and insights. Pandas provides us with tons of statistical functions to work with data. In the next post we will look at Window Functions in Pandas and how to apply them. To learn about Pandas Mathematical Functions check our previous post **here**.