Pandas gives us powerful and easy to use functions for statistical analysis. We can perform different operations and deliver powerful insights with pandas. Statistics is at the centre of any analytical and data science solution. With it we can understand the characteristics of both the sample and population sizes. We can easily make descriptive and inferential analyses of our data with Pandas functions. In this post we will look at Pandas Statistical Functions and how to apply them. This post is an extension of the previous post on Pandas Mathematical Functions. You can download the dataset for this post from here.

## Pandas Statistical Functions

```                    ```
import pandas as pd
import numpy as np

```
```

Summary statistics

```                    ```
titanic_df.describe(include='all') # To also include categorical summary
titanic_df.describe()
```
```

Show Min Value

```                    ```
titanic_df['Age'].min() # min of age column only
titanic_df.min() # min of every column
```
```

Show max value

```                    ```
titanic_df['Age'].max() # max of age column only
titanic_df.max() # max of every column
```
```

Show mode of values

```                    ```
titanic_df['Age'].mode() # max of Age
```
```

Show Median Value

```                    ```
titanic_df['Fare'].median() # Median of Fare
```
```

Show sum of values

```                    ```
titanic_df['Fare'].sum() # max of Fare
```
```

Show frequency of each category

```                    ```
titanic_df['Pclass'].value_counts() # freequency of passengers in each class
```
```

Calculate mean

```                    ```
titanic_df['Age'].mean()
```
```

Calculate standard deviation

```                    ```
titanic_df['Age'].std() # std for age only
titanic_df.std() # std for all columns
```
```

Show Variance

```                    ```
titanic_df['Age'].var() # variance for age column only
titanic_df.var() # variance for all numeric columns
```
```

Show Covariance

```                    ```
titanic_df[['Age','Fare']].cov() # Covariance of Age and Fare
titanic_df.cov() # Covariance for entire dataframe
```
```

Correlation

Correlation Measures the relationship between two variables

1. Pearson Correlation

Measures the linear relationship between two variables. Pearson correlation coefficient is the default correlation method in Pandas Data Frame.

NOTE: Pearson Correlation assumes that the data is normally distributed. It’s sensitive to outliers

```                    ```
titanic_df.corr(method='pearson')
titanic_df.corr() # Or don't specifiy since it's the default
```
```

2. Spearman Rank Correlation

Measures the monotonic relationship between two variables. Does not assume normal distribution of the dataset. Has a growth rate of O(nlogn)

```                    ```
titanic_df.corr(method='spearman')
```
```

3. Kendall Rank Correlation

It measures the monotonic relationship between two variables. It does not assume normal distribution of the data. It has a growth rate of O(n^2) hence tends to be a bit slower on large dataset.

```                    ```
titanic_df.corr(method='kendall')
```
```

Calculate Kurtosis

```                    ```
titanic_df.kurtosis()
```
```

Calculate Skew

```                    ```
titanic_df.skew()
```
```

Compute Percent change

Calculates the percent change over a given number of periods. Handle missing values (Nulls) before computing the percent change).

```                    ```
titanic_df['Fare'].pct_change(periods=3)
```
```

Rank

ranks the data and shows the ties in data values

```                    ```
```
```

For complete code check the jupyter notebook here.

## Conclusion

In this post we have looked at various commonly used Pandas Statistical Functions. Statistics is important in analysing and interpreting analytical results and insights. Pandas provides us with tons of statistical functions to work with data. In the next post we will look at Window Functions in Pandas and how to apply them. To learn about Pandas Mathematical Functions check our previous post here.

Pandas Statistical Functions

Article Rating
Subscribe
Notify of
Inline Feedbacks
0