“Math is the language of the universe” so they say. When it comes to data the only language that data can be able to speak to and communicate facts and truth is Maths. Mathematics is at the heart of data science, analytics and Machine Learning. Pandas provides us with powerful mathematical tools and techniques to manipulate and analyse data. In this post we will look at Pandas Mathematical Functions and how to use them. In the next post we will dive into pandas statistical functions and understand how to use them to interpret our data.
Pandas Mathematical Functions
Create the DataFrame
import pandas as pd
import numpy as np
students_score_df = pd.DataFrame(
{
"Students": ["Tom", "Peter", "Mary", "Smith"],
"Reg_No": [1790, 1731, 1780, 1755],
"Reg_Date": ["15/01/2021", "16/01/2021", "19/01/2021", "27/01/2021"],
"Math": ["79.00", "67.00", "84.00", "70.00"],
"Physics": ["60", "70", "50", "90"],
"Computer": ["65.80", "80", "70", "75"],
}
)
students_score_df
Check Data Types
# Check if data types have proper data structure representation if not convert them to proper data types
students_score_df.dtypes
Convert Data Types to Integer/Float
# Convert data types to proper representation
students_score_df[['Math','Physics','Computer']]=students_score_df[['Math','Physics','Computer']].astype(np.float) # Math, Physics and Computer need to be float
students_score_df['Reg_No']=students_score_df['Reg_No'].astype(str) # Reg_No need to object
students_score_df['Reg_Date']=pd.to_datetime(students_score_df['Reg_Date']) # Reg_Date need to be a valid pandas date
students_score_df.dtypes
Scalar Addition
Add a scalar value to every numeric element in the dataframe
students_score_df[['Math','Physics','Computer']]=students_score_df[['Math','Physics','Computer']].add(5)
students_score_df.head()
Element-wise addition
Add two dataframes element-wise
score_1_df=students_score_df[['Math','Physics','Computer']]-90
score_2_df=students_score_df[['Math','Physics','Computer']]-85
score_1_df
score_2_df
Add two dataframes element-wise
# add the two dataframes
score_3_df=score_1_df.add(score_2_df)
score_3_df
Subtraction with Scalar value
students_score_df
students_score_df[['Math','Physics','Computer']]=students_score_df[['Math','Physics','Computer']]-50
students_score_df
Element-wise Subtraction
Subtract two dataframes element-wise
# score_2_df-score_1_df # Option 1
score_4_df=score_2_df.sub(score_1_df) # Option 2
score_4_df
Multiplication with Scalar value
students_score_df[['Math','Physics','Computer']]=students_score_df[['Math','Physics','Computer']]*-3
students_score_df
Element-wise Multiplication
Multiply two dataframes element-wise
Division with Scalar value
students_score_df[['Math','Physics','Computer']]=students_score_df[['Math','Physics','Computer']]/3
students_score_df
Element-wise Division
Divide two dataframes element-wise
# score_1_df / score_2_df # Option 1
score_1_df.div(score_2_df) # Option 2
Pandas power function
Using ** to raise element to specified power
score_1_df**2 # Raise each element to power 2
Using power pow() function
The pow() function calculates the exponential power of dataframe and other, element-wise (binary operator pow). It resembles the ** operator but allows handling of missing values.
score_1_df=score_1_df.pow(2)
score_1_df
Element-wise power along specified axis
We can specify axis when performing exponential power of two dataframes or a dataframe and a series
We can use NumPy log function to perform logarithmic operation on dataframe.
score_1_df['Log2_Computer']=np.log2(score_1_df['Computer'])
score_1_df
Logarithm on base 10
score_1_df['Log10_Computer']=np.log10(score_1_df['Computer'])
score_1_df
Natural logarithmic
score_1_df['Natural_Log_Computer']=np.log(score_1_df['Computer'])
score_1_df
Pandas aggregate agg function
score_1_df
Aggregate by mean function
score_1_df.agg('mean')
Aggregate values a long a specified axis
score_1_df.agg('mean',axis=1) # axis=1 implies columnwise
Nesting multiple aggregations
score_1_df.agg(['min','max','sum','mean','std','var'])
Using different agg() functions on each column
score_1_df.agg({'Math':['sum','min','max'],'Log2_Computer':['mean'],
'Log10_Computer':['std'],'Natural_Log_Computer':['var']})
Subtract one year from the date
students_score_df['Reg_Date_Less_1_Yr']=students_score_df['Reg_Date']-pd.DateOffset(years=1)
students_score_df
Subtract one month from the date
students_score_df['Reg_Date_Less_1_Mn']=students_score_df['Reg_Date']-pd.DateOffset(months=1)
students_score_df
Subtract one day from the date
students_score_df['Reg_Date_Less_1_Day']=students_score_df['Reg_Date']-pd.DateOffset(days=1)
students_score_df
For complete code check the jupyter notebook here.
Conclusion
In this post we have looked at various common Mathematical functions and operations in pandas. Every analysis and insight has a mathematical underpinning hence having the maths skill to manipulate and analyse the data is key. In the next post we will learn about Pandas Statistical Functions and how they are used in data. To learn about how to work with dates and time in Pandas, check our previous post here.