It’s with no doubt that SQL and Python are the powerful and mostly used languages for working with data. These tools are nowadays used in highbred fashion. Window functions are data manipulation constructs that enables us to operate on a set of rows and return a single aggregated value for each row. Window functions are mostly used for advanced analytical use-cases in SQL. We have different types of Window functions in SQL context such as aggregate, rank and value based window functions. Pandas library has come a long way and is mature enough to offer window functions similar to what we have in SQL. This allows us to perform complex analytical tasks in Pandas with ease. In this post we will look at Window Functions in Pandas and how to use them.

pandas-logo

Window Functions in Pandas

Create DataFrame

                    

windows_data_df = pd.DataFrame(
    {
        "Students": ["Tom", "Peter", "Mary", "Smith"],
        "Reg_No": ["1790","1731","1780", "1755"],
        "Reg_Date": ["15/01/2021", "13/01/2021", "14/01/2021", "27/01/2021"],
        "Math": [79.00, 67.00, 84.00, 70.00],
        "Physics": [60, 70, 50, 90],
        "Computer": [65.80, 80, 70, 75],
    }
)

windows_data_df

pandas-window-df

Lag in Pandas

The Lag Window function gets the previous value based on a defined key such as timestamp. Note: It’s best practice to sort values first based on a key column such as date.

                    

windows_data_df=windows_data_df.sort_values(by='Reg_Date', ascending=False)
windows_data_df['Previous_Computer'] = windows_data_df['Computer'].shift(1)
windows_data_df

pandas-window-lag

Lead in Pandas

The Lead Window function gets the next value based on a defined key such as timestamp. Note: It’s best practice to sort values first based on a key column such as date.

                    

windows_data_df=windows_data_df.sort_values(by='Reg_Date', ascending=False)
windows_data_df['Next_Computer'] = windows_data_df['Computer'].shift(-1)
windows_data_df

pandas-window-lead

Rolling window

Generic fixed or variable sliding window over the values.

                    

'''Look at how we are calculating the Rolling_Window_Physics column. We're going back and adding previous
    number to the current. In this case our window=2'''
windows_data_df['Rolling_Window_Physics']=windows_data_df['Physics'].rolling(window=2).sum()
windows_data_df

pandas-window-rolling-window

Specifying Weights in the Windows

Passing win_type to .rolling generates a generic rolling window computation, that is weighted according the win_type. Refer to documentation for more details.

                    

windows_data_df['Rolling_Window_Physics']=windows_data_df['Physics'].rolling(window=2, win_type='triang').sum()
windows_data_df

pandas-window-specifying-weight-window

Expanding window

Accumulating window over the values. An expanding window yields the value of an aggregation statistic with all the data available up to that point in time.

                    

'''Expanding window with min_periods=1 with sum function resembes cumulative sum. 
    Trying using other aggregation functions such as mean etc.'''
windows_data_df['Expanding_Window_Physics']=windows_data_df['Physics'].expanding(min_periods=1).mean()
windows_data_df

pandas-window-expanding-window

Exponentially Weighted window

Exponentially Weighted window: Accumulating and exponentially weighted window over the values. An exponentially weighted window is similar to an expanding window but with each prior point being exponentially weighted down relative to the current point. Available EW functions are mean(), var(), std(), corr(), cov()

Let’s first get the data ready

                    

# Let's load our sales data
clv_df=pd.read_csv('clv.csv')

clv_df.head()

pandas-window-ewm-df

Let’s created a sales column by multiplying Quantity by UnitPrice

                    

clv_df['sales']=clv_df['Quantity']*clv_df['UnitPrice']
# Let's order our data in ascending order based on InvoiceDate
clv_df=clv_df.sort_values(by='InvoiceDate',ascending=True)
clv_df.head()

pandas-window-ewm-df-with-sales-column-1

exponentially weighted moving average

Exponentially weighted moving average gives more weight to recent observations, which makes it powerful to capture recent trends more quickly.

                    

### Exponential Moving Average on sales
clv_df['5_day_Sales_EWM'] = clv_df['sales'].ewm(span=5).mean()
clv_df.head(10)

pandas-window-ewm-average

For complete code check the jupyter notebook here.

Conclusion

Mastering SQL and Python as a Data practitioner is a blessing as you can solve almost all the data problems with these two languages. Window functions are powerful techniques for slicing data to present complex analytical insights in simple ways. Pandas offers SQL-lite capabilities of Window functions at our disposal for data manipulation and analysis. In this post we have looked at various common yet powerful Window functions. In the next post we will switch to Data Visualization and its significance in data analytics. To learn about Pandas Statistical Functions check our previous post here.

Window Functions in Pandas

Post navigation


0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x