It’s with no doubt that SQL and Python are the powerful and mostly used languages for working with data. These tools are nowadays used in highbred fashion. Window functions are data manipulation constructs that enables us to operate on a set of rows and return a single aggregated value for each row. Window functions are mostly used for advanced analytical use-cases in SQL. We have different types of Window functions in SQL context such as aggregate, rank and value based window functions. Pandas library has come a long way and is mature enough to offer window functions similar to what we have in SQL. This allows us to perform complex analytical tasks in Pandas with ease. In this post we will look at Window Functions in Pandas and how to use them.

## Window Functions in Pandas

Create DataFrame

```                    ```
windows_data_df = pd.DataFrame(
{
"Students": ["Tom", "Peter", "Mary", "Smith"],
"Reg_No": ["1790","1731","1780", "1755"],
"Reg_Date": ["15/01/2021", "13/01/2021", "14/01/2021", "27/01/2021"],
"Math": [79.00, 67.00, 84.00, 70.00],
"Physics": [60, 70, 50, 90],
"Computer": [65.80, 80, 70, 75],
}
)

windows_data_df
```
```

Lag in Pandas

The Lag Window function gets the previous value based on a defined key such as timestamp. Note: It’s best practice to sort values first based on a key column such as date.

```                    ```
windows_data_df=windows_data_df.sort_values(by='Reg_Date', ascending=False)
windows_data_df['Previous_Computer'] = windows_data_df['Computer'].shift(1)
windows_data_df
```
```

The Lead Window function gets the next value based on a defined key such as timestamp. Note: It’s best practice to sort values first based on a key column such as date.

```                    ```
windows_data_df=windows_data_df.sort_values(by='Reg_Date', ascending=False)
windows_data_df['Next_Computer'] = windows_data_df['Computer'].shift(-1)
windows_data_df
```
```

Rolling window

Generic fixed or variable sliding window over the values.

```                    ```
'''Look at how we are calculating the Rolling_Window_Physics column. We're going back and adding previous
number to the current. In this case our window=2'''
windows_data_df['Rolling_Window_Physics']=windows_data_df['Physics'].rolling(window=2).sum()
windows_data_df
```
```

Specifying Weights in the Windows

Passing win_type to .rolling generates a generic rolling window computation, that is weighted according the win_type. Refer to documentation for more details.

```                    ```
windows_data_df['Rolling_Window_Physics']=windows_data_df['Physics'].rolling(window=2, win_type='triang').sum()
windows_data_df
```
```

Expanding window

Accumulating window over the values. An expanding window yields the value of an aggregation statistic with all the data available up to that point in time.

```                    ```
'''Expanding window with min_periods=1 with sum function resembes cumulative sum.
Trying using other aggregation functions such as mean etc.'''
windows_data_df['Expanding_Window_Physics']=windows_data_df['Physics'].expanding(min_periods=1).mean()
windows_data_df
```
```

Exponentially Weighted window

Exponentially Weighted window: Accumulating and exponentially weighted window over the values. An exponentially weighted window is similar to an expanding window but with each prior point being exponentially weighted down relative to the current point. Available EW functions are mean(), var(), std(), corr(), cov()

Let’s first get the data ready

```                    ```
# Let's load our sales data

```
```

Let’s created a sales column by multiplying Quantity by UnitPrice

```                    ```
clv_df['sales']=clv_df['Quantity']*clv_df['UnitPrice']
# Let's order our data in ascending order based on InvoiceDate
clv_df=clv_df.sort_values(by='InvoiceDate',ascending=True)
```
```

exponentially weighted moving average

Exponentially weighted moving average gives more weight to recent observations, which makes it powerful to capture recent trends more quickly.

```                    ```
### Exponential Moving Average on sales
clv_df['5_day_Sales_EWM'] = clv_df['sales'].ewm(span=5).mean()
```
```

For complete code check the jupyter notebook here.

## Conclusion

Mastering SQL and Python as a Data practitioner is a blessing as you can solve almost all the data problems with these two languages. Window functions are powerful techniques for slicing data to present complex analytical insights in simple ways. Pandas offers SQL-lite capabilities of Window functions at our disposal for data manipulation and analysis. In this post we have looked at various common yet powerful Window functions. In the next post we will switch to Data Visualization and its significance in data analytics. To learn about Pandas Statistical Functions check our previous post here.

Window Functions in Pandas