One of the most powerful capability of pandas is to slice based on different criteria. Pandas provides us with different approaches to slice data in both series and dataframe data structures to return the required subset. These approaches include selection by label, position or both. After selecting specific subset of rows and columns we can then sort our data based a specific column/s. This gives us the power to arrange the data in the form we require. Another powerful functionality in pandas is the one to filter data based on a defined criteria. Pandas provides us with SQL-Like functions to filter our data for analytical purpose. In this post we will learn how to use select, sort and filter data in Pandas. You can download the dataset used in this post here. For complete cone check my jupyter notebook here.

pandas-logo

Selecting Data in Pandas

In Pandas we can select data to return only a subset of specific rows and columns to work with. We can select data by label or position. Below are different approaches to select data from pandas  dataframe.

Select specific columns and all records

                    

# Select one column only
titanic_df['Survived'] 
# Use . syntax if the column name don't have spaces
titanic_df.Survived
# Select more than one column
titanic_df[['Survived','Pclass','Age']]

Select data by position

                    

titanic_df.iloc[:10] # select first 10 records
titanic_df.iloc[-10:] # select last 10 records
titanic_df.iloc[5:8] # select records from 5 to 8 records. Including 5 excluding 8
titanic_df.iloc[0:-10] # select all data excluding last 10 records
titanic_df.iloc[0:,0:2] # select first two columns with all records
titanic_df.iloc[0:,3:5] # select all rows and only columns from 3 to 5.
titanic_df.iloc[:5,0:-1] # select first 5 records and all columns excluding last 1 column

Sorting Data in Pandas

Pandas gives us flexibility to order our data either ascending, descending or a combination of both in one or more columns. Below are different ways we can sort our data in pandas.

Sort data ascending

                    

titanic_df.sort_values(by='Age').head()

Sort descending

                    

titanic_df.sort_values(by='Age', ascending=False).head()

Sort data by two columns

                    

titanic_df.sort_values(by=['Age','Fare'], ascending=[True,False]).head() # Sort age by ascending and then Fare by descending

Filtering Data in Pandas

The filtering capabilities enables us to apply logical conditions to slice the data in different ways. Pandas provides SQL-Like functions and mathematical operators to filter our data.

Filter data where age > 50

                    

titanic_df[titanic_df['Age']>50].head()

Filter data where age is between 30 and 40

                    

titanic_df[(titanic_df['Age']>30) & (titanic_df['Age']<40)].head()

Using IN function

                    

titanic_df[titanic_df['Sex'].isin(['male'])].head()

Using NOT IN

                    

titanic_df[~titanic_df['Sex'].isin(['male'])].head()

Conclusion

Slicing and dicing data is a core skill of a data practitioner. Luckily Pandas provides us with all the powerful tools to perform these task. We can select a small subset of data of our interest by applying slicing, sorting and filtering conditions to the dataframe. In the next post we will learn how to reshape data with techniques and approaches such as pivot, pivot_table, cross-tabulation, stacking, unstacking and melt. To learn more about pandas data types and there conversions check our previous post here.

Pandas Select Sort and Filter Data

Post navigation


0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x