Ready for Functional Pandas? – Exploring Python the great and powerful (part 1)
Introduction
Recently, I had many chances to work with Python projects, which used to be few and far between in Flinters Vietnam. One similarity of the Python projects I have involved is that, all of them feature data transformation in some parts, and all of them use Pandas’ dataframe to accomplish that. Today I will have a small discussion about our Pandas dataframes usages, and also suggest a few points to have our transformation code look more functional.
Pandas dataframe
Inarguably, Pandas is among the most used libraries in Python. There are also numerous online tutorials for beginners to quickly get started with this powerful library, making it a natural choice when it comes to data transformation.
That said, a lengthy list of bracket indexer, sometimes chained like this:
df[df["Column 1"] > 0 & df["Col 4"].notna() & df["Col 6"].isna()]["Column 2"]
may easily become overwhelming. Therefore, it is necessary at times to use a more functional API for Pandas transformations. Luckily, Pandas has us covered with the likes of df.assign and df.pipe. There are a number of other APIs, however, these will be discussed in a later post. Let’s find out what we can do with these two Pandas APIs!
df.assign
Refer to the official documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.assign.html
For instance, if we start with a pseudo-transformation like the following:
df["avg_speed"] = df["distance"] / df["time"]
df["avg_speed_mph"] = df["distance"] / df["time"] / 1.609
df["predicted_speed_next_hour"] = df["avg_speed"] * 1.3
We can use df.assign to assign multiple columns within the same expression:
df = df.assign(
avg_speed=df["distance"] / df["time"],
avg_speed_mph=df["distance"] / df["time"] / 1.609,
predicted_speed_next_hour=df["avg_speed"] * 1.3,
)
df.pipe
Refer to the official documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pipe.html
This method is great for cases when you need to apply a chainable function (returning another DataFrame) on a dataframe. Instead of writing:
df = transform(df)
df = do_another_transform(df)
It is possible to write:
df = df.pipe(transform).pipe(do_another_transform)
This feature comes in handy in cases when you need to make transformation steps clear.
Conclusion
In this blog, I have shown two lesser used methods from Pandas library. Hopefully this has helped you in your Python data projects!