While Python has been the most popular programming language since 2019, data scientists often critique its slow speed and limited capabilities in handling big data scenarios. In this workshop series, we'll tackle how to enhance Python's performance in data science by diving deep into its workings and leveraging technologies to transform Python into an effective tool for high-performance big data analytics.
In the second 2-hour session, our focus will be on techniques for loading and processing extremely large datasets in Python on a single machine, comparing dataframe implementations from Pandas, Modin, Pandarallel, Dask, and Vaex, among others. No specific prerequisites are required to join the lecture, but familiarity with Python's numpy and Pandas packages will aid in fully grasping the discussed content.
Instructor: Qiyang Hu