Ashraf Miah
Dec 23, 2022

--

What kind of a comparison or benchmark doesn't state what hardware was used? You state `pandas` single core usage is it's limiting factor but most of the issues you've encounter are memory based.

You don't make clear where `pandas` was being run. Locally on a 8, 16, GB laptop or a remote server? Or a free environment like Google Colab?

The comparison isn't useful for practical purposes for two reasons:

1) What was the loading time to get the data into pandas Vs gigasheet?

2) Gigasheet may have run on much more capable hardware Vs local laptop -, then wouldn't you expect this result?

--

--

Ashraf Miah

CTO, Data Scientist & Chartered Engineer (MEng CEng EUR ING MRAeS) with over 20 years experience in the Aerospace, Rail & Energy Industry.