Hi Michael,

Thanks for your comment. I’ve been really busy, so I haven’t rerun the experiment with billion of rows.

The main point of this article is to introduce Vaex and not an end-to-end test as only a few people would read it. I give pointers to documentation for those who would like to dive deeper.

I give special emphasis to data load as some datasets are so big that you cannot even load it with pandas, but Vaex or Dask should be able to process them on your laptop. Which means no need for a Hadoop or a Spark cluster.

Written by

Senior Data Scientist, tweeting twitter.com/romanorac.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store