Thanks for your comment. I’ve been really busy, so I haven’t rerun the experiment with billion of rows.
The main point of this article is to introduce Vaex and not an end-to-end test as only a few people would read it. I give pointers to documentation for those who would like to dive deeper.
I give special emphasis to data load as some datasets are so big that you cannot even load it with pandas, but Vaex or Dask should be able to process them on your laptop. Which means no need for a Hadoop or a Spark cluster.