Most property insurers today still rely on a guy with a ladder and a camera on a stick to perform physical inspections

Image for post
Image for post
Photo by h heyerlein on Unsplash

Most property insurers today still rely on a guy with a ladder and a camera on a stick to perform physical inspections and assess risk. But smart insurers are enlisting the help of AI researchers who have developed platforms that can evaluate thousands of publicly available images and other data points on the web to deliver a risk assessment within seconds.

“We make sure that the insurer can access that data very, very quickly, especially if it’s being used in a quote engine,” said said Ryan Kottenstette, CEO at Cape Analytics, a deep learning company that provides predictive risk analysis…


Pandas doesn’t handle well Big Data. These two libraries do! Which one is better? Faster?

Image for post
Image for post
Photo by NASA on Unsplash

I recently wrote two introductory articles about processing Big Data with Dask and Vaex — libraries for processing bigger than memory datasets. While writing, a question popped up in my mind:

Can these libraries really process bigger than memory datasets or is it all just a sales slogan?

This intrigued me to make a practical experiment with Dask and Vaex and try to process a bigger than memory dataset. The dataset was so big that you cannot even open it with pandas.

What do I mean by Big Data?


I was recently contacted by a recruiter from a Big Tech company. Why now and never before? Few tips on how you can increase your chances.

Image for post
Image for post
Photo by Mitchell Luo on Unsplash

I was recently contacted by a recruiter from a Big Tech company. Why now and never before?

In this article, I present my theory of why a recruiter contacted me for a Senior Data Science position. You can use my theory (and develop it further) to increase your chances of getting contacted by Big Tech company.

Many Software Developers dream about working for a Big Tech company. How do I know? I was one of them.


Start the New Year with one of the best New Year’s resolutions: Learn more pandas.

Image for post
Image for post
Photo by Michael Payne on Unsplash

Pandas needs no introduction as it became the de facto tool for Data Analysis in Python. As a Data Scientist, I use pandas daily and it never ceases to amaze me with better ways of achieving my goals.

For pandas newbies — Pandas provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

The name pandas is derived from the term “panel data”, an econometrics term for datasets that include observations over multiple time periods for the same individuals.

In this article, I’m going to show you 5 pandas tricks that will make you more productive…


Python is evolving. Don’t get left behind!

Image for post
Image for post
Photo by Michael Dziedzic on Unsplash

Start the New Year with one of the best New Year’s resolutions: Learn more Python.

You can start with this article in which I present 5 Python tricks that will make your life easier.

You’ll learn:

  • How to format big integers more clearly
  • What are Magic commands in IPython
  • A simple way to debug code
  • A better way to work with file paths
  • The proper way of string formating

1. Underscores in Numeric Literals


Only a fool learns from his own mistakes. The wise man learns from the mistakes of others.

Image for post
Image for post
Photo by shiyang xu on Unsplash

Have you ever planned you’d need an hour to finish a short task, but then you spend a whole day working on it? If yes, welcome to my world!

In this article, I present 3 pandas mistakes that took me much longer to solve than they should. I also share the link to the Notebook with examples at the end of this article.

Only a fool learns from his own mistakes. The wise man learns from the mistakes of others.

See my pandas articles to learn more about Data Analysis with pandas:

1. How NOT to visualize a Weighted Average


These 8 tips will help you to spot bugs before training a Machine Learning model.

Image for post
Image for post
Photo by Michael Dziedzic on Unsplash

One of the most common misconceptions in Machine Learning is that ML Engineers get a CSV dataset and they spend the majority of the time optimizing the hyperparameters of a model.

If you work in the industry, you know that’s far from the truth. ML Engineers spend most of the time planning how to construct the training set that resembles real-world data distribution for a certain problem.

When you’ve managed to construct such training set, just add a few well-crafted features and the Machine Learning model won’t have a hard time finding the decision boundary.

In this article, we’re going…


PyTorch and TensorFlow aren’t the only Deep Learning frameworks in Python. There’s another library similar to scikit-learn.

Image for post
Image for post
Gif from giphy

scikit-learn is my first choice when it comes to classic Machine Learning algorithms in Python. It has many algorithms, supports sparse datasets, is fast and has many utility functions, like cross-validation, grid search, etc.

When it comes to advanced modeling, scikit-learn many times falls shorts. If you need Boosting, Neural Networks or t-SNE, it’s better to avoid scikit-learn.

scikit-learn has two basic implementations for Neural Nets. There’s MLPClassifier for classification and MLPRegressor for regression.

While MLPClassifier and MLPRegressor have a rich set of arguments, there’s no option to customize layers of a Neural Network (beyond setting the number of hidden…


A few tips on how NOT to start with Machine Learning. I also present a better way on how to start learning it.

Image for post
Image for post
Photo by Justin Luebke on Unsplash

Every day there’s more and more educational content about Machine Learning. With such a high volume of new content, it’s easy to get confused. Many aspiring Data Scientists don’t know where or how to start learning.

These three questions pop up regularly in my inbox:

  • Should I start learning ML bottom-up by building strong foundations with Math and Statistics?
  • Or top-down by doing practical exercises, like participating in Kaggle challenges?
  • Should I pay for a course from an influencer that I follow?

In this article, I give answers to the questions above and I also present a better way on…


This is my advice to all aspiring Data Scientists

Image for post
Image for post
Photo by LinkedIn Sales Navigator on Unsplash

I get many messages asking for advice from aspiring Data Scientists. I am no expert in career advising so take everything that I write with a grain of salt.

I give advice based on my observations of the field and the experience that I’ve developed over the years. This is me, advising younger me as I had similar questions at the start of my career.

What is the best way to learn and practice Data Science?

My advice would be to start with practical projects and then slowly progress with theory. Kaggle notebooks are a great way to learn the practical part.

Ask questions in Reddit communities or in Cross Validated…

Roman Orac

Senior Data Scientist, tweeting twitter.com/romanorac.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store