PyTorch and TensorFlow aren’t the only Deep Learning frameworks in Python. There’s another library similar to scikit-learn.

Photo by Uriel SC on Unsplash

scikit-learn is my first choice when it comes to classic Machine Learning algorithms in Python. It has many algorithms, supports sparse datasets, is fast and has many utility functions, like cross-validation, grid search, etc.

When it comes to advanced modeling, scikit-learn many times falls shorts. If you need Boosting, Neural Networks or t-SNE, it’s better to avoid scikit-learn.

scikit-learn has two basic implementations for Neural Nets. There’s MLPClassifier for classification and MLPRegressor for regression.

While MLPClassifier and MLPRegressor have a rich set of arguments, there’s no option to customize layers of a Neural Network (beyond setting the number of hidden…


A 10 step tutorial on how to start and configure a free server anywhere in the world

Photo by Paul Hanaoka on Unsplash

Having an always-on server is a great way to show your references to your future employers or to test your Machine Learning model in the real world.

Before we start I would like to disclose that I’m NOT affiliated with Amazon in any way. The approach you’ll learn in this article should be also applicable to other cloud providers (eg. Microsoft Azure, Google Cloud Platform).

I wrote this article because I feel it is important that you have this knowledge. I wish someone would teach me this in my college days when I had too much time and no money.

By reading this article you’ll learn:


Yet another Python library for Data Analysis that You Should Know About — and no, I am not talking about Spark or Dask

Photo by Christian Englmeier on Unsplash

Big Data Analysis in Python is having its renaissance. It all started with NumPy, which is also one of the building blocks behind the tool I am presenting in this article.

In 2006, Big Data was a topic that was slowly gaining traction, especially with the release of Hadoop. Pandas followed soon after with its DataFrames. 2014 was the year when Big Data became mainstream, also Apache Spark was released that year. In 2018 came Dask and other libraries for data analytics in Python.

Each month I find a new Data Analytics tool, which I am eager to learn. It…


The field offers competitive salaries, it’s challenging and fun. The best thing about it is that you can learn it for FREE.

Photo by Roméo A. on Unsplash

There is an increasing interest in the field of Machine Learning. This doesn’t surprise me with all the hype around Deep Learning and Neural Networks. Besides that, the field offers competitive salaries, it’s challenging and fun.

I’m assuming you’re new to the field and don’t know where to start.

The question I get asked the most often is: How to start with Machine Learning? My answer is always the same, hence I wrote this article to guide you in your journey.

My suggestion is: start with free content as there are many great resources available online. …


No need to write additional code — Just use them

Photo by Richard Clark on Unsplash

I’ve been using pandas for a few years and each time I feel I am typing too much, I google the operation and I usually find a shorter way of doing it — a new pandas trick!

I learned about these functions recently and I deem them essential because of ease of use.

By reading this article, you’ll learn:

  1. How to retrieve a column value from a Dataframe
  2. How to change a column value in a Dataframe
  3. The proper way of adding a new column to a DataFrame
  4. How to retrieve a Series or a DataFrame
  5. How to create a…


Data for Change

Let’s learn by example and train a Neural Network with PyTorch that is able to recognize toxicity in online conversation

Photo by Austin Pacheco on Unsplash

Cyberharassment is a form of bullying using electronic means. It has become increasingly common, especially among teenagers, as the digital sphere has expanded and technology has advanced.

Three years ago, Toxic Comment Classification Challenge was published on Kaggle. The main aim of the competition was to develop tools that would help to improve online conversation:

Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. …


What career advice would I give to myself If I could go back in time? What did I do right and where did I go wrong?

Photo by Matt Duncan on Unsplash

I’ve been actively writing on Medium for almost two years and some great unexpected things happened along the way.

From time to time, students ask me for advice. While responding, I always do a bit of retrospective on my career in Data Science.

What did I learn? Would I choose a different path?

This article is a write-up of lessons learned through my professional career.

Eight years ago…


Most property insurers today still rely on a guy with a ladder and a camera on a stick to perform physical inspections

Photo by h heyerlein on Unsplash

Most property insurers today still rely on a guy with a ladder and a camera on a stick to perform physical inspections and assess risk. But smart insurers are enlisting the help of AI researchers who have developed platforms that can evaluate thousands of publicly available images and other data points on the web to deliver a risk assessment within seconds.

“We make sure that the insurer can access that data very, very quickly, especially if it’s being used in a quote engine,” said said Ryan Kottenstette, CEO at Cape Analytics, a deep learning company that provides predictive risk analysis…


Pandas doesn’t handle well Big Data. These two libraries do! Which one is better? Faster?

Photo by NASA on Unsplash

I recently wrote two introductory articles about processing Big Data with Dask and Vaex — libraries for processing bigger than memory datasets. While writing, a question popped up in my mind:

Can these libraries really process bigger than memory datasets or is it all just a sales slogan?

This intrigued me to make a practical experiment with Dask and Vaex and try to process a bigger than memory dataset. The dataset was so big that you cannot even open it with pandas.

What do I mean by Big Data?


I was recently contacted by a recruiter from a Big Tech company. Why now and never before? Few tips on how you can increase your chances.

Photo by Mitchell Luo on Unsplash

I was recently contacted by a recruiter from a Big Tech company. Why now and never before?

In this article, I present my theory of why a recruiter contacted me for a Senior Data Science position. You can use my theory (and develop it further) to increase your chances of getting contacted by Big Tech company.

Many Software Developers dream about working for a Big Tech company. How do I know? I was one of them.

Roman Orac

Senior Data Scientist, tweeting twitter.com/romanorac.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store