A practical guide to interactive Exploratory Data Analysis on the Avocado dataset

Exploratory Data Analysis (EDA) is one of the first steps in the Data Science process — usually, it follows the data extraction. EDA helps us to get familiar with the data before we proceed with modeling or decide to repeat the extraction step.

EDA helps Data Scientists to:

  • get familiar…

A JupyterLab extension generates code on the fly while you work on your analysis.

As the need for people to be data literate grows, Python’s popularity is growing with it. One of the annoyances of Python, regardless of if you are a Ph.D. Data Scientist or just starting to learn, is that the syntax can take a long time to get right. …

You can call Mito into your Jupyter Environment and each edit you make will generate the equivalent Python in the code cell below.

Mito is a spreadsheet interface for Python

Mito allows you to pass your dataframes or CSV files into a spreadsheet interface. It has the feel of Excel, but each edit generates the equivalent Python in the code cell below. At its best, this can be a really fast way to get your data analysis done.

Mito is a JupyterLab extension that enables exploring and transforming datasets with the ease of Excel… and it’s FREE.

It’s really an exciting time to be a part of the Data Science community with all the new JupyterLab extensions that are coming out. They make Data Science much more enjoyable by minimizing the tedious work.

I remember the old days where we had to rely on numpy and matplotlib…

Getting Started

A short tutorial on how to visualize correlation with pandas without third-party plotting packages.

As a Data Scientist, I use correlation frequently to calculate and visualize relationships between features.

I used to start by importing matplotlib and seaborn packages, which render a good-looking plot. But it’s cumbersome to import both packages just to visualize the correlation when starting with an empty Jupyter Notebook.

Is there a better way?



Where do wage disparities occur (and what to do about it)

Software Engineering salaries differ substantially from country to county. While Software Engineers in Silicon Valley report higher wages, they usually talk about gross income (income before taxation). Living expenses in California are also substantially higher than in most European countries.

I don’t feel I’m qualified to do the objective comparison…

A 3-step tutorial on how to migrate your ML model to Java, Go, C++ or any other language. It’s easier than you may think.

I recently worked on a project, where I needed to train a Machine Learning model that would run on the Edge — meaning, the processing and prediction occur on the device that collects the data.

As usual, I did my Machine Learning part in Python and I haven’t thought much…

Python is evolving — don’t get complacent

I’ve been coding in Python for more than 10 years. There was a time when I thought I knew it all, which was a clear sign I was getting complacent.

Then I decided to do a bit of research about Python improvements. Those 3.6, 3.7, 3.8 …

The missing guide for stream processing with Apache Flink.

I recently started learning stream processing with Apache Flink. I already worked with Apache Spark and Hadoop (tools for big data processing) so I expected I’ll quickly pick up with Flink.

To my surprise, Flink’s Getting Started Guide is really poorly written. IMO the most effective way of learning a…

Working from home has its challenges. I use these 7 tips to overcome them.

I’m a full-time Data Scientist. I work from home for the last year and a half. I was extra productive at first (finished with my work earlier) and had more time because I wasn’t commuting. But after some time my productivity started to decline.

Working from home (WFH) is not…

Roman Orac

Senior Data Scientist. Get How NOT to write pandas code course: https://romanorac.gumroad.com/l/vxxiV

