TLDR; process it with a new Python Data Processing Engine in the Cloud.

Data Science is having its renaissance moment. It's hard to keep track of all new Data Science tools that have the potential to change the way Data Science gets done.

I learned about this new Data Processing Engine only recently in a conversation with a colleague, also a Data Scientist…

A few checks to make before training a Machine Learning model on data that could be random.

Time series forecasting is a subfield of Data Science, which deals with forecasting the spread of COVID, forecasting the prices of stocks, forecasting the daily consumption of electricity… we could go on and on.

Data Science is a wide field of study and there aren’t many “Jack of all trades”

A short tutorial on how to write Excel-like formulas with the Mito package in Python

Many Python users are transitioning from spreadsheets because of a Python package that allows users to use Excel-like syntax. It is a spreadsheet environment for JupyterLab to help you with your Python analysis.

Meet Mito — a Python package that initializes an interactive spreadsheet into your JupyterLab Environment. For each…

A practical guide to interactive Exploratory Data Analysis on the Avocado dataset

Exploratory Data Analysis (EDA) is one of the first steps in the Data Science process — usually, it follows the data extraction. EDA helps us to get familiar with the data before we proceed with modeling or decide to repeat the extraction step.

EDA helps Data Scientists to:

  • get familiar…

You can call Mito into your Jupyter Environment and each edit you make will generate the equivalent Python in the code cell below.

Mito is a spreadsheet interface for Python

Mito allows you to pass your dataframes or CSV files into a spreadsheet interface. It has the feel of Excel, but each edit generates the equivalent Python in the code cell below. At its best, this can be a really fast way to get your data analysis done.

Mito is a JupyterLab extension that enables exploring and transforming datasets with the ease of Excel… and it’s FREE.

It’s really an exciting time to be a part of the Data Science community with all the new JupyterLab extensions that are coming out. They make Data Science much more enjoyable by minimizing the tedious work.

I remember the old days where we had to rely on numpy and matplotlib…

Getting Started

A short tutorial on how to visualize correlation with pandas without third-party plotting packages.

As a Data Scientist, I use correlation frequently to calculate and visualize relationships between features.

I used to start by importing matplotlib and seaborn packages, which render a good-looking plot. But it’s cumbersome to import both packages just to visualize the correlation when starting with an empty Jupyter Notebook.

Is there a better way?

I…

Opinion

Where do wage disparities occur (and what to do about it)

Software Engineering salaries differ substantially from country to county. While Software Engineers in Silicon Valley report higher wages, they usually talk about gross income (income before taxation). Living expenses in California are also substantially higher than in most European countries.

I don’t feel I’m qualified to do the objective comparison…

A 3-step tutorial on how to migrate your ML model to Java, Go, C++ or any other language. It’s easier than you may think.

I recently worked on a project, where I needed to train a Machine Learning model that would run on the Edge — meaning, the processing and prediction occur on the device that collects the data.

As usual, I did my Machine Learning part in Python and I haven’t thought much…

Roman Orac

Senior Data Scientist. Get Unlimited Medium Reads: https://romanorac.medium.com/membership

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store