These tips will help you when you need to share your analysis with others

Photo by Sid Balachandran on Unsplash

These tips will help you need to to share your analysis with others. Whether you are a Student, Data Scientist or a Ph.D. Researcher, each project ends with some kind of a report. May this be a post on Confluence, Readme on GitHub or a Scientific paper.

There is no need to copy-paste values one by one from a DataFrame to another software. Pandas with its formatting functions can convert a DataFrame to many formats.


n = 10
df = pd.DataFrame(

A short tutorial on how to set the colors on a pandas DataFrame.

Photo by Robert Katzki on Unsplash

Pandas needs no introduction as it became the de facto tool for Data Analysis in Python. As a Data Scientist, I use pandas daily and it never ceases to amaze me with better ways of achieving my goals.

Another useful feature that I learned recently is how to color a pandas Dataframe.

Let’s add colors

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0, 100, size=(15, 4)), columns=list("ABCD"))

MinMaxScaler can return values smaller than 0 and greater than 1.

Photo by Kelly Sikkema on Unsplash

MinMaxScaler is one of the most commonly used scaling techniques in Machine Learning (right after StandardScaler).

From sklearns documentation:

Transform features by scaling each feature to a given range.

This estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one.

Usually, when we use MinMaxScaler, we scale values between 0 and 1.

Did you know that MinMaxScaler can return values smaller than 0 and greater than 1? I didn’t know this and it surprised me.

In case you’re interested, Udacity offers Free Access to:

- Intro to Machine Learning with PyTorch- Deep Learning Nanodegree…

For each mistake, I also show a proper solution

Photo by Tobias Fischer on Unsplash

SQL is widely used in Data Analysis and Data Science. It’s fairly simple to start writing SQL queries, but bugs can quickly sneak into the code and consequently in the reports (or Machine Learning models).

In this article, I show 5 common mistakes (and solutions) when writing SQL queries. Some of which I made myself, others I noticed when performing code reviews.

Examples in this article are concise and show the core of the problem with as little code as possible. …

Deep learning is driving advances in artificial intelligence that are changing our world. Enroll now to build and apply your own deep neural networks to challenges like image classification and generation, time-series prediction, and model deployment.

If you are from the US:

Not from the US:

Udacity’s 30 Day Free Access offer is now available globally excluding the following countries: India, Brazil, Bahrain, Egypt, Jordan, Kuwait, Morocco, Oman, Qatar and Saudi Arabia. …

These tips are also applicable to Software Engineers. Make a few changes in your CV and land that job!

Photo by Christina @ on Unsplash

Writing a good CV can be one of the toughest challenges of job searching.

Most employers spend just a few seconds scanning each CV before sticking it in the Yes or No pile.

Here are the top 5 tips that will increase the chances that your CV lands in the Yes pile.

In case you’re interested, Udacity offers Free Access to:

- Intro to Machine Learning with PyTorch- Deep Learning Nanodegree and more

1. Beautiful Design

PyTorch and TensorFlow aren’t the only Deep Learning frameworks in Python. There’s another library similar to scikit-learn.

Photo by Uriel SC on Unsplash

scikit-learn is my first choice when it comes to classic Machine Learning algorithms in Python. It has many algorithms, supports sparse datasets, is fast and has many utility functions, like cross-validation, grid search, etc.

When it comes to advanced modeling, scikit-learn many times falls shorts. If you need Boosting, Neural Networks or t-SNE, it’s better to avoid scikit-learn.

scikit-learn has two basic implementations for Neural Nets. There’s MLPClassifier for classification and MLPRegressor for regression.

While MLPClassifier and MLPRegressor have a rich set of arguments, there’s no option to customize layers of a Neural Network (beyond setting the number of hidden…

A 10 step tutorial on how to start and configure a free server anywhere in the world

Photo by Paul Hanaoka on Unsplash

Having an always-on server is a great way to show your references to your future employers or to test your Machine Learning model in the real world.

Before we start I would like to disclose that I’m NOT affiliated with Amazon in any way. The approach you’ll learn in this article should be also applicable to other cloud providers (eg. Microsoft Azure, Google Cloud Platform).

I wrote this article because I feel it is important that you have this knowledge. I wish someone would teach me this in my college days when I had too much time and no money.

By reading this article you’ll learn:

Yet another Python library for Data Analysis that You Should Know About — and no, I am not talking about Spark or Dask

Photo by Christian Englmeier on Unsplash

Big Data Analysis in Python is having its renaissance. It all started with NumPy, which is also one of the building blocks behind the tool I am presenting in this article.

In 2006, Big Data was a topic that was slowly gaining traction, especially with the release of Hadoop. Pandas followed soon after with its DataFrames. 2014 was the year when Big Data became mainstream, also Apache Spark was released that year. In 2018 came Dask and other libraries for data analytics in Python.

Each month I find a new Data Analytics tool, which I am eager to learn. It…

Roman Orac

Senior Data Scientist, tweeting

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store