These tips will help you need to to share your analysis with others. Whether you are a Student, Data Scientist or a Ph.D. Researcher, each project ends with some kind of a report. May this be a post on Confluence, Readme on GitHub or a Scientific paper.
There is no need to copy-paste values one by one from a DataFrame to another software. Pandas with its formatting functions can convert a DataFrame to many formats.
Let’s create a DataFrame 10 rows and 3 columns with random values.
n = 10
df = pd.DataFrame(
Pandas needs no introduction as it became the de facto tool for Data Analysis in Python. As a Data Scientist, I use pandas daily and it never ceases to amaze me with better ways of achieving my goals.
Another useful feature that I learned recently is how to color a pandas Dataframe.
Let’s create a pandas DataFrame with random numbers:
import numpy as np
import pandas as pddf = pd.DataFrame(np.random.randint(0, 100, size=(15, 4)), columns=list("ABCD"))
MinMaxScaler is one of the most commonly used scaling techniques in Machine Learning (right after StandardScaler).
Transform features by scaling each feature to a given range.
This estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one.
Usually, when we use MinMaxScaler, we scale values between 0 and 1.
Did you know that MinMaxScaler can return values smaller than 0 and greater than 1? I didn’t know this and it surprised me.
SQL is widely used in Data Analysis and Data Science. It’s fairly simple to start writing SQL queries, but bugs can quickly sneak into the code and consequently in the reports (or Machine Learning models).
In this article, I show 5 common mistakes (and solutions) when writing SQL queries. Some of which I made myself, others I noticed when performing code reviews.
Examples in this article are concise and show the core of the problem with as little code as possible. …
Deep learning is driving advances in artificial intelligence that are changing our world. Enroll now to build and apply your own deep neural networks to challenges like image classification and generation, time-series prediction, and model deployment.
If you are from the US:
Not from the US:
Udacity’s 30 Day Free Access offer is now available globally excluding the following countries: India, Brazil, Bahrain, Egypt, Jordan, Kuwait, Morocco, Oman, Qatar and Saudi Arabia. …
Writing a good CV can be one of the toughest challenges of job searching.
Most employers spend just a few seconds scanning each CV before sticking it in the Yes or No pile.
Here are the top 5 tips that will increase the chances that your CV lands in the Yes pile.
scikit-learn is my first choice when it comes to classic Machine Learning algorithms in Python. It has many algorithms, supports sparse datasets, is fast and has many utility functions, like cross-validation, grid search, etc.
When it comes to advanced modeling, scikit-learn many times falls shorts. If you need Boosting, Neural Networks or t-SNE, it’s better to avoid scikit-learn.
scikit-learn has two basic implementations for Neural Nets. There’s MLPClassifier for classification and MLPRegressor for regression.
While MLPClassifier and MLPRegressor have a rich set of arguments, there’s no option to customize layers of a Neural Network (beyond setting the number of hidden…
Having an always-on server is a great way to show your references to your future employers or to test your Machine Learning model in the real world.
Before we start I would like to disclose that I’m NOT affiliated with Amazon in any way. The approach you’ll learn in this article should be also applicable to other cloud providers (eg. Microsoft Azure, Google Cloud Platform).
I wrote this article because I feel it is important that you have this knowledge. I wish someone would teach me this in my college days when I had too much time and no money.
Big Data Analysis in Python is having its renaissance. It all started with NumPy, which is also one of the building blocks behind the tool I am presenting in this article.
In 2006, Big Data was a topic that was slowly gaining traction, especially with the release of Hadoop. Pandas followed soon after with its DataFrames. 2014 was the year when Big Data became mainstream, also Apache Spark was released that year. In 2018 came Dask and other libraries for data analytics in Python.
Each month I find a new Data Analytics tool, which I am eager to learn. It…