scikit-learn is my first choice when it comes to classic Machine Learning algorithms in Python. It has many algorithms, supports sparse datasets, is fast and has many utility functions, like cross-validation, grid search, etc.
When it comes to advanced modeling, scikit-learn many times falls shorts. If you need Boosting, Neural Networks or t-SNE, it’s better to avoid scikit-learn.
scikit-learn has two basic implementations for Neural Nets. There’s MLPClassifier for classification and MLPRegressor for regression.
While MLPClassifier and MLPRegressor have a rich set of arguments, there’s no option to customize layers of a Neural Network (beyond setting the number of hidden…
Having an always-on server is a great way to show your references to your future employers or to test your Machine Learning model in the real world.
Before we start I would like to disclose that I’m NOT affiliated with Amazon in any way. The approach you’ll learn in this article should be also applicable to other cloud providers (eg. Microsoft Azure, Google Cloud Platform).
I wrote this article because I feel it is important that you have this knowledge. I wish someone would teach me this in my college days when I had too much time and no money.
Big Data Analysis in Python is having its renaissance. It all started with NumPy, which is also one of the building blocks behind the tool I am presenting in this article.
In 2006, Big Data was a topic that was slowly gaining traction, especially with the release of Hadoop. Pandas followed soon after with its DataFrames. 2014 was the year when Big Data became mainstream, also Apache Spark was released that year. In 2018 came Dask and other libraries for data analytics in Python.
Each month I find a new Data Analytics tool, which I am eager to learn. It…
There is an increasing interest in the field of Machine Learning. This doesn’t surprise me with all the hype around Deep Learning and Neural Networks. Besides that, the field offers competitive salaries, it’s challenging and fun.
I’m assuming you’re new to the field and don’t know where to start.
The question I get asked the most often is: How to start with Machine Learning? My answer is always the same, hence I wrote this article to guide you in your journey.
My suggestion is: start with free content as there are many great resources available online. …
I’ve been using pandas for a few years and each time I feel I am typing too much, I google the operation and I usually find a shorter way of doing it — a new pandas trick!
I learned about these functions recently and I deem them essential because of ease of use.
By reading this article, you’ll learn:
Cyberharassment is a form of bullying using electronic means. It has become increasingly common, especially among teenagers, as the digital sphere has expanded and technology has advanced.
Three years ago, Toxic Comment Classification Challenge was published on Kaggle. The main aim of the competition was to develop tools that would help to improve online conversation:
Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. …
I’ve been actively writing on Medium for almost two years and some great unexpected things happened along the way.
From time to time, students ask me for advice. While responding, I always do a bit of retrospective on my career in Data Science.
What did I learn? Would I choose a different path?
This article is a write-up of lessons learned through my professional career.
Most property insurers today still rely on a guy with a ladder and a camera on a stick to perform physical inspections and assess risk. But smart insurers are enlisting the help of AI researchers who have developed platforms that can evaluate thousands of publicly available images and other data points on the web to deliver a risk assessment within seconds.
“We make sure that the insurer can access that data very, very quickly, especially if it’s being used in a quote engine,” said said Ryan Kottenstette, CEO at Cape Analytics, a deep learning company that provides predictive risk analysis…
Can these libraries really process bigger than memory datasets or is it all just a sales slogan?
This intrigued me to make a practical experiment with Dask and Vaex and try to process a bigger than memory dataset. The dataset was so big that you cannot even open it with pandas.
I was recently contacted by a recruiter from a Big Tech company. Why now and never before?
In this article, I present my theory of why a recruiter contacted me for a Senior Data Science position. You can use my theory (and develop it further) to increase your chances of getting contacted by Big Tech company.
Many Software Developers dream about working for a Big Tech company. How do I know? I was one of them.