As promised in the article Your Personal Course in Big Data, I am posting answers to the most frequently asked questions that arise among people interested in Data Science and Big Data.
The answers are given by the best practitioners of data analysis: Kaggle winners, employees of many data science services company, and everyone who knows what Data Science is all about.
So, I took about 100 of the most common questions, highlighted the most talked about ones, and commented on each one in the most detail so there would be no more questions!
How do you get started on machine learning tasks quickly?
Of course, the answer to this question depends heavily on the training of the person who is going to solve the problems. But, in general, for someone who has a little knowledge in:
– common sense,
– mathematical thinking.
Where to start learning?
First of all, you need to understand what the real work of a Data Scientist looks like. You can study machine learning indefinitely – but what’s the point if most of the work is often a chore? So to avoid wasting time and to be sure you need it, you should first get to know how things really work in the real world and what you should be prepared for. For this, at one time, a series of articles was written: How to Learn Mathematics the Right Way? Many who begin to learn Data Analysis face a lack of mathematical thought. Indeed, in order to understand all the algorithms, as well as to work competently with the data, it is simply necessary to have some rigor in one’s judgment.
What skills are needed at different professional levels?
It is worth noting that the answer to this question knowingly can’t be exact and correct, because everything in the end is determined by the nature of the work. Nevertheless, the levels can be conventionally defined as follows:
- Beginner. As a rule, you need to be able to work well with data: to perform overprocessing, cleaning, separation of features, bringing the data, relatively speaking, to the matrix “object-character”.
- Middle. Here it is already important to know machine learning. Experience in kaggle competitions will come in handy. It is important to know the mathematics and algorithms very well.
- Senior. It’s already important to understand working with big data – how it’s stored and how it’s processed.
- Advanced. Here you have to understand more technical details, as well as clearly understand the plan for solving the problem, to estimate the deadline. As a rule, you already have to manage a group of developers.
What is a typical day in the life of a big data specialist?
This question primarily interests those who have not yet worked in this field. It’s worth noting here that most of the work is a routine that happens every day. However, the routine consists of working with:
– data consistently and accurately,
– testing various hypotheses,
– visualizing data.
Machine learning tasks are solved at the very last minute. Nevertheless, most of the panelists were of the opinion that still, in general, the process of data analysis in one way or another obeys the CRISP-DM methodology, which can be briefly explained by a picture that speaks for itself:
In general, of course, there are deviations from this process.