We decided to write this content piece because, in the past few months, many aspiring data science professionals asked our project advisors these questions on getting started with a data science career -
“I want to learn data science but I don’t know where to start.”
“I know python for web development but how to learn python for data science.”
“What is the best way to learn data science?”
“How to learn data science from scratch?”
“What is the best place to learn data science?”
The objective of this blog is to set you off on the right foot on your data science journey. You can rise up and take on your desire to become a data scientist irrespective of whether you have a fancy background, fancy degree, or not. Anyone can become a data scientist regardless of the current job role or previous experience. The biggest challenge is knowing where to start learning data science. There are tons of data science resources online to learn data science but it is important to structure the data science career path logically. All that is required is to work hard, learn the required data science skills, and demonstrate that you can deliver results through hands-on data science projects.
Before you get lost down the data science rabbit hole, take a look at our 3 critical tips on how to learn data science from scratch by yourself faster –
Practicing projects and learning by yourself is the only thing that can make you a data science superhero.
The best way to learn data science is to work on projects so you can gain data science skills that can be applied immediately and are useful from a real-world implementation perspective. The sooner you start working on diverse data science projects, the faster you will learn the related concepts. Even if you blaze through reading a complete book on machine learning algorithms and a topic like linear regression seems straightforward- so even a naïve person could implement it- you will still end up scratching your head when you are given a real-world business problem to implement linear regression machine learning algorithm for the first time. You will think, “wait what is the formula to estimate the coefficients in linear regression? “As the principle of Brain Plasticity states- “Use It or Lose It”, this principle holds true when learning data science. Build a data science project for every concept that you learn through the book or any other online data science resources. If you are not actively applying the data science concepts that you learn, you’ll not be prepared to do the actual data science work on the job. Projects are the best way to learn data science and a great starting point.
Learning data science fundamentals always should be your priority: the better you understand them, the easier it is to learn other advanced data science and machine learning concepts. Understand the components and concepts that are used in data science instead of jumping onto courses directly. Break each concept into smaller chunks, understand the theory behind it, and put them to practice by implementing them. Grasping the fundamentals is an important step in learning data science, so don’t overlook it. We break down the fundamentals of data science into two core categories: Math and Programming.
Compared to other programming languages, it is preferred to learn Python for data science as it has become the de-facto programming language for analytics professionals followed by R. Regardless of whether you are planning to learn Python or data science or R for data science, here are some fundamental programming concepts that you must grasp –
If any particular math or statistics concept does not make sense to you, be it on the book, or in the class, keep up your confidence and search for other alternate online resources to learn any given data science concept. There are tons of resources online to learn data science for free. Every person learns differently, and just because you’ve not been able to learn a concept through one source does not mean that you cannot learn it. There is a wealth of content online to learn data science be it an explanatory blog, tutorial, video, or podcast that will make the concept at hand crystal clear for you to grasp.
We had the opportunity to talk with Kaggle expert Sharan Kumar Ravindran who decided to share his data science career path with us. Sharan is a leading Data Scientist currently working at Deloitte Australia. He has authored two books on Data Science related topics, with over 2200 copies sold globally and his books are consistently ranked in the top 500 in the Machine Learning specialty category in Amazon. Sharan ranks in the top 1 percentile on Kaggle, the world's largest community of data scientists and well-versed in programming in R and Python. In this interview, he talks about the importance of having a mentor and candidly shares how his mentor helped him shape his data science career.
Can you unpack for us what a Kaggle competition expert is, what are the various levels and how did you achieve being in the top one percentile status?
I would like to begin by informing you that I'm no longer active on Kaggle. I think my last competition was about five years from now. But the points that I received when I was active have helped me to retain my position. My highest ranking was maybe in the top 0.5 percentile and my current ranking is probably between 3-4 percentile. I started participating in Kaggle data science hackathons with the intention to learn. Kaggle is a good platform for anyone who's trying to learn data science from scratch on their own. It really helped me understand the different machine learning algorithms, their applications as well as the suitable datasets to work with various algorithms. As a result of my intensive involvement, I learned things from Kaggle which people gain through experience. I made sure to be actively involved in at least one Kaggle data science competition at any point in time. I would spend at least two hours every day focusing on my learnings from a particular competition. I read almost every post, as well as the discussions on various forums about the machine learning model and the dataset. This helped me to excel not only on Kaggle but also as a data scientist.
How many data science competitions do you typically have to submit a solution for you to get a high rank let’s say - into the top 5 percentile?
There are at least 1000 users participating in a typical data science competition on Kaggle, so to get into the top five percentile takes some time. I usually start at the launch time of the competition because that gave me enough runway to fine-tune the machine learning models. The frequency of my participation can be inferred from this data - the competitions in which I ranked in the top 5-10 percentile, I would have made at least 50 submissions or sometimes even 100 submissions. From what I recall there is a limitation of 3 submissions per day.
On your blog, you wrote about a topic that is a favorite in our learning community which is how to get your first data science job. Could you list down the top three things that are most critical in that journey?
The top-most critical thing is to stay up to date with the novel data science tools and technologies. Kaggle runs a survey and this year they had about 20,000 plus respondents. The survey has various questions about the popular tools being used by data scientists on their job. This helps to understand the various tools and technologies that a data scientist on a job is currently using so one understands and familiarizes themselves with the same to stay relevant in today's demanding world. Chances are high that if there is a job opening most of the questions would be based on the current trends.
Secondly, I would suggest anyone looking for a job in data science be really strong in the fundamental concepts of data science. There is no need to be an expert in a particular area but must be really strong with basics like basic statistics, pandas, NumPy, basic visualization to handle the missing data and the typical data issues.
Also having a project portfolio is really important when appearing for a data science job interview because there are hundreds of people applying for the same job positions and the portfolios would differentiate between the wide variety of candidates. Lastly networking, you need to have a good network.
When you are interviewing candidates how many data science projects do you expect to see in their portfolio? Apart from the quality of projects do you also see quantity?
I like to see a variety. The projects should be from different areas because variety reflects a candidate's willingness to learn different components of data science so apart from quality, one thing that I would look for is variety. I always suggest digging deep into a few data science projects in order to acquire knowledge.
How do you suggest new aspiring data scientists to network?
Due to COVID, things have been tough. I used to expand my network by participating in various data science meetups. Bangalore had a culture of meetups and I made sure to participate in at least two of these in a month where I interacted with individuals working in the field of data science from different organizations.
In another blog of yours, you talk about why everyone should have a mentor in their data science career. Can you talk about the critical things your mentors helped you with? Any specific inflection points and the guidance they gave which impacted you?
As a beginner, I was interested in various things. Mostly because data science is huge and then it's very tempting to learn and test a lot of things at the same time. Having a mentor helped me focus on my goals, to identify things I'm good at. With his guidance and some introspection, I understood my strengths as well as weaknesses. My mentor was Derek Jose from Flutura which was my first organization where I started my data science career. Other than giving me regular inputs and providing materials, he connected me with a lot of interesting people.
Can you elaborate on your book which is on amazon - “Mastering social media mining using R? Why does this topic need an entire book and why did you choose R instead of Python?
To begin on why I chose R instead of Python I will state the fact that this was written 5 years back and at that point of time, R was still popular for data science and a lot of data scientists were using R. This trend has however changed in the last two years as python for data science has become very popular. Then if you ask why an entire book for itself is because initially all the social media channels such as Facebook, Instagram even LinkedIn had their APIs open for a few months and were later restricted. Now you can gain access to them by submitting a request and going through a proper channel. There are a lot of blogs, reviews and the amount of data out there is huge so are the number of use cases that we can implement using all this data. I can probably write one more book on this topic because the amount of data that is getting generated from these platforms is enormous.
What are your favorite online tools or resources that you refer to help you with your data science projects or in your data science career and while you already mentioned the updates that Kaggle sends out but other than that what else do you recommend people?
Apart from Kaggle, I would say Medium, Towards Data Science is a publication that has a lot of data science-related content. It sends out daily newsletters curating their best articles. Then there is KDNuggets which also has a lot of data science-related articles. It sends out weekly newsletters curating their best articles. Data science central is a community of data scientists which has related articles as well as events. Referring to these resources helps you stay relevant with recent technologies.
As a senior data scientist, do you have any favorite tactics or tips which help you get your work done efficiently?
One thing that helps me a lot is having a template. Not only in data science, if I see repetition in whatever I’m currently I try to come up with a template. For example, I publish 1 article in two weeks so before writing I note my ideas in a template and that is efficient utilization of my time. Similarly, for data science projects having a template is advantageous.
The final question of this session is you said that a portfolio is an important thing for an aspiring data scientist to have but when you're interviewing people for potential jobs and find a reasonable resume with some online courses and the portfolio may have some interesting projects, what else is usually missing? Is this enough? I'm sure everybody with a data science portfolio and a course doesn't get a job, what else is usually missing?
I would say what differentiates is a good understanding of the data science concepts. Knowing the basic concepts is fine but the application with real-world use cases is important. That helps in coming up with a structured way of thinking when working on any real-world data science or machine learning project.
Feel like becoming a data scientist is something you need to accomplish and wondering where do I start. Look no further than ProjectPro. We are the only solution to train you across diverse data science tools and technologies through a library of 60+ solved end-to-end data science and machine learning projects. Each project comes with a solution code, 2-3 hours of videos explaining the end-to-end project lifecycle, dataset, and documentation.