Over the last decade, there has been an explosion of interest in data science careers, and the number of people looking to make a career in data science. The COVID-19 pandemic crisis caused millions to lose their jobs and forced people to rethink career transition. One question that our Project Advisors often get from nearly everyone interested in pursuing a data science career – whether they are just getting started or looking to transition into the most in-demand data science job market – is: “How to switch to a career in data science?” If you’re considering a career switch to data science or data analytics, this information can help you pave an easy data science career path.
But, let’s put first things first
Our digital footprint has increased over the years and especially since the pandemic began. You may not realize and still be surrounded by massive amounts of data. The huge volume of data makes Data Science a coveted career in today’s world. A 2020 Dice report said that the demand for data scientists increased by an average of 50% across healthcare, telecommunications, media, and the banking, financial services, and insurance (BFSI) sectors, among others. The ashes of this pandemic crisis have strengthened the data science job market making it the second-best job in America for 2021.
Apart from the statistical know-how and programming skills, the most important thing is an individual’s love for data. Here is our quick checklist of data science skills -
Recommended Reading. : Data Science Skills - Must Have's
People from all backgrounds have been able to make a successful data science career transition. Their advice will definitely shine a light on those who want to design and build their own customized data scientist career path.
We recently got an opportunity to talk to one of the celebrated Data Scientists in the country on how to make a data science career transition. Joy Mustafi is one of India’s Top 10 Data Scientists and has two decades of experience in Data Science. When he traveled on the path not taken he often relied on his friends for data. With the availability of open-source data, his data science learning path has not reached its crest. Joy still enjoys learning as much as he enjoys teaching. He truly understands the importance of timely delivery of a business project and therefore doesn’t consider his segment of work to be exclusive in any way. Having 65 patents under his name he is spreading Joy across communities while staying grounded. In this astounding interview with ProjectPro, Joy talks about a wide range of topics including MUST research and his vision to impart his technical knowledge to the industry beginners.
MUST research is mainly to promote the excellence of the competitive field of data science, artificial intelligence, and machine learning. When I was a student I thought about starting something to talk about emerging technologies. When I had my first job I met my colleagues and friends and we decided to form a club where the uber goal was to spend our free time contributing with respect to technology for the benefit of the nation. When I started, at that time the data science phrase was not there, the data scientist position was also not there. Currently, we have more than 500 members or volunteers across India who are all contributing to data science for the benefit of society or mankind.
As I mentioned, the data science problems we focus on are very close to society so we focus on few business domains which we call - HEARTS
H - Healthcare
E - Education
A - Agriculture
R - Retail
T - Transportation
S - Smart city
These are the primary focus areas. At the same time horizontally we cut across with respect to the technology with natural language processing (NLP), computer vision, speech technology to some extent IoT and robotics as well which we like to call embedded intelligence where we try to embed the machine learning or deep algorithm itself in the hardware side as well. Then we have collaborated with multiple governments including the Government of India, State Governments of Telangana, and West Bengal. We are also working with multiple academia such as IIIT Hyderabad, IIT Kharagpur, ISB Hyderabad.
When you started off there was really nothing defined called data science yet it was analytics broadly. What has changed from then to now when it is suddenly cool data science is very much in demand? How have data science job roles changed?
I started my journey 18 -19 years back. In the early 2000s, only a few academic organizations focused on research so when I was a researcher at Indian Statistical Institute, Kolkata I started my first program that was on a neural network. I wrote the entire piece of code by myself and it was C programming language at that time. Of course, the academicians knew about this. IITs and IISC were doing research but in the corporate world, it was not so famous because the corporate always wanted a deterministic based approach. I did not want to take a risk probabilistic approach versus a deterministic or a typical IT software deliverable. Even some of my friends and colleagues questioned my decision of learning AI. Everyone advised me to learn Java and SQL because those skills would help me acquire a job. So that was the scenario.
Although the term analytics had come it was mostly descriptive analytics or business intelligence. Even the predictive analytics came later. After I started things changed gradually because of the availability of resources. As mentioned earlier, I was a researcher at ISI and at that time my desktop had 8 MB of RAM, now I have 32 GB of RAM in my laptop. A huge change has happened over years and in 8 MB of RAM my program used to run for 48 hours on a neural network like all the iterations and everything so if I changed a piece of code then I had to wait for two days to see the result. Now the same program can be run within 5-10 minutes using Python. A lot of things have changed today because of hardware availability, data availability. Now there are different forums where we get access to public datasets and these open source libraries were not available at that time. We only had languages such as C++ and Java and few packages like MATLAB but those were all licensed products. There was no free open-source version of packaged libraries. These are the changes that have happened and then the corporate world got interested. I still remember when I was working with IBM as an analytics consultant and they launched IBM Watson which was the first corporate product based on AI. It is actually based on natural language versus information retrieval which intrigued me and I moved to IBM Watson. There I started contributing on how to bridge the gap between what customers want and what we had at that time. We wanted to protect the intellectual property of the organization so right now I have around 65 patents and all related data science and artificial intelligence.
So the biggest transformation in your experience has been this scale-up of processing power, hardware, data science, and machine learning libraries and interfaces and APIs?
Data availability was a major transformation because I collected primary data. I was working on handwriting recognition so I requested my friends and colleagues to write in an A4 sheet and give it back to me because there was no open dataset at that time.
Coming into today's environment beginners do a lot of things to break transition into a data science career. What is the single biggest thing you think they should focus on while making a data science career transition?
To switch to a Data Science career successfully, any Data Scientist should have multiple hacks. The word scientist has a lot of weightage so we have to prove ourselves with respect to the inventions which come with patent papers. I recommend aspiring data scientists to first research, think out of the box. Secondly, any researcher shouldn’t just be a theoretical researcher, they should be able to execute their logic as well. Language is not a constraint here but in any programming language, one should be able to translate mathematical logic into computational data science and machine learning project solutions. Third is everyone should always be willing to learn like a student irrespective of age, experience, and background. Growth stops when learning stops and there are a lot of new machine learning algorithms coming up.
Last year GPT3 came, which is one of the most interesting algorithms. But if you don't know what is GPT3 or what is even GPT then you can’t call yourself a Data Scientist. I still love to suggest to others and try to teach others. I still love teaching. I am a visiting faculty at IIT Kharagpur and ISB Hyderabad right now. I have been a visiting faculty in multiple academia mostly tier one organizations and that helped me a lot to clear a lot of doubts and while I studied those as a student I also implemented them as a practitioner in the corporate world. I really love this R E S T concept of being a Researcher, Engineer, Scholar, and Teacher.
That's a very nice framework a data scientist should have and while they do this and obviously most of them go to google, are there any specific online data science resources that you have relied on or you suggest people should rely on that are super helpful?
Data Science and Machine Learning is a vast ocean right. I don't have a particular list but when I started my Data Science Career I went through Dan Jurafsky's videos. I first read the book right on NLP. If you see Dan’s lectures now on his YouTube channel he uses the latest technologies and latest algorithms. If I go through Andrew NG's ML open videos they are all continuously updated with the latest technologies. It's good to have a self-learning mode but there are multiple things for which we need teachers. We still go to school we still go to college we don't just go to the bookstore or borrow books from the library. We still need a supervisor or teacher or mentor who can guide us which is very important and that's the reason we started the MUST research academy. Being a non-profit organization we are trying to serve the purpose of a very advanced applied research program with an affordable fee structure.
You mentioned GPT3, I'm curious what is so fundamentally different about GPT3 other than obviously the humongous data set on which it has been trained. Why is it getting so much attention?
A-Definitely it's the latest one right and the full form is a generative pre-trained transformer so it's a kind of transformer-based learning and it is released by open AI and then a lot of other implementations happen. My Data Science career path began with a classical-based learning kind of thing which is feature engineering-based. Feature engineering means as a human being if I identify the features of a dog and a cat and then I am defining those features for image processing algorithms. If it is a digit recognition like zero to nine now I know how 0 and 9 look right and I am defining these features with respect to my knowledge or with respect to my domain knowledge. This domain knowledge can be on healthcare, manufacturing, telecommunication and I am using my human intelligence and knowledge or experience to identify those features and once I determine the feature then I implement a machine learning algorithm. GPT-3 is a transformer-based neural network that does not require any special fine-tuning and can perform custom language tasks without training data. The network is deep and it will learn on its own which is interesting. And its popularity is mainly because of the fact that it’s new. All the latest algorithms are process-hungry and data-hungry. So we have to put an adequate amount of data to use that kind of GPU or TPU so those are the constraints but these algorithms are powerful.
One of the things for example we do at ProjectPro is that we help data scientists get their work done faster by giving these reusable templates. I'm just curious from your experience, are there some tactics or hacks that you can share to be more efficient and release your projects faster.
To deliver my Data Science Projects faster I always prefer a spiral model so I always start like an end-to-end hello world. This is mainly because a Data Science project is not exclusive, we always have to think about how we can integrate the solution with an appropriate product pipeline, how we can handshake with the engineering team, architecture team, and even user interface design team. These are all mutually dependent and we cannot say we are focusing only on the accuracy. So I prefer to have a basic working machine learning algorithm and run it but it should run end-to-end then focus on improving the accuracy of the algorithm without hampering the entire data science or machine learning pipeline.
If you are looking to dig deep into the world of data rather than feeling lost then there is no time better than now.ProjectPro is on a mission to change how data scientists get their work done. The overzealous team is constantly working to improve the way data scientists deliver their projects. When we say we provide experience to real business problems, we mean it. Every data science project has been carefully designed by industry experts providing users with verified, reusable project solutions. Each project comes with datasets, code, documentation, and videos to elucidate the code providing just-in-time learning.