What problem to solve? Which MOOC to focus on? What Kaggle competition to take part in? What new ML package to learn? What kind of projects to showcase on the Github? How much mathematics skill to acquire?īasically, how to build a great data science portfolio? - As per a highly popular article, the answer is by doing public work e.g. But many such new entrants face difficulty maintaining the momentum of learning the new trade-craft once they are past the regularized curricula of their course and into an uncertain zone. They are changing careers, paying for boot camps and online MOOCs, building networks on LinkedIn. And, people are moving into data science. Critical for self-driven data scienceĭata science is hot and selling. However, even something as simple as having access to quality datasets for starting one’s journey into data science/machine learning turns out, not so simple, after all. That kind of consumer, social, or behavioral data collection presents its own issue. It is not a discussion about how to get quality data for the cool travel or fashion app you are working on. Let me also be very clear that in this article, I am only talking about the scarcity of data for learning the purpose and not for running any commercial operation. I faced it myself years back when I started my journey in this path. This often creates a complicated issue for beginners in data science and machine learning.
Standing in 2018 we can safely say that, algorithm, programming frameworks, and machine learning packages (or even tutorials and courses how to learn these techniques) are not the scarce resource but high-quality data is. And plenty of open source initiatives are propelling the vehicles of data science, digital analytics, and machine learning. Open source has come a long way from being christened evil by the likes of Steve Ballmer to being an integral part of Microsoft. Googles and Facebooks of this world are so generous with their latest machine learning algorithms and packages (they give those away freely) because the entry barrier to the world of algorithms is pretty low right now. Data is the new oil and truth be told only a few big players have the strongest hold on that currency.