Are you trying to get into data science and not sure which languages to learn? Well, you are on the right blog. This is exactly what we’re going to talk about today. I’m going to share what languages you should get started with.
So by the end of this article, hopefully you’ll have a really good idea in terms of what programming languages you should be focusing on if you’re trying to get into data science role. Before we jump into the details, I do want to say that the programming languages that I’m going to mention today is based on my experience.
Alright, so what programming languages you should be focusing on? Let’s say you have figured out that you want to be a data scientist. You want to get into data science and you’re trying to figure out what languages you should get started with. First of all, you should read some of my previous blogs where I actually talk about what skills you should learn.
I do not recommend learning programming languages as the first skill for entering transitioning into data science. I actually recommend that you build your fundamentals in statistics and machine learning before you actually jump into the coding part. So if you want to learn a little bit more about my in-depth analysis of why I think that’s the case, you can go through some of my previous blogs where I talk about that.
Let’s say you have figured out you want to get into data science and you have narrowed down your specific role that you want to target. In this blog, I’m going to focus on data scientist generalist role. A generalist is somebody who is like a mix of machine learning, a research scientist, machine learning engineer, a research scientist, a data analyst, have a good business understanding and they’re able to combine everything together and they’re able to basically solve business problem with data science methodologies.
For example, like these are the job descriptions for a data scientist generalist looks like. Alright, so when it comes to the programming languages, there are a lot of options out there and depending on your role in the focus area, the preferred language could be different. But the three common ones that have come over and over and over and over again is the one that I’m going to focus.
But there are a few programming languages that have come that are very common regardless of which niche in the data science domain you are in. And the first one without any guesses is obviously SQL. And I know that’s surprising that you think that I mentioned SQL.
When we think about SQL like a lot of the times like you think it’s a data analyst go-to language. Yes, that is correct. But as a data scientist, you would also need to learn SQL because how are you going to access the data? You need to learn SQL to be able to access and pull your data and do joins or whatever you need to do to the data so you could actually do work on top of it.
So in my opinion, a data scientist should be very, very, very well versed in SQL. So that’s definitely a must-have. So you should definitely have a really good understanding of SQL. You should be able to know where the data lives. You should be able to access it. You should be able to write your own queries. You should be able to join multiple tables and basically get your data prepared for your data science application. Alright, and a lot of times you probably heard that there is a lot of data cleaning involved. There is a lot of like data manipulation involved.
And a lot of the times the first step of all most data manipulation or data cleaning would involve like using some sort of SQL. There are so many resources on SQL out there. But definitely, definitely, definitely learn SQL. I’m going to link one of the courses, one of my favorite courses on Coursera that I have personally recommended to a lot of people. So you can look into that if you are not familiar with SQL.
But there are hundreds of free resources out there. So you don’t have to listen to my recommendation. Okay, so let’s say you’ve learned SQL and now you need to figure out what is the programming language that you want to learn next.
Here I’m going to present you with two options. And I lean over one option more than the other. So there are two languages that often are spoken about.
One is Python, the other is R. The question is, do you need to learn both languages? No, you don’t. Do you need to pick one over the other? Yes. And here’s why.
Because when you’re just starting out, you don’t want to overwhelm yourself learning both languages together. You want to make sure that you’re picking one and then you are basically building your knowledge on it and then trying the data science application on that language before you kind of like go too wide and then get with learning two languages and get confused. So for that reason alone, I recommend that you limit the number of languages that you learn at a time because it can get overwhelming.
Python and R are both amazing data science languages. They have amazing libraries, analysis toolkits available in both of these languages that I don’t think any of those choices are wrong.
But let’s say if you do have to pick one, then my personal preference is Python. And here is a little bit of story of when I transitioned into data science, my experience, because I first learned R. And I was working in R. It was great. And doing my work, publishing it, writing paper on it after doing my basically whatever problem I was solving.
And then my team was basically working with the software engineering team. So basically all my teams worked with all, my entire team worked with R. And we were all like specialized in it, all data scientists. So there was one project where we actually had to partner with the software engineering team and they had to implement the model that we have researched and built to take it into production.
And when they had to take it into production, since they took, we wrote it in R, it took them a very, very long time to kind of like translate that R code into whatever language that they were using. I think they were using C sharp. It took them very long time to translate that R code into a production level language.
And that was at that time, they were using Java and C sharp. The project got delayed because it took a very long time. And my team actually had to do a reflection and kind of like highlight what was the problem.
And it ended up being that R is very hard to communicate between two different job families, especially when they’re working close together. When the other job family doesn’t know how to basically work with it, simply put. And because they cannot take the R code by itself and copy paste into a production environment, they have to like translate into a different language that can live in a production environment and then they can roll it out.
So that was the moment where my team made a conscious decision to kind of move from R to Python. So as a team, all data scientists, we kind of transitioned from R to Python. And that, looking back, that did improve the timeline of a project that we worked on going forward, because the software engineers actually understood Python very well.
And even those that didn’t, they were able to kind of like figure out quickly what each code, each function, each code snippet is doing. From that experience alone, my recommendation is to pick Python for your first language. Because when you’re just learning, you don’t know if you’re going to be working at the end of your learning process when you get a job.
You don’t know if you’re going to be working with a team. You’re going to be working in a team that has to push code to a production environment. So for that reason alone, like if I were to like start all over again, I would push, I would definitely encourage starting with Python.
Now that doesn’t mean that you don’t learn R. Like yeah, you can definitely learn R. I feel like if you learn, if you know Python, it’s very easy to pick up R. And R does have like a lot of cool libraries that you can kind of, you can like use. For example, their time series analysis libraries are some of the coolest ones. Very easy to do time series analysis in R versus if I were to use Python.
So there are definitely like benefits to using R. For example, when I’m doing my work and I have to do something quick and dirty, like I prefer actually using R over Python for those scenarios because I know like how quickly I can like spit something out and get it out as soon as possible. So there are definitely benefits to both languages. And when you are trying to figure out what to learn, like yes, initially you start and narrow down your focus, but like eventually you can like build, expand on it and you can learn R. Anyways, like I wanted to talk about it because I know like a lot of the people when they’re starting out, they kind of get stuck in terms of what languages to learn.
So I would say like go with these options because these are kind of like tried and tested. So you can pick a combination of SQL and Python or SQL and R and none of these options are wrong. It’s again like personal preference and depending on your target company, target role, you might be, SQL and R might be a preferred option versus SQL and Python.
That’s where you would kind of like do some research and like figure out what works for you. These are the three languages. These are like very popular options and these are the three ones I mentioned.
And I’m sure like once you start working, you will eventually like pick up on a lot more languages. For example, at my last company, I actually ended up learning Scala. That wasn’t my plan, but I had to do a project that required me to use Scala.
So I ended up learning Scala while on the job. So learning these three languages doesn’t mean that you’re going to be stuck with it for the rest of your career. Eventually, you’re going to get to learn a lot more than this.
But the reason I wanted to focus on these three only because like when you’re starting out, like it’s very easy to get overwhelmed and it’s very easy to kind of get discouraged when you’re trying to learn too many things and you’re not able to grasp. Let me know in the comments what your preferred language for data science is. Do you prefer R? Do you prefer Python? What are your thoughts on SQL?
I hope you’re having a beautiful day. Talk to you later. Bye.