What You Don’t Know Will Hurt You
More people are studying data science than ever before, yet companies still struggle to hire qualified data scientists. While data scientists are graduating with plenty of technical skills demanded by employers, they lack four critical data science skills that simply are not being taught at universities. If you want to succeed, you must close these knowledge gaps.
1. How to write efficient code regardless of programming language
Everybody learns a few programming languages while studying, and hopefully, these will be enough for the job you get right out of university. Unfortunately, the most popular programming languages used by companies change often, especially as new technologies develop and new approaches to data science become standard. You will inevitably need to learn others in order to stay relevant.
Like most data scientists, my team at Evo primarily uses Python, SQL, and R. Some team members learned them all at university, while others had to pick up one or another while already working or for a particular project. That is perfectly normal and not itself a problem. Even people who know the basics of a programming language inevitably will have some knowledge gap and need to learn new functions to complete their work.
In some ways, programming is like painting. You start with a blank canvas and certain basic raw materials. You use a combination of science, art, and craft to determine what to do with them.
Andrew Hunt, Author of The Pragmatic Programmer
I myself only learned C, R, and MATLAB in university. I had to learn some Python and Stata on my own through trainings at work and over the internet. I was mostly motivated to learn these new tools in response to a particular problem, as some issues are best dealt with in a particular programming language.
What is a problem is when someone only knows how to code in a specific language rather than mastering the general programming concepts that apply to any language. Too many universities today teach you the details of a language without focusing as much on the universal principles that apply to any code. My own strategy for learning a new programming language is to focus on the similarities between the languages as a starting point. Most programming patterns have commonalities, but you have to know how to write. Graduating without this critical skill will hold you back.
2. How to know when it is time to switch or integrate languages
Related to the above knowledge gap, many students are graduating without fully mastering the limits and functionalities of the programming languages they learn. It is not enough to understand how to use a particular language; you also have to know when a language is the most appropriate tool for your purposes — and when you would be better off choosing another.
As mentioned previously, some topics are best served by a particular language. You will just be wasting time if you spend too much effort trying to force a programming language to do a task it is not well adapted to.
For example, R is extremely useful to manipulate dataframes with many rows using the package dplyr, but only within certain limits. You can run into problems when you have to deal with massive records containing millions or even billions of data points. In these cases, you may face memory issues because the tables are loaded as dataframes in the working environment. Using databases and therefore languages like SQL would, of course, be a better idea, even if it is not as flexible as R. Having the ability to recognize this type of distinction is vital — and unfortunately not something often given significant attention at the university level.
Interestingly, this example is even far too simplified to realistically represent how we use programming languages in the business world. Sometimes you do not need to know how to decide between two binary options, but rather you need to figure out the best way to leverage both languages for their best uses together to accomplish a greater task.
R and SQL can communicate. You simply have to understand how to integrate them correctly. Sadly, this is rarely mentioned at university. Programming languages are instead taught as standalone topics. This is incredibly limiting when it comes to real-world application. After all, business solutions often leverage many languages to build complex architectures that carry out a wide array of tasks. It’s crucial to understand how these languages work together in order to work within these frameworks. When you focus solely on mastering a particular language for a class, you often miss the nuance.
3. How to show the results of your analysis
Universities tend to do an excellent job of teaching data scientists how to collect data and analyse the results; however, they rarely cover how to share those results with the world. In many classes, you deliver your code and your technical analysis. You learn to frame your analysis for experts, not for the average person.
Once you are working, your audience is almost never a data science or coding expert. They need to quickly grasp the big picture. This requires data visualization skills and clear, concise business writing.
Most universities do not require data science students to develop their data visualization skills in any significant way. Usually, a single graph with little context suffices. That is not enough in any business setting. Your charts should instantly summarize the analysis you want to make. A picture says 1000 words, and an effective chart can say much more. Every aspiring data scientist should focus specifically on developing data visualization skills, even if it is not required by your university.
Shiny, for example, is a great tool for anyone with R programming experience to develop easily adaptable templates that generate clear charts, tables and graphs that illustrate patterns in your data. Instead of creating new visuals for your analysis every time, you can build dashboards that illustrate real-time developments that impact the analysis and other key information businesses need to know. Of course, sometimes you just need a simple graph fast. In those cases, you can never go wrong with Microsoft Excel.
Numbers have an important story to tell. They rely on you to give them a voice.
Stephen Few, Information Technology Professor and Consultant
Beyond graphs and charts, data scientists must be able to explain what the data suggests in a non-technical way. Universities may need you to explain the technicalities, but outside this setting, plain language matters most. Unless you have practiced sharing your conclusions clearly without technical language, you will not succeed as a data scientist in the real world — and you may not even get past the interview stages that require this skillset.
4. How to use statistics with a business mindset
Statistics are a key area of study for any data scientist. You will likely have years of statistics by the time you graduate. Despite that, or perhaps because of that, many graduates actually leave university with an approach to statistics that stands in complete opposition to the ways that statistics are used in the business world.
If you complete an analysis that shows something has only a 20% chance of happening, you would not dare suggest to your professor that this is a good option. The confidence is way too low. You are taught to ignore anything lower than at least 90% probability (if not much higher). In the university, you are constantly taught to approach statistical probabilities cautiously, and to ignore anything that does not approach your threshold.
In business, this cautiousness does not hold up. Ask any CEO if they would take an action with a 20% chance of making them $1 billion, and you will get a yes every time. The risk calculation works differently in the business world. Instead of looking at the pure statistics, data scientists need to be business scientists who understand what motivates executives and what best serves the needs of a company.
Programmers are not to be measured by their ingenuity and their logic but by the completeness of their case analysis.
Alan J. Perlis, Computer Science Pioneer and First Recipient of the Turing Award
At Evo, the company that makes AI-driven automated pricing and supply chain recommendations where I am currently the Senior Data Scientist, we are confronted by this reality on a daily basis. We have to make sure statistics are not costing our clients opportunities. Instead, we have developed predicative tools that take into account the full business context and deliver recommendations according to each company’s needs and goals.
To succeed as a data scientist, graduates need to learn how to step away from pure statistics and focus on real world application. A lack of relevant contextual knowledge could keep you from getting a job.
Stepping outside your comfort zone
These four skills are absolutely critical for success in data science. You cannot build a career without developing them. If you, like many students, are not learning them in university, you must dedicate yourself to learning them elsewhere. This may require stretching yourself and stepping outside your comfort zone, but if you do not, you will never build a lasting career as a data scientist.
When I am hiring data scientists at Evo, I am not just looking for technical abilities (although those are critical too). I am looking for these four skills. We hire team members who are ready move beyond the academic practice of data science to apply it in business, and that means you have to develop these abilities.
Thanks Kaitlin Goodrich for your contribution.
About the author
Giuseppe Craparotta is the most senior data scientist at Evo.
Before, he worked as an intern in an aerospace company and in Product Lifecycle Management consultancy. He got an MSc in Mathematical Engineering from the Polytechnic University of Turin, and he recently received a PhD in Pure and Applied Mathematics.
His research interests span across applications of advanced statistics; in particular, he is focusing on sales forecasting for fashion. He loves hiking, and singing!