I sometimes receive emails asking for guidance related to data science, which I answer here as a data science advice column. If you have a data science related quandary, email me at
[email protected]
. Note that questions are edited for clarity and brevity. Previous installments of the data science advice column include
how should you structure your data science and engineering teams
and
advice to a student interested in deep learning
Q:
This question is a composite of a few emails I’ve received from people with some limited programming skills, living outside the Bay Area, and interested in becoming data scientists, such as the following:
Q1.
I’m a financial analyst at a major bank. I’m in the process of pivoting to tech as a software engineer and I’m interested in machine learning, which lead me to your post on
The Diversity Crisis in AI
. Do I need a masters or PhD to work in AI?
Q2.
I am in the 6th year of my PhD in pure mathematics and am going to graduate soon. I am really interested in data science and I want to know if I want to get a job in this area, what can I do or what should I prepare myself so that I can have the skill sets that the companies need? I’m now reading books and trying to find some side projects that I can do. Do you have any ideas where I can find these projects that will interest employers?
Q3.
I have a graduate STEM degree and have worked as both a researcher and a teacher. I am currently in the midst of a career transition and looking for industry roles that might need both analytical and instructional skills from their employees. My knowledge is more on the science side though rather than in software. The Internet can get to be a pretty overwhelming place to pull information from without the actual sharing. Do you have recommendations for programming courses and workshops that are also friendly to a teacher’s budget? And what coding languages or skills would you say would be most helpful to focus on developing?
A:
I think of myself as having a somewhat non-traditional background. At first glance it may seem like I have a classic data science education: I took 2 years of C++ in high school, minored in computer science in college (with a math major), did a PhD related to probability, and worked as a quant. However, my computer science coursework was mostly theoretical, my math thesis was entirely theoretical (no computations at all!), and over the years I used less and less C/C++ and more and more MATLAB (why oh why did I do that to myself?!? Somehow I found myself even writing web scrapers in MATLAB…) My college education taught me how to prove if an algorithm was NP-complete or Turing computable, but nothing about testing, version control, web apps, or how the internet works. The company where I was a quant primarily used proprietary software/languages that aren’t used in the tech industry.
After 2 years working as a quant in energy trading, I realized that my favorite parts of the job were programming and working with data. I was frustrated with the bureacracy of working at a large company, and of dealing with outdated and proprietary software tools. I wanted something different, and decided to attend the data science conference
Strata
in February 2012 to learn more about the world of Bay Area data science. I was totally, absolutely, blown away. The huge enthusiasm for data, all the tools I was most excited about (and several more I’d never heard of before), the stories of people who had quit their previous lives in academia or established companies to work on their passion at a startup… it was so different and refreshing to what I was used to. After Strata, I spent a few extra days in San Francisco, interviewing at startups and having coffee with some distant acquaintances that I found had moved to SF - everyone was very helpful and also apparently addicted to
Four Barrel Coffee
(almost everyone I spoke with suggested meeting there!)
I was star-struck, but actually switching into tech made me feel totally out of it… For my first many conversations and interviews with people in tech, I often felt like they were speaking another language. I grew up in Texas, spent my 20s in Pennsylvania and North Carolina, and didn’t know anyone who worked in tech. I’d never taken a statistics class and just thought of probability as real analysis on a space of measure 1. I didn’t know anything about how start-ups and tech companies worked. The first time I interviewed at a start-up, one interviewer boasted about how the company had briefly achieved profitability before embarking on rapid expansion/hiring. “You mean this company isn’t profitable!?!?” I responded in horror (Yes, I actually said that out loud, in a shocked tone of voice). I now cringe in embarrassment at the memory. In another interview, I was so confused by the concept of an
“impression”
(when an internet ad is displayed) that it took me a while to even get to the logic of the question.
I’ve been here five years now, and here’s some things I wish I’d known when I was starting my career move. I’m aware that I’m white, a US citizen, had a generous fellowship in grad school and no student debt, and was single and childless at the time I decided to switch careers, and someone without these privileges will face a much tougher path. While my anecdotes should be taken with a grain of salt, I hope that some of these suggestions turn out to be helpful to you:
Becoming ready for a move to data science
-
Most importantly: find ways to work whatever you want to learn into your current job. Find a project that involves more coding/data analysis and that would be helpful to your employer. Take any boring task you do and try to automate it. Even if the process of automation makes it take 5x as long (and even if you only do the task once!), you are learning by doing this.
-
Analyze any data you have: from research for an upcoming purchase (i.e. deciding which microwave to buy), data from a personal fitness tracker, nutrition data from recipes you’re cooking, pre-schools you’re looking at for your child. Turn it into a mini-data analysis project and write it up in a blog post. E.g. if you are a graduate student, you could analyze grade data from the students you are teaching