Pragmatic Works Nerd News

3 Key Factors to Any Successful Data Science Project

Written by Brad Gall | Jul 06, 2018

Are you in the process of or looking to implement data science projects in your organization? If you’re just starting out, today I’d like to give you the 3 key factors to make any data science project successful.

1. Ask a sharp question of your data. It’s imperative to ask a question that has a very specific answer to it for our model to be able to give us that specific answer. In other words, ask an obscure question and you’ll get an obscure answer.

For example, in a customer churn scenario, we can ask ‘Is this customer going to cancel their subscription in the next 3 months?’ There is a specific answer here that the model can determine and give back to us. If you make it more obscure, the model may get confused and it won’t be as accurate as you’d like.

2. Prepare your data. I’m sure you’ve heard of ‘garbage in, garbage out’, right? This applies to a data science or machine learning project as well. The data coming in needs to be as clean as we can get it, so we can pass it through that model, train the model and get accurate results out.

One example is to look for columns that have rows that don’t match the type the columns should hold. If it’s primarily text type columns and we have rows with numbers that don’t make sense, that will throw the model off.

Also, get rid of missing data. If there are columns that are only 10% populated, there’s not going to be much use to our model to be able to do some predictions.

Another point in data preparation is the model needs a table of numbers and words. To run a model, we can consume all kinds of data – unstructured video or audio files or maybe determine sentiment that goes inside of those for instance. What we need to do in the model layer is take that unstructured data and somehow map it into a table, so we can do analysis on it, train our models and produce accurate models for predicting outcomes.

We also need to create features that are going to best help answer our question. For instance, we may have a couple columns in our data set, maybe a start and end time, but really the column that helps us predict or answer the question would be the duration between these two.

Features is just a calculation between multiple columns in our data set that give us the exact number or word that we’re looking for to run through our model and to train it and then be able to answer questions of that.

3. The last step is to create and train a model that can answer your question. After all the work in steps one and two, we need to pick a model and train it with some of that data, preferably some historical data that we have with those answers in them, and then create a model that we can pass data to answer questions moving forward.

So, focus on these key factors; put that model into use, get some ROI on that, which will then turn it into a successful project. If you have more questions about data science projects or how you may be able to execute them in your organization, you’re in the right place. Click the link below or contact us—we’d love to help.