Pragmatic Works

Enabling your business intelligence enterprise.
Welcome to Pragmatic Works Sign in | Join | Help
in Search

Brian Knight

Customer Labs: SQL Server Data Mining Bagging

Recently I had a client who was receiving very good performance out of the decision tree algorithm on the top cases but the predictions in the middle of the cases left room for improvement. We tried each of the other algorithms like Logical Regression and Naive Baines but each had its own flaws. In all cases, the customer loved parts of each of the four primary SQL Server data mining algorithms but each left some key down at the bottom of the pile or promoted cases too aggressively.

Enter a concept called data mining bagging. With data mining bagging, you can take all four algorithms that give you reasonable performance and combine their score into a more level prediction. In this customer’s case, we choose the top four algorithms for their data: Decision Tree, Clustering, Naïve Baines and Logical Regression. We were trying to predict the probability of a salesperson wanting to chase a given customer based on the odds of them getting the project. All the salespeople voted whether this was a good customer or not and these votes were all introduced as evidence.

So the end game was this. We took all four algorithms and ran their respective queries in SSIS, loading a table for each algorithm of the customer and probability of it being a good customer (score). Then, we bagged the results in SSIS by taking the four tables and averaging their score. The result improved the already splendid results from data mining by 40%.

By bagging your data, you ensure that one week model doesn’t inappropriately grade a case. It ensures that each algorithm essentially votes for each case and the more models that vote for that case, truly pushes it up. It also helps you get past the weakness of each model inherently will have. After seeing these results in SQL Server Analysis Services, I would highly recommend using bagging on any future projects.

-- Brian Knight

Comments

No Comments
Powered by Community Server (Non-Commercial Edition), by Telligent Systems