Data Mining For Dummies
Data Mining for Dummies takes you step-by-step through a real-world data-mining project using open-source tools that allow you to get immediate hands-on experience working with large amounts of data. You'll gain the confidence you need to start making data mining practices a routine part of your successful business. If you're serious about doing everything you can to push your company to the top, Data Mining for Dummies is your ticket to effective data mining.
Data Mining For Dummies
Catching the Data-Mining Train
You've picked an exciting moment to become a data miner.
By some estimates, more than 15 exabytes of new data are now produced each year. How much is that? It's really, ridiculously big - that's how much! Why is this important? Most organizations have access to only a teeny, tiny fraction of that data, and they aren't getting much value from what they have.
Data can be a valuable resource for business, government, and nonprofit organizations, but quantity isn't what's important about it. A greater quantity of data does not guarantee better understanding or competitive advantage. In fact, used well, a little bit of relevant data provides more value than any poorly used gargantuan database. As a data miner, it's your mission to make the most of the data you have.
This chapter goes over the basics of data mining. Here I explain what data miners do and the tools and methods they use to do it.
Getting Real about Data Mining
Maybe you've heard news reports or ads hinting that all you need to make valuable information pop out like magic is a big database and the latest software. That's nonsense. Data miners have to work and think to make valuable discoveries.
Maybe you've heard that to get results out of your database, you must first hire one of a special breed of people who have nearly super-human knowledge of data, people known to be very expensive, nearly impossible to find, and absolutely necessary to your success. That's nonsense, too. Data miners are ordinary, motivated people who complement their business knowledge with the fundamentals of data analysis.
Data mining is not magic and not art. It's a craft, one that mere mortals learn every day. You can find out about it, too.
Not your professor's statistics
Perhaps you took a class in statistics a long time ago and felt overwhelmed by the professor's insistence on rigorous methods. Relax. You're out to find information to support everyday business decisions, and many everyday business problems can be solved using less formal analysis methods than the ones you learned at school. Give yourself some slack.
How do you give yourself slack? By data mining, that's how.
Data mining is the way that ordinary businesspeople use a range of data analysis techniques to uncover useful information from data and put that information into practical use. Data miners use tools designed to help the work go quickly. They don't fuss over theory and assumptions. They validate their discoveries by testing. And they understand that things change, so when the discovery that worked like a charm yesterday doesn't hold up today, they adapt.
The value of data mining
Business managers already have desks piled high with reports. Some have access to computer dashboards that let them see their data in myriad segments and summaries. Can data mining really add value? It can.
Typical business reports provide summaries of what has happened in the past. They don't offer much, if anything, to help you understand why those things happened, or how you might influence what will happen next.
Data mining is different.
Here are examples of information that has been uncovered through data mining:
A retailer discovered that loyalty program sign-ups could be used to identify which customers were most likely to spend a lot and which would spend a little over time, based on just the information gathered on the customer's first visit. This information enabled the retailer to focus marketing investment on the high spenders to maximize revenue and reduce marketing costs.
A manufacturer discovered a sequence of events that preceded accidental releases of toxic materials. This inf