Dataminig , Datawarehouse .A process of extracting patterns from data
Health care is probably the largest, at times the most expensive, business on earth.
However, there are a lot of diseases and conditions that can be diagnosed
even before their symptoms appear.
Yes, this is possible through extensive data mining and prediction techniques
and this gives rise to an area called preventive health care.
Data mining is the science of retrieving knowledge from huge volumes of raw
and uninterpretable data. These data belong to medical records of patients,
which can be quite a lot when years of data is processed.
In such cases, efficient data mining techniques and knowledge
discovery approaches comes to a rescue.
Data mining is the process of extracting patterns from data.
Data mining is becoming an increasingly important tool
to transform this data into information
. It is commonly used in a wide range of profiling practices, such as
marketing, surveillance, fraud detection and scientific discovery.
Data Mining Techniques An Introduction to Data Mining
Data mining is the process of extracting patterns from data.
Data mining is becoming an increasingly important tool
to transform this data into information. I
t is commonly used in a wide range of profiling practices, such as marketing,
surveillance, fraud detection and scientific discovery.
Data mining can be used to uncover patterns in data but is often
carried out only on samples of data.
The mining process will be ineffective if the samples are not a good representation
of the larger body of data.
Data mining cannot discover patterns that may be present in the larger body of data
if those patterns are not present in the sample being "mined". Inability to find patterns
may become a cause for some disputes between customers and service providers.
Therefore data mining is not foolproof but may be useful if sufficiently representative
data samples are collected. The discovery of a particular pattern in a particular
set of data does not necessarily mean that a pattern is found elsewhere in the larger data
from which that sample was drawn.
An important part of the process is the verification and validation
of patterns on other samples of data.
The related terms data dredging, data fishing and data snooping refer to the use of data
mining techniques to sample sizes that are (or may be) too small for statistical inferences
to be made about the validity of any patterns discovered (see also data-snooping bias).
Data dredging may, however, be used to develop new hypotheses,
which must then be validated with sufficiently large sample sets.
Continuous Innovation
Although data mining is a relatively new term, the technology is not.
Companies have used powerful computers to sift through volumes of
supermarket scanner data and analyze market research reports for years.
However, continuous innovations in computer processing power, disk storage,
and statistical software are dramatically increasing the accuracy of analysis
while driving down the cost.
Example:
Data Mining Techniques
An Introduction to Data Mining
Data mining is the process of extracting patterns from data.
Data mining is becoming an increasingly important tool
to transform this data into information.
It is commonly used in a wide range of profiling practices,
such as marketing, surveillance, fraud detection and scientific discovery.
Data mining can be used to uncover patterns in data but is often carried out
only on samples of data.
The mining process will be ineffective if the samples are not a good representation
of the larger body of data.
Data mining cannot discover patterns that may be present in the larger
body of data if those patterns are not present in the sample being "mined".
Inability to find patterns may become a cause for some disputes
between customers and service providers.
Therefore data mining is not foolproof but may be useful if sufficiently representative
data samples are collected.
The discovery of a particular pattern in a particular set of data does not necessarily
mean that a pattern is found elsewhere in the larger data from which
that sample was drawn.
An important part of the process is the verification and validation of
patterns on other samples of data.
The related terms data dredging, data fishing and data snooping refer to the use
of data mining techniques to sample sizes that are (or may be) too small
for statistical inferences to be made about the validity of any patterns discovered
(see also data-snooping bias). Data dredging may, however, be used to develop
new hypotheses, which must then be validated with sufficiently large sample sets.
Data Mining an Overview
Generally, data mining (sometimes called data or knowledge discovery)
is the process of analyzing data from different perspectives and summarizing it
into useful information - information that can be used to increase revenue,
cuts costs, or both. Data mining software is one of a number of analytical tools
for analyzing data.
It allows users to analyze data from many different dimensions or angles,
categorize it, and summarize the relationships identified. Technically,
data mining is the process of finding correlations or patterns among dozens
of fields in large relational databases.
Continuous Innovation
Although data mining is a relatively new term, the technology is not.
Companies have used powerful computers to sift through volumes
of supermarket scanner data and analyze market research reports for years
. However, continuous innovations in computer processing power,
disk storage, and statistical software are dramatically increasing
the accuracy of analysis while driving down the cost.
Example
For example, one Midwest grocery chain used the data mining capacity
of Oracle software to analyze local buying patterns.
They discovered that when men bought diapers on Thursdays and Saturdays,
they also tended to buy beer. Further analysis showed that these shoppers
typically did their weekly grocery shopping on Saturdays.
On Thursdays, however, they only bought a few items.
The retailer concluded that they purchased the beer to have
it available for the upcoming weekend.
The grocery chain could use this newly discovered information in various ways
to increase revenue.
For example, they could move the beer display closer to the diaper display.
And, they could make sure beer and diapers were sold at full price on Thursdays.
Data, Information, and Knowledge
Data
Data are any facts, numbers, or text that can be processed by a computer.
Today, organizations are accumulating vast and growing amounts of data
in different formats and different databases. This includes:
operational or transactional data such as, sales, cost, inventory, payroll, and accounting
nonoperational data, such as industry sales, forecast data, and macro economic data
meta data - data about the data itself, such as logical database design
or data dictionary definitions
Information
The patterns, associations, or relationships among all this data can provide information.
For example, analysis of retail point of sale transaction data
can yield information on which products are selling and when.
Knowledge
Information can be converted into knowledge about historical patterns and future trends
. For example, summary information on retail supermarket sales
can be analyzed in light of promotional efforts to provide knowledge of consumer
buying behavior. Thus, a manufacturer or retailer could determine which items
are most susceptible to promotional efforts.
Data Warehouses
Dramatic advances in data capture, processing power, data transmission,
and storage capabilities are enabling organizations to integrate their various databases
into data warehouses.
Data warehousing is defined as a process of centralized data management and retrieval.
Data warehousing, like data mining, is a relatively new term although the concept itself
has been around for years.
Data warehousing represents an ideal vision of maintaining a central
repository of all organizational data. Centralization of data is needed
to maximize user access and analysis.
Dramatic technological advances are making this vision a reality for many companies.
And, equally dramatic advances in data analysis software are allowing users to access
this data freely. The data analysis software is what supports data mining.
What can data mining do?
Data mining is primarily used today by companies with a strong consumer focus -
retail, financial, communication, and marketing organizations.
It enables these companies to determine relationships among "internal" factors
such as price, product positioning, or staff skills, and "external" factors such
as economic indicators, competition, and customer demographics.
And, it enables them to determine the impact on sales, customer satisfaction,
and corporate profits. Finally, it enables them to "drill down"
into summary information to view detail transactional data.
With data mining, a retailer could use point-of-sale records of
customer purchases to send targeted promotions
based on an individual's purchase history.
By mining demographic data from comment or warranty cards,
the retailer could develop products and promotions to appeal to specific customer segments.
WalMart is pioneering massive data mining to transform its supplier relationships.
WalMart captures point-of-sale transactions from over 2,900 stores in 6 countries
and continuously transmits this data to its massive 7.5 terabyte Teradata data warehouse
. WalMart allows more than 3,500 suppliers,
to access data on their products and perform data analyses.
These suppliers use this data to identify customer buying patterns at the store display level.
They use this information to manage local store inventory and i
dentify new merchandising opportunities. In 1995, WalMart computers processed over 1 million
The National Basketball Association (NBA) is exploring a data mining application
that can be used in conjunction with image recordings of basketball games.
The Advanced Scout software analyzes the movements of players
to help coaches orchestrate plays and strategies.
For example, an analysis of the play-by-play sheet
of the game played between the New York Knicks and the Cleveland Cavaliers
on January 6, 1995 reveals that when Mark Price played the Guard position,
John Williams attempted four jump shots and made each one!
Advanced Scout not only finds this pattern,
but explains that it is interesting because
it differs considerably from the average shooting percentage
of 49.30% for the Cavaliers during that game.
By using the NBA universal clock, a coach can automatically bring up
the video clips showing each of the jump shots attempted by Williams
with Price on the floor, without needing to comb through hours of video footage.
Those clips show a very successful pick-and-roll play in which Price
draws the Knick's defense and then finds Williams for an open jump shot.
Data Mining Techniques
An Introduction to Data Mining
Data mining is the process of extracting patterns from data.
Data mining is becoming an increasingly important tool to transform
this data into information. It is commonly used in a wide range of profiling practices,
such as marketing, surveillance, fraud detection and scientific discovery.
Data mining can be used to uncover patterns in data but is often carried out
only on samples of data. The mining process will be ineffective if the samples
are not a good representation of the larger body of data.
Data mining cannot discover patterns that may be present
in the larger body of data if those patterns are not present in
the sample being "mined". Inability to find patterns may become
a cause for some disputes between customers and service providers.
Therefore data mining is not foolproof but may be useful if sufficiently representative
data samples are collected.
The discovery of a particular pattern in a particular set of data
does not necessarily mean that a pattern is found elsewhere
in the larger data from which that sample was drawn.
An important part of the process is the verification and validation of patterns
on other samples of data.
The related terms data dredging, data fishing and data snooping refer
to the use of data mining techniques to sample sizes that are (or may be) too small
for statistical inferences to be made about the validity of any patterns discovered
(see also data-snooping bias).
Data dredging may, however, be used to develop new hypotheses,
which must then be validated with sufficiently large sample sets.
Data Mining an Overview
Generally, data mining (sometimes called data or knowledge discovery)
is the process of analyzing data from different perspectives and summarizing it
into useful information - information that can be used to increase revenue, cuts costs, or both.
Data mining software is one of a number of analytical tools for analyzing data
. It allows users to analyze data from many different dimensions or angles, categorize it,
and summarize the relationships identified.
Technically, data mining is the process of finding correlations
or patterns among dozens of fields in large relational databases.
Continuous Innovation
Although data mining is a relatively new term, the technology is not.
Companies have used powerful computers to sift through volumes of
supermarket scanner data and analyze market research reports for years.
However, continuous innovations in computer processing power, disk storage,
and statistical software are dramatically increasing the accuracy of analysis
while driving down the cost.
Data mining consists of five major elements
#1. Extract, transform, and load transaction data onto the data warehouse system.
#2. Store and manage the data in a multidimensional database system.
#3. Provide data access to business analysts and information technology professionals.
#4. Analyze the data by application software.
#5. Present the data in a useful format, such as a graph or table.
Different levels of analysis are available:
Artificial neural networks: Non-linear predictive models
that learn through training and resemble biological neural networks in structure.
Genetic algorithms: Optimization techniques that use processes such as genetic
combination, mutation, and natural selection in a design based on the concepts
of natural evolution.
What technological infrastructure is required?
Today, data mining applications are available on all size systems for mainframe,
client/server, and PC platforms. System prices range from several
thousand dollars for the smallest applications up to $1 million a terabyte
for the largest. Enterprise-wide applications generally range in
size from 10 gigabytes to over 11 terabytes. NCR has the capacity to deliver
applications exceeding 100 terabytes. There are two critical technological driver
Comments