Dataminig , Datawarehouse .A process of extracting patterns from data

March 07, 2015

Dataminig , Datawarehouse .A process of extracting patterns from data

Resultado de imagen para datamining images

Health care is probably the largest, at times the most expensive, business on earth.

However, there are a lot of diseases and conditions that can be diagnosed

even before their symptoms appear.

Yes, this is possible through extensive data mining and prediction techniques

and this gives rise to an area called preventive health care.

Data mining is the science of retrieving knowledge from huge volumes of raw

and uninterpretable data. These data belong to medical records of patients,

which can be quite a lot when years of data is processed.

In such cases, efficient data mining techniques and knowledge

discovery approaches comes to a rescue.

Data mining is the process of extracting patterns from data.

Data mining is becoming an increasingly important tool

to transform this data into information

. It is commonly used in a wide range of profiling practices, such as

marketing, surveillance, fraud detection and scientific discovery.

Data Mining Techniques An Introduction to Data Mining

Data mining is the process of extracting patterns from data.

Data mining is becoming an increasingly important tool

to transform this data into information. I

t is commonly used in a wide range of profiling practices, such as marketing,

surveillance, fraud detection and scientific discovery.

Data mining can be used to uncover patterns in data but is often

carried out only on samples of data.

The mining process will be ineffective if the samples are not a good representation

of the larger body of data.

Data mining cannot discover patterns that may be present in the larger body of data

if those patterns are not present in the sample being "mined". Inability to find patterns

may become a cause for some disputes between customers and service providers.

Therefore data mining is not foolproof but may be useful if sufficiently representative

data samples are collected. The discovery of a particular pattern in a particular

set of data does not necessarily mean that a pattern is found elsewhere in the larger data

from which that sample was drawn.

An important part of the process is the verification and validation

of patterns on other samples of data.

The related terms data dredging, data fishing and data snooping refer to the use of data

mining techniques to sample sizes that are (or may be) too small for statistical inferences

to be made about the validity of any patterns discovered (see also data-snooping bias).

Data dredging may, however, be used to develop new hypotheses,

which must then be validated with sufficiently large sample sets.

Continuous Innovation

Although data mining is a relatively new term, the technology is not.

Companies have used powerful computers to sift through volumes of

supermarket scanner data and analyze market research reports for years.

However, continuous innovations in computer processing power, disk storage,

and statistical software are dramatically increasing the accuracy of analysis

while driving down the cost.

Example:

Data Mining Techniques

An Introduction to Data Mining

Data mining is the process of extracting patterns from data.

Data mining is becoming an increasingly important tool

to transform this data into information.

It is commonly used in a wide range of profiling practices,

such as marketing, surveillance, fraud detection and scientific discovery.

Data mining can be used to uncover patterns in data but is often carried out

only on samples of data.

The mining process will be ineffective if the samples are not a good representation

of the larger body of data.

Data mining cannot discover patterns that may be present in the larger

body of data if those patterns are not present in the sample being "mined".

Inability to find patterns may become a cause for some disputes

between customers and service providers.

Therefore data mining is not foolproof but may be useful if sufficiently representative

data samples are collected.

The discovery of a particular pattern in a particular set of data does not necessarily

mean that a pattern is found elsewhere in the larger data from which

that sample was drawn.

An important part of the process is the verification and validation of

patterns on other samples of data.

The related terms data dredging, data fishing and data snooping refer to the use

of data mining techniques to sample sizes that are (or may be) too small

for statistical inferences to be made about the validity of any patterns discovered

(see also data-snooping bias). Data dredging may, however, be used to develop

new hypotheses, which must then be validated with sufficiently large sample sets.

Data Mining an Overview

Generally, data mining (sometimes called data or knowledge discovery)

is the process of analyzing data from different perspectives and summarizing it

into useful information - information that can be used to increase revenue,

cuts costs, or both. Data mining software is one of a number of analytical tools

for analyzing data.

It allows users to analyze data from many different dimensions or angles,

categorize it, and summarize the relationships identified. Technically,

data mining is the process of finding correlations or patterns among dozens

of fields in large relational databases.

Continuous Innovation

Although data mining is a relatively new term, the technology is not.

Companies have used powerful computers to sift through volumes

of supermarket scanner data and analyze market research reports for years

. However, continuous innovations in computer processing power,

disk storage, and statistical software are dramatically increasing

the accuracy of analysis while driving down the cost.

Example

For example, one Midwest grocery chain used the data mining capacity

of Oracle software to analyze local buying patterns.

They discovered that when men bought diapers on Thursdays and Saturdays,

they also tended to buy beer. Further analysis showed that these shoppers

typically did their weekly grocery shopping on Saturdays.

On Thursdays, however, they only bought a few items.

The retailer concluded that they purchased the beer to have

it available for the upcoming weekend.

The grocery chain could use this newly discovered information in various ways

to increase revenue.

For example, they could move the beer display closer to the diaper display.

And, they could make sure beer and diapers were sold at full price on Thursdays.

Data, Information, and Knowledge

Data

Data are any facts, numbers, or text that can be processed by a computer.

Today, organizations are accumulating vast and growing amounts of data

in different formats and different databases. This includes:

operational or transactional data such as, sales, cost, inventory, payroll, and accounting

nonoperational data, such as industry sales, forecast data, and macro economic data

meta data - data about the data itself, such as logical database design

or data dictionary definitions

Information

The patterns, associations, or relationships among all this data can provide information.

For example, analysis of retail point of sale transaction data

can yield information on which products are selling and when.

Knowledge

Information can be converted into knowledge about historical patterns and future trends

. For example, summary information on retail supermarket sales

can be analyzed in light of promotional efforts to provide knowledge of consumer

buying behavior. Thus, a manufacturer or retailer could determine which items

are most susceptible to promotional efforts.

Data Warehouses

Dramatic advances in data capture, processing power, data transmission,

and storage capabilities are enabling organizations to integrate their various databases

into data warehouses.

Data warehousing is defined as a process of centralized data management and retrieval.

Data warehousing, like data mining, is a relatively new term although the concept itself

has been around for years.

Data warehousing represents an ideal vision of maintaining a central

repository of all organizational data. Centralization of data is needed

to maximize user access and analysis.

Dramatic technological advances are making this vision a reality for many companies.

And, equally dramatic advances in data analysis software are allowing users to access

this data freely. The data analysis software is what supports data mining.

What can data mining do?

Data mining is primarily used today by companies with a strong consumer focus -

retail, financial, communication, and marketing organizations.

It enables these companies to determine relationships among "internal" factors

such as price, product positioning, or staff skills, and "external" factors such

as economic indicators, competition, and customer demographics.

And, it enables them to determine the impact on sales, customer satisfaction,

and corporate profits. Finally, it enables them to "drill down"

into summary information to view detail transactional data.

With data mining, a retailer could use point-of-sale records of

customer purchases to send targeted promotions

based on an individual's purchase history.

By mining demographic data from comment or warranty cards,

the retailer could develop products and promotions to appeal to specific customer segments.

WalMart is pioneering massive data mining to transform its supplier relationships.

WalMart captures point-of-sale transactions from over 2,900 stores in 6 countries

and continuously transmits this data to its massive 7.5 terabyte Teradata data warehouse

. WalMart allows more than 3,500 suppliers,

to access data on their products and perform data analyses.

These suppliers use this data to identify customer buying patterns at the store display level.

They use this information to manage local store inventory and i

dentify new merchandising opportunities. In 1995, WalMart computers processed over 1 million

The National Basketball Association (NBA) is exploring a data mining application

that can be used in conjunction with image recordings of basketball games.

The Advanced Scout software analyzes the movements of players

to help coaches orchestrate plays and strategies.

For example, an analysis of the play-by-play sheet

of the game played between the New York Knicks and the Cleveland Cavaliers

on January 6, 1995 reveals that when Mark Price played the Guard position,

John Williams attempted four jump shots and made each one!

Advanced Scout not only finds this pattern,

but explains that it is interesting because

it differs considerably from the average shooting percentage

of 49.30% for the Cavaliers during that game.

By using the NBA universal clock, a coach can automatically bring up

the video clips showing each of the jump shots attempted by Williams

with Price on the floor, without needing to comb through hours of video footage.

Those clips show a very successful pick-and-roll play in which Price

draws the Knick's defense and then finds Williams for an open jump shot.

Data Mining Techniques

An Introduction to Data Mining

Data mining is the process of extracting patterns from data.

Data mining is becoming an increasingly important tool to transform

this data into information. It is commonly used in a wide range of profiling practices,

such as marketing, surveillance, fraud detection and scientific discovery.

Data mining can be used to uncover patterns in data but is often carried out

only on samples of data. The mining process will be ineffective if the samples

are not a good representation of the larger body of data.

Data mining cannot discover patterns that may be present

in the larger body of data if those patterns are not present in

the sample being "mined". Inability to find patterns may become

a cause for some disputes between customers and service providers.

Therefore data mining is not foolproof but may be useful if sufficiently representative

data samples are collected.

The discovery of a particular pattern in a particular set of data

does not necessarily mean that a pattern is found elsewhere

in the larger data from which that sample was drawn.

An important part of the process is the verification and validation of patterns

on other samples of data.

The related terms data dredging, data fishing and data snooping refer

to the use of data mining techniques to sample sizes that are (or may be) too small

for statistical inferences to be made about the validity of any patterns discovered

(see also data-snooping bias).

Data dredging may, however, be used to develop new hypotheses,

which must then be validated with sufficiently large sample sets.

Data Mining an Overview

Generally, data mining (sometimes called data or knowledge discovery)

is the process of analyzing data from different perspectives and summarizing it

into useful information - information that can be used to increase revenue, cuts costs, or both.

Data mining software is one of a number of analytical tools for analyzing data

. It allows users to analyze data from many different dimensions or angles, categorize it,

and summarize the relationships identified.

Technically, data mining is the process of finding correlations

or patterns among dozens of fields in large relational databases.

Continuous Innovation

Although data mining is a relatively new term, the technology is not.

Companies have used powerful computers to sift through volumes of

supermarket scanner data and analyze market research reports for years.

However, continuous innovations in computer processing power, disk storage,

and statistical software are dramatically increasing the accuracy of analysis

while driving down the cost.

Data mining consists of five major elements

#1. Extract, transform, and load transaction data onto the data warehouse system.

#2. Store and manage the data in a multidimensional database system.

#3. Provide data access to business analysts and information technology professionals.

#4. Analyze the data by application software.

#5. Present the data in a useful format, such as a graph or table.

Different levels of analysis are available:

Artificial neural networks: Non-linear predictive models

that learn through training and resemble biological neural networks in structure.

Genetic algorithms: Optimization techniques that use processes such as genetic

combination, mutation, and natural selection in a design based on the concepts

of natural evolution.

What technological infrastructure is required?

Today, data mining applications are available on all size systems for mainframe,

client/server, and PC platforms. System prices range from several

thousand dollars for the smallest applications up to $1 million a terabyte

for the largest. Enterprise-wide applications generally range in

size from 10 gigabytes to over 11 terabytes. NCR has the capacity to deliver

applications exceeding 100 terabytes. There are two critical technological driver

Search This Blog

Speak Out Thinking Free

Dataminig , Datawarehouse .A process of extracting patterns from data

Comments

Popular Posts

Yesterday , Imagine a world as One ,Remembering Juan Alberto Badia

Who the San Antonio Spurs Are To Me