Details view: Data Pre-processing

comments

Respond
Edit
- Edit article
- Delete article
Share
View
- Graph
  - Explorer
    
    Focus
    Down
    
    Load 1 level
    Load 2 levels
    Load 3 levels
    Load 4 levels
    Load all levels
    
    All
  - Dagre
    
    Focus
    Down
    
    Load 1 level
    Load 2 levels
    Load 3 levels
    Load 4 level
    Load all levels
    
    All
- Tree
  - SpaceTree
    
    Focus
    Expanding
    
    Load 1 level
    Load 2 levels
    Load 3 levels
    
    Down
    All
    Down
  - Radial
    
    Focus
    Expanding
    
    Load 1 level
    Load 2 levels
    Load 3 levels
    
    Down
    All
    Down
  - Box
    
    Focus
    Expanding
    Down
    Up
    All
    Down
- Article ✓
- Outline
- Document
  - Down
  - All
- Page
- Canvas
- Time
  - Timeline
  - Calendar
Updates
Contact us

Data Pre-processing

Data pre-processing is an important step in the data mining process. The phrase "garbage in, garbage out" is particularly applicable to data mining and machine learning projects. Data-gathering methods are often loosely controlled, resulting in out-of-range values (e.g., Income: −100), impossible data combinations (e.g., Sex: Male, Pregnant: Yes), missing values, etc.

Data pre-processing is an important step in the data mining process. The phrase "garbage in, garbage out" is particularly applicable to data mining and machine learning projects. Data-gathering methods are often loosely controlled, resulting in out-of-rangevalues (e.g., Income: −100), impossible data combinations (e.g., Sex: Male, Pregnant: Yes), missing values, etc. Analyzing data that has not been carefully screened for such problems can produce misleading results. Thus, the representation and quality of datais first and foremost before running an analysis.^[1]

If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. Data preparation and filtering steps can take considerable amount of processing time. Data pre-processing includes cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. Kotsiantis et al. (2006) present a well-known algorithm for each step of data pre-processing.^[2]

References[edit]

Jump up^ Pyle, D., 1999. Data Preparation for Data Mining. Morgan Kaufmann Publishers, Los Altos, California.
Jump up^ S. Kotsiantis, D. Kanellopoulos, P. Pintelas, "Data Preprocessing for Supervised Leaning", International Journal of Computer Science, 2006, Vol 1 N. 2, pp 111–117.

Data Pre-processing

References[edit]

Enter task details