Skip to main content

3 Stages of model building

3 Stages of model building in machine learning

1. Data collection, preprocessing, data cleaning

    In this step we collect the data like features and labels. 

    Data Preprocessing :  

    1. Data cleaning includes simple techniques such as outlier and missing value removal and replace new values with either mean, median in regression and mode in classification

  • Mean is a average of sequence ;  the sum of the values divided by the number of values. 
  • Median is centered value or mean of two middle values in sequence; used when there are outliers in the sequence that might skew the average of the values.
  • Mode is value that appears most often in sequence

    2. Handle or encoding categorical variables :

    Transform categorical variables into numbers ; tool : labelencoder()
    Differentiate those categories , tool : one hot encoder(), get_dummies()   

    3. Feature scaling : 

    Standardize or equalize the numeric features ; Scaling is to bring all values to same magnitude
  • Normalization ranging between 0 and 1. It is also known as Min-Max scaling.
  • Standardization :  values are centered around the mean with a unit standard deviation.

    4. Splitting the data set into training and testing set


2. Algorithm selection 

After seeing the data set we have to decide which algorithm should be chose

If we have regression problem like salary prediction, price evaluation
we use, Linear regression, Polynomial regression,  Support vector regression

If we have classification problem like salary prediction, price evaluation
we use, Logistic regression, K neighbor classifier, Scale vector classifier, Decision tree, random forest

If we have cluster problem, we use k- means algorithm

3. Evaluate model, error analysis

  • Finding accuracy of model ; Determining or evaluating model performance for regression classification problems
  • The most commonly used metric is the mean square error (MSE), root mean square error (RMSE) 
  • In classfication we use confusion matrix for model analysis 






Comments

Popular posts from this blog

Gradient Descent with RSME

Optimization Alorithms Ex : G.D. , S.D. ,  Adam, RMS prop , momentum , adelta Gradient Descent is an  optimization algorithm that find a best fit line and local minima of a differentiable function for given training data set. S imply used to find the coefficients (weights) and intercept (bias) that minimize a cost function as far as possible.  There are three types of  g radient descent techniques:   Regular Batch GD (Gradient Descent) -  Studiously descend the curve in one path towards one minima ; every hop calculates the cost function for entire training data. If training data is large, one should not use this. Random GD (Stochastic GD) -   Calculates the Cost function for only one (randomly selected) training data per hop ; tend to jump all over the place due to randomness but due to it actually jump across minima’s.  Mini Batch gradient descent - Somewhere midway between the above 2. Does the calculation for a bunch of random data poin...

Why python ? What is Python?

Python is a generally interpreted and  interactive dynamic symmetric   high-level  object oriented programming language. It is widely used in Machine Learning today. Pretty easy to understand, learn, code and explain because it has very crisp and clear syntaxes than other languages.  Guido van Rossum made Python in 1991, named his programming language after the television show Monty Python's Flying Circus. Python has got features or derived features from ABC named programming language. Interactive - The result will be printed on the screen, immediately return, in the next line as we entered. High-level - Humans can easy to interpret; the source code contains easy-to-read syntax that is later converted into a low-level language (0 , 1) Dynamic-symmetric – Don’t need to clarify the data type. It Allows the type casting. Type Casting –  We can transform the one data type in another data type Object Oriented – language is focused on Object...