Skip to main content

Getting started with Machine learning?

Getting started with Machine learning?


In this article we know about the basics of Machine Learning.We will also discuss about most common and most frequently used terminologies in Machine learning that will help us in getting started with Machine Learning.
Let's start

What is Machine Learning?

Actually Machine Learning is a computer science field that enables the computer or any other machine to learn without explicit programming. The main focus of machine learning is to provide algorithms that can be trained to accomplish a task. It is subset of artificial intelligence. Machine learning algorithms create a mathematical model based on sample data, known as "training data," so that predictions or decisions can be made without explicit programming for the task.

What are the Types of Machine Learning problems?

The most common classification of Machine Learning problems includes:
  • Supervised Learning: The majority of practical machine learning problems uses supervised learning algorithms Supervised learning is type of learning  where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.
                    Y = f(X)
Our aim is to train this mapping function so well that when we supply a new input data (x) to the function we can predict the output variables (Y) for that data. It is called supervised learning because the process of algorithm learning from the training dataset can be taken as a instructor supervising the learning process. We know the correct answers; the algorithm iteratively makes predictions on the training data and is corrected by the instructor.

Supervised learning problems can be further grouped into regression and classification problems.
  • Classification: A classification problem is when the output variable is a certain class of data, such as “red” or “blue” or “male” and “female”.
  • Regression: A regression problem is when the output is a real value, such as “price of house” or “weight”.
Some of the popular examples of supervised machine learning algorithms are:
  1. Linear and polynomial regression for regression problems
  2. Decision Trees and Random forest for classification and regression problems.
  3. Support vector machines for classification problems.
  4. Artificial Neural Networks
  5. KNN
  • Unsupervised Learning: Unsupervised learning is where you only have input data (X) but no corresponding output variables commonly called labels. The main aim for unsupervised learning is to model the data in order to learn more about the data. Unlike supervised learning there is no instructor means we don’t have corresponding output label for input data.

Unsupervised learning problems can be further grouped into clustering and association problems.
  • Clustering: A clustering problem is where you make several clusters/groups of feature sets based on their similarities in behavior.
  •   Association:  An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
Some popular examples of unsupervised learning algorithms are:
  1.  k-means and hierarchical for clustering problems.
  2. Apriori algorithm for association rule learning problems.
    Supervised and unsupervised learning
    Supervised and unsupervised learning
  •  Semi-supervised learning: Semi-supervised learning problems includes the problems where you have sufficiently large amount of input data also called "Training Data" and only some of the data is labeled . These types of problems lies in between both supervised and unsupervised learning. For example, a photo archive where only some of the images are labeled, (e.g. dog, cat, person) and the majority are unlabeled.
  • Reinforcement learning:  Reinforcement Learning certain special algorithms  that includes computer program that interacts with a external environment in which it is performing a certain goal. The algorithm contains feedback mechanism in terms of rewards and punishments as it navigates its problem space. In formal terms reinforcement learning is a method of machine learning wherein the agent(Sometimes called Model) learns to perform certain action in an environment which lead it towards maximum reward. It does so by exploring and exploiting knowledge that it learns through repeated trials to maximize the reward.
    Reinforcement Learning
    Reinforcement Learning
While learning any thing new, learner faces many difficulties. One of them that disturbs a learner is not knowing technical terminologies related to that subject matter. In Machine Learning, there are also such technical terms that bothers learners a lot if they does not know about them. So let's see what are the most common and technical terminologies that  a learner should know about before he/she go through any materials.

Some of the most common and most frequently used terminologies in Machine learning:
  • Algorithm: step wise procedures for solving a problem
  • Attribute: Also called feature or field or variables or classes,  class label that defines the class in which given data belongs to.
  • Label: They are the final output classes.
  • Dimension: The number of features is called dimension
  • Model: It is the mathematical expression obtained from the processing of real world data.
  • Training: It is process of generating model by passing 'Training data' into different algorithms. Sometimes it is also called 'Learning'.
  • Testing: It is process of predicting results from machine learning model by passing 'Testing data' to it.
  • Training Data: It is the data set on the basic of which a learning model is made.
  • Testing Data: It is the data whose output is to be predicted by passing to the model.
  • Target: The output of the input variables
  • Regression: They are the techniques used to predict the real numeric values
  • Classification: Categorizing the data into predefined classes.
  • Over fitting: A model is said to be over fitted if it highly fits on training data but gives poor prediction for new input data.
  • Under fitting: It is just opposite of over fitting.
  • Regularization: Regularization is the method to estimate a preferred complexity of the machine learning model so that the model generalizes and the over-fit/under-fit problem is avoided.
  • Hyper-Parameter: They are the parameters in Machine Learning whose value is set before the learning process begin.

These are some basic and most common terminologies for understanding any materials on machine learning. However there are still many other terminologies that we will cover in respective topics in further articles.


Also Read- Data Pre-processing
Also visit - Kraj Education

Comments

Popular posts from this blog

Understanding KNN(K-nearest neighbor) with example

Understanding KNN(K-nearest neighbor) with example.  It is probably, one of the simplest but strong supervised learning algorithms used for classification as well regression purposes. It is most commonly used to classify the data points that are separated into several classes, in order to make prediction for new sample data points. It is a non-parametric and lazy learning algorithm. It classifies the data points based on the similarity measure (e.g. distance measures, mostly Euclidean distance). Assumption of KNN : K- NN algorithm is based on the principle that, “the similar things exist closer to each other or Like things are near to each other.” In this algorithm ‘K’ refers to the number of neighbors to consider for classification. It should be odd value.  The value of ‘K’ must be selected carefully otherwise it may cause defects in our model. If the value of ‘K’ is small then it causes Low Bias, High variance i.e. over fitting of model. In the same way if ‘K’ is very large then it l

What are various Data Pre-Processing techniques? What is the importance of data pre-processing?

What is Data Pre-Processing? What is the importance of data pre-processing? The real-world data are susceptible to high noise, contains missing values and a lot of vague information, and is of large size. These factors cause degradation of quality of data. And if the data is of low quality, then the result obtained after the mining or modeling of data is also of low quality. So, before mining or modeling the data, it must be passed through the series of quality upgrading techniques called data pre-processing. Thus, data pre-processing can be defined as the process of applying various techniques over the raw data (or low quality data) in order to make it suitable for processing purposes (i.e. mining or modeling). What are the various Data Pre-Processing Techniques? Fig: Methods of Data Pre-Processing source: Fotolia Once we know what data pre-processing actually does, the question might arise how is data processing done? Or how it all happens? The answer is obvious; there are series o

Supervised Machine Learning

Supervised Machine Learning What Is Supervised Learning?  It is the machine learning algorithm that learns from labeled data. After the data is analyzed and learned, the algorithm determines which label should be given to new data supplied by the user based on pattern and associating the patterns to the unlabeled new data. Supervised Learning algorithm has two categories i.e Classification & Regression Classification predicts the class or category in which the data belongs to. e.g.: Spam filtering and detection, Churn Prediction, Sentiment Analysis, image classification. Regression predicts a numerical value based on previously observed data. e.g.: House Price Prediction, Stock Price Prediction. Classification Classification is one of the widely and mostly used techniques for determining class the dependent belongs to base on the one or more independent variables. For simple understanding, what classification algorithm does is it simply makes a decision boundary between data points