Getting started with Machine learning?

In this article we know about the basics of Machine Learning.We will also discuss about most common and most frequently used terminologies in Machine learning that will help us in getting started with Machine Learning.
Let's start

What is Machine Learning?

Actually Machine Learning is a computer science field that enables the computer or any other machine to learn without explicit programming. The main focus of machine learning is to provide algorithms that can be trained to accomplish a task. It is subset of artificial intelligence. Machine learning algorithms create a mathematical model based on sample data, known as "training data," so that predictions or decisions can be made without explicit programming for the task.

What are the Types of Machine Learning problems?

The most common classification of Machine Learning problems includes:

Supervised Learning: The majority of practical machine learning problems uses supervised learning algorithms Supervised learning is type of learning where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.

Y = f(X)
Our aim is to train this mapping function so well that when we supply a new input data (x) to the function we can predict the output variables (Y) for that data. It is called supervised learning because the process of algorithm learning from the training dataset can be taken as a instructor supervising the learning process. We know the correct answers; the algorithm iteratively makes predictions on the training data and is corrected by the instructor.

Supervised learning problems can be further grouped into regression and classification problems.

Classification: A classification problem is when the output variable is a certain class of data, such as “red” or “blue” or “male” and “female”.
Regression: A regression problem is when the output is a real value, such as “price of house” or “weight”.

Some of the popular examples of supervised machine learning algorithms are:

Linear and polynomial regression for regression problems
Decision Trees and Random forest for classification and regression problems.
Support vector machines for classification problems.
Artificial Neural Networks
KNN

Unsupervised Learning: Unsupervised learning is where you only have input data (X) but no corresponding output variables commonly called labels. The main aim for unsupervised learning is to model the data in order to learn more about the data. Unlike supervised learning there is no instructor means we don’t have corresponding output label for input data.

Unsupervised learning problems can be further grouped into clustering and association problems.

Clustering: A clustering problem is where you make several clusters/groups of feature sets based on their similarities in behavior.
Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.

Some popular examples of unsupervised learning algorithms are:

k-means and hierarchical for clustering problems.
Apriori algorithm for association rule learning problems.

Supervised and unsupervised learning

Semi-supervised learning: Semi-supervised learning problems includes the problems where you have sufficiently large amount of input data also called "Training Data" and only some of the data is labeled . These types of problems lies in between both supervised and unsupervised learning. For example, a photo archive where only some of the images are labeled, (e.g. dog, cat, person) and the majority are unlabeled.

Reinforcement learning: Reinforcement Learning certain special algorithms that includes computer program that interacts with a external environment in which it is performing a certain goal. The algorithm contains feedback mechanism in terms of rewards and punishments as it navigates its problem space. In formal terms reinforcement learning is a method of machine learning wherein the agent(Sometimes called Model) learns to perform certain action in an environment which lead it towards maximum reward. It does so by exploring and exploiting knowledge that it learns through repeated trials to maximize the reward.

Reinforcement Learning

While learning any thing new, learner faces many difficulties. One of them that disturbs a learner is not knowing technical terminologies related to that subject matter. In Machine Learning, there are also such technical terms that bothers learners a lot if they does not know about them. So let's see what are the most common and technical terminologies that a learner should know about before he/she go through any materials.

Some of the most common and most frequently used terminologies in Machine learning:

Algorithm: step wise procedures for solving a problem
Attribute: Also called feature or field or variables or classes, class label that defines the class in which given data belongs to.
Label: They are the final output classes.
Dimension: The number of features is called dimension
Model: It is the mathematical expression obtained from the processing of real world data.
Training: It is process of generating model by passing 'Training data' into different algorithms. Sometimes it is also called 'Learning'.
Testing: It is process of predicting results from machine learning model by passing 'Testing data' to it.
Training Data: It is the data set on the basic of which a learning model is made.
Testing Data: It is the data whose output is to be predicted by passing to the model.
Target: The output of the input variables
Regression: They are the techniques used to predict the real numeric values
Classification: Categorizing the data into predefined classes.
Over fitting: A model is said to be over fitted if it highly fits on training data but gives poor prediction for new input data.
Under fitting: It is just opposite of over fitting.
Regularization: Regularization is the method to estimate a preferred complexity of the machine learning model so that the model generalizes and the over-fit/under-fit problem is avoided.
Hyper-Parameter: They are the parameters in Machine Learning whose value is set before the learning process begin.

These are some basic and most common terminologies for understanding any materials on machine learning. However there are still many other terminologies that we will cover in respective topics in further articles.

Also Read- Data Pre-processing
Also visit - Kraj Education

Understanding KNN(K-nearest neighbor) with example

Understanding KNN(K-nearest neighbor) with example. It is probably, one of the simplest but strong supervised learning algorithms used for classification as well regression purposes. It is most commonly used to classify the data points that are separated into several classes, in order to make prediction for new sample data points. It is a non-parametric and lazy learning algorithm. It classifies the data points based on the similarity measure (e.g. distance measures, mostly Euclidean distance). Assumption of KNN : K- NN algorithm is based on the principle that, “the similar things exist closer to each other or Like things are near to each other.” In this algorithm ‘K’ refers to the number of neighbors to consider for classification. It should be odd value. The value of ‘K’ must be selected carefully otherwise it may cause defects in our model. If the value of ‘K’ is small then it causes Low Bias, High variance i.e. over fitting of model. In the same way if ‘K’ is v...

machine learning blogs

Search This Blog