**Classification **

In pharma, health care, economics and other fields, classification plays an important role. As huge data repositories exist in this domains and are used for planning and innovation. Patterns are identified and analyzed for forecasts and prediction.

A good example for classification algorithm is analysis of X-ray images. The labels are assigned for disease characteristics such as tumors and others. The label values can be yes and no. Image analysis of X-rays help in bringing down time for analysis of X-ray images.

The classifier executes the classification algorithm which is of high speed and precise. The training data set is selected which is small to start with and covers all the parameters which are features of the model for X-ray images which are X-ray parameters such as bones, head, and other body parts.

The learning patterns can be different for the training to happen. A feature vector is used for the classification algorithm. The vector has representations of the features in numeric form. Let us say the goal is to classify the images of dogs into different classes based on a set of features. The feature vector will consist of size, appearance, and purpose and hair color.

The techniques used for the classification are presented in the next section.

**Linear Regression**

Linear regression is a classifying technique where the relationships between the parameters observed are modeled. The observed parameters are numerically fit to a line using a simple linear regression. The line is drawn for best fitting or closest to the point.

In a scenario where a group of values is labeled Yes which is of value 1 and other label No of value 0. The linear regression might fail in classifying which is shown in the picture below.

**Perceptron**

A binary classifier is an algorithm which is referred to as a perceptron. The input data which is based on binary classification is used by the algorithm. The output is a linear partition of data from one class to another. Binary classifier labels the data elements are boolean such as yes or no.

**Naive Bayes Classifier**

A Naive Bayes classifier is based on bayes theorem. According to the bayes theorem, the probability of an event C happening, given that D has occurred can be calculated. D is the evidence for C happening and C is referred to as hypothesis. The predictor and the features are independent to each other. The algorithm is referred to as naive because one feature does not affect the other one.

The bayesian posterior probability is dependent on prior occurrence, likelihood and evidence (data).

P(C/D) = (P(D/C) P(C) ) / P(D)

To give an example for Naive Bayes classification, let us look at objects which need to classified based on color YELLOW or BLUE. A new objects need to be classified based when they come. The class label need to be applied based on the objects which left. We look at number of BLUE objects which left versus YELLOW. Let us say there are thrice as many BLUE objects versus YELLOW. The new case is thrice as likely to have BLUE label versus YELLOW. Bayesian analysis refers to this technique as prior probability. The previous observations decide the prior probabilities and the BLUE and YELLOW percentages are used for prediction of outcomes.

Let us say total of 80 objects, 60 of which are BLUE and 20 are YELLOW. The prior probabilities of class membership are :

Prior probability for BLUE = 60/80 = 3/4 = 0.75

Prior probability for YELLOW = 20/80 = 1/4 = 0.25

A new object (White circle) need to be classified as shown in the picture above. The naive bayes classifier uses a priori probabilities for likelihood of this new object. The number of points are used for

calculating the probability of the new object being BLUE or GREEN. The likelihood of the object given YELLOW is higher than likelihood of BLUE.

We look at the circle around the white object to check how many BLUE and YELLOW objects are there. The circle has 5 BLUE and 10 YELLOW objects. The membership of the new white object depends on the data presented and the number of YELLOW and BLUE objects which came and left in the system.

Let us take an example for using Naive Bayesian Classification. A deck of cards consist of 52 cards. The goal is to find the probability of the card being a Queen.

Total number of cards in the deck are 52. The total number of Queen cards is 4. The probability of a card being a Queen card is :

P(Queen) = 4/52 = 1/13

The probability of a card being a Queen given that the card has a face on it can be calculated using bayes theorem.

P(Queen/Face) = (P(Face/Queen) P (Queen))/ P(Face)

The probability of the card is a queen given it has a face.

P(Face/Queen) = 1

P(Queen) = 1/13

P(Face) = 9/52

P(Queen/Face) = (1 x (1/13)) / (9/52) = 4/9

**Decision Trees**

Decision tree is used for representing the classified groups. Among the supervised classification learning methods, decision tree learning is very popular method. The features are from domains which are finite and discrete. Class is the term for classified domain element. The tree which is used for labelling the input feature which is a non leaf node. The feature values are labelled by the arcs generating out of a node. The tree leaves are labeled with probability values of the classes.

The features which are from the tree have values assigned on the arcs. The algorithm stops when the leaf is classified.