There are different methods of Regression used in machine learning. The different techniques are listed below:
Non Parametric Regression
K-Nearest Neighbor Regression
The types of the regression is dependent on the number of explanatory variables such as single (simple) and multiple.
In the next section, linear regression is discussed in detail.
Linear Regression is very popular modeling method. This method consists of dependent and independent variables. Dependent variables are continuous. Independent variables are continuous and discrete. In linear regression, independent variables (Z) and dependent variables are used for identifying relationship between them. The relationship used is a straight line which is a best fit. It is also referred as linear regression.
It is represented by an equation W=mZ + c + err, where c is intercept, m is slope of the line and err is error term. To predict the value of a variable, the function W is used. The linear regression has single independent variable.
Multiple linear regression has more than independent variables. If there are more than one independent variable, multiple linear regression addresses the finding the fit for the line which relates the dependent variable and independent variables.
Least Square method is used for finding the fit for multiple linear regression technique. The method tries to minimize the sum of the squares of the differences from each point to the line. The deviations are squared and added to ensure that the positive and negative values are not cancelled out.
In pharma, health care, economics and other fields, classification plays an important role. As huge data repositories exist in this domains and are used for planning and innovation. Patterns are identified and analyzed for forecasts and prediction.
A good example for classification algorithm is analysis of X-ray images. The labels are assigned for disease characteristics such as tumors and others. The label values can be yes and no. Image analysis of X-rays help in bringing down time for analysis of X-ray images.
The classifier executes the classification algorithm which is of high speed and precise. The training data set is selected which is small to start with and covers all the parameters which are features of the model for X-ray images which are X-ray parameters such as bones, head, and other body parts.
The learning patterns can be different for the training to happen. A feature vector is used for the classification algorithm. The vector has representations of the features in numeric form. Let us say the goal is to classify the images of dogs into different classes based on a set of features. The feature vector will consist of size, appearance, and purpose and hair color.
The techniques used for the classification are presented in the next section.
Linear regression is a classifying technique where the relationships between the parameters observed are modeled. The observed parameters are numerically fit to a line using a simple linear regression. The line is drawn for best fitting or closest to the point.
In a scenario where a group of values is labeled Yes which is of value 1 and other label No of value 0. The linear regression might fail in classifying which is shown in the picture below.
A binary classifier is an algorithm which is referred to as a perceptron. The input data which is based on binary classification is used by the algorithm. The output is a linear partition of data from one class to another. Binary classifier labels the data elements are boolean such as yes or no.
Naive Bayes Classifier
A Naive Bayes classifier is based on bayes theorem. According to the bayes theorem, the probability of an event C happening, given that D has occurred can be calculated. D is the evidence for C happening and C is referred to as hypothesis. The predictor and the features are independent to each other. The algorithm is referred to as naive because one feature does not affect the other one.
The bayesian posterior probability is dependent on prior occurrence, likelihood and evidence (data).
P(C/D) = (P(D/C) P(C) ) / P(D)
To give an example for Naive Bayes classification, let us look at objects which need to classified based on color YELLOW or BLUE. A new objects need to be classified based when they come. The class label need to be applied based on the objects which left. We look at number of BLUE objects which left versus YELLOW. Let us say there are thrice as many BLUE objects versus YELLOW. The new case is thrice as likely to have BLUE label versus YELLOW. Bayesian analysis refers to this technique as prior probability. The previous observations decide the prior probabilities and the BLUE and YELLOW percentages are used for prediction of outcomes.
Let us say total of 80 objects, 60 of which are BLUE and 20 are YELLOW. The prior probabilities of class membership are :
Prior probability for BLUE = 60/80 = 3/4 = 0.75
Prior probability for YELLOW = 20/80 = 1/4 = 0.25
A new object (White circle) need to be classified as shown in the picture above. The naive bayes classifier uses a priori probabilities for likelihood of this new object. The number of points are used for
calculating the probability of the new object being BLUE or GREEN. The likelihood of the object given YELLOW is higher than likelihood of BLUE.
We look at the circle around the white object to check how many BLUE and YELLOW objects are there. The circle has 5 BLUE and 10 YELLOW objects. The membership of the new white object depends on the data presented and the number of YELLOW and BLUE objects which came and left in the system.
Let us take an example for using Naive Bayesian Classification. A deck of cards consist of 52 cards. The goal is to find the probability of the card being a Queen.
Total number of cards in the deck are 52. The total number of Queen cards is 4. The probability of a card being a Queen card is :
P(Queen) = 4/52 = 1/13
The probability of a card being a Queen given that the card has a face on it can be calculated using bayes theorem.
P(Queen/Face) = (P(Face/Queen) P (Queen))/ P(Face)
The probability of the card is a queen given it has a face.
P(Face/Queen) = 1
P(Queen) = 1/13
P(Face) = 9/52
P(Queen/Face) = (1 x (1/13)) / (9/52) = 4/9
Decision tree is used for representing the classified groups. Among the supervised classification learning methods, decision tree learning is very popular method. The features are from domains which are finite and discrete. Class is the term for classified domain element. The tree which is used for labelling the input feature which is a non leaf node. The feature values are labelled by the arcs generating out of a node. The tree leaves are labeled with probability values of the classes.
The features which are from the tree have values assigned on the arcs. The algorithm stops when the leaf is classified.
“Machine learning will increase productivity throughout the supply chain.” ~Dave Waters
The training data is labeled by the domain experts. The machine learning model is trained by the labeled data. The data which is ambiguous is evaluated and validated by the domain experts. The training data set is used for learning purposes.
Machine learning plays an important role in solving the complex problems. Machine Learning techniques are applied to develop learning models for forecasting. The machine learning models help in generating business value for the enterprise.
Model evaluated will be used for predictions. The learning model is used for forecasting, reporting, discovery, planning, optimization and analysis purposes in the organization.
Machine learning models assume that the training data is the basis and the unseen data is very important for making the model more effective. To validate and check the predictions, we need more unseen data for making the model trustworthy. The model should not be remembering the training data and making forecasting for future scenarios. The training data sets might be linearly separable or not linearly separable.
Note: The data set which is linearly separable splits the input set by a plane, line or hyperplane. The points of one set are in first half space and the second set is in the other space.
The machine learning models are evaluated based on number of errors and mean squared error measures. The performance of the model is very important for any machine learning engagement. The evaluation of the model is based on the unseen data and out of the sample data predictions. The accuracy of the predictions is an important evaluation measure.
The model’s evaluation is based on two methods:
The test data set is a prerequisite for model’s evaluation. The data which is used for developing needs to be different from the test data set. The prediction algorithm will have it in memory the label for the training set point. This scenario is called overfitting. The holdout evaluation is about testing the model on unseen data instead of just the trained data set. The learning model effectiveness is measured based on the unseen data accuracy. In the Hold out method, the data set has three subsets. The subsets are:
Test set (Unseen data)
The training data set is used for building the forecasting models. The validation set is used for evaluating and creating the learning model during the training phase Test data or unseen data is used for evaluating the future effectiveness of the model. The hold out method is effective for its performance. The results will have high variableness because the accuracy varies at different stages of the machine learning.
Cross validation is related to separating the observation data from the training data set. The training data set is used for the model learning and training. The unseen data set is used for evaluating the effectiveness of the model.
K-fold cross-validation is one of the cross validation methods. The data set is divided into k sub sets which are referred to as folds. k can vary from 5 to 10. Each of those subsets are used for testing and validating the model. The model performance is based on the average error over k different subsets.
In four fold cross validation; the data is separated into 4 subsets. The models are trained set by step. The first model uses the first data set as the testing one and the other datasets are for training. This happens for 4 separations of the data. The effectiveness of the model is measured by 4 trials with 4 folds (data sets). Every data set point is used for testing once and for training in k-1 trials. The error bias comes down and the data is used for fitting. It reduces the variance and the effectiveness of this method improves by having testing data set as the training data set.
In the next section, we look into different types of Machine learning algorithms such as supervised learning and unsupervised learning in the next blog article.
AI is becoming popular in real life. Many applications are using computer vision by implementing Convolutional Neural network algorithms. Agriculture apps are using CNN based techniques to analyze the crop images for crop’s health and viability of seeds. Self driving cars are using them in moving car and other vehicle detection and classification. Video analysis software uses CNN for finding the automobiles, road blocks, and human beings on the road. Security software also uses it for crime and violence detection. Breast Cancer, pneumonia and other diseases can be diagnosed based on the medical images by using CNN algorithms.
CNN technique consists of two steps convolution and pooling. These steps help in image reduction to basic features for image classification. Convolution helps in viewing the image in breaking it into small images. A CNN can have multiple convolution and activation layers. Convolution layer acts like a filter by applying dot product of the actual pixel input values and weights assigned. The sum of the output is used for filtering the image pixels. Activation layer which is part of CNN creates a matrix smaller than the actual image. The matrix is executed using activation layer which helps training the network by back propagation algorithm. Activation layer uses the function ReLu. Pooling step helps in filter size reduction and downsampling. Max pooling is the term for filtering the results of the last layer. Pooling helps in training the network using all features of the image. Fully connected layer in a Convolutional Neural Network is a multilayer perceptron. The input for fully connected layer is a one dimensional vector. One Dimensional vector is the result of the last layer. The output is a set of probabilities of different feature labels. Each label represents the class and the one with highest probability will be the classification decision.
Enterprise AI has neural network techniques such as ANN, CNN, and RNN. Machine learning algorithms use neural network methods for data analysis and predictive analytics. AI Architecture will have machine learning components and neural network algorithms. For an Enterprise Architect, AI Architecture skills will be very important to create AI Architecture Practice within the organization.Artificial Intelligence Architecture certification adds a step in career path of Enterprise Architect. AI Architecture creation is an important skill needed for a qualified Enterprise Architect.
AI architecture has key factors such as the selection of machine learning frameworks and scalable solutions for automation. The AI reference architecture typically shows a workflow for automation solutions. Many AI frameworks such as Google Tensor Flow,IBM Watson, Scipy, Azure ML, Keras, Google AI, NTLK, Pytorch, and AWS Sage Maker are evolving and changing features rapidly. The AI architecture needs to have the flexibility and adaptability of handling the change. AI architecture helps in scaling, delivering speed and automating processes in the organization.
The AI architecture course explains the machine learning workflows and features such as the following feature derivation, model training, data analytics, data collation, data analysis & selection, project packaging, machine learning model tuning, evaluation, model inference, model validation, and deployment. The course will help in architecting AI applications for the below:
Document,Video and Image Analysis
voice to text, speech recognition
NLP/NLU, Conversational agents and Intelligent Assistants
Deep Learning, Knowledge studio and machine learning
knowledge mining, cognitive search and decision-making applications
In daily life, we come across many applications while working with customers and enterprises. The typical use cases where AI Architecture will help are:
Spam & Email – Filtering & User preferences based content analysis
Predictive Analytics – Credit Worthiness and Loan Applications
OCR : Pattern Recognition – Text, Images, Video and Audio
Biometrics: Identity Management & Security
Machine Learning Models: Life Insurance – Mortality rates, life expectancy
Medical Expense Prediction Model: patient history & medical claim history
Fraud Detection: Credit Card usage and activity patterns
Social Network Analysis: Relationship & Influence Analysis
Ecommerce websites use AI techniques and methods in their implementation . They have the below features related to AI:
Historical data related to customer transactions analysed for customer demographics
Shopping carts of the customer analysed for abandoned
Price analysis of the products using the historical data
Next Best action for the customer based on his preferences and previous purchases
Web page analytics related to customer browsing time for a product
Customer information related to profile, billing, and shipping addresses analysed for demographics
Referral websites tracked by the customer views and click stream analysis
Patterns related to customer rating and reviews of the products
Marketing campaign effectiveness based on email, sms and web channels
Recommendations based on customer history related to browsing, usage and behavior.
Conversion of the shopping from view to a buy – analysis
The recommendations of the customer and the merchant to the customer are analysed using various approaches mentioned below:
Content based Filtering
Train Matchbox Recommendation
Score Matchbox Recommendation
AI Modeling and Architectural development involves identifying modeling techniques, selecting algorithms, designing tests, developing models, assessing models and training the models. The other methods like Ensemble techniques help in combining and selecting multiple approaches based on scenarios. The AI model is validated and tested before using for unseen scenarios.
Enterprises are keen to evaluate AI & Machine learning techniques and develop models for decision making using Data science and algorithms. Leadership in enterprise is interested in getting their Architects trained based on experiential learning and avoid failures by using reference architecture and patterns & anti patterns. RPA is another area which enterprises want to evaluate and implement in the enterprise with AI & Machine learning, Voice and Natural language processing algorithms. Leadership is interested to know domain specific use cases where RPA is successful.
IASA Architect- AI Architecture Training Program is a basic course related to AI Enterprise architecture. This program is a defined baseline for successful IT architects who are implementing AI in enterprises.This initiative involves the advancement of best practices and education while delivering AI Enterprise programs and services to IT architects of all levels around the world.
Supervised learning is related to creating a model which can be used for forecasting based on the historical data for unseen data. The machine learning technique reads the input data set and the expected output data. The model is trained for forecasting the outputs for the new scenarios.
The supervised machine learning can be categorized as:
The fitting of the data is done in the Regression method. The data is partitioned in the Classification method. Supervised learning is very popular in the machine learning space.
The input variables z is transformed by the mapping function g to create the output variable W in supervised learning technique.
W = g(Z)
The new input data Z will be used for forecasting the output variables W using the mapping function. The aim is to find the mapping function. This method is referred to as supervised learning as it is like a manager supervising the employee learning process. Supervisor checks the training process and the forecasts on the training data set. The supervisor validates the outputs for unseen data and the technique targets a goal set for effectiveness.
Let us look at the examples for classification in the following section. The first example is related to classification of dogs.
Classification of Dogs
There are different types of dogs. Dogs can be classified into the following groups.
Dogs have different characteristics and each group has set of features which are used to identify the dog. This is a good example for supervised learning where we have to classify the dog images into various groups based on features.
There are around 560 breeds of dogs presented in the word cloud below:
Another example is classification of cats.
Below is the word cloud of 100 cat breeds. Each breed has different characteristic and feature to categorize the images.
Some of the features or characteristics of the cat are body type, coat, pattern of the skin and coat. The shape of the face is another important factor for cat classification.
Note :A chowder is a set of cats. It is also referred as a glaring. The cats which are very different to each other in a group, glaring is the right word. Kindle is a group of kittens.
In the case of regression, data is distributed in different dimensions. Information needs to be retrieved from it.The models need to evolved based on the data set and the errors need to be minimized for prediction. Regression is the method which is described above.
Dogs and cats problems have different challenges and learning is different in each case. Features need to be analyzed and the models need to be fitted to the data available for prediction.
In terms of machine learning we define these two types as a part of broader class called supervised learning. Machine learning has evolved with the data and processing power available at that particular time.
Classical Machine learning consists of different phases such as modeling, evaluation and methods such as supervised and unsupervised learning. There are different techniques within the supervised and unsupervised learning which are presented in the next sections.
Classical Machine Learning
Machine Learning is related to a code which can learn by implicit code and logic. The input for the code is provided by the data for the training and learning purposes. Machine Learning is part of the computer science and related to Artificial Intelligence. The data is gathered, staged, and cleansed for training and learning purposes.
Real world has different workflows and procedures which can be modeled using mathematics. Machine Learning model is based on the mathematical model of the procedure. Learning is achieved by using the data provided. Data is collated from databases and devices. The data ingestion is done from different datasources.
Data is transformed, normalized, and cleansed before the data set is created for learning.Data is analyzed and patterns are identified for forecasting. Data set features are analyzed and identified for feature set creation.
Different sets of features from the data are used for selection of the approach. For example, for regression the complexity and the degree of the polynomial are the key factors. The model based on mathematics is chosen from a group of candidates. Most of the time, the simplest model is the best one for prediction and forecasting.
“We consider it a good principle to explain the phenomena by the simplest hypothesis possible”. – Ptolemy
Models can be selected from different approaches such as listed below:
Support Vector Machine
Machine Learning Algorithms are categorized into three types.
Supervised Machine Learning
Unsupervised Machine Learning
Before we look at different types of machine learning algorithms, let us look at the machine learning models, features and model creation, training and evaluation of the models in the next blog article.
Artificial Intelligence based Deep Learning is an evolving field. Architect Corner has Next Generation AI Deep Learning Platform.
The following are the use cases targeted by Architect Corner:
The social dynamics in Twitter are characterized by signatures representing the tweet’s popularity, contagiousness, stickiness, and interactivity.
The social dynamics in Yelp are characterized by signatures representing how different groups of reviewers rate individual businesses. We have found the patterns where theses signatures interact by generating, enhancing, or dominating one another.
Deep networks have also had spectacular successes for pedestrian detection and
image segmentation and yielded superhuman performance in traﬃc sign classiﬁcation
such as indexation attribution modeling, collaborative filtering, or recommendation
Detect Emotions from Photos
recommendation engine predicts the proxy of interest.
User clicks on ad
user enters a rating
user clicks on a “like” button,
user buys product
user spends some amount of money on the product
user spends time visiting a page for the product
Collaborative ﬁltering systems: when a new item or a new user is introduced, its lack of rating history means that there is no way to evaluate its similarity with other items or users or the degree of association between, say, that new user and existing items. This is called the problem of cold-start recommendations. we get a biased and incomplete view of the preferences of users: we only see the responses of users to the items. they were recommended and not to the other items. In addition, in some cases we may not get any information on users for whom no recommendation has been made (for example, with ad auctions, it may be that the price proposed for an ad was below a minimum price threshold, or does not win the auction, so the ad is not shown at all). More importantly, we get no information about what outcome would have resulted from recommending any of the other items.