Artificial General Intelligence – Pro et Contra

Head, Board, Machine Learning, Algorithm
AGI

“There is a thin domain of research that, while having ambitious goals of making progress towards human-level intelligence, is also sufficiently grounded in science and engineering methodologies to bring real progress in technology. That’s the sweet spot.” – Yann LeCun

Mind Map

Before we delve into benefits of AGI, let us look at what is AGI first :

Artificial General Intelligence is based on four key principles:

  • Essence of intelligence is thought i.e.., rational deliberation which is necessarily sequential
  • Ideal model of thought is logical inference based on concepts
  • Perceptions is at a lower level of thought
  • Intelligence is based on ontology

Computers with AGI can think, comprehend, learn and apply AI techniques to solve real life challenges. AGI can handle unfamiliar problems. It is referred to as deep or strong AI. The capabilities of AGI are listed below:

  • Sensory perception
  • Fine motor skills
  • Natural Language Understanding
  • Natural Language Processing

Now let us look at the benefits of AGI:

  • AGI can provide solutions to world’s problems related to health, hunger, and poverty.
  • AGI can automate processes and improve efficiencies in the companies
  • Without manual supervision, AGI can execute tasks
silhouette of man standing on rock near body of water during daytime
Singularity

AGI has problems and disadvantages as mentioned below:

  • Singularity can be the effect of AGI
  • AI can be destructive to mankind
  • It can be a weapon for human extinction
  • AGI can evolve without any rules or principles

Note : Singularity is a hypothetical point in time when technological growth becomes uncontrollable and irreversible, resulting in uncontrolled changes to mankind.

“The robot has some objective and pursues it brilliantly to the destruction of mankind. And it’s because it’s the wrong objective. It’s the old King Midas problem. We’ve got to get the right objective,” he explains, “and since we don’t seem to know how to program it, the right answer seems to be that the robot should learn – from interacting with and watching humans – what it is humans care about.”Stuart Russell

 “Golden Touch” myth is about the old King Midas Problem. King Midas, a rich and greedy king in Ancient Greece, acquired the ability to change all that he touched into gold. But hardly had he started that everything was transformed into gold, including his daughter.

Solution for singularity is to develop a simulation tool to simulate AGI machine with different techniques. This simulation will help in predicting the AGI behavior against mankind.

Augmented, Reality, Virtual, Glasses
Simulation

Regression

There are different methods of Regression used in machine learning. The different techniques are listed below:

  1. Linear Regression
  2. Polynomial Regression
  3. Ridge Regression
  4. Lasso Regression
  5. Non Parametric Regression
  6. K-Nearest Neighbor Regression
  7. Kernel Regression

The types of the regression is dependent on the number of explanatory variables such as single (simple) and multiple.

Regression types

                      

 In the next section, linear regression is discussed in detail.

 Linear Regression

Linear Regression

                                  

Linear Regression is very popular modeling method.  This method consists of dependent and independent variables. Dependent variables are continuous. Independent variables are continuous and discrete. In linear regression, independent variables (Z) and dependent (W) variables are used for identifying relationship between them. The relationship used is a straight line which is a best fit. It is also referred as linear regression.

It is represented by an equation W=mZ + c + err, where c is intercept, m is slope of the line and err is error term.  To predict the value of a variable, the function W is used.  The linear regression has single independent variable.

Multiple linear regression has more than independent variables. If there are more than one independent variable, multiple linear regression addresses the finding the fit for the line which relates the dependent variable and independent variables.

Least Square method is used for finding the fit for multiple linear regression technique. The method tries to minimize the sum of the squares of the differences from each point to the line.  The deviations are squared and added to ensure that the positive and negative values are not cancelled out.

Code Snippet : Linear_Regression.py

Instructions for Running the Code

pip install numpy

pip install tensorflow

python linear_regression.py

Output of the code Execution

Classification

Classification 

In pharma, health care, economics and other fields, classification plays an important role. As huge data repositories exist in this domains and are used for planning and innovation. Patterns are identified and analyzed for forecasts and prediction.

A good example for classification algorithm is analysis of X-ray images. The labels are assigned for disease characteristics such as tumors and others. The label values can be yes and no. Image analysis of X-rays help in bringing down time for analysis of X-ray images.

X Ray Images

The classifier executes the classification algorithm which is of high speed and precise. The training data set is selected which is small to start with and covers all the parameters which are features of the model for X-ray images which are X-ray parameters such as bones, head, and other body parts.

  Xray Image analysis

                          

The learning patterns can be different for the training to happen. A feature vector is used for the classification algorithm. The vector has representations of the features in numeric form. Let us say the goal is to classify the images of dogs into different classes based on a set of features. The feature vector will consist of size, appearance, and purpose and hair color.

The techniques used for the classification are presented in the next section.

Linear Regression

Linear regression is a classifying technique where the relationships between the parameters observed are modeled. The observed parameters are numerically  fit to a line using a simple linear regression. The line is drawn for best fitting or closest to the point.

In a scenario where a group of values is labeled Yes which is of value 1 and other label No of value 0. The linear regression might fail in classifying which is shown in the picture below.

Linear Regression

                             

Perceptron

A binary classifier is an algorithm which is referred to as a perceptron. The input data which is based on binary classification is used by the algorithm. The output is a linear partition of data from one class to another. Binary classifier labels the data elements are boolean such as yes or no.

Perceptron

                  

Naive Bayes Classifier

A Naive Bayes classifier is based on bayes theorem. According to the bayes theorem, the probability of an event C happening, given that D has occurred can be calculated. D is the evidence for C happening and C is referred to as hypothesis.  The predictor and the features are independent to each other.  The algorithm is referred to as naive because  one feature does not affect the other one.

The bayesian posterior probability is dependent on prior occurrence, likelihood and evidence (data).

P(C/D)  = (P(D/C) P(C) )  / P(D)

           

Naive Bayes Method

To give an example for Naive Bayes classification,  let us look at objects which need to classified based on color YELLOW or BLUE. A new objects need to be classified based when they come. The class label need to be applied based on the objects which left.  We look at number of BLUE objects which left versus YELLOW. Let us say there are thrice as many BLUE objects versus YELLOW. The new case is thrice as likely to have BLUE label versus YELLOW. Bayesian analysis refers to this technique as prior probability. The previous observations decide the prior probabilities and the BLUE and YELLOW percentages are used for prediction of outcomes.

Let us say total of 80 objects, 60 of which are BLUE and 20 are YELLOW. The prior probabilities of class membership  are :

Prior probability for BLUE  = 60/80 = 3/4 = 0.75

Prior probability for YELLOW = 20/80 = 1/4 = 0.25

A new object (White circle) need to be classified as shown in the picture above. The naive bayes classifier uses a priori probabilities  for likelihood of this new object. The number of points are used for

calculating the probability of the new object being BLUE or GREEN. The likelihood of the object given YELLOW  is higher than likelihood of BLUE.

Naive Bayes Classifier

We look at the circle around the white object to check how many BLUE and YELLOW objects are there. The circle has 5 BLUE and 10 YELLOW objects. The membership of the new white object depends on the data presented and the number of YELLOW and BLUE objects which came and left in the system.

Likelihood Analysis

                         

 Let us take an example for using Naive Bayesian Classification.   A deck of cards consist of 52 cards. The goal is to find the probability of the card being a Queen.

Total number of cards in the deck are 52.  The total number of Queen cards is 4.  The probability of a card being a Queen card is :

P(Queen) = 4/52 = 1/13

The probability of a card being a Queen given that the card has a face on it can be calculated using bayes theorem.

P(Queen/Face)  = (P(Face/Queen)  P (Queen))/ P(Face)

The probability of the card is a queen given it has a face.

P(Face/Queen) = 1 

P(Queen) = 1/13

P(Face) = 9/52 

P(Queen/Face) = (1 x (1/13)) / (9/52)   = 4/9

Decision Trees

Decision Tree

Decision tree is used for representing the classified groups. Among the supervised classification learning methods, decision tree learning is very popular method. The features are from domains which are finite and discrete.  Class is the term  for classified domain element.  The tree which is used for labelling the input feature which is a non leaf node. The feature values are labelled by the arcs generating out of a node. The tree leaves are labeled with probability values of the classes.

The features which are from the tree have values assigned on the arcs. The algorithm stops when the leaf is classified.

ML Modeling

“Machine learning will increase productivity throughout the supply chain.” ~Dave Waters

Training Models

The training data is labeled by the domain experts. The machine learning model is trained by the labeled data. The data which is ambiguous  is evaluated and validated by the domain experts. The training data set is used for learning purposes.

Training flow

                      

Evaluation

Machine learning plays an important role in solving the complex problems. Machine Learning techniques are applied to develop learning models for forecasting. The machine learning models help in generating business value for the enterprise.

Model Evaluation

                                    

Model evaluated will be used for predictions. The learning model is used for forecasting, reporting, discovery, planning, optimization and analysis purposes in the organization.

Model Usage

                                     

Machine learning models assume that the training data is the basis and the unseen data is very important for making the model more effective. To validate and check the predictions, we need more unseen data for making the model trustworthy.  The model should not be remembering the training data and making forecasting for future scenarios. The training data sets might be linearly separable or not linearly separable.

Nonlinear and linear separability

                     

Note: The data set which is linearly separable splits the input set by a plane, line or hyperplane.  The points of one set are in first half space and the second set is in the other space.

The machine learning models are evaluated based on number of errors and mean squared error measures. The performance of the model is very important for any machine learning engagement. The evaluation of the model is based on the unseen data and out of the sample data predictions. The accuracy of the predictions is an important evaluation measure.

The model’s evaluation is based on two methods:

  1. Hold out
  2. Cross Validation

Hold out

The test data set is a prerequisite for model’s evaluation. The data which is used for developing needs to be different from the test data set. The prediction algorithm will have it in memory the label for the training set point. This scenario is called overfitting. The holdout evaluation is about testing the model on unseen data instead of just the trained data set. The learning model effectiveness is measured based on the unseen data accuracy. In the Hold out method, the data set has three subsets. The subsets are:

  1. Training Set
  2. Validation Set
  3. Test set (Unseen data)
  Data Subsets

                                            

The training data set is used for building the forecasting models. The validation set is used for evaluating and creating the learning model during the training phase Test data or unseen data is used for evaluating the future effectiveness of the model. The hold out method is effective for its performance.  The results will have high variableness because the accuracy varies at different stages of the machine learning.

Cross Validation

Cross validation is related to separating the observation data from the training data set. The training data set is used for the model learning and training. The unseen data set is used for evaluating the effectiveness of the model.

K-fold cross-validation is one of the cross validation methods. The data set is divided into k sub sets which are referred to as folds. k can vary from 5 to 10.  Each of those subsets are used for testing and validating the model.  The model performance is based on the average error over k different subsets. 

In four fold cross validation; the data is separated into 4 subsets. The models are trained set by step. The first model uses the first data set as the testing one and the other datasets are for training. This happens for 4 separations of the data. The effectiveness of the model is measured by 4 trials with 4 folds (data sets). Every data set point is used for testing once and for training in k-1 trials. The error bias comes down and the data is used for fitting. It reduces the variance and the effectiveness of this method improves by having testing data set as the training data set.

In the next section, we look into different types of Machine learning algorithms such as supervised learning and unsupervised learning in the next blog article.

Types of Machine learning

Supervised Learning

Supervised learning is related to creating a model which can be used for forecasting based on the historical data for unseen data. The machine learning technique reads the input data set and the expected output data. The model is trained for forecasting the outputs for the new scenarios.

The supervised machine learning can be categorized as:

  1. Regression
  2. Classification

The fitting of the data is done in the Regression method. The data is partitioned in the Classification method. Supervised learning is very popular in the machine learning space.

The input variables  z is transformed by the mapping function g to create the output variable W in supervised learning technique.

W = g(Z)

The new input data Z will be used for forecasting the output variables W using the mapping function.  The aim is to find the mapping function. This method is referred to as supervised learning as it is like a manager supervising the employee learning process. Supervisor checks the training process and the forecasts on the training data set. The supervisor validates the outputs for unseen data and the technique targets a goal set for effectiveness.

Let us look at the examples for classification in the following section. The first example is related to classification of dogs.

                                     Classification of Dogs

There are different types of dogs. Dogs can be classified into the following groups.

  1. Herding
  2. Sporting
  3. NonSporting
  4. Working
  5. Hounds
  6. Terriers
  7. Toy

Dogs have different characteristics and each group has set of features which are used to identify the dog. This is a good example for supervised learning where we have to classify the dog images into various groups based on features.

There are around 560 breeds of dogs presented in the  word cloud  below:

Dog Breed Word cloud

                              

Another example is classification of cats.

Classification of Cats

Below is the word cloud of 100 cat breeds. Each breed has different characteristic and feature to categorize the images.

Cat breed word cloud

                           

Some of the features or characteristics of the cat are body type, coat, pattern of the skin and coat. The shape of the face is another important factor for cat classification.

Note :A chowder is a set of cats. It is also referred as a glaring. The cats which are very different to each other  in a group, glaring is the right word. Kindle is a group of kittens.

In the case of regression, data is distributed in different dimensions. Information needs to be retrieved from it.The models need to evolved based on the data set and the errors need to be minimized for prediction. Regression is the method which is described above.

Dogs and cats problems have different challenges and learning is different in each case.  Features need to be analyzed and the models need to be fitted to the data available for prediction.

  Regression

                               

In terms of machine learning we define these two types as a part of broader class called supervised learning. Machine learning has evolved with the data and processing power available at that particular time.

Classical Machine Learning

Classical Machine learning consists of different phases such as modeling, evaluation and methods such as supervised and unsupervised learning. There are different techniques within the supervised and unsupervised learning which are presented in the next sections.

Classical Machine Learning

Machine Learning is related to a code which can learn by implicit code and logic. The input for the code is provided by the data for the training and learning purposes. Machine Learning is part of the computer science and related to Artificial Intelligence.  The data is gathered, staged, and cleansed for training and learning purposes.

Modelling

 Real world has different workflows and procedures which can be modeled using mathematics. Machine Learning model is based on the mathematical model of the procedure. Learning  is achieved by using the data provided. Data is collated from databases and devices. The data ingestion is done from different datasources.

Data is transformed, normalized, and cleansed before the data set is created for learning.Data is analyzed and patterns are identified for forecasting. Data set features are analyzed and identified for feature set creation.

Different sets of features from the data are used for selection of the approach. For example, for regression the complexity and the degree of the polynomial are the key factors. The model based on mathematics is chosen from a group of candidates.  Most of the time, the simplest model is the best one for prediction and forecasting.

“We consider it a good principle to explain the phenomena by the simplest hypothesis possible”. – Ptolemy

 Models can be selected from different approaches such as listed below:

  • Support Vector Machine
  • Logistic Regression
  • Others

  Machine Learning Algorithms are categorized into three types.

  1. Supervised Machine Learning
  2. Unsupervised Machine Learning
  3. Reinforcement Learning

Before we look at different types of machine learning algorithms, let us look at the machine learning models, features and model creation, training and evaluation of the models in the next blog article.