Artificial Intelligence

A Generic Searching Algorithm

4 min read
A Generic Searching Algorithm: A generic algorithm to search for a solution path in a graph. The algorithm is independent of any particular search strategy and any particular graph. The intuitive idea behind the generic search algorithm, given a graph, a set of start nodes, and a set of goal nodes, is to incrementally explore paths from the start nodes. This is done by maintaining a frontier (or fringe) of paths from the start node that have been explored. The frontier contains all of the paths that could form initial segments of paths from a start node to a goal node. (See Figure 3.3 (on the next page), where the frontier is the set of paths to the gray shaded nodes.) Initially, the frontier contains trivial paths containing no arcs from the start nodes. As the search proceeds, the frontier expands into the unexplored nodes until a goal node is encountered. To expand the frontier, the searcher selects and removes a path from the frontier, extends the path with each arc leaving the last node, and adds these new paths to the frontier. A search strategy defines which element of the frontier is selected at each step. The generic search algorithm is shown in Figure 3.4. Initially, the frontier is the set of empty paths from start nodes. At each step, the algorithm advances the frontier by removing a path s0, . . . , sk from the frontier. If goal(sk) is true (i.e., sk is a goal node), it has found a solution and returns the path that was found, namely s0, . . . , sk. Otherwise, the path is extended by one more arc by finding the neighbors of sk. For every neighbor s of sk, the path s0, . . . , sk, s is added to the frontier. This step is known as expanding the node sk. This algorithm has a few features that should be noted:

Genetic Algorithms

3 min read
Introduction:-A typical example of a heuristic method for problem solving is the genetic approach used in what is known as genetic algorithms. Genetic algorithms solve complex combinatorial and organizational problems with many variants, by employing analogy with nature's evolution. Genetic algorithms were introduced by John Holland (1975) and further developed by him and other researchers. Nature’s diversity of species is tremendous. How does mankind evolve into the enormous variety of variants—in other words, how does nature solve the optimization problem of perfecting mankind? One answer to this question may be found in Charles Darwin's theory of evolution. The most important terms used in the genetic algorithms are analogous to the terms used to explain the evolutionary processes. They are: Genetic algorithms are usually illustrated by game problems. Such is a version of the "mastermind" game, in which one of two players thinks up a number (e.g., 001010) and the other has to find it out with a minimal number of questions. Each question is a hypothesis (solution) to which the first player replies With another number indicating the number of correctly guessed figures. This number is the criterion for the selection of the most promising or prospective variant which will take the second player to eventual success. If there is no improvement after a certain number of steps, this is a hint that a change should be introduced. Such change is called mutation. In this game success is achieved after 16 questions, which is four times faster than checking all the possible combinations, as there are 26 = 64 possible variants. There is no need for mutation in this example. If it were needed, it could be introduced by changing a bit (a gene) by random selection. Mutation would have been necessary if, for example, there was 0 in the third bit of all three initial individuals, because no matter how the most prospective individuals are combined, by copying a precise part of their code we can never change this bit into 1.Mutation takes evolution out of a "dead end."

Breadth-first Search

4 min read
Before proving the various properties of breadth-first search, we take on the somewhat easier job of analyzing its running time on an input graph G = (V, E). We use aggregate analysis, as we saw in Aggregate analysis. After initialization, no vertex is ever whitened, and thus the test in line 13 ensures that each vertex is enqueued at most once, and hence dequeued at most once. The operations of enqueuing and dequeuing take O(1) time, so the total time devoted to queue operations is O(V). Because the adjacency list of each vertex is scanned only when the vertex is dequeued, each adjacency list is scanned at most once. Since the sum of the lengths of all the adjacency lists is Θ(E), the total time spent in scanning adjacency lists is O(E). The overhead for initialization is O(V), and thus the total running time of BFS is O(V E). Thus, breadth-first search runs in time linear in the size of the adjacency-list representation of G. At the beginning of this section, we claimed that breadth-first search finds the distance to each reachable vertex in a graph G = (V, E) from a given source vertex s V. Define the shortest-path distance δ(s, v) from s to v as the minimum number of edges in any path from vertex s to vertex v; if there is no path from s to v, then δ(s, v) = . A path of length δ(s, v) from s to v is said to be a shortest path from s to v. Before showing that breadth-first search actually computes shortest-path distances, we investigate an important property of shortest-path distances. The procedure BFS builds a breadth-first tree as it searches the graph, as illustrated in Figure 22.3. The tree is represented by the π field in each vertex. More formally, for a graph G = (V, E) with source s, we define the predecessor subgraph of G as Gπ = (Vπ, Eπ), where

Heuristic Search

3 min read
Heuristic Search: All of the search methods in the preceding section are uninformed in that they did not take into account the goal. They do not use any information about where they are trying to get to unless they happen to stumble on a goal. One form of heuristic information about which nodes seem the most promising is a heuristic function h(n), which takes a node n and returns a non-negative real number that is an estimate of the path cost from node n to a goal node. The function h(n) is an underestimate if h(n) is less than or equal to the actual cost of a lowest-cost path from node n to a goal. The heuristic function is a way to inform the search about the direction to a goal. It provides an informed way to guess which neighbor of a node will lead to a goal. There is nothing magical about a heuristic function. It must use only information that can be readily obtained about a node. Typically a trade-off exists between the amount of work it takes to derive a heuristic value for a node andhow accurately the heuristic value of a node measures the actual path cost from the node to a goal. A standard way to derive a heuristic function is to solve a simpler problem and to use the actual cost in the simplified problem as the heuristic function of the original problem. The h function can be extended to be applicable to (non-empty) paths. The heuristic value of a path is the heuristic value of the node at the end of the path. That is: h(no, . . . , nk) = h(nk) A simple use of a heuristic function is to order the neighbors that are added to the stack representing the frontier in depth-first search. The neighbors can be added to the frontier so that the best neighbor is selected first. This is known as heuristic depth-first search. This search chooses the locally best path, but it explores all paths from the selected path before it selects another path. Although it is often used, it suffers from the problems of depth-fist search. Another way to use a heuristic function is to always select a path on the frontier with the lowest heuristic value. This is called best-first search. It

Games

6 min read
Mathematical game theory, a branch of economics, views any multi agent environment as a game, provided that the impact of each agent on the others is "significant," regardless of whether the agents are cooperative or competitive. 1 In AI, the most common games are of a rather specialized kind-what game theorists call deterministic, turn-taking, two-player, zero-sum games of perfect information (such as chess). In our terminology, this means deterministic, fully observable envirorunents in which two agents act alternately and in which the utility values at the end of the game are always equal and opposite. For example, if one player wins a game of chess, the other player necessarily loses. It is this opposition between the agents' utility functions that makes the situation adversarial. Games have engaged the intellectual faculties of humans-sometimes to an alanning degree-for as long as civilization has existed. For AI researchers, the abstract nature of games makes them an appealing subject for study. The state of a game is easy to represent, and agents are usually restricted to a small nwnber of actions whose outcomes are defined by precise rules. Physical games, such as croquet and ice hockey, have much more complicated descriptions, a much larger range of possible actions, and rather imprecise rules defining the legality of actions. With the exception of robot soccer, these physical games have not attracted much interest in the AI community. Games, tmlike most of the toy problems studied in Chapter 3, are interesting because they are too hard to solve. For example, chess has an average branching factor of about 35, and games often go to 50 moves by each player, so the search tree has about 35100 or 10154 nodes (although the search graph has "only" about 1 040 distinct nodes). Games, like the real world, therefore require the ability to make some decision even when calculating the optimal decision is infeasible. Games also penalize inefficiency severely. Whereas an implementation of A* search that is half as efficient will simply take twice as long to run to completion, a chess program that is half as efficient in using its available time probably will be beaten into the ground, other things being equal. Game-playing research has therefore spawned a number of interesting ideas on how to make the best possible use of time.

Minimax Algorithm

4 min read
The minimax algorithm: The minimax algorithm (Figure 5.3) computes the minimax decision from the current state. It uses a simple recursive computation of the minimax values of each successor state, directly implementing the defining equations. The recursion proceeds all the way down to the leaves of the tree, and then the minimax values are backed up through the tree as the recursion unwinds. For example, in Figure 5.2, the algorithm first recurses down to the three bottomleft nodes and uses the UTILITY function on them to discover that their values are 3, 12, and 8, respectively. Then it takes the minimum of these values, 3, and returns it as the backedup value of node B. A similar process gives the backed-up values of 2 for C and 2 for D. Finally, we take the maximum of 3, 2, and 2 to get the backed-up value of 3 for the root node. The minimax algoritlun performs a complete depth-first exploration of the game tree. If the maximum depth of the tree is m and there are b legal moves at each point, then the time complexity of the minimax algoritlun is O(b m). The space complexity is O(lnn) for an algorithm that generates all actions at once, or 0( m) for an algorithm that generates actions one at a time. For real games, of course, the time cost is totally impractical, but this algorithm serves as the basis for the mathematical analysis of games and for more practical algorithms. Optimal decisions in multiplayer games: Many popular games allow more than two players. Let us examine how to extend the minimax idea to multiplayer games. This is straightforward from the technical viewpoint, but raises some interesting new conceptual issues.

Bi-directional Search

3 min read
Bi-directional search: Suppose that the search problem is such that the arcs are bidirectional. That is, if there is an operator that maps from state A to state B, there is another operator that maps from state B to state A. Many search problems have reversible arcs. 8-puzzle, 15-puzzle, path planning etc are examples of search problems. However there are other state space search formulations which do not have this property. The water jug problem is a problem that does not have this property. But if the arcs are reversible, you can see that instead of starting from the start state and searching for the goal, one may start from a goal state and try reaching the start state. If there is a single state that satisfies the goal property, the search problems are identical. How do we search backwards from goal? One should be able to generate predecessor states. Predecessors of node n are all the nodes that have n as successor. This is the motivation to consider bidirectional search. Algorithm: Bidirectional search involves alternate searching from the start state toward the goal and from the goal state toward the start. The algorithm stops when the frontiers intersect. A search algorithm has to be selected for each half. How does the algorithm know when the frontiers of the search tree intersect? For bidirectional search to work well, there must be an efficient way to check whether a given node belongs to the other search tree. Bidirectional search can sometimes lead to finding a solution more quickly. The reason can be seen from inspecting the following figure.

Reinforcement Learning

4 min read
A supervised learning agent needs to be told the correct move for each position it encounters, but such feedback is seldom available. In the absence of feedback from a teacher, an agent can learn a transition model for its own moves and can perhaps leam to predict the opponents moves, but without some feedback about what is good and what is bad, the agent will have no grounds for deciding which move to make. The agent needs to know that sometrung good has happened when it (accidentally) checkmates the opponent, and that something bad has happened when it is checkmated-or vice versa, if the game is suicide chess. This kind of feedback is called a reward, or reinforcement. In games like chess, the reinforcement is received only at the end of the game. In other environments, the rewards come more frequently. In ping-pong, each point scored can be considered a reward; when learning to crawl, any forward motion is an acruevement. Our framework for agents regards the reward as part of the input percept, but the agent must be "hardwired" to recognize that part as a reward rather than as just another sensory input. Thus, animals seem to be hardwired to recognize pain and hunger as negative rewards and pleasure and food intake as positive rewards. Reinforcement has been carefully studied by animal psychologists for over 60 years. An optimal policy is a policy that maximizes the expected total reward. The task of reinforcement learning is to use observed rewards to learn an optimal (or nearly optimal) policy for the envirorunent. prior knowledge of either. Imagine playing a new game whose rules you do not know; after a hundred or so moves, your opponent announces, "You lose." This is reinforcement leaming in a nutshell.

Decision Trees

4 min read
Decision Trees Rooted trees can be used to model problems in which a series of decisions leads to a solution. For instance, a binary search tree can be used to locate items based on a series of comparisons, where each comparison tells us whether we have located the item, or whether we should go right or left in a subtree. A rooted tree in which each internal vertex corresponds to a decision, with a subtree at these vertices for each possible outcome of the decision, is called a decision tree. The possible solutions of the problem correspond to the paths to the leaves of this rooted tree. Example 1 illustrates an application of decision trees. EXAMPLE 1 Suppose there are seven coins, all with the same weight, and a counterfeit coin that weighs less than the others. How many weighings are necessary using a balance scale to determine which of the eight coins is the counterfeit one? Give an algorithm for finding this counterfeit coin. Solution: There are three possibilities for each weighing on a balance scale. The two pans can have equal weight, the first pan can be heavier, or the second pan can be heavier. Consequently, the decision tree for the sequence of weighings is a 3-ary tree. There are at least eight leaves in the decision tree because there are eight possible outcomes (because each of the eight coins can be the counterfeit lighter coin), and each possible outcome must be represented by at least one leaf. The largest number of weighings needed to determine the counterfeit coin is the height of the decision tree. From Corollary 1 of Section 11.1 it follows that the height of the decision tree is at least log3 8 = 2. Hence, at least two weighings are needed. It is possible to determine the counterfeit coin using two weighings. The decision tree that illustrates how this is done is shown in Figure 3. THE COMPLEXITY OF COMPARISON-BASED SORTING ALGORITHMS Many different sorting algorithms have been developed. To decide whether a particular sorting algorithm is efficient, its complexity is determined. Using decision trees as models, a lower bound for the worst-case complexity of sorting algorithms that are based on binary comparisons can be found. We can use decision trees to model sorting algorithms and to determine an estimate for the worst-case complexity of these algorithms. Note that given n elements, there are n! possible orderings of these elements, because each of the n! permutations of these elements can be the correct order. The sorting algorithms studied in this book, and most commonly used sorting algorithms, are based on binary comparisons, that is, the comparison of two elements at a time. The result of each such comparison narrows down the set of possible orderings. Thus, a sorting algorithm based on binary comparisons can be represented by a binary decision tree in which each internal vertex represents a comparison of two elements. Each leaf represents one of the n! permutations of n elements.

Neural Networks

3 min read
Neural Networks: Artificial neural networks are among the most powerful learning models. They have the versatility to approximate a wide range of complex functions representing multi-dimensional input-output maps. Neural networks also have inherent adaptability, and can perform robustly even in noisy environments. An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected simple processing elements (neurons) working in unison to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true of ANNs as well. ANNs can process information at a great speed owing to their highly massive parallelism. Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyse. This expert can then be used to provide projections given new situations of interest and answer "what if" questions. Other advantages include: Biological Neural Networks: Much is still unknown about how the brain trains itself to process information, so theories abound. In the human brain, a typical neuron collects signals from others through a host of fine structures called dendrites. The neuron sends out spikes of electrical activity through a long, thin stand known as an axon, which splits into thousands of branches. At the end of each branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity from the axon into electrical effects that inhibit or excite activity in the connected neurones. When a neuron receives excitatory input that is sufficiently large compared with its inhibitory input, it sends a spike of electrical activity down its axon. Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on another changes.

Learning Hidden Markov Models

2 min read
Learning hidden Markov models: Our final application of EM involves learning the transition probabilities in hidden Markov models (HMMs). Recall from Chapter 15 that a hidden Markov model can be represented by a dynamic Bayes net with a single discrete state variable, as illustrated in Figure 20.11. Each data point consists of an observation sequence of finite length, so the problem is to learn the transition probabilities from a set of observation sequences (or possibly from just one long sequence). We have already worked out how to learn Bayes nets, but there is one complication: in Bayes nets, each parameter is distinct; in a hidden Markov model, on the other hand, the individual transition probabilities from state i to state j at time t, ijt =P(Xt 1 =jjXt =i), are repeated across time—that is, ijt =ij for all t. To estimate the transition probability from state i to state j, we simply calculate the expected proportion of times that the system undergoes a transition to state j when in state i: Again, the expected counts are computed by any HMM inference algorithm. The forward– backward algorithm shown in Figure 15.4 can be modified very easily to compute the necessary probabilities. One important point is that the probabilities required are those obtained by smoothing rather than filtering; that is, we need to pay attention to subsequent evidence in estimating the probability that a particular transition occurred. As we said in Chapter 15, the evidence in a murder case is usually obtained after the crime (i.e., the transition from state i to state j) occurs.

Reinforcement Learning

4 min read
A supervised learning agent needs to be told the correct move for each position it encounters, but such feedback is seldom available. In the absence of feedback from a teacher, an agent can learn a transition model for its own moves and can perhaps leam to predict the opponents moves, but without some feedback about what is good and what is bad, the agent will have no grounds for deciding which move to make. The agent needs to know that sometrung good has happened when it (accidentally) checkmates the opponent, and that something bad has happened when it is checkmated-or vice versa, if the game is suicide chess. This kind of feedback is called a reward, or reinforcement. In games like chess, the reinforcement is received only at the end of the game. In other environments, the rewards come more frequently. In ping-pong, each point scored can be considered a reward; when learning to crawl, any forward motion is an acruevement. Our framework for agents regards the reward as part of the input percept, but the agent must be "hardwired" to recognize that part as a reward rather than as just another sensory input. Thus, animals seem to be hardwired to recognize pain and hunger as negative rewards and pleasure and food intake as positive rewards. Reinforcement has been carefully studied by animal psychologists for over 60 years. An optimal policy is a policy that maximizes the expected total reward. The task of reinforcement learning is to use observed rewards to learn an optimal (or nearly optimal) policy for the envirorunent. prior knowledge of either. Imagine playing a new game whose rules you do not know; after a hundred or so moves, your opponent announces, "You lose." This is reinforcement leaming in a nutshell.

Learning With Complete Data

4 min read
Introduction: Our development of statistical learning methods begins with the simplest task: parameter learning with complete data. A parameter learning task involves finding the numerical parameters for a probability model whose structure is fixed. For example, we might be interested in learning the conditional probabilities in a Bayesian network with a given structure. Data are complete when each data point contains values for every variable in the probability model being learned. Complete data greatly simplify the problem of learning the parameters of a complex model. We will also look briefly at the problem of learning structure. Maximum-likelihood parameter learning: Discrete models Suppose we buy a bag of lime and cherry candy from a new manufacturer whose lime–cherry proportions are completely unknown—that is, the fraction could be anywhere between 0 and 1. In that case, we have a continuum of hypotheses. The parameter in this case, which we call , is the proportion of cherry candies, and the hypothesis is h. (The proportion of limes is just 1 - .) If we assume that all proportions are equally likely a priori, then a maximumlikelihood approach is reasonable. If we model the situation with a Bayesian network, we need just one random variable, Flavor (the flavor of a randomly chosen candy from the bag). It has values cherry and lime, where the probability of cherry is  (see Figure 20.2(a)). Now suppose we unwrap N candies, of which c are cherries and `=N - c are limes. According to Equation (3), the likelihood of this particular data set is

Learning With Hidden Variables

2 min read
Introduction: The preceding section dealt with the fully observable case. Many real-world problems have hidden variables (sometimes called latent variables) which are not observable in the data that are available for learning. For example, medical records often include the observed symptoms, the treatment applied, and perhaps the outcome of the treatment, but they seldom contain a direct observation of the disease itself!6 One might ask, “If the disease is not observed, why not construct a model without it?” The answer appears in Figure 20.7, which shows a small, fictitious diagnostic model for heart disease. There are three observable predisposing factors and three observable symptoms (which are too depressing to name). Assume that each variable has three possible values (e.g., none, moderate, and severe). Removing the hidden variable from the network in (a) yields the network in (b); the total number of parameters increases from 78 to 708. Thus, latent variables can dramatically reduce the number of parameters required to specify a Bayesian network. This, in turn, can dramatically reduce the amount of data needed to learn the parameters. Hidden variables are important, but they do complicate the learning problem. In Figure 20.7(a), for example, it is not obvious how to learn the conditional distribution for HeartDisease, given its parents, because we do not know the value of HeartDisease in each case; the same problem arises in learning the distributions for the symptoms. This section

Unsupervised Clustering: Learning Mixtures Of Gaussians

6 min read
Unsupervised clustering: Learning mixtures of Gaussians Unsupervised clustering is the problem of discerning multiple categories in a collection of objects. The problem is unsupervised because the category labels are not given. For example, suppose we record the spectra of a hundred thousand stars; are there different types of stars revealed by the spectra, and, if so, how many and what are their characteristics? We are all familiar with terms such as “red giant” and “white dwarf,” but the stars do not carry these labels on their hats—astronomers had to perform unsupervised clustering to identify these categories. Other examples include the identification of species, genera, orders, and so on in the Linnæan taxonomy of organisms and the creation of natural kinds to categorize ordinary objects. Unsupervised clustering begins with data. Figure 20.8(a) shows 500 data points, each of which specifies the values of two continuous attributes. The data points might correspond to stars, and the attributes might correspond to spectral intensities at two particular frequencies. Next, we need to understand what kind of probability distribution might have generated the data. Clustering presumes that the data are generated from a mixture distribution, P. Such a distribution has k components, each of which is a distribution in its own right. A data point is generated by first choosing a component and then generating a sample from that component. Let the random variable C denote the component, with values 1; : : : ; k; then the mixture

Interleaving Vs. Non-interleaving Of Sub-plan Steps

4 min read
Interleaving vs. Non-Interleaving of Sub-Plan Steps: Given a conjunctive goal, G1 ^ G2, if the steps that solve G1 must either all come before or all come after the steps that solve G2, then the planner is called a non-interleaving planner. Otherwise, the planner allows interleaving of sub-plan steps. This constraint is different from the issue of partial-order vs. total-order planners. STRIPS is a non-interleaving planner because of its use of a stack to order goals to be achieved. Partial-Order Planner (POP) Algorithm function pop(initial-state, conjunctive-goal, operators) // non-deterministic algorithm plan = make-initial-plan(initial-state, conjunctive-goal); loop: begin if solution?(plan) then return plan; (S-need, c) = select-subgoal(plan) ; // choose an unsolved goal choose-operator(plan, operators, S-need, c); // select an operator to solve that goal and revise plan resolve-threats(plan); // fix any threats created end end function solution?(plan) if causal-links-establishing-all-preconditions-of-all-steps(plan) and all-threats-resolved(plan) and all-temporal-ordering-constraints-consistent(plan) and all-variable-bindings-consistent(plan) then return true; else return false; end function select-subgoal(plan) pick a plan step S-need from steps(plan) with a precondition c that has not been achieved; return (S-need, c); end procedure choose-operator(plan, operators, S-need, c) // solve "open precondition" of some step choose a step S-add by either Step Addition: adding a new step from operators that has c in its Add-list or Simple Establishment: picking an existing step in Steps(plan) that has c in its Add-list; if no such step then return fail; add causal link "S-add --->c S-need" to Links(plan); d temporal ordering constrainS-add < S-need" to Orderings(plan); adt " if S-add is a newly added step then begin add S-add to Steps(plan); add "Start < S-add" and "S-add < Finish" to Orderings(plan); end end procedure resolve-threats(plan) foreach S-threat that threatens link "Si --->c Sj" in Links(plan) begin // "declobber" threat choose either Demotion: add "S-threat < Si" to Orderings(plan) or Promotion: add "Sj < S-threat" to Orderings(plan); if not(consistent(plan)) then return fail; end

Syntax Of Propositional Calculus

2 min read
Introduction: A proposition is a sentence, written in a language, that has a truth value (i.e., it is true or false) in a world. A proposition is built from atomic propositions using logical connectives. An atomic proposition, or just an atom, is a symbol (page 114) that starts with a lower-case letter. Intuitively, an atom is something that is true or false. For example, ai is fun, lit l1, live outside, and sunny can all be atoms. In terms of the algebraic variables of the preceding chapter, an atom can be seen as a statement that a variable has a particular value or that the value is in a set of values. For example, the proposition classtimeAfter3 may mean ClassTime > 3, which is true when the variable ClassTime has value greater than 3 and is false otherwise. It is traditional in propositional calculus not to make the variable explicit and we follow that tradition. A direct connection exists to Boolean variables (page 113), which are variables with domain {true, false}. An assignment X = true is written as the proposition x, using the variable name, but in lower case. So the proposition happy can mean there exists a Boolean variable Happy, where happy means Happy = true. Propositions can be built from simpler propositions using logical connectives. A proposition is either ¬p (read “not p”)—the negation of p p ∧ q (read “p and q”)—the conjunction of p and q p ∨ q (read “p or q”)—the disjunction of p and q p → q (read “p implies q”)—the implication of q from p p ← q (read “p if q”)—the implication of p from q p ↔ q (read “p if and only if q” or “p is equivalent to q”)

Choice Between Forward And Backward Chaining

2 min read
Choice between forward and backward chaining Forward chaining is often preferable in cases where there are many rules with the same conclusions. A well-known category of such rule systems are taxonomic hierarchies. E.g. the taxonomy of the animal kingdom includes such rules as: animal(X) :- sponge(X). animal(X) :- arthopod(X). animal(X) :- vertebrate(X). ... vertebrate(X) :- fish(X). vertebrate(X) :- mammal(X) ... mammal(X) :- carnivore(X) ... carnivore(X) :- dog(X). carnivore(X) :- cat(X). ... (I have skipped family and genus in the hierarchy.) Now, suppose we have such a knowledge base of rules, we add the fact "dog(fido)" and we query whether "animal(fido)". In forward chaining, we will successively add "carnivore(fido)", "mammal(fido)", "vertebrate(fido)", and "animal(fido)". The query will then succeed immediately. The total work is proportional to the height of the hierarchy. By contast, if you use backward chaining, the query "~animal(fido)" will unify with the first rule above, and generate the subquery "~sponge(fido)", which will initiate a search for Fido through all the subdivisions of sponges, and so on. Ultimately, it searches the entire taxonomy of animals looking for Fido. In some cases, it is desirable to combine forward and backward chaining. For example, suppose we augment the above animal with features of these various categories: breathes(X) :- animal(X). ... backbone(X) :- vertebrate(X). has(X,brain) :- vertebrate(X). ... furry(X) :- mammal(X). warm_blooded(X) :- mammal(X). ... If all these rules are implemented as forward chaining, then as soon as we state that Fido is a dog, we have to add all his known properties to Gamma; that he breathes, is warm-blooded, has a liver and kidney, and so on. The solution is to mark these property rules as backward chaining and mark the hierarchy rules as forward chaining. You then implement the knowledge base with both the forward chaining algorithm, restricted to rules marked as forward chaining, and backward chaining rules, restricted to rules marked as backward chaining. However, it is hard to guarantee that such a mixed inference system will be complete.

Typical Ai Problems

4 min read
Typical AI problems: While studying the typical range of tasks that we might expect an “intelligent entity” to perform, we need to consider both “common-place” tasks as well as expert tasks. Examples of common-place tasks include – Recognizing people, objects. – Communicating (through natural language). – Navigating around obstacles on the streets These tasks are done matter of factly and routinely by people and some other animals. Expert tasks include: These tasks cannot be done by all people, and can only be performed by skilled specialists. Now, which of these tasks are easy and which ones are hard? Clearly tasks of the first type are easy for humans to perform, and almost all are able to master them. The second range of tasks requires skill development and/or intelligence and only some specialists can perform them well. However, when we look at what computer systems have been able to achieve to date, we see that their achievements include performing sophisticated tasks like medical diagnosis, performing symbolic integration, proving theorems and playing chess. On the other hand it has proved to be very hard to make computer systems perform many routine tasks that all humans and a lot of animals can do. Examples of such tasks include navigating our way without running into things, catching prey and avoiding predators. Humans and animals are also capable of interpreting complex sensory information. We are able to recognize objects and people from the visual image that we receive. We are also able to perform complex social functions.

Limits Of Ai

3 min read
Limits of AI Today: Today’s successful AI systems operate in well-defined domains and employ narrow, specialized knowledge. Common sense knowledge is needed to function in complex, open-ended worlds. Such a system also needs to understand unconstrained natural language. However these capabilities are not yet fully present in today’s intelligent systems. What can AI systems do: Today’s AI systems have been able to achieve limited success in some of these tasks. Limitations of Artificial Intelligence: The ultimate goal of research in AI and Robotics is to produce an android which can interact meaningfully with human beings. A huge amount of research effort is being exerted in order to achieve this aim and a lot of progress has already been made. Researchers have manufactured androids that can walk on two legs, that can climb stairs, that can grasp objects without breaking or dropping them, that can recognise faces and a variety of physical objects, that can imitate what they see human beings doing and so on. It is hard to make robots that can do these things and I have no desire to belittle the scientific achievements that have already been made, but even if a robot succeeds in doing all these things as well as a human being it will still lack at least one essential human ability, namely that of learning from other people by accepting what they say and by believing what they have written. The ultimate goal of AI cannot be achieved until we have implemented in a computer system the ability to acquire information from testimony.

Intelligent Agents

5 min read
Intelligent Agents: An Intelligent Agent must sense, must act, must be autonomous (to some extent),. It also must be rational. AI is about building rational agents. An agent is something that perceives and acts. A rational agent always does the right thing. Rationality: Perfect Rationality assumes that the rational agent knows all and will take the action that maximizes her utility. Human beings do not satisfy this definition of rationality. Rational Action is the action that maximizes the expected value of the performance measure given the percept sequence to date. However, a rational agent is not omniscient. It does not know the actual outcome of its actions, and it may not know certain aspects of its environment. Therefore rationality must take into account the limitations of the agent. The agent has too select the best action to the best of its knowledge depending on its percept sequence, its background knowledge and its feasible actions. An agent also has to deal with the expected outcome of the actions where the action effects are not deterministic. Bounded Rationality: “Because of the limitations of the human mind, humans must use approximate methods to handle many tasks.” Herbert Simon, 1972 Evolution did not give rise to optimal agents, but to agents which are in some senses locally optimal at best. In 1957, Simon proposed the notion of Bounded Rationality: that property of an agent that behaves in a manner that is nearly optimal with respect to its goals as its resources will allow. Under these promises an intelligent agent will be expected to act optimally to the best of its abilities and its resource constraints.

Structure Of Intelligent Agents

5 min read
Introduction: The job of AI is to design the agent program: a function that implements the agent mapping from percepts to actions. We assume this program will run on some sort of computing device, which we will call the architecture. Obviously, the program we choose has to be one that the architecture will accept and run. The architecture might be a plain computer, or it might include special-purpose hardware for certain tasks, such as processing camera images or filtering audio input. It might also include software that provides a degree of insulation between the raw computer and the agent program, so that we can program at a higher level. In general, the architecture makes the percepts from the sensors available to the program, runs the program, and feeds the program’s action choices to the effectors as they are generated. The relationship among agents, architectures, and programs can be summed up as follows: agent = architecture program Most of this book is about designing agent programs, although Chapters 24 and 25 deal directly with the architecture. Before we design an agent program, we must have a pretty good idea of the possible percepts and actions, what goals or performance measure the agent is supposed to achieve, and what sort of environment it will operate in.5 These come in a wide variety. Figure 2.3 shows the basic elements for a selection of agent types. It may come as a surprise to some readers that we include in our list of agent types some programs that seem to operate in the entirely artificial environment defined by keyboard input and character output on a screen. “Surely,” one might say, “this is not a real environment, is it?” In fact, what matters is not the distinction between “real” and “artificial” environments, but the complexity of the relationship among the behavior of the agent, the percept sequence generated by the environment, and the goals that the agent is supposed to achieve. Some “real” environments are actually quite simple. For example, a robot designed to inspect parts as they come by on a conveyer belt can make use of a number of simplifying assumptions: that the lighting is always just so, that the only thing on the conveyer belt will be parts of a certain kind, and that there are only two actions—accept the part or mark it as a reject.

Types Of Agent Program

5 min read
We will find that different aspects of driving suggest different types of agent program. We will consider four types of agent program: Simple reflex agents Agents that keep track of the world The option of constructing an explicit lookup table is out of the question. The visual input from a single camera comes in at the rate of 50 megabytes per second (25 frames per second, 1000 1000 pixels with 8 bits of color and 8 bits of intensity information). So the lookup table for an hour would be 260 60 50M entries. However, we can summarize portions of the table by noting certain commonly occurring input/output associations. For example, if the car in front brakes, and its brake lights come on, then the driver should notice this and initiate braking. In other words, some processing is done on the visual input to establish the condition we call “The car in front is braking”; then this triggers some established connection in the agent program to the action “initiate braking”. We call such a connection a condition–action rule7 written as if car-in-front-is-braking then initiate-braking Humans also have many such connections, some of which are learned responses (as for driving) and some of which are innate reflexes (such as blinking when something approaches the eye). In the course of the book, we will see several different ways in which such connections can be learned and implemented. Figure 2.7 gives the structure of a simple reflex agent in schematic form, showing how the condition–action rules allow the agent to make the connection from percept to action. (Do not worry if this seems trivial; it gets more interesting shortly.) We use rectangles to denote

Agents And Environments

5 min read
In this section we introduced several different kinds of agents and environments. In all cases, however, the nature of the connection between them is the same: actions are done by the agent on the environment, which in turn provides percepts to the agent. First, we will describe the different types of environments and how they affect the design of agents. Then we will describe environment programs that can be used as testbeds for agent programs. Properties of environments: Environments come in several flavors. The principal distinctions to be made are as follows: Accessible vs. inaccessible. If an agent’s sensory apparatus gives it access to the complete state of the environment, then we say that the environment is accessible to that agent. An environment is effectively accessible if the sensors detect all aspects that are relevant to the choice of action. An accessible environment is convenient because the agent need not maintain any internal state to keep track of the world. Deterministic vs. nondeterministic.: If the next state of the environment is completely determined by the current state and the actions selected by the agents, then we say the environment is deterministic. In principle, an agent need not worry about uncertainty in an accessible, deterministic environment. If the environment is inaccessible, however, then it may appear to be nondeterministic. This is particularly true if the environment is complex, making it hard to keep track of all the inaccessible aspects. Thus, it is often better to think of an environment as deterministic or nondeterministic from the point of view of the agent.

Utility-based Agents

2 min read
Utility-based agents: Goals alone are not really enough to generate high-quality behavior. For example, there are many action sequences that will get the taxi to its destination, thereby achieving the goal, but some are quicker, safer, more reliable, or cheaper than others. Goals just provide a crude distinction between “happy” and “unhappy” states, whereas a more general performance measure should allow a comparison of different world states (or sequences of states) according to exactly how happy they would make the agent if they could be achieved. Because “happy” does not sound very scientific, the customary terminology is to say that if one world state is preferred to another, then it has higher utility for the agent. Utility is therefore a function that maps a state9 onto a real number, which describes the associated degree of happiness. A complete specification of the utility function allows rational decisions in two kinds of cases where goals have trouble. First, when there are conflicting goals, only some of which can be achieved (for example, speed and safety), the utility function specifies the appropriate trade-off. Second, when there are several goals that the agent can aim for, none of which can be achieved with certainty, utility provides a way in which the likelihood of success can be weighed up against the importance of the goals. An agent that possesses an explicit utility function therefore can make rational decisions, but may have to compare the utilities achieved by different courses of actions. Goals, although cruder, enable the agent to pick an action right away if it satisfies the goal. In some cases, moreover, a utility function can be translated into a set of goals, such that the decisions made by a goal-based agent using those goals are identical to those made by the utility-based agent. The overall utility-based agent structure appears in Figure 2.12. Actual utility-based agent programs appear in Chapter 5, where we examine game-playing programs that must make fine distinctions among various board positions; where we tackle the general problem of designing decision-making agents.

Regression Algorithms

3 min read
Regression Algorithms: In statistics, regression analysis includes any techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. These are statistical algorithms predicting real valued labels but having both supervised & un-supervised learning. PCA (Principle Component Analysis) Principal component analysis (PCA) is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal component has as high a variance as possible (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to (uncorrelated with) the preceding components. Principal components are guaranteed to be independent only if the data set is jointly normally distributed. PCA was invented in 1901 by Karl Pearson. Now it is mostly used as a tool in exploratory data analysis and for making predictive models.

Pattern Recognition Methodologies

4 min read
Pattern Recognition Methodologies: There are four major methodologies in pattern recognition; these are statistical approach, syntactic approach, template matching & neural network. Statistical Approach: Typically, statistical pattern recognition systems are based on statistics and probabilities. In these systems, features are converted to numbers which are placed into a vector to represent the pattern. This approach is most intensively used in practice because it is the simplest to handle. In this approach, patterns to be classified are represented by a set of features defining a specific multidimensional vector: by doing so, each pattern is represented by a point in the multidimensional features space. To compare patterns, this approach uses measures by observing distances between points in this statistical space. Syntactic Approach: Also called structural pattern recognition systems, these systems are based on the relation between features. In this approach, patterns are represented by structures which can take into account more complex relations between features than numerical feature vectors used in statistical PRSs. Patterns are described in hierarchical structure composed of sub-structures composed themselves of smaller sub-structures. The shape is represented with a set of predefined primitives called the codebook and the primitives are called code words. For example, given the code words on the left of figure, the shape on the right of the figure can be represented as the following string S, when starting from the pointed codeword on the figure: