Wikinotes

Maintainer: admin

Tips for problem-solving and material we've encountered in problems. Sources: midterms, past finals, assignments, sample questions.

Under construction

Admissible = optimistic about the distance left (so $h \leq e$ where $e$ is the actual distance). The closer it is to the actual distance, the more useful the heuristic for all kinds of search algoriths.

For optimisation. Can maintain multiple solutions, not guaranteed to converge to correct answer. Describe the individuals, fitness functions, operations, population, etc.

Crossover: easy, just cross at the specified point

Doing it to depth $k+1$ instead of $k$ might be better, but it depends on the heuristic function.

Needs a good move-ordering for efficiency.

Only leads to optimal play against an opponent using the same heuristic function, if the function is good enough (like, not trivial). If the heuristic is shitty then who knows what will happen.

Works well with high branching factor. Not necessarily optimal (depends on the policy for generating trajectories, and how many were generated)

If we add a positive number to all edges this could change the result

Not guaranteed to converge on an optimal solution even for an admissible heuristic, unless the beam is wide enough to include all successor nodes every time; space and time are $O(k^d)$; lower memory requirement than A*.

For optimisation. Only keeps one best solution, guaranteed to converge with the right choice of parameters. If you lower the temp too quickly, you might get stuck in a local optimum; if you lower it too slowly it could take longer to reach the optimum than it will take to get our midterms back.

Self-explanatory. To ensure the maximum amount of pruning, sort it in increasing order at min nodes, decreasing at max.

Useful when you just want to find a valid solution, not necessarily the optimal solution. Describe with variables, domains, and constraints. Should indicate how the problem can be solved, too (e.g., depth-first search with backtracking).

Depth-limited search but you increase the depth and start again if there's still time. Leads to duplicated work but better memory requirements (linear). Good for large state spaces.

Can be done for game-tree search. The con is that a lot of work gets duplicated.

Can learn AND, OR, but not XOR (not linearly separable)

Take a vector of weights, find its dot product with some input vector, return yes if the output is above a certain threshold and no otherwise. Can be used to take a majority vote.

To represent XOR, you need multiple perceptrons:

From wikimedia commons

To represent some boolean formula, use +1 for positive literals, -1 for negative, and an appropriate threshold. Just try all values I guess

Also a binary classifier, also not linearly separable, but if we form a network with a single hidden layer we can represent any sort of boolean function; number of hidden units required is exponential in the number of inputs.

Not good for taking a majority vote

To compute the weight update rule: $w = w - \alpha\nabla J$ where $J$ is some least squares bullshit and $\nabla$ is the gradient of $o$ wrt $w$

Can we replace this with simulated annealing? Yes, if we let the algorithm move upward with some small probability, but it would be slower, less optimal, etc.

Training error: error rate on the training set.
Testing error: error rate on the test set.
Overfitting: When the error rate on the training set is low but the testing error is high
k-fold cross-validation: train on $9/k$, test on $1/k$, repeat $k$ times

If you don't know this already then you'll probably fail the exam

Find the event that is most likely to lead to a specific outcome. Take the log of the likelihood function (the product of the density function $n$ times), find its derivative, set to 0 (to find an optimum). This can also be done with gradient descent (guess and check)

Draw a circle for an event, arrows to indicate influence, then write the probabilities (given any influencing events) next to each event.

$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$

Value of information: Expected value of best action with the information - expected value of best action with the information

Difference between UCB and epsilon-greedy: the former wants to reduce uncertainty, the latter is just greedy (with some small probability for non-greediness); the differences even out over time

ands of ors. The harder one. It looks like $\to$ can be left as-is, though, judging from the solutions to 4b for the sample questions

Skolemisation lets you replace an existential quantifier with some arbitrary variable, say $X0$.

Use variables for things if necessary (like "Bob").

Note that you can't do things like "most" in FOL. You can represent a finite, constant number though, like "there are two different boys in the world" (just state that they're not equal)

Use skolemisation. Resolution: if $a \to b$ and $b \to c$ then $a \to c$ (transitivity). Unification: replace a variable by something else (like a skolem constant)

States, actions, transition probabilities, rewards

Policy: Whatever gets the highest reward.
Value at a state: Compute the reward accumulated by that state if we behave acc. to the optimal policy
Optimal value function: $V^*(n)$, a function indicating the optimal value if we start at state $n$ (and continue until the end of time). It looks like we can leave it in recurrence relation form, judging from the solutions to 6c for the sample questions
Value iteration, $V_n(k)$: later

fuck this

later

Basically an essay question. Given a problem, how would you solve it using AI techniques, etc. MDPs are a good bet. Describe rewards, transition models, states, etc. For optimisation problems,

HTSEFP

1Search and game-playing¶

1.1Heuristics¶

1.2Genetic algorithms¶

1.3Alpha-beta¶

1.4Monte Carlo tree search¶

1.5Uniform cost search¶

1.6Beam search¶

1.7Simulated annealing¶

1.8Minimax trees¶

1.9Constraint satisfaction¶

1.10Iterative deepening¶

2Machine learning¶

2.1Linear perceptron¶

2.2Sigmoid neuron¶

2.3Decision trees¶

2.4Gradient descent for training neural networks¶

2.5Function approximation¶

3Probability¶

3.1Basic probability calculations¶

3.2Maximum likelihood estimation¶

3.3Drawing Bayes nets¶

3.4Bayes theorem¶

3.5Utility theory¶

3.6Bandit problems¶

4Logic¶

4.1Converting to CNF¶

4.2Translating into first-order logic¶

4.3Proving statements from first-order logic¶

5MDPs¶

5.1Describing an MDP¶

5.2Optimal policies and values¶

5.3Bellman equations¶

5.4Temporal difference learning¶

6Problem formulation¶