Bayes' Nets

$$ % latex command here \gdef\independent{\perp\kern-5pt \perp} \gdef\ConditionallyIndependent#1#2#3{#1 \perp\kern-5pt \perp #2 \mid #3} $$ Probability Marginal Distribution Conditional Probabilities Normalization Trick Inference Inference by Enumeration Inference with Bayes’ Rules Bayes’ Nets Independence: $X \independent Y \iff \forall x, y: P(x, y) = P(x)P(y)$ $\forall x, y: P(x|y) = P(x)$ Conditional Independence: $\ConditionallyIndependent{X}{Y}{Z} \iff \forall x, y, z: P(x, y |z)= P(x|z)P(y|z)$ $\forall x, y, z: P(x|z,y) = P(x |z)$ D-Separation Query: $\ConditionallyIndependent{X_i}{X_j}{{X_{k_1}, \dots, X_{k_n}}}$?

Posts

Reinforcement Learning

don’t know $T$ or $R$ model-based -> get $T$ and $R$ model-free passive: learn state values temporal difference(TD) learning $sample = R(s, \pi(s), s’) + \gamma V^{\pi}(s’)$ $V^\pi(s) \gets (1 - \alpha)V^\pi (s) + (\alpha) sample$ or $V^\pi(s) \gets V^\pi (s) + \alpha (sample - V^\pi(s))$ limit: make new policy?? active: Q-learning $Q_{k+1}(s, a) \gets \sum_{s’}T(s, a, s’)[R(s, a, s’) + \gamma \max_{a’} Q_k(s’, a’)]$ from sample (s, a, s’,

Posts

MDP (Markov Decision Process)

random known environment non-deterministic search problems solve w/ expectimax find optimal policy $\pi^*: S \rightarrow A$ discounting: make short solution less cost value iteration time: $O(S^2 A)$ will coverage to unique optimal values init $V_0(s) = 0$ $V_{k+1}(s) \gets \max_a \sum_{s’} T(s, a, s’) [R(s, a, s’) + \gamma V_k(s’)]$ policy iteration policy evaluation $V_0^\pi(s) = 0$ $V_{k+1}^\pi(s) \gets \sum_{s’}T(s, \pi(s), s’)[R(s, \pi(s), s’) + \gamma V_{k}^\pi (s’)]$ time: $O(S^2)$ per iteration extraction: $\pi^*(s) = \arg\max_a\sum_{s’}T(s, a, s’)[R(s, a, s’) + \gamma V^*(s’)]$ extraction: $\pi^*(s) = \arg\max_a Q^*(s, a)$ policy improvement update with one-step look-ahead

Posts

Game Tree

Game Trees: Adversarial Search zero-sum: maximize, minimize 현재 상태에서 최선을 다하기: w/ evaluation functions depth matters minimax pruning time: $O(b^m)$, space: $O(bm)$ alpha-beta pruning time: $O(b^{m/2})$ exact solution: impossible Uncertainty & Utilities expectimax random agent

Posts

CSP (Constraint Satisfaction Problem)

ex) map coloring constraint graphs constraints unary constraints binary constraints high-order constraints: involve ≥3 variables (cryptarithmetic column constraints) preferences(soft constraints): cost search backtracking search DFS one variables at a time, check constraints as you go improve ordering: of variables, of values filter: detect inevitable failure early structure: exploit problem structure filtering: forward checking keep domains for unassigned variables filtering: arc consistency check all arc($X \rightarrow Y$) consistent $\iff$ $\forall x, \exists y P$ ordering: MRV (minimum remaining values) choose variable with fewest values remain fail-fast ordering, most constrained variable ordering: LCV (least constraining value) structure: nearly tree-structured CSP conditioning: instantiate variable, prune its neighbors’ domains cutset conditioning: make remaining constraint graph a tree time: $O(d^c (n-c)d^2)$ hill climbing start wherever repeat: best neighboring state beam search like greedy hill climbing search, but keep $K$ states at all times

Posts

Search

graph vs tree algorithms DFS (Depth-First Search) strategy: expand deepest node first fringe: LIFO stack for $m$ tiers (depth), $b$ nodes (children) time: $O(b^m)$ space: $O(bm)$ $m$ infinite -> non-complete non-optimal: leftmost solution BFS (Breath-First Search) strategy: expand shallowest node first fringe: FIFO queue for $s$ shallowest tiers (depth), $b$ nodes (children) time: $O(b^s)$ space: $O(b^s)$ complete (for given $s$) optimal: only if costs all 1 UCS (Uniform Cost Search) strategy: expand cheapest node first fringe: priority queue (w/ cumulative cost) for solution cost $C^*$, minimum arc cost $\varepsilon$ => effective depth $C^* / \varepsilon$ (tiers) time: $O(b^{C^* / \varepsilon})$ space: $O(b^{C^* / \varepsilon})$ complete / optimal Greedy Search strategy: expand node that (think is) closest to a goal state A* uniform-cost: orders by path cost / backward cost $g(n) greedy: orders by goal proximity / forward cost $h(n)$ A*: orders by the sum: $f(n) = g(n) + h(n)$ admissible heuristics $\iff 0 \le h(n) \le h^*(n) \quad $where $h^*(n)$ is the true cost to nearest goal -> make problem simpler consistent heuristics