首页 An Introduction to Probabilistic Graphical Models

An Introduction to Probabilistic Graphical Models

举报
开通vip

An Introduction to Probabilistic Graphical Models PROBABILISTIC GRAPHICAL MODELS David Madigan Rutgers University madigan@stat.rutgers.edu Expert Systems •Explosion of interest in “Expert Systems” in the early 1980’s IF the infection is primary-bacteremia AND the site of the culture is one of th...

An Introduction to Probabilistic Graphical Models
PROBABILISTIC GRAPHICAL MODELS David Madigan Rutgers University madigan@stat.rutgers.edu Expert Systems •Explosion of interest in “Expert Systems” in the early 1980’s IF the infection is primary-bacteremia AND the site of the culture is one of the sterile sites AND the suspected portal of entry is the gastrointestinal tract THEN there is suggestive evidence (0.7) that infection is bacteroid. •Many companies (Teknowledge, IntelliCorp, Inference, etc.), many IPO’s, much media hype •Ad-hoc uncertainty handling Uncertainty in Expert Systems If A then C (p1) If B then C (p2) What if both A and B true? Then C true with CF: p1 + (p2 X (1- p1)) “Currently fashionable ad-hoc mumbo jumbo” A.F.M. Smith Eschewed Probabilistic Approach •Computationally intractable •Inscrutable •Requires vast amounts of data/elicitation e.g., for n dichotomous variables need 2n - 1 probabilities to fully specify the joint distribution Conditional Independence X Y | Z )|()|()|,( |||, zyfzxfzyxf ZYZXZYX =! Conditional Independence •Suppose A and B are marginally independent. Pr(A), Pr(B), Pr(C|AB) X 4 = 6 probabilities •Suppose A and C are conditionally independent given B: Pr(A), Pr(B|A) X 2, Pr(C|B) X 2 = 5 •Chain with 50 variables requires 99 probabilities versus 250-1 A B C C A | B Properties of Conditional Independence (Dawid, 1980) CI 1: A B [P] ⇒ B A [P] CI 2: A B ∪ C [P] ⇒ A B [P] CI 3: A B ∪ C [P] ⇒ A B | C [P] CI 4: A B and A C | B [P] ⇒ A B ∪ C [P] For any probability measure P and random variables A, B, and C: Some probability measures also satisfy: CI 5: A B | C and A C | B [P] ⇒ A B ∪ C [P] CI5 satisfied whenever P has a positive joint probability density with respect to some product measure Markov Properties for Undirected Graphs X1 X5 X2 X3 X4 (Global) S separates A from B ⇒ A B | S (Local) α V \ cl(α) | bd (α) (Pairwise) α β | V \ {α,β} (G) ⇒ (L) ⇒ (P) X2 X5, X4 | X1, X3 (1) ⇒ X2 X4 | X1, X3, X5 (2) To go from (2) to (1) need X5 X2 | X1,X3? or CI5 Lauritzen, Dawid, Larsen & Leimer (1990) Factorizations A density f is said to “factorize according to G” if: f(x) = Π ψC(xC) C ε C Proposition: If f factorizes according to a UG G, then it also obeys the global Markov property “Proof”: Let S separate A from B in G and assume Let CA be the set of cliques with non-empty intersection with A. Since S separates A from B, we must have for all C in CA. Then: “clique potentials”• cliques are maximally complete subgraphs .SBAV !!= !="CB )()()()()( 21 \ SBSA CCC CC CC CC xfxfxxxf AA !! "" == ## $$ Markov Properties for Acyclic Directed Graphs (Bayesian Networks) X3 X1 (Global) S separates A from B in Gan(A,B,S)m ⇒ A B | S (Local) α nd(α)\pa(α) | pa (α) (G) ⇔ (L) Lauritzen, Dawid, Larsen & Leimer (1990) X2 X3 X1 X2 Factorizations ADG Global Markov Property ⇔ f(x) = Π f(xv | xpa(v) ) v ε V A density f admits a “recursive factorization” according to an ADG G if f(x) = Π f(xv | xpa(v) ) Lemma: If P admits a recursive factorization according to an ADG G, then P factorizes according GM (and chordal supergraphs of GM) Lemma: If P admits a recursive factorization according to an ADG G, and A is an ancestral set in G, then PA admits a recursive factorization according to the subgraph GA Factorizations D B A C E F G H S p(A,B,C,D,E,F,G,H,S) = p(A)p(C|A)p(D|C)p(S|D,F)p(E|S) p(F|G)p(G|B)p(H|S,B)p(B) ⇒ p(S|A,B,C,D,E,F,G,H) ∝ p(S|D,F)p(E|S)p(H|S,B) {D,F,W,H,B} is the “Markov Blanket” of S. It contains the parents of S, the children of S, and the other parents of the children of S. Markov Properties for Acyclic Directed Graphs (Bayesian Networks) (Global) S separates A from B in Gan(A,B,S)m ⇒ A B | S (Local) α nd(α)\pa(α) | pa (α) (G) ⇒ (L) α ∪ nd(α) is an ancestral set; pa(α) obviously separates α from nd(α)\pa(α) in Gan(α∪nd(α))m (L) ⇒ (factorization) induction on the number of vertices d-separation A chain π from a to b in an acyclic directed graph G is said to be blocked by S if it contains a vertex γ ∈ π such that either: - γ ∈ S and arrows of π do not meet head to head at γ, or - γ ∉ S nor has γ any descendents in S, and arrows of π do meet head to head at g Two subsets A and B are d-separated by S if all chains from A to B are blocked by S d-separation and global markov property Let A, B, and S be disjoint subsets of a directed, acyclic graph, G. Then S d-separates A from B if and only if S separates A from B in Gan(A,B,S)m UG – ADG Intersection A C B A B C A B C A C C A | B A C | B A C D A B CA B | C,D C D | A,B A C | B B A B C A C | B UG – ADG Intersection UG ADG Decomposable •UG is decomposable if chordal •ADG is decomposable if moral •Decomposable ~ closed-form log-linear models No CI5 Chordal Graphs and RIP •Chordal graphs (uniquely) admit clique orderings that have the Running Intersection Property T V L A X D B S 1. {V,T} 2. {A,L,T} 3. {L,A,B} 4. {S,L,B} 5. {A,B,D} 6. {A,X} •The intersection of each set with those earlier in the list is fully contained in previous set •Can compute cond. probabilities (e.g. Pr(X|V)) by message passing (Lauritzen & Spiegelhalter, Dawid, Jensen) Probabilistic Expert System •Computationally intractable •Inscrutable •Requires vast amounts of data/elicitation •Chordal UG models facilitate fast inference •ADG models better for expert system applications – more natural to specify Pr( v | pa(v) ) Factorizations UG Global Markov Property ⇔ f(x) = Π ψC(xC) C ε C ADG Global Markov Property ⇔ f(x) = Π f(xv | xpa(v) ) v ε V Lauritzen-Spiegelhalter Algorithm B A S C D ψ (C,S,D) ← Pr(S|C, D) ψ(A,E) ← Pr(E|A) Pr(A) ψ (C,E) ← Pr(C|E) ψ(F,D,B) ← Pr(D|F)Pr(B|F)Pr(F) ψ (D,B,S) ← 1 ψ (B,S,G) ← Pr(G|S,B) ψ (H,S) ← Pr(H|S) Algorithm is widely deployed in commercial software E F G H B A S C D E F G H •Moralize •Triangulate L&S Toy Example A B C Pr(C|B)=0.2 Pr(C|¬B)=0.6Pr(B|A)=0.5 Pr(B|¬A)=0.1 Pr(A)=0.7 ψ(A,B) ← Pr(B|A)Pr(A) ψ (B,C) ← Pr(C|B) A B C AB B BC A B 0.35 0.35 ¬A 0.03 0.27 ¬B B C 0.2 0.8 ¬B 0.6 0.4 ¬C B 1 1 ¬B Message Schedule: AB BC BC AB B 0.38 0.62 ¬B B C 0.076 0.304 ¬B 0.372 0.248 ¬C B C 0.076 0 ¬B 0.372 0 ¬C Pr(A|C) Other Theoretical Developments Do the UG and ADG global Markov properties identify all the conditional independences implied by the corresponding factorizations? Yes. Completeness for ADGs by Geiger and Pearl (1988); for UGs by Frydenberg (1988) Graphical characterization of collapsibility in hierarchical log-linear models (Asmussen and Edwards, 1983) Collapsibility Care Less Survival No Yes 3 176 1.7% More 4 293 1.4% Care Less Survival No Yes 17 197 7.9% More 2 23 8.0% Clinic A Clinic B Care Less Survival No Yes 20 373 5.1% More 6 316 1.9% Pooled Collapsibility Care Clinic Surv. Theorem: A graphical log-linear model L is collapsible onto A iff every connected component of Ac is complete. Bayesian Learning for Discrete ADG’s • Example: three binary variables • Five parameters: Local and Global Independence Bayesian learning Consider a particular state pa(v)+ of pa(v) • ADG models for a fixed set of vertices decompose into Markov equivalence classes: A B C A B C A B C A C | B A D | B,C B C | A a c b d a c b da c b d D 1 D 2 D 3 a c b d D 4 A D | B,C B C Equivalence Classes and Chain Graphs • Repeating analyses for equivalent ADGs leads to significant computational inefficiencies. • Ensuring that equivalent ADGs have equal posterior probabilities imposes severe constraints on prior distributions (Geiger and Heckerman, 1995). • Bayesian model averaging procedures that average across ADGs assign weights to statistical models that are proportional to equivalence class sizes. Why is this a problem? Theorem (Verma & Pearl, Glymour et al, Frydenberg, AMP94): Two ADGs are Markov equivalent iff they have the same skeletons and the same immoralities. Definition The essential graph D* associated with D is the graph D* := ∪(D’|D’ ~ D), a c b d a c b d a c b d Equivalence Class Characterization a c b d a c b da c b d D 1 D 2 D 3 Essential Graphs AMP (1995) • Essential graphs are chain graphs • D* is the unique smallest chain graph Markov equivalent to D • A graph G = (V, E) is equal to D* for some ADG D if and only if G satisfies the following four conditions: (i) G is a chain graph; (ii) For every chain component t of G, Gt is chordal; (iii) The configuration a→bc does not occur as an induced subgraph of G; (iv) Every arrow a→b ∈ G is strongly protected in G: a b c (a) a b c (b) a b c (c) a b c 1 (d) c 2 (c 1 !c 2 ) also Meek (1995) and Chickering (1995) What’s a Chain Graph? “Equivalence”: a ~ b iff a b Chain Components (“Boxes”) Chain Graphs UG ADG Decomposable •Chain graph Markov property, Frydenberg (1990) •Equivalence results (LWF, AMP, Meek, Studeny) CG A C D C D | A,B B C D | Aor ? Cox & Wermuth (1996)
本文档为【An Introduction to Probabilistic Graphical Models】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_917499
暂无简介~
格式:pdf
大小:460KB
软件:PDF阅读器
页数:35
分类:
上传时间:2010-12-01
浏览量:1363