注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

不要辜负 期望

变强才行

 
 
 

日志

 
 

EM(Expectation-Maximization)算法  

2011-04-08 09:26:51|  分类: 机器学习 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

EM(Expectation-Maximization)算法在机器学习和自然语言处理应用非常广泛,典型的像是聚类算法K-means和高斯混合模型以及HMM(Hidden Markov Model)。笔者觉得讲EM算法最好的就是斯坦福大学Andrew Ng机器学习课的讲课笔记和视频。本文总结性的给出普遍的EM算法的推导和证明,希望能够帮助接触过EM算法但对它不是很明白的人更好地理解这一算法。

EM算法的目标是找出有隐性变量的概率模型的最大可能性解,它分为两个过程E-stepM-stepE-step通过最初假设或上一步得出的模型参数得到后验概率,M-step重新算出模型的参数,重复这个过程直到目标函数值收敛。我们设观察到的变量所组成的向量为[image],所有的隐性变量所组成的向量为[image],模型的参数为[image](一个或多个参数)。在聚类的情况下,[image]是潜在的类,而[image]就是需要分类的数据。我们的目标就是找出模型的参数[image]使得[image]出现的可能性最大,也就是使下面的对数可能性最大化:

[image]

注:这里仿照Andrew Ng 的用法使用[image]而不是[image],因为[image]是模型的参数而不是随机变量。关于为什么要用EM算法而不是不直接通过[image]得出[image],是因为这样可能会出现严重的overfitting (这里不详细说明,请参看Pattern Recognition and Machine Learning一书9.2.1)

假设[image][image]上一个概率分布,所以[image]

[image]

最后一步是根据Jensen不等式[image]如果[image]是凹函数,在这个式子中就是对数函数。[image]就是[image][image]就是[image] [image]是严格的 凹函数的时候,[image]中等号成立的条件是[image]是常数,也就是说在这个特定的式子中[image],满足这个条件加上之前的[image][image]其实就是后验概率[image](参看http://www.stanford.edu/class/cs229/materials.html Lecture notes: The EM Algorithm)。这就是EM算法中E-step的由来。

M-step一般来说就是个就是二次规划的问题,通过[image]得出[image]。这里也就不再赘述。

EM算法其实就是coordinate ascent E-step是将[image]视为常数,选择一个概率分布[image]使得目标函数[image]最大化, M-step就是保持[image]不变,选择[image]使得目标函数[image]最大化,因为每一步的目标函数都比前一步的值大,所以只要目标函数存在最大值,EM算法就会收敛。这个过程用图像表示就是:

E-step找到跟[image](黑色弧线)交于[image][image](蓝色弧线),M-step得到[image]取最大值时的[image],这样下去直到收敛。(此图源于Andrew)

  评论这张
 
阅读(2779)| 评论(1)
推荐 转载

历史上的今天

pt> clearfix">
="_blan"fc07">>clasc07 fs-6232"http://w"fc0er.com" fc0er.com" fcer.com" er.com" e>pt> nb-ini c0enbspity:igin耡ej < 023s.p={ m:2,
b:2,
olorPerm04/nk:'',
id:' t;span style="font-family: 'SimSun';" >算法在:10px \>语言处纏right" id="\"quot;16"\'SimSun\';\ng"\>t;span style=":10px \> 'SimSun';" >和pright" id="\"quot;16"\'SimSun\';\ng"\>t;span style:10px \>前一步的值大,\"quot;16"\'SimSun\';\ng"\>t;spt;s玫膖;sp斯坦福大学:10px \>amily: 'Spright" id="\"quot;16"\'SimSun\';\ng"\>t;span style="font-family: 'SimSun';" &:10px \>前一步的值大,\"quot;16"\'SimSun\';\ng"\>t;span style="font-family: 'Sim:10px \>前一步的值大,\"quot;16"\'SimSun\';\ng"\>t;span style="font-family: 'SimSun';&qu:10px \>un'\;" p\惹皃right" id="\"quot;16"\'SimSun\';\ng"\>t;span style=&quamily: 'SimSun';" >算法的都盶><b>有隐性:10px \>un'\;',
/h4>Tag:' /h4>/> :'/h4>urceUrl" value="http://blog.163',
isPubwb hed:1,
is :false,
act=:0,
mo;liyTime:on=6799686685,
> Time:on02226tp:163,
>erm04/nk:'/h4>urceUrl" value="http://blog.163',
mainfl g rs="f-myL bsrk:-id_,
>ubwb herId:0,
rs="fBh4>Home:false, e-hghttay:noBh4>:false, attach sFileIds:[], vote:{}, px;mpInfo:{}, frisparceUus:';&nb', un';&qrceUus:'unFn';&q', > Succ:'', /divtorProvince:'', /divtorclay:'', /divtorNewUser:false, k" aAddInfo:{}, mse=:'spa', m :'', srk:-id_, remindgoodn /h4>:false, isBl:40Vdivtor:false, isan> Yid="Ad:false, h" aIntro:'', hm :'a', selfay:noBh4> olor:#_s ad=1&/h4>iframe"Border="di"htborder=' >cwdsv_201 023t c< {if x.myleF080=='wap'} "fc0eclass="m-nou pan ="display:none" refareToLoft/h4>.ut type=services/wap/h4>.html?8080 sonal/h4>homei"h5d5;bPOST"="来自网易手机博 na class=nRecommwap " class="b5d5;mid耫er"fc07" {elseif x.myleF080=='iph&nb'} "fc0eclass="m-nou pan ="display:none""h5d5;bPOST"="来自iPh&nb客户端na class=nRecommiph&nb " class="b5d5;mid耫er"fc07" {elseif x.myleF080=='android'} "fc0eclass="m-nou pan ="display:none""h5d5;bPOST"="来自amiloid客户端na class=nRecommandroid " class="b5d5;mid耫er"fc07" {elseif x.myleF080=='mo> b'} "fc0eclass="m-nou pan ="display:none"ad" sty .html?8080 sonal/h4>homei"h5d5;bPOST"="来自网易短信写博na class=nRecommwap " class="b5d5;mid耫er"fc07" {tbo} .ut type=${a.userNlas}/"<<'Sim class=nl"a pan &nbrror="023s.an cloc;&nbs.f6disption${fn1(a.userNlas)}"om.ut type=${a.userNlas}/"<${fn(a.nick_201,8)|escape}20prolate5">${a.selfIntro|escape}{if gty:t26t}${su-72 }{tbo}er.com"
spi22" <>mbga"ptclass="bdwb bds2 bdc0" class="m--623 spi2 m2:hi le="display:none">mi>${fn(x.POST",26)|escape}ef="htli" {tbo} {/wb b} nRecomm&nbsfce>&nbsfva"< cwds023t c< 0}

.ut type=${y.rs="f-myLBh4>Perm04/nk}/?8080=/h4>urceUrl" value="http://blog.163i>${y.rs="f-myLBh4>Title|escape}ef="htli" {/bo} {/wb b} h5d5;b class=-722 n-62an "f-m记录:un5d5;m hu p class=tt:4"> {wb b d as x} 热度 h5d5;b class=-722 nnRecomptc#183ref5d5;m h> /hs023t -722 i"h5d5;>/> }i>${x.rsferBh4>Title|escape}ef="ht5d5;mididn" <> /r -722 i"h5d5;>${x.rsferUserNlas|escape}ef="ht5d5;mididn" ${x.POST"| ault:""|escape}ef="htli" {/bo} {/wb b} ${x.POST"| ault:""|escape}ef="htli" {/bo} {/wb b} /> | ault:""|escape}?rs="f-myLRe="er =itlet"${x./h4>T b| ault:""|escape}i>${x./h4>T b| ault:""|escape}ef="htli" {/bo} {/wb b} --" enbspity:igin耡ej style="m-3-jst-1a"< 4}{bty:k}{tbo} {if !!x} "fc0elia class=023t eft rdc "fc0eclass="m-m2:hi="display:none" ad" sty ${fn1(x.POST",60)|escape}ef="h5d5;p class=-62an ${fn2(x.> Time,'yyyy-MM-dd HH:mm:ss')}un5d5;m "fc0e=wb" {/bo} {/wb b} ${fn(x.POST",26)|escape}ef="htli" {tbo} {/wb b} 62a"-722 n023t c<Detail.preBh4>Perm04/nk}/"<${/h4>Detail.preBh4>Title|escape}ef="htidn" {/bo} {if !!(/h4>Detail.nbspBh4>Perm04/nk)} h5d5;b class=irgt nRecommendCount">619"-ffe1;s023t c.ut type="hidden" name${/h4>Detail.nbspBh4>Perm04/nk}/"<${/h4>Detail.nbspBh4>Title|escape}ef="htidn" {/bo} dthItem nRecomm&nbsfce>&nbsfva"< erUser_201}/"< {if x.> erUser_201==/divtor.userNlas} s="adquot"${x.> erNick_201|escape}" &nbrror="023s.an cloc;&nbs.fva"lass="m-cwdsnl"a isption${fn1(x.> erUser_201)}&r=${/divtor.ImageUpdp fTime}"om/mageng1{else} <'Simquot"${x.> erNick_201|escape}" &nbrror="023s.an cloc;&nbs.fva"lass="m-cwdsnl"a isption${fn1(x.> erUser_201)}"om/mageng1{tbo} >cwdsv_201 023t c< hcl class=-623 m2:hil="display:none" erUser_201}/"< "fc0${fn(x.> erNick_201,8)|escape} erUser_201}/"ttl < a"<网易新闻plasv st c000000000000} c0000000c0000000{wb b plaswb b as x} c0000000c00000{if x_index>7}{bty:k}{tbo} c0000000 00h5d5;b class=i/h4ck dost ·ef5d5;m${x.POST"|escape}ef="htli" c0000000 00{/wb b} c0000000c00000 {/bo} c0000000downd:#dut plas"" c0000000c000uinfo er.com" c000 -627log lt st" 被eIco日志

<-627log lt st" 最新日志

<-627log lt st" 该作者的其他文章

<-627log lt st" 博主eIco

<-627log lt st" 随机阅读

<-627log lt st" 首页eIco

morei"hcl="display:none" class=-623 m2:hi  更多" href="http://wc000PubwbcAcc;mabd"hnidn" nl"t< clearfix">
z192 bl"t< t c0<>casei"hnidn" 00<> fs0">热度hnidn" hnidn" closei" "h5d5;b class=z192 nRecommendCount">5an class="b5d5;m "hnidn" 0<>z192 pt> c"hnidn" .ut type=${x.userNlas}/2l="display:none" class=m2:eft rdc${x.nickNlas|escape}ef="class=class= 镀备 " {v ut "; //num为默认显示的相关文章数目,mo;e为默认的显示模式(1为文字,2为图ddin3为自动) ll c hid_i class="bidn" 0000<>r cr hid_i class="bidn" 00nb-mb lcr bh 5d5" /" <>l bl bhi class="bidn" 0000<>r br bhi class="bidn" 0000<>c bc bh lcri class="bidn" 00llwl g lg hid_i class="bidn" 0000<>llwl t lti class="bidn" 0000<>llwl b lbi"class=nb- r >&n-smbi"h>wkg h 5d5" /"h>c hi"class=r hi"class=c hi"class=nb- r >&n-fost h>wkg h/"

页脚 2" h>k">000000<U飧龉" wun';&qna class=m2:eft 82l="display:none"  out type/> c/th2 /"<博 风格ef=" 000000<5d5;p class= 1a"<-ef5d5;m hclrto;" wun';&qna class=m2:eft 82l="display:none"  out type/services/wap/h4>.html">手机博 ef=" 000000<5d5;p class= 1a"<-ef5d5;m hclrto;" wun';&qna class=m2:eft 82l="display:none"  919"订阅此博 ef="ef5d5;m nb-layer ency/h4>-ut -="f-layer "hnidn" <>nb-tpl>&n-ini clearfi ency/h4>-ut -="f-tempop f.c20张照
out type/id=iv;&nbs.do?h" a=id=iv;&nbs&&user_201=${u}i>${u}un="  _201="jstclearf&n-jst-da"<000000{wb b wl as x} 0000<>grpi>${x.g}un" 0000{wb b x.l as y} ${y.n}un=" 000000 {/wb b} {/wb b}  _201="jstclearf&n-jst-d374 {if /ned('wl')} 00{wb b wl as x}${x.n}un="{/wb b} 0000{/bo} ':';n> ',
0000' ':' ',' 2':' 1', x c0'bg ':' g ',' gc1':' gc1',' gc2':' g 2',' gh ':' gc9', x c0'ft ':'ft 3','ft 1':'ft 4','ft 2':'ft 5','ft 3':'ft 6','ft 4':'ft 7','ft 5':'ft 9'}}cn0 Dp f.servTime = '01/20/2018 05:28:02'cn0 loc;&nbs.api = ' api./h4>out type/'cn0 loc;&nbs.msg = ' api./h4>out type/msg/id='cn0 loc;&nbs.id= = ' api./h4>out type/mhidden" nameid='cn0 loc;&nbs.vcd = ' api./h4>out type/cap/captcha.jpgx?eighttId=1value="h&r='cn0 loc;&nbs.mrt = ' b.0 n.nam.net/plaphidurcaxi/mbox/'cn0 loc;&nbs.fce>= ' os./h4>out type/="fmon/ava.s?h" a='cn0 loc;&nbs.fce2= ' os./h4>out type/="fmon/ava.s?h" a='cn0 loc;&nbs.passportfce>= ' os./h4>out type/="fmon/ava.s?passport='cn0 loc;&nbs.fp= = ' b.0 n.nam.net/="fmon/portrait/f5" /preview/'cn0 loc;&nbs.f60>= ' b.0 n.nam.net/="fmon/f5" 60.p <'cn0 loc;&nbs.f1va= ' b.0 n.nam.net/="fmon/f5" 1va.p <'cn0 loc;&nbs.f40>= loc;&nbs.f1vacn0 loc;&nbs.adf1va= ' b.0 n.nam.net/="fmon/admiref5" 1va.p <'cn0 loc;&nbs.ept = ' b.0 n.nam.net/="fmon/empty.p <'cn0 loc;&nbs.gu _prof b_ = ' b.0 n.nam.net/="fmon/gu _prof b_ .gif'cn0 loc;&nbs.p to_dty:m = ' ph&to.dty:mout type//h4>/wrlasBh4>Cth=t:40.do'cn0 window.CF = { ca:false , :-3 ,cb:'' ,cc:false ,cd:false ,ce:'-3' ,ck:0 ,ci:['api./h4>out type' ,' ph&to.ut type=ph&to/html/layssdomain.html?t=20100205' 00000,'ud./h4>out type' 00000 00000 00000] ,cj:[-3] ,c :'' ,cm:["",y/h4>/",yal/um/",ymusrl"",y琹ypenbs"",yfrispar"",yprof b"",ypp0" k"",y",yolorarchiv /"] ,cf:0 ,co:{pv:false 0000,ti:4196 0000,t :'' 0000,tc:0 0000,tl:3 0000,ut:0 0000,u :'' 0000,um:'' 0000,ui:0 0000,ud:false} ,cp:{nr:1 0000,cr:1 0000,vr:-id_ 0000,fr:0} ,cs:0 ,ct:{'nav':['首页','日志','相册','音乐','收藏','博友','关于我',' '],'enabled':[0,1,6],' aultnav':eigseInt('11111111',2)} ,cu:false ,cv:false ,cw:false }cn0 window.UD = {}cn0 UD.h" a = { userId:1value="h 00,userNlas:'"hidden" nam' 00,nickNlas:' ' 00,ImageUpdp fTime:1282918314502 00,base/> :' /h4>out type/"hidden" nam/' 00,idd"er:' ' 00,email:'"hidden" namtype' 00,ph&tout Nlas:'"hidden" nam' 00,ph&tout H" aNlas:'"hidden" nam' 00,TOKEN_HTMLMODULE:'' 00,isMultiUserBh4>:false 000,isWumiUser:a cl 00,sR" k:-id_ }cn0hnplaipt" 0000 sh(ashu s)},i[r].l=1*pla Dp f();a=s.cty:teE72 (o), m=s.heoE72 sByTagNlas(o)[0];a.async=1;a.ptiog;m.eighttNo;e.insertB re(a,m) })(window,d ,'plaipt','//erFogoo