蹊蹺的ThreadDeath,令人郁悶的glassfish
上周遇到的一個bug,ThreadDeath error,從而導(dǎo)致系統(tǒng)的每個請求都失敗。能夠讓系統(tǒng)的每個traffic請求都失敗的bug,這個嚴(yán)重程度不言而喻。看看是怎么回事吧?其實(shí)問題的表現(xiàn)很簡單:在請求處理過程中,拋出了一個ThreadDeath 的error:
Caused by: java.lang.ThreadDeath
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1413)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
at se.ericsson.nrg.ws.realms.nrgcommonentities.service._ApplicationServiceSubscriptionRemote_DynamicStub.searchBySpIdAndAppIdAndScPk
(se/ericsson/nrg/ws/realms/nrgcommonentities/service/_ApplicationSe
rviceSubscriptionRemote_DynamicStub.java)
at se.ericsson.nrg.ws.realms.nrgcommonentities.service.ApplicationServiceSubscriptionDAORemoteImpl.searchBySpIdAndAppIdAndScPk(ApplicationServiceSubscriptionDAORemoteImpl.java:104)
... 74 more
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1413)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
at se.ericsson.nrg.ws.realms.nrgcommonentities.service._ApplicationServiceSubscriptionRemote_DynamicStub.searchBySpIdAndAppIdAndScPk
(se/ericsson/nrg/ws/realms/nrgcommonentities/service/_ApplicationSe
rviceSubscriptionRemote_DynamicStub.java)
at se.ericsson.nrg.ws.realms.nrgcommonentities.service.ApplicationServiceSubscriptionDAORemoteImpl.searchBySpIdAndAppIdAndScPk(ApplicationServiceSubscriptionDAORemoteImpl.java:104)
... 74 more
坦白說,每次看到ThreadDeath這個error,我都有種一頭撞死的感覺, ^0^,誰取的這名字!由于之前遇到過類似的問題,因此倒是不特別緊張,先看看請求處理的過程:首先是一個webapp接收請求,然后請求被交給一個EJB的模塊處理,ThreadDeath error就是在這里拋出。從ThreadDeath的exception trace上看到,
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1413)
說明當(dāng)前線程是試圖用webapp的classloader來裝載類,然后遭遇ThreadDeath,隨即在日志文件中查找到這樣的內(nèi)容:
[#|2010-05-24T15:19:26.702+0800|INFO|sun-glassfish-comms-server2.0|org.apache.catalina.loader.WebappClassLoader|_ThreadID=47;_ThreadName=httpWorkerThread-48080WorkerThread-8080-337;|PWC1635: Illegal access: this web application instance has been stopped already (the eventual following stack trace is caused by an error thrown for debugging purposes as well as to attempt to
terminate the thread which caused the illegal access, and has no functional impact)|#]
"this web application instance has been stopped already",進(jìn)一步證實(shí)了我們的猜測,這個是最常見的ThreadDeath 的發(fā)生場景了,簡單描述一下:
1. 工作線程執(zhí)行webapp的請求,webapp的classloader被保存在thread
類似Thread.setContextClassLoader(WebappClassLoader)這樣的方式
2. webapp被restart/redeploy ,但是由于某些原因上面的工作線程中保留的webapp的classloader并沒有被清理
3. 這個工作線程繼續(xù)處理請求,由于他使用的classloader是restart/redeploy前的webapp的,現(xiàn)在這個webapp已經(jīng)不復(fù)存在,因此出現(xiàn)ThreadDeath 錯誤
有關(guān)ThreadDeath的類似錯誤,glassfish上有個wiki特地描述了這個問題。
http://wiki.glassfish.java.net/Wiki.jsp?page=FaqWebAppStoppedIllegalAccessError
正在我們以為找到問題準(zhǔn)備繼續(xù)查找看是哪里的Thread.setContextClassLoader()方法出現(xiàn)問題時(shí),被另外一個發(fā)現(xiàn)擊倒!我們發(fā)現(xiàn)上述的ThreadDeath 在重新啟動之后居然可以立即重現(xiàn)!很暈,按照上面的推斷邏輯,如果我們關(guān)閉glassfish,即關(guān)閉jvm,退出進(jìn)程,然后重啟啟動,ThreadDeath 就應(yīng)該消失才是,因?yàn)槌鰡栴}的Thread隨jvm一起退出了。新jvm中啟動的Thread不再有殘余的classloader來導(dǎo)致ThreadDeath。
這個證據(jù)直接推翻前面的推斷,看來問題不是這個簡單了。
隨后進(jìn)行了大量的查找,推斷和嘗試,過程不敘述了。最后發(fā)現(xiàn)在日志中有一個異常,坦白說我們的日志有點(diǎn)多,有點(diǎn)亂,不是很好查找問題:
javax.servlet.ServletException: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
at se.ericsson.nrg.ws.service.http.SIGAuthenticationFilter.init(SIGAuthenticationFilter.java:162)
at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:273)
at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:385)
at org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:119)
at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4521)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:5369)
at com.sun.enterprise.web.WebModule.start(WebModule.java:345)
at com.sun.enterprise.web.LifecycleStarter.doRun(LifecycleStarter.java:58)
at com.sun.appserv.management.util.misc.RunnableBase.runSync(RunnableBase.java:304)
at com.sun.appserv.management.util.misc.RunnableBase.run(RunnableBase.java:341)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
at se.ericsson.nrg.ws.service.http.SIGAuthenticationFilter.init(SIGAuthenticationFilter.java:162)
at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:273)
at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:385)
at org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:119)
at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4521)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:5369)
at com.sun.enterprise.web.WebModule.start(WebModule.java:345)
at com.sun.enterprise.web.LifecycleStarter.doRun(LifecycleStarter.java:58)
at com.sun.appserv.management.util.misc.RunnableBase.runSync(RunnableBase.java:304)
at com.sun.appserv.management.util.misc.RunnableBase.run(RunnableBase.java:341)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
webapp中定義有一個filter,現(xiàn)在這個filter在init()方法中出現(xiàn)異常,而我們編碼的時(shí)候會包裝這個異常然后以ServletException的方式拋給容器。
init()方法中出現(xiàn)的異常是一個配置項(xiàng)的類型不正確,做強(qiáng)制類型轉(zhuǎn)換時(shí)失敗導(dǎo)致。由于出現(xiàn)這個配置問題的腳本偶爾才運(yùn)行,因此過去這個錯誤發(fā)生的幾率極低。隨即修復(fù)了這個配置項(xiàng),重啟glassfish后驗(yàn)證成功,ThreadDeath 錯誤消失,系統(tǒng)工作正常。
OK,問題找到并得到確認(rèn)。
可是這里依然有一個巨大的疑問:為什么一個filter拋出的ServletException,會導(dǎo)致glassfish出現(xiàn)"判斷失誤"從而認(rèn)定webapp是"stopped"狀態(tài)以致最后以ThreadDeath的形式來體現(xiàn)錯誤?
這個案例比較迷惑人的地方是,filter拋出的ServletException后,這個webapp是可以正常接收請求也可以正常將請求轉(zhuǎn)交后面的業(yè)務(wù)處理模塊,控制臺上也看不到這個webapp的任何狀態(tài)異常。
特地找了一下j2ee5規(guī)范,沒有發(fā)現(xiàn)有對filter的詳細(xì)要求,隨即找到了j2ee 5的 API,在API文檔中發(fā)現(xiàn)以下內(nèi)容:
void init(FilterConfig filterConfig)
throws ServletException
Called by the web container to indicate to a filter that it is being placed into service. The servlet container calls the init method exactly once after instantiating the filter. The init method
must complete successfully before the filter is asked to do any filtering work.
The web container cannot place the filter into service if the init method either
1.Throws a ServletException
2.Does not return within a time period defined by the web container
throws ServletException
Called by the web container to indicate to a filter that it is being placed into service. The servlet container calls the init method exactly once after instantiating the filter. The init method
must complete successfully before the filter is asked to do any filtering work.
The web container cannot place the filter into service if the init method either
1.Throws a ServletException
2.Does not return within a time period defined by the web container
這里可以看到,當(dāng)發(fā)生ServletException 異常時(shí),“The web container cannot place the filter into service”,既這個filter應(yīng)該不能生效才是。而且也沒有任何其他說明說ServletException會導(dǎo)致其他問題比如webapp不能啟動,不能工作之類的。
當(dāng)時(shí)實(shí)際上,我們在檢查ThreadDeath的調(diào)用信息時(shí),發(fā)現(xiàn)有下面的調(diào)用
at se.ericsson.nrg.ws.service.http.SIGAuthenticationFilter.doFilter(SIGAuthenticationFilter.java:89)
說明這個出現(xiàn)init()錯誤的filter還是被glassfish正常調(diào)用去執(zhí)行doFilter()方法,這里和j2ee API的要求是不符合的。有點(diǎn)奇怪的是,glassfish一向是以嚴(yán)格遵循j2ee規(guī)范而著稱,居然在這里一反常態(tài)。
而更令人 郁悶的是,glassfish在處理這個有filter初始化出現(xiàn)ServletException異常的webapp時(shí)的前后表現(xiàn):首先這個webapp的啟動沒有問題,狀態(tài)正常。filter也被認(rèn)為可以正常工作并加入了filter鏈。webapp中的功能正常,可以正常的接收請求并轉(zhuǎn)發(fā)給內(nèi)容業(yè)務(wù)處理模塊。從這些跡象看這個webapp基本沒有問題。但是后面glassfish卻莫名其妙的認(rèn)定,“this web application instance has been stopped already”,從而以ThreadDeath這種非常規(guī)的error來報(bào)錯。
這個做法令人比較難于接收,如果filter初始化出現(xiàn)ServletException異常會導(dǎo)致webapp的狀體異常,那么glassfish應(yīng)該在第一時(shí)間直接給出這個判斷,比如直接讓webapp就啟動失敗之類的,這樣不至于將這個錯誤推遲到一個非常遙遠(yuǎn)的地方才被暴露,而且filter ServletException異常 -》 ThreadDeath 的這種因果關(guān)系未免有點(diǎn)牽強(qiáng),查錯時(shí)根本沒有可能直接聯(lián)系上,導(dǎo)致查錯的難度大增。
不得不再次批評一下glassfish,從產(chǎn)品質(zhì)量上看,glassfish和weblogic的差距還真是不小。
posted on 2010-05-25 11:38 sky ao 閱讀(3683) 評論(0) 編輯 收藏 所屬分類: ejb