Spark history Server产生背景
以standalone˜q行模å¼ä¸ÞZ¾‹åQŒåœ¨˜q行Spark Application的时候,Spark会æä¾›ä¸€ä¸ªWEBUI列出应用½E‹åºçš„è¿è¡Œæ—¶ä¿¡æ¯åQ›ä½†è¯¥WEBUIéšç€Application的完æˆ?æˆåŠŸ/å¤?è´?而关é—,也就是说åQŒSpark Application˜q行å®?æˆåŠŸ/å¤ÞpÓ|)åŽï¼Œž®†æ— 法查看Application的历å²è®°å½•ï¼›
Spark history Serverž®±æ˜¯ä¸ÞZº†åº”对˜q™ç§æƒ…况而äñ”生的åQŒé€šè¿‡é…ç½®å¯ä»¥åœ¨Application执行的过½E‹ä¸è®°å½•下了日志事äšgä¿¡æ¯åQŒé‚£ä¹ˆåœ¨Application执行 ¾l“æŸåŽï¼ŒWEBUIž®Þpƒ½é‡æ–°æ¸²æŸ“生æˆUI界é¢å±•çŽ°å‡ø™¯¥Application在执行过½E‹ä¸çš„è¿è¡Œæ—¶ä¿¡æ¯åQ?/p>
Spark˜q行在yarn或者mesos之上åQŒé€šè¿‡sparkçš„history serverä»ç„¶å¯ä»¥é‡æž„å‡ÞZ¸€ä¸ªå·²¾l完æˆçš„Applicationçš„è¿è¡Œæ—¶å‚æ•°ä¿¡æ¯åQˆå‡å¦‚Application˜q行的事件日志信æ¯å·²¾l记录下æ¥ï¼‰åQ?/p>
é…ç½®&使用Spark History Server
以默认酾|®çš„æ–¹å¼å¯åЍspark history serveråQ?/p>
cd $SPARK_HOME/sbin start-history-server.sh
报错åQ?/p>
starting org.apache.spark.deploy.history.HistoryServer, logging to /home/spark/software/source/compile/deploy_spark/sbin/../logs/spark-spark-org.apache.spark.deploy.history.HistoryServer-1-hadoop000.out failed to launch org.apache.spark.deploy.history.HistoryServer: at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:44) ... 6 more
需è¦åœ¨å¯åŠ¨æ—¶æŒ‡å®šç›®å½•ï¼š
start-history-server.sh hdfs://hadoop000:8020/directory
hdfs://hadoop000:8020/directoryå¯ä»¥é…置在酾|®æ–‡ä»¶ä¸åQŒé‚£ä¹ˆåœ¨å¯åЍhistory-serveræ—¶å°±ä¸éœ€è¦æŒ‡å®šï¼ŒåŽç®‹ä»‹ç»æ€Žä¹ˆé…ç½®åQ?/p>
注:该目录需è¦äº‹å…ˆåœ¨hdfs上创建好åQŒå¦åˆ™history-serverå¯åŠ¨æŠ¥é”™ã€?/strong>
å¯åŠ¨å®Œæˆä¹‹åŽå¯ä»¥é€šè¿‡WEBUI讉K—®åQŒé»˜è®¤ç«¯å£æ˜¯18080åQšhttp://hadoop000:18080
默认界é¢åˆ—è¡¨ä¿¡æ¯æ˜¯ç©ºçš„ï¼Œä¸‹é¢æˆªå›¾æ˜¯æˆ‘è·‘äº†å‡ æ¬¡spark-sql‹¹‹è¯•åŽå‡ºçŽ°çš„ã€?/p>
history server相关的酾|®å‚æ•°æ˜q?/strong>
1åQ?spark.history.updateInterval
  默认å€û|¼š10
  以秒为å•ä½ï¼Œæ›´æ–°æ—¥å¿—相关信æ¯çš„æ—¶é—´é—´éš?/p>
2åQ‰spark.history.retainedApplications
  默认å€û|¼š50
  在内å˜ä¸ä¿å˜Application历å²è®°å½•的个敎ͼŒå¦‚æžœ‘…过˜q™ä¸ªå€û|¼Œæ—§çš„应用½E‹åºä¿¡æ¯ž®†è¢«åˆ 除åQŒå½“冿¬¡è®‰K—®å·²è¢«åˆ é™¤çš„åº”ç”¨ä¿¡æ¯æ—¶éœ€è¦é‡æ–°æž„建页é¢ã€?/p>
3åQ?span style="color: #ff0000;">spark.history.ui.port
  默认å€û|¼š18080
  HistoryServerçš„web端å£
4åQ‰spark.history.kerberos.enabled
  默认å€û|¼šfalse
  是å¦ä½¿ç”¨kerberosæ–¹å¼ç™Õd½•讉K—®HistoryServeråQŒå¯¹äºŽæŒä¹…层ä½äºŽå®‰å…¨é›†ç¾¤çš„HDFS上是有用的,如果讄¡½®ä¸ºtrueåQŒå°±è¦é…¾|®ä¸‹é¢çš„两个属æ€?/p>
5åQ‰spark.history.kerberos.principal
  默认å€û|¼šç”¨äºŽHistoryServerçš„kerberosä¸ÖM½“åç§°
6åQ‰spark.history.kerberos.keytab
  用于HistoryServerçš„kerberos keytabæ–‡äšgä½ç½®
7åQ‰spark.history.ui.acls.enable
  默认å€û|¼šfalse
  授æƒç”¨æˆ·æŸ¥çœ‹åº”用½E‹åºä¿¡æ¯çš„æ—¶å€™æ˜¯å¦æ£€æŸ¥acl。如果å¯ç”¨ï¼Œåªæœ‰åº”用½E‹åºæ‰€æœ‰è€…å’Œspark.ui.view.acls指定的用户å¯ä»¥æŸ¥çœ‹åº”用程åºä¿¡æ?å¦åˆ™åQŒä¸åšä“Q何检æŸ?/p>
8åQ?span style="color: #ff0000;">spark.eventLog.enabled
  默认å€û|¼šfalse
  是å¦è®°å½•Spark事äšgåQŒç”¨äºŽåº”用程åºåœ¨å®ŒæˆåŽé‡æž„webUI
9åQ?span style="color: #ff0000;">spark.eventLog.dir
  默认å€û|¼šfile:///tmp/spark-events
  ä¿å˜æ—¥å¿—相关信æ¯çš„èµ\径,å¯ä»¥æ˜¯hdfs://开头的HDFS路径åQŒä¹Ÿå¯ä»¥æ˜¯file://开头的本地路径åQŒéƒ½éœ€è¦æå‰åˆ›å»?/p>
10åQ?span style="color: #ff0000;">spark.eventLog.compress
  默认å€û|¼šfalse
  是å¦åŽ‹ç¾ƒè®°å½•Spark事äšgåQŒå‰æspark.eventLog.enabled为trueåQŒé»˜è®¤ä‹É用的是snappy
以spark.history开头的需è¦é…¾|®åœ¨spark-env.shä¸çš„SPARK_HISTORY_OPTSåQŒä»¥spark.eventLog开头的é…置在spark-defaults.conf
我在‹¹‹è¯•˜q‡ç¨‹ä¸çš„é…置如下åQ?/p>
spark-defaults.conf
spark.eventLog.enabled true spark.eventLog.dir hdfs://hadoop000:8020/directory spark.eventLog.compress true
spark-env.sh
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=7777 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://had oop000:8020/directory"
傿•°æè¿°åQ?/p>
spark.history.ui.port=7777 调整WEBUI讉K—®çš„端å£å·ä¸?777
spark.history.fs.logDirectory=hdfs://hadoop000:8020/directory é…置了该属性åŽåQŒåœ¨start-history-server.shæ—¶å°±æ— éœ€å†æ˜¾½Cºçš„æŒ‡å®šè·¯å¾„
spark.history.retainedApplications=3 指定ä¿å˜Application历å²è®°å½•的个敎ͼŒå¦‚æžœ‘…过˜q™ä¸ªå€û|¼Œæ—§çš„应用½E‹åºä¿¡æ¯ž®†è¢«åˆ 除
è°ƒæ•´å‚æ•°åŽå¯åЍstart-history-server.sh
start-history-server.sh
讉K—®WEBUIåQ?http://hadoop000:7777
在ä‹É用spark history server的过½E‹ä¸äº§ç”Ÿçš„å‡ ä¸ªç–‘é—®ï¼š
ç–‘é—®1åQšspark.history.fs.logDirectoryå’Œspark.eventLog.dir指定目录有啥区别åQ?/strong>
¾l测试åŽå‘现åQ?/p>
spark.eventLog.diråQšApplication在è¿è¡Œè¿‡½E‹ä¸æ‰€æœ‰çš„ä¿¡æ¯å‡è®°å½•在该属性指定的路径下;
spark.history.fs.logDirectoryåQšSpark History Server™åµé¢åªå±•½Cø™¯¥æŒ‡å®šè·¯å¾„下的信æ¯åQ?/p>
比如åQšspark.eventLog.dir刚开始时指定的是hdfs://hadoop000:8020/directoryåQŒè€ŒåŽä¿®æ”¹æˆhdfs://hadoop000:8020/directory2
那么spark.history.fs.logDirectory如果指定的是hdfs://hadoop000:8020/directoryåQŒå°±åªèƒ½æ˜„¡¤ºå‡ø™¯¥ç›®å½•下的所有Application˜q行的日志信æ¯ï¼›å之亦然ã€?/p>
ç–‘é—®2åQšspark.history.retainedApplications=3 貌似没生效?åQŸï¼ŸåQŸï¼ŸåQ?/strong>
The History Server will list all applications. It will just retain a max number of them in memory. That option does not control how many applications are show, it controls how much memory the HS will need.
注æ„åQšè¯¥å‚æ•°òq¶ä¸æ˜¯ä¹Ÿ™åµé¢ä¸æ˜¾½Cºçš„application的记录数åQŒè€Œæ˜¯å˜æ”¾åœ¨å†…å˜ä¸çš„个敎ͼŒå†…å˜ä¸çš„ä¿¡æ¯åœ¨è®¿é—®é¡µé¢æ—¶ç›´æŽ¥è¯Õd–渲染既å¯åQ?/p>
æ¯”å¦‚è¯´è¯¥å‚æ•°é…ç½®äº?0个,那么内å˜ä¸å°±æœ€å¤šåªèƒ½å˜æ”?0个applicaiton的日志信æ¯ï¼Œå½“第11ä¸ªåŠ å…¥æ—¶åQŒç¬¬ä¸€ä¸ªå°±ä¼šè¢«ít¢é™¤åQŒå½“冿¬¡è®‰K—®½W?个application的页é¢ä¿¡æ¯æ—¶ž®?span style="font-size: 14px; line-height: 1.5;">需è¦é‡æ–°è¯»å–指定èµ\å¾„ä¸Šçš„æ—¥å¿—ä¿¡æ¯æ¥æ¸²æŸ“展示™åµé¢ã€?nbsp;
详è§å®˜æ–¹æ–‡æ¡£åQšhttp://spark.apache.org/docs/latest/monitoring.html
今天在测试spark-sql˜q行在yarn上的˜q‡ç¨‹ä¸ï¼Œæ— æ„间从日志ä¸å‘çŽîCº†ä¸€ä¸ªé—®é¢˜ï¼š
spark-sql --master yarn
14/12/29 15:23:17 INFO Client: Requesting a new application from cluster with 1 NodeManagers 14/12/29 15:23:17 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 14/12/29 15:23:17 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 14/12/29 15:23:17 INFO Client: Setting up container launch context for our AM 14/12/29 15:23:17 INFO Client: Preparing resources for our AM container 14/12/29 15:23:17 INFO Client: Uploading resource file:/home/spark/software/source/compile/deploy_spark/assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar -> hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0093/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar 14/12/29 15:23:18 INFO Client: Setting up the launch environment for our AM container
å†å¼€å¯ä¸€ä¸ªspark-sql命ä×o行,从日志ä¸å†æ¬¡å‘现åQ?/p>
14/12/29 15:24:03 INFO Client: Requesting a new application from cluster with 1 NodeManagers 14/12/29 15:24:03 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 14/12/29 15:24:03 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 14/12/29 15:24:03 INFO Client: Setting up container launch context for our AM 14/12/29 15:24:03 INFO Client: Preparing resources for our AM container 14/12/29 15:24:03 INFO Client: Uploading resource file:/home/spark/software/source/compile/deploy_spark/assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar -> hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0094/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar 14/12/29 15:24:05 INFO Client: Setting up the launch environment for our AM container
ç„¶åŽæŸ¥çœ‹HDFS上的文äšgåQ?/p>
hadoop fs -ls hdfs://hadoop000:8020/user/spark/.sparkStaging/
drwx------ - spark supergroup 0 2014-12-29 15:23 hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0093 drwx------ - spark supergroup 0 2014-12-29 15:24 hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0094
æ¯ä¸ªApplicationéƒ½ä¼šä¸Šä¼ ä¸€ä¸ªspark-assembly-x.x.x-SNAPSHOT-hadoopx.x.x-cdhx.x.x.jarçš„jar包,影å“HDFS的性能以åŠå 用HDFS的空间ã€?/p>
在Spark文档(http://spark.apache.org/docs/latest/running-on-yarn.html)ä¸å‘çŽ?span style="color: #ff0000;">spark.yarn.jar属性,ž®†spark-assembly-xxxxx.jarå˜æ”¾åœ¨hdfs://hadoop000:8020/spark_lib/ä¸?/p>
在spark-defaults.confæ·ÕdŠ å±žæ€§é…¾|®ï¼š
spark.yarn.jar hdfs://hadoop000:8020/spark_lib/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
冿¬¡å¯åЍspark-sql --master yarn观察日志åQ?/p>
14/12/29 15:39:02 INFO Client: Requesting a new application from cluster with 1 NodeManagers 14/12/29 15:39:02 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 14/12/29 15:39:02 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 14/12/29 15:39:02 INFO Client: Setting up container launch context for our AM 14/12/29 15:39:02 INFO Client: Preparing resources for our AM container 14/12/29 15:39:02 INFO Client: Source and destination file systems are the same. Not copying hdfs://hadoop000:8020/spark_lib/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar 14/12/29 15:39:02 INFO Client: Setting up the launch environment for our AM container
观察HDFS上文�/p>
hadoop fs -ls hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0097
该Application对应的目录下没有spark-assembly-xxxxx.jar了,从而节çœassemblyåŒ…ä¸Šä¼ çš„˜q‡ç¨‹ä»¥åŠHDFS½Iºé—´å 用ã€?/p>
我在‹¹‹è¯•˜q‡ç¨‹ä¸é‡åˆîCº†¾cÖM¼¼å¦‚下的错误:
Application application_xxxxxxxxx_yyyy failed 2 times due to AM Container for application_xxxxxxxxx_yyyy
exited with exitCode: -1000 due to: java.io.FileNotFoundException: File /tmp/hadoop-spark/nm-local-dir/filecache does not exist
åœ?tmp/hadoop-spark/nm-local-dir路径下创建filecacheæ–‡äšg夹å³å¯è§£å†ÏxŠ¥é”™é—®é¢˜ã€?/p>