ï»??xml version="1.0" encoding="utf-8" standalone="yes"?>99久久精品免费看,精品中文字幕一区二区三区av,久久久久久久久四区三区http://www.aygfsteel.com/wangxinsh55/category/55043.htmlzh-cnThu, 26 May 2016 06:17:05 GMTThu, 26 May 2016 06:17:05 GMT60Spark History Server配置使用http://www.aygfsteel.com/wangxinsh55/archive/2016/05/26/430665.htmlSIMONESIMONEThu, 26 May 2016 06:12:00 GMThttp://www.aygfsteel.com/wangxinsh55/archive/2016/05/26/430665.htmlhttp://www.aygfsteel.com/wangxinsh55/comments/430665.htmlhttp://www.aygfsteel.com/wangxinsh55/archive/2016/05/26/430665.html#Feedback0http://www.aygfsteel.com/wangxinsh55/comments/commentRss/430665.htmlhttp://www.aygfsteel.com/wangxinsh55/services/trackbacks/430665.htmlhttp://www.cnblogs.com/luogankun/p/3981645.html

Spark history Server产生背景

以standalone˜qè¡Œæ¨¡å¼ä¸ÞZ¾‹åQŒåœ¨˜qè¡ŒSpark Application的时候,Spark会提供一个WEBUI列出应用½E‹åºçš„运行时信息åQ›ä½†è¯¥WEBUI随着Application的完æˆ?成功/å¤?è´?而关闭,也就是说åQŒSpark Application˜qè¡Œå®?成功/å¤ÞpÓ|)后,ž®†æ— æ³•查看Application的历史记录;

Spark history Serverž®±æ˜¯ä¸ÞZº†åº”对˜q™ç§æƒ…况而äñ”生的åQŒé€šè¿‡é…ç½®å¯ä»¥åœ¨Application执行的过½E‹ä¸­è®°å½•下了日志事äšg信息åQŒé‚£ä¹ˆåœ¨Application执行 ¾l“束后,WEBUIž®Þpƒ½é‡æ–°æ¸²æŸ“生成UIç•Œé¢å±•çŽ°å‡ø™¯¥Application在执行过½E‹ä¸­çš„运行时信息åQ?/p>

Spark˜qè¡Œåœ¨yarn或者mesos之上åQŒé€šè¿‡sparkçš„history server仍然可以重构å‡ÞZ¸€ä¸ªå·²¾lå®Œæˆçš„Application的运行时参数信息åQˆå‡å¦‚Application˜qè¡Œçš„事件日志信息已¾lè®°å½•下来)åQ?/p>

 

配置&使用Spark History Server

以默认配¾|®çš„æ–¹å¼å¯åЍspark history serveråQ?/p>

cd $SPARK_HOME/sbin start-history-server.sh

报错åQ?/p>

starting org.apache.spark.deploy.history.HistoryServer, logging to /home/spark/software/source/compile/deploy_spark/sbin/../logs/spark-spark-org.apache.spark.deploy.history.HistoryServer-1-hadoop000.out failed to launch org.apache.spark.deploy.history.HistoryServer:         at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:44)         ... 6 more

需要在启动时指定目录:

start-history-server.sh hdfs://hadoop000:8020/directory

hdfs://hadoop000:8020/directory可以配置在配¾|®æ–‡ä»¶ä¸­åQŒé‚£ä¹ˆåœ¨å¯åЍhistory-server时就不需要指定,后箋介绍怎么配置åQ?/p>

注:该目录需要事先在hdfs上创建好åQŒå¦åˆ™history-server启动报错ã€?/strong>

启动完成之后可以通过WEBUI讉K—®åQŒé»˜è®¤ç«¯å£æ˜¯18080åQšhttp://hadoop000:18080

默认界面列表信息是空的,下面截图是我跑了几次spark-sql‹¹‹è¯•后出现的ã€?/p>

 

history server相关的配¾|®å‚数描˜q?/strong>

1åQ?spark.history.updateInterval
  默认å€û|¼š10
  以秒为单位,更新日志相关信息的时间间�/p>

2åQ‰spark.history.retainedApplications
  默认å€û|¼š50
  在内存中保存Application历史记录的个敎ͼŒå¦‚æžœ­‘…过˜q™ä¸ªå€û|¼Œæ—§çš„应用½E‹åºä¿¡æ¯ž®†è¢«åˆ é™¤åQŒå½“再次讉K—®å·²è¢«åˆ é™¤çš„应用信息时需要重新构建页面ã€?/p>

3åQ?span style="color: #ff0000;">spark.history.ui.port
  默认å€û|¼š18080
  HistoryServer的web端口

4åQ‰spark.history.kerberos.enabled
  默认å€û|¼šfalse
  是否使用kerberos方式ç™Õd½•讉K—®HistoryServeråQŒå¯¹äºŽæŒä¹…层位于安全集群的HDFS上是有用的,如果讄¡½®ä¸ºtrueåQŒå°±è¦é…¾|®ä¸‹é¢çš„两个属æ€?/p>

5åQ‰spark.history.kerberos.principal
  默认å€û|¼šç”¨äºŽHistoryServerçš„kerberosä¸ÖM½“名称

6åQ‰spark.history.kerberos.keytab
  用于HistoryServerçš„kerberos keytabæ–‡äšg位置

7åQ‰spark.history.ui.acls.enable
  默认å€û|¼šfalse
  授权用户查看应用½E‹åºä¿¡æ¯çš„æ—¶å€™æ˜¯å¦æ£€æŸ¥acl。如果启用,只有应用½E‹åºæ‰€æœ‰è€…å’Œspark.ui.view.acls指定的用户可以查看应用程序信æ?否则åQŒä¸åšä“Q何检æŸ?/p>

8åQ?span style="color: #ff0000;">spark.eventLog.enabled
  默认å€û|¼šfalse
  是否记录Spark事äšgåQŒç”¨äºŽåº”用程序在完成后重构webUI

9åQ?span style="color: #ff0000;">spark.eventLog.dir
  默认å€û|¼šfile:///tmp/spark-events
  保存日志相关信息的èµ\径,可以是hdfs://开头的HDFS路径åQŒä¹Ÿå¯ä»¥æ˜¯file://开头的本地路径åQŒéƒ½éœ€è¦æå‰åˆ›å»?/p>

10åQ?span style="color: #ff0000;">spark.eventLog.compress
  默认å€û|¼šfalse
  是否压羃记录Spark事äšgåQŒå‰æspark.eventLog.enabled为trueåQŒé»˜è®¤ä‹É用的是snappy

以spark.history开头的需要配¾|®åœ¨spark-env.sh中的SPARK_HISTORY_OPTSåQŒä»¥spark.eventLog开头的配置在spark-defaults.conf

 

我在‹¹‹è¯•˜q‡ç¨‹ä¸­çš„配置如下åQ?/p>

spark-defaults.conf

spark.eventLog.enabled  true spark.eventLog.dir      hdfs://hadoop000:8020/directory spark.eventLog.compress true

spark-env.sh

export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=7777 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://had oop000:8020/directory"

参数描述åQ?/p>

spark.history.ui.port=7777  è°ƒæ•´WEBUI讉K—®çš„端口号ä¸?777

spark.history.fs.logDirectory=hdfs://hadoop000:8020/directory  é…ç½®äº†è¯¥å±žæ€§åŽåQŒåœ¨start-history-server.sh时就无需再显½Cºçš„æŒ‡å®šè·¯å¾„

spark.history.retainedApplications=3   指定保存Application历史记录的个敎ͼŒå¦‚æžœ­‘…过˜q™ä¸ªå€û|¼Œæ—§çš„应用½E‹åºä¿¡æ¯ž®†è¢«åˆ é™¤

 

调整参数后启动start-history-server.sh

start-history-server.sh 

讉K—®WEBUIåQ?http://hadoop000:7777

 

在ä‹É用spark history server的过½E‹ä¸­äº§ç”Ÿçš„几个疑问:

ç–‘é—®1åQšspark.history.fs.logDirectoryå’Œspark.eventLog.dir指定目录有啥区别åQ?/strong>

¾læµ‹è¯•后发现åQ?/p>

spark.eventLog.diråQšApplication在运行过½E‹ä¸­æ‰€æœ‰çš„信息均记录在该属性指定的路径下;

spark.history.fs.logDirectoryåQšSpark History Server™åµé¢åªå±•½Cø™¯¥æŒ‡å®šè·¯å¾„下的信息åQ?/p>

比如åQšspark.eventLog.dir刚开始时指定的是hdfs://hadoop000:8020/directoryåQŒè€ŒåŽä¿®æ”¹æˆhdfs://hadoop000:8020/directory2

那么spark.history.fs.logDirectory如果指定的是hdfs://hadoop000:8020/directoryåQŒå°±åªèƒ½æ˜„¡¤ºå‡ø™¯¥ç›®å½•下的所有Application˜qè¡Œçš„æ—¥å¿—信息;反之亦然ã€?/p>

 

ç–‘é—®2åQšspark.history.retainedApplications=3 貌似没生效?åQŸï¼ŸåQŸï¼ŸåQ?/strong>

The History Server will list all applications. It will just retain a max number of them in memory. That option does not control how many applications are show, it controls how much memory the HS will need.

注意åQšè¯¥å‚æ•°òq¶ä¸æ˜¯ä¹Ÿ™åµé¢ä¸­æ˜¾½Cºçš„application的记录数åQŒè€Œæ˜¯å­˜æ”¾åœ¨å†…存中的个敎ͼŒå†…存中的信息在访问页面时直接è¯Õd–渲染既可åQ?/p>

比如说该参数配置äº?0个,那么内存中就最多只能存æ”?0个applicaiton的日志信息,当第11个加入时åQŒç¬¬ä¸€ä¸ªå°±ä¼šè¢«ít¢é™¤åQŒå½“再次讉K—®½W?个application的页面信息时ž®?span style="font-size: 14px; line-height: 1.5;">需要重新读取指定èµ\径上的日志信息来渲染展示™åµé¢ã€?nbsp;

详见官方文档åQšhttp://spark.apache.org/docs/latest/monitoring.html



]]>
Spark On Yarn中spark.yarn.jar属性的使用http://www.aygfsteel.com/wangxinsh55/archive/2016/05/26/430664.htmlSIMONESIMONEThu, 26 May 2016 06:11:00 GMThttp://www.aygfsteel.com/wangxinsh55/archive/2016/05/26/430664.htmlhttp://www.aygfsteel.com/wangxinsh55/comments/430664.htmlhttp://www.aygfsteel.com/wangxinsh55/archive/2016/05/26/430664.html#Feedback0http://www.aygfsteel.com/wangxinsh55/comments/commentRss/430664.htmlhttp://www.aygfsteel.com/wangxinsh55/services/trackbacks/430664.htmlhttp://www.cnblogs.com/luogankun/p/4191796.html

今天在测试spark-sql˜qè¡Œåœ¨yarn上的˜q‡ç¨‹ä¸­ï¼Œæ— æ„é—´ä»Žæ—¥å¿—中发çŽîCº†ä¸€ä¸ªé—®é¢˜ï¼š

spark-sql --master yarn
复制代码
14/12/29 15:23:17 INFO Client: Requesting a new application from cluster with 1 NodeManagers 14/12/29 15:23:17 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 14/12/29 15:23:17 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 14/12/29 15:23:17 INFO Client: Setting up container launch context for our AM 14/12/29 15:23:17 INFO Client: Preparing resources for our AM container 14/12/29 15:23:17 INFO Client: Uploading resource file:/home/spark/software/source/compile/deploy_spark/assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar -> hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0093/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar 14/12/29 15:23:18 INFO Client: Setting up the launch environment for our AM container
复制代码

再开启一个spark-sql命ä×o行,从日志中再次发现åQ?/p>

复制代码
14/12/29 15:24:03 INFO Client: Requesting a new application from cluster with 1 NodeManagers 14/12/29 15:24:03 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 14/12/29 15:24:03 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 14/12/29 15:24:03 INFO Client: Setting up container launch context for our AM 14/12/29 15:24:03 INFO Client: Preparing resources for our AM container 14/12/29 15:24:03 INFO Client: Uploading resource file:/home/spark/software/source/compile/deploy_spark/assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar -> hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0094/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar 14/12/29 15:24:05 INFO Client: Setting up the launch environment for our AM container
复制代码

然后查看HDFS上的文äšgåQ?/p>

hadoop fs -ls hdfs://hadoop000:8020/user/spark/.sparkStaging/
drwx------   - spark supergroup          0 2014-12-29 15:23 hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0093 drwx------   - spark supergroup          0 2014-12-29 15:24 hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0094

每个Application都会上传一个spark-assembly-x.x.x-SNAPSHOT-hadoopx.x.x-cdhx.x.x.jar的jar包,影响HDFS的性能以及占用HDFS的空间�/p>

 

在Spark文档(http://spark.apache.org/docs/latest/running-on-yarn.html)中发çŽ?span style="color: #ff0000;">spark.yarn.jar属性,ž®†spark-assembly-xxxxx.jar存放在hdfs://hadoop000:8020/spark_lib/ä¸?/p>

在spark-defaults.confæ·ÕdŠ å±žæ€§é…¾|®ï¼š

spark.yarn.jar hdfs://hadoop000:8020/spark_lib/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar

再次启动spark-sql --master yarn观察日志åQ?/p>

复制代码
14/12/29 15:39:02 INFO Client: Requesting a new application from cluster with 1 NodeManagers 14/12/29 15:39:02 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 14/12/29 15:39:02 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 14/12/29 15:39:02 INFO Client: Setting up container launch context for our AM 14/12/29 15:39:02 INFO Client: Preparing resources for our AM container 14/12/29 15:39:02 INFO Client: Source and destination file systems are the same. Not copying hdfs://hadoop000:8020/spark_lib/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar 14/12/29 15:39:02 INFO Client: Setting up the launch environment for our AM container
复制代码

观察HDFS上文�/p>

hadoop fs -ls hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0097

该Application对应的目录下没有spark-assembly-xxxxx.jar了,从而节省assembly包上传的˜q‡ç¨‹ä»¥åŠHDFS½Iºé—´å ç”¨ã€?/p>

 

我在‹¹‹è¯•˜q‡ç¨‹ä¸­é‡åˆîCº†¾cÖM¼¼å¦‚下的错误:

Application application_xxxxxxxxx_yyyy failed 2 times due to AM Container for application_xxxxxxxxx_yyyy 

exited with exitCode: -1000 due to: java.io.FileNotFoundException: File /tmp/hadoop-spark/nm-local-dir/filecache does not exist

åœ?tmp/hadoop-spark/nm-local-dir路径下创建filecacheæ–‡äšg夹即可解å†ÏxŠ¥é”™é—®é¢˜ã€?/p>



]]>
Ö÷Õ¾Ö©Öë³ØÄ£°å£º ÕØÇìÊÐ| ½ºÄÏÊÐ| ¦·³ÏØ| ÖÎÏØ¡£| ¼ª°²ÏØ| Î÷ÄþÊÐ| »á¶«ÏØ| µ¤¶«ÊÐ| ÍÁÄ¬ÌØÓÒÆì| ½úÖÐÊÐ| ±öÑôÏØ| Æû³µ| ËÕÖÝÊÐ| ß®ÑôÏØ| ¾ÅÁúÆÂÇø| °×Ë®ÏØ| ÑÓ´¨ÏØ| Òí³ÇÏØ| ¶õÂ×´º×ÔÖÎÆì| ÁºÆ½ÏØ| ÁúÖÝÏØ| ¶«ÖÁÏØ| »´ÄÏÊÐ| ¹ÁÔ´ÏØ| Âå´¨ÏØ| ±±íÕÇø| ·ÚÎ÷ÏØ| ÎâÖÒÊÐ| Áú´¨ÏØ| ×óÔÆÏØ| ½ºÓÏØ| ¹ãµÂÏØ| ÆÕ¸ñÏØ| ºÍÁúÊÐ| ԭƽÊÐ| Ì©ÄþÏØ| ×ÊÐËÊÐ| Êè¸½ÏØ| Û°³ÇÏØ| Öн­ÏØ| ´íÄÇÏØ|