posts - 495,comments - 227,trackbacks - 0
          http://www.cnblogs.com/luogankun/p/4191796.html

          今天在測(cè)試spark-sql運(yùn)行在yarn上的過程中,無意間從日志中發(fā)現(xiàn)了一個(gè)問題:

          spark-sql --master yarn
          復(fù)制代碼
          14/12/29 15:23:17 INFO Client: Requesting a new application from cluster with 1 NodeManagers 14/12/29 15:23:17 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 14/12/29 15:23:17 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 14/12/29 15:23:17 INFO Client: Setting up container launch context for our AM 14/12/29 15:23:17 INFO Client: Preparing resources for our AM container 14/12/29 15:23:17 INFO Client: Uploading resource file:/home/spark/software/source/compile/deploy_spark/assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar -> hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0093/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar 14/12/29 15:23:18 INFO Client: Setting up the launch environment for our AM container
          復(fù)制代碼

          再開啟一個(gè)spark-sql命令行,從日志中再次發(fā)現(xiàn):

          復(fù)制代碼
          14/12/29 15:24:03 INFO Client: Requesting a new application from cluster with 1 NodeManagers 14/12/29 15:24:03 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 14/12/29 15:24:03 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 14/12/29 15:24:03 INFO Client: Setting up container launch context for our AM 14/12/29 15:24:03 INFO Client: Preparing resources for our AM container 14/12/29 15:24:03 INFO Client: Uploading resource file:/home/spark/software/source/compile/deploy_spark/assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar -> hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0094/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar 14/12/29 15:24:05 INFO Client: Setting up the launch environment for our AM container
          復(fù)制代碼

          然后查看HDFS上的文件:

          hadoop fs -ls hdfs://hadoop000:8020/user/spark/.sparkStaging/
          drwx------   - spark supergroup          0 2014-12-29 15:23 hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0093 drwx------   - spark supergroup          0 2014-12-29 15:24 hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0094

          每個(gè)Application都會(huì)上傳一個(gè)spark-assembly-x.x.x-SNAPSHOT-hadoopx.x.x-cdhx.x.x.jar的jar包,影響HDFS的性能以及占用HDFS的空間。

           

          在Spark文檔(http://spark.apache.org/docs/latest/running-on-yarn.html)中發(fā)現(xiàn)spark.yarn.jar屬性,將spark-assembly-xxxxx.jar存放在hdfs://hadoop000:8020/spark_lib/下

          在spark-defaults.conf添加屬性配置:

          spark.yarn.jar hdfs://hadoop000:8020/spark_lib/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar

          再次啟動(dòng)spark-sql --master yarn觀察日志:

          復(fù)制代碼
          14/12/29 15:39:02 INFO Client: Requesting a new application from cluster with 1 NodeManagers 14/12/29 15:39:02 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 14/12/29 15:39:02 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 14/12/29 15:39:02 INFO Client: Setting up container launch context for our AM 14/12/29 15:39:02 INFO Client: Preparing resources for our AM container 14/12/29 15:39:02 INFO Client: Source and destination file systems are the same. Not copying hdfs://hadoop000:8020/spark_lib/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar 14/12/29 15:39:02 INFO Client: Setting up the launch environment for our AM container
          復(fù)制代碼

          觀察HDFS上文件

          hadoop fs -ls hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0097

          該Application對(duì)應(yīng)的目錄下沒有spark-assembly-xxxxx.jar了,從而節(jié)省assembly包上傳的過程以及HDFS空間占用。

           

          我在測(cè)試過程中遇到了類似如下的錯(cuò)誤:

          Application application_xxxxxxxxx_yyyy failed 2 times due to AM Container for application_xxxxxxxxx_yyyy 

          exited with exitCode: -1000 due to: java.io.FileNotFoundException: File /tmp/hadoop-spark/nm-local-dir/filecache does not exist

          在/tmp/hadoop-spark/nm-local-dir路徑下創(chuàng)建filecache文件夾即可解決報(bào)錯(cuò)問題。

          posted on 2016-05-26 14:11 SIMONE 閱讀(1087) 評(píng)論(0)  編輯  收藏 所屬分類: spark

          只有注冊(cè)用戶登錄后才能發(fā)表評(píng)論。


          網(wǎng)站導(dǎo)航:
           
          主站蜘蛛池模板: 本溪| 昌黎县| 阿城市| 慈溪市| 无锡市| 阿荣旗| 许昌市| 怀来县| 句容市| 屏东市| 许昌县| 从化市| 乐业县| 威信县| 巴中市| 合川市| 光山县| 沐川县| 伽师县| 岫岩| 上林县| 应城市| 辽中县| 逊克县| 若羌县| 婺源县| 达尔| 海口市| 凉山| 岑巩县| 永安市| 青河县| 新余市| 普安县| 盘锦市| 确山县| 遂宁市| 三原县| 乐亭县| 定襄县| 庆元县|