??xml version="1.0" encoding="utf-8" standalone="yes"?>红桃视频亚洲,久久精品理论片,久久亚洲高清http://www.aygfsteel.com/paulwong/category/54858.htmlzh-cnThu, 18 Jun 2015 05:28:33 GMTThu, 18 Jun 2015 05:28:33 GMT60Spark与Shark的原?/title><link>http://www.aygfsteel.com/paulwong/archive/2015/06/18/425773.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Thu, 18 Jun 2015 05:20:00 GMT</pubDate><guid>http://www.aygfsteel.com/paulwong/archive/2015/06/18/425773.html</guid><wfw:comment>http://www.aygfsteel.com/paulwong/comments/425773.html</wfw:comment><comments>http://www.aygfsteel.com/paulwong/archive/2015/06/18/425773.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/paulwong/comments/commentRss/425773.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/paulwong/services/trackbacks/425773.html</trackback:ping><description><![CDATA[<p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;"><strong>1.Spark生态圈</strong></p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">如下图所CZؓSpark的整个生态圈Q最底层源管理器Q采用Mesos、Yarn{资源管理集或者Spark 自带的Standalone模式Q底层存储ؓ文gpȝ或者其他格式的存储pȝ如HBase。Spark作ؓ计算框架Qؓ上层多种应用提供服务?Graphx和MLBase提供数据挖掘服务Q如图计和挖掘q代计算{。Shark提供SQL查询服务Q兼容Hive语法Q性能比Hive?-50 倍,BlinkDB是一个通过权衡数据_度来提升查询晌应旉的交互SQL查询引擎Q二者都可作Z互式查询使用。Spark Streaming流式计分解成一pd短小的批处理计算Qƈ且提供高可靠和吞吐量服务?/p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;"><br /><img src="http://dl2.iteye.com/upload/attachment/0109/5585/a0237912-0cd8-3032-a9fe-cff365afb9e1.png" alt="" style="border: 0px;" /><br /> <strong>2.Spark基本原理</strong></p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;"><strong>Sparkq行框架如下图所C,首先有集资源管理服务(Cluster ManagerQ和q行作业d的结点(Worker NodeQ,然后是每个应用的Q务控制结点Driver和每个机器节点上有具体Q务的执行q程QExecutorQ?/strong></p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;"><strong><br /><img src="http://dl2.iteye.com/upload/attachment/0109/5587/99853dc0-1bb5-323a-9f99-dea35197b965.png" alt="" style="border: 0px;" /><br /> </strong>与MR计算框架相比QExecutor有二个优点:一个是多线E来执行具体的Q务,而不是像MR那样采用q程模型Q?减少了Q务的启动开E。二个是Executor上会有一个BlockManager存储模块Q类gKVpȝQ内存和盘共同作ؓ存储讑֤Q,当需要P?多轮Ӟ可以中间过E的数据先放到这个存储系l上Q下ơ需要时直接读该存储上数据,而不需要读写到hdfs{相关的文gpȝ里,或者在交互式查询场?下,事先表Cache到该存储pȝ上,提高dIO性能。另外Spark在做ShuffleӞ在GroupbyQJoin{场景下L了不必要?Sort操作Q相比于MapReduce只有Map和Reduce二种模式QSparkq提供了更加丰富全面的运操作如 filter,groupby,join{?/p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;"> </p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">Notes: 在集?cluster)方式? Cluster Managerq行在一个jvmq程之中Q而workerq行在另一个jvmq程中。在local cluster中,q些jvmq程都在同一台机器中Q如果是真正的standalone或Mesos及Yarn集群Qworker与master或分布于不同的主Z上?/p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;"> </p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">JOB的生成和q行</p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">job生成的简单流E如?/p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">1.首先应用E序创徏SparkContext的实例,如实例ؓsc</p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">2.利用SparkContext的实例来创徏生成RDD</p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">3.l过一q串的transformation操作Q原始的RDD转换成ؓ其它cd的RDD</p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">4.当action作用于{换之后RDDӞ会调用SparkContext的runJobҎ</p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">5.sc.runJob的调用是后面一q串反应的v点,关键性的跃变发生在此处</p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">调用路径大致如下</p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">1.sc.runJob->dagScheduler.runJob->submitJob</p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">2.DAGScheduler::submitJob会创建JobSummitted的event发送给内嵌ceventProcessActor</p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">3.eventProcessActor在接收到JobSubmmitted之后调用processEvent处理函数</p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">4.job到stage的{换,生成finalStageq提交运行,关键是调用submitStage</p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">5.在submitStage中会计算stage之间的依赖关p,依赖关系分ؓ宽依赖和H依赖两U?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">6.如果计算中发现当前的stage没有M依赖或者所有的依赖都已l准备完毕,则提交task</p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">7.提交task是调用函数submitMissingTasks来完?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">8.task真正q行在哪个worker上面是由TaskScheduler来管理,也就是上面的submitMissingTasks会调用TaskScheduler::submitTasks</p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">9.TaskSchedulerImpl中会ҎSpark的当前运行模式来创徏相应的backend,如果是在单机q行则创建LocalBackend</p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">10.LocalBackend收到TaskSchedulerImpl传递进来的ReceiveOffers事g</p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">11.receiveOffers->executor.launchTask->TaskRunner.run</p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;"> </p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">Spark采用了Scala来编写,在函数表达上Scala有天然的优势Q因此在表达复杂的机器学习算法能力比其他 语言更强且简单易懂。提供各U操作函数来建立起RDD的DAG计算模型。把每一个操作都看成构徏一个RDD来对待,而RDD则表C的是分布在多台机器上的 数据集合Qƈ且可以带上各U操作函数。如下图所C:</p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;"><br /><img src="http://dl2.iteye.com/upload/attachment/0109/5589/76cf1a9e-5b47-33e8-b265-a5ca97b743d7.png" alt="" style="border: 0px;" /><br /> 首先从hdfs文g里读取文本内Ҏ建成一个RDDQ然后用filterQ)操作来对上次的RDDq行qoQ再?用mapQ)操作取得记录的第一个字D,最后将其cache在内存上Q后面就可以对之前cacheq的数据做其他的操作。整个过E都Ş成一个DAG计算 图,每个操作步骤都有定w机制Q同时还可以需要多ơ用的数据cachehQ供后箋q代使用.</p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;"> </p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;"><strong>3.Shark的工作原?/strong></p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">Shark是基于Spark计算框架之上且兼容Hive语法的SQL执行引擎Q由于底层的计算采用了SparkQ?能比MapReduce的Hive普遍?倍以上,如果是纯内存计算的SQLQ要?倍以上,当数据全部load在内存的话,快10倍以上,因此 Shark可以作ؓ交互式查询应用服务来使用?br /><img src="http://dl2.iteye.com/upload/attachment/0109/5591/3c1dd765-73b0-3c56-8abb-38a0d3ddb915.png" alt="" style="border: 0px;" /><br /> 上图是整个Shark的框架图Q与其他的SQL引擎相比Q除了基于Spark的特性外QShark是完全兼容Hive的语法,表结构以及UDF函数{,已有的HiveSql可以直接q行q移至Shark上?/p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;"><br /><strong>与Hive相比QShark的特性如下:</strong></p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">1.以在U服务的方式执行dQ避免Q务进E的启动和销毁开E,通常MapReduce里的每个d都是启动和关闭进E的方式来运行的Q而在Shark中,Serverq行后,所有的工作节点也随之启动,随后以常L务的形式不断的接受Server发来的Q务?/p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">2.Groupby和Join操作不需要Sort工作Q当数据量内存能装下Ӟ一Ҏ收数据一Ҏ行计操作。在Hive中,不管M操作在Map到Reduce的过E都需要对Keyq行Sort操作?/p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">3.对于性能要求更高的表Q提供分布式Cachepȝ表数据事先Cache臛_存中Q后l的查询直接访问内存数据,不再需要磁盘开E?/p><p style="margin: 0px 0px 10px; padding: 0px; color: #616161; font-family: 'microsoft yahei'; line-height: 20px; background-color: #efefef;">4.q有很多Spark的特性,如可以采用Torrent来广播变量和数据,执行计划直接传送给TaskQDAGq程中的中间数据不需要落地到Hdfs文gpȝ?/p><img src ="http://www.aygfsteel.com/paulwong/aggbug/425773.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/paulwong/" target="_blank">paulwong</a> 2015-06-18 13:20 <a href="http://www.aygfsteel.com/paulwong/archive/2015/06/18/425773.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>SPARK架构与流E?/title><link>http://www.aygfsteel.com/paulwong/archive/2015/06/18/425772.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Thu, 18 Jun 2015 05:17:00 GMT</pubDate><guid>http://www.aygfsteel.com/paulwong/archive/2015/06/18/425772.html</guid><wfw:comment>http://www.aygfsteel.com/paulwong/comments/425772.html</wfw:comment><comments>http://www.aygfsteel.com/paulwong/archive/2015/06/18/425772.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/paulwong/comments/commentRss/425772.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/paulwong/services/trackbacks/425772.html</trackback:ping><description><![CDATA[<p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">Spark的整体流EؓQClient 提交应用QMaster扑ֈ一个Worker启动DriverQDriver向Master或者资源管理器甌资源Q之后将应用转化为RDD GraphQ再由DAGSchedulerRDD Graph转化为Stage的有向无环图提交lTaskSchedulerQ由TaskScheduler提交dlExecutor执行。在d执行的过E中Q其他组件协同工作,保整个应用利执行?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">Spark架构采用了分布式计算中的Master-Slave模型。Master是对应集中的含有Masterq程的节点(ClusterManagerQ,Slave是集中含有Workerq程的节炏VMaster作ؓ整个集群的控制器Q负责整个集的正常q行QWorker相当于是计算节点Q接收主节点命o与进行状态汇报;Executor负责d的执行;Client作ؓ用户的客L负责提交应用QDriver负责控制一个应用的执行Q如图下图:</p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><br /><img alt="" src="http://dl2.iteye.com/upload/attachment/0109/5772/08f9d15b-10ea-3486-b6c9-a65809e27b0b.png" title="点击查看原始大小囄" width="760" height="401" style="border: 0px; cursor: url(http://www.iteye.com/images/magplus.gif), pointer;" /><br />                                                                                 Spark 框架?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"> </p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">Spark集群部v后,需要在主节点和从节点分别启动Masterq程和Workerq程Q对整个集群q行控制。在一个Spark应用的执行过E中QDriver和Worker是两个重要角艌ӀDriver E序是应用逻辑执行的v点,负责作业的调度,即Taskd的分发,而多个Worker用来理计算节点和创建Executorq行处理d。在执行阶段QDriver会将Task和Task所依赖的file和jar序列化后传递给对应的Worker机器Q同时Executor对相应数据分区的dq行处理?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"> </p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><strong>Spark的架构中的基本组件介l:</strong></p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><strong>ClusterManager</strong>Q在Standalone模式中即为MasterQ主节点Q,控制整个集群Q监控Worker。在YARN模式中ؓ资源理器?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><strong>Worker</strong>Q从节点Q负责控制计节点,启动Executor或Driver。在YARN模式中ؓNodeManagerQ负责计节点的控制?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><strong>Driver</strong>Q运行Application的main()函数q创建SparkContext?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><strong>Executor</strong>Q执行器Q在worker node上执行Q务的lg、用于启动线E池q行d。每个Application拥有独立的一lExecutors?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><strong>SparkContext</strong>Q整个应用的上下文,控制应用的生命周期?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><strong>RDD</strong>QSpark的基本计单元,一lRDD可Ş成执行的有向无环图RDD Graph?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><strong>DAG Scheduler</strong>Q实现将Spark作业分解成一到多个StageQ每个StageҎRDD的Partition个数军_Task的个敎ͼ然后生成相应的Task set攑ֈTaskScheduler中?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><strong>TaskScheduler</strong>Q将dQTaskQ分发给Executor执行?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><strong>Stage</strong>Q一个Spark作业一般包含一到多个Stage?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><strong>Task</strong>Q一个Stage包含一到多个TaskQ通过多个Task实现q行q行的功能?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><strong>Transformations</strong>Q{?Transformations) (如:map, filter, groupBy, join{?QTransformations操作是Lazy的,也就是说从一个RDD转换生成另一个RDD的操作不是马上执行,Spark在遇到Transformations操作时只会记录需要这L操作Qƈ不会L行,需要等到有Actions操作的时候才会真正启动计过E进行计?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><strong>Actions</strong>Q操?Actions) (如:count, collect, save{?QActions操作会返回结果或把RDD数据写到存储pȝ中。Actions是触发Spark启动计算的动因?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><strong>SparkEnv</strong>Q线E别的上下文,存储q行时的重要lg的引用?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><em>SparkEnv内创建ƈ包含如下一些重要组件的引用?/em></p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><em>MapOutPutTrackerQ负责Shuffle元信息的存储?/em></p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><em>BroadcastManagerQ负责广播变量的控制与元信息的存储?/em></p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><em>BlockManagerQ负责存储管理、创建和查找块?/em></p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><em>MetricsSystemQ监控运行时性能指标信息?/em></p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><em>SparkConfQ负责存储配|信息?/em></p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"> </p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"><br /><img alt="" src="http://dl2.iteye.com/upload/attachment/0109/5774/5fce0961-1030-3805-b649-acafae85170b.png" style="border: 0px;" /><br />                                                                       Sparkq行逻辑?/p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;"> </p><p style="margin: 0px; padding: 0px; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25.2000007629395px; background-color: #efefef;">在Spark应用中,整个执行程在逻辑上会形成有向无环图(DAGQ。Action子触发之后Q将所有篏U的子形成一个有向无环图Q然后由调度器调度该图上的Q务进行运。Spark的调度方式与MapReduce有所不同。SparkҎRDD之间不同的依赖关pd分Ş成不同的阶段QStageQ,一个阶D包含一pd函数执行水Uѝ图中的A、B、C、D、E、F分别代表不同的RDDQRDD内的Ҏ代表分区。数据从HDFS输入SparkQŞ成RDD A和RDD CQRDD C上执行map操作Q{换ؓRDD DQ?RDD B?RDD E执行join操作Q{换ؓFQ而在B和Eq接转化为F的过E中又会执行ShuffleQ最后RDD F 通过函数saveAsSequenceFile输出q保存到HDFS?Hbase?/p><img src ="http://www.aygfsteel.com/paulwong/aggbug/425772.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/paulwong/" target="_blank">paulwong</a> 2015-06-18 13:17 <a href="http://www.aygfsteel.com/paulwong/archive/2015/06/18/425772.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss> <footer> <div class="friendship-link"> <a href="http://www.aygfsteel.com/" title="狠狠久久亚洲欧美专区_中文字幕亚洲综合久久202_国产精品亚洲第五区在线_日本免费网站视频">狠狠久久亚洲欧美专区_中文字幕亚洲综合久久202_国产精品亚洲第五区在线_日本免费网站视频</a> </div> </footer> վ֩ģ壺 <a href="http://" target="_blank">DZ</a>| <a href="http://" target="_blank">Ǧɽ</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">˳</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">ɽ</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">ͩ®</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">ԭ</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">ɳ</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">̨</a>| <a href="http://" target="_blank">ij</a>| <a href="http://" target="_blank">Դ</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"> </a>| <a href="http://" target="_blank">ƽ</a>| <a href="http://" target="_blank">÷</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">̨</a>| <a href="http://" target="_blank">ɽ</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">ʡ</a>| <a href="http://" target="_blank">ƴ</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">Ѩ</a>| <a href="http://" target="_blank">Ϫ</a>| <a href="http://" target="_blank">ƾ</a>| <script> (function(){ var bp = document.createElement('script'); var curProtocol = window.location.protocol.split(':')[0]; if (curProtocol === 'https') { bp.src = 'https://zz.bdstatic.com/linksubmit/push.js'; } else { bp.src = 'http://push.zhanzhang.baidu.com/push.js'; } var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(bp, s); })(); </script> </body>