??xml version="1.0" encoding="utf-8" standalone="yes"?>久草免费在线观看,国产中文日韩欧美,性欧美精品孕妇http://www.aygfsteel.com/ricdong/zh-cnMon, 16 Jun 2025 07:46:00 GMTMon, 16 Jun 2025 07:46:00 GMT60MapReduce 数据分布倾斜?/title><link>http://www.aygfsteel.com/ricdong/articles/366991.html</link><dc:creator>Ric Dong</dc:creator><author>Ric Dong</author><pubDate>Thu, 22 Dec 2011 02:17:00 GMT</pubDate><guid>http://www.aygfsteel.com/ricdong/articles/366991.html</guid><wfw:comment>http://www.aygfsteel.com/ricdong/comments/366991.html</wfw:comment><comments>http://www.aygfsteel.com/ricdong/articles/366991.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/ricdong/comments/commentRss/366991.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/ricdong/services/trackbacks/366991.html</trackback:ping><description><![CDATA[<span id="wmqeeuq" class="Apple-style-span" style="font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 25px; background-color: #ffffff; "><p class="MsoNormal" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "><span style="font-family: 宋体; "><span style="font-size: small; ">数据分布倾斜性指的是数据分布q度集中于数据空间的某端Q造成“头重脚轻”或?#8220;比萨斜塔”{不均匀的分布特炏V数据分布倾斜性将造成q算效率上的“瓉”和数据分析结果的“以偏概全”?/span></span></p><p class="MsoNormal" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "><span style="font-size: small; "><br /></span></p><p class="MsoNormal" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "><span style="font-size: small; "><span style="font-family: 宋体; "><strong style="font-weight: bold; ">效率上的“瓉”</strong></span><span lang="EN-US"></span></span></p><p class="MsoNormal" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "><span style="font-size: small; "><span style="font-family: 宋体; ">假如在大型商ZQ共?/span><span lang="EN-US">A,B1,B2</span><span style="font-family: 宋体; ">…</span><span lang="EN-US">..B9</span><span style="font-family: 宋体; ">十家店铺Q其?/span><span lang="EN-US">A</span><span style="font-family: 宋体; ">店铺中有</span><span lang="EN-US">99W</span><span style="font-family: 宋体; ">商品Q?/span><span lang="EN-US">B1,B2</span><span style="font-family: 宋体; ">…</span><span lang="EN-US">.B9</span><span style="font-family: 宋体; ">q九家店铺分别有</span><span lang="EN-US">1W</span><span style="font-family: 宋体; ">商品。我们要l计商场中商品LQ计初Q采?/span><span lang="EN-US">HASHMAP</span><span style="font-family: 宋体; ">作ؓ存储l构Q其?/span><span lang="EN-US">Key</span><span style="font-family: 宋体; ">Q店?/span><span lang="EN-US"> Value</span><span style="font-family: 宋体; ">Q商品。我们的计算q程是先l计每个店铺的商品LQ最后将l果累加。可以发玎ͼ׃</span><span lang="EN-US">A</span><span style="font-family: 宋体; ">?/span><span lang="EN-US">99W</span><span style="font-family: 宋体; ">商品Q按?/span><span lang="EN-US">1+1</span><span style="font-family: 宋体; ">的篏U方式(假如</span><span lang="EN-US">1+1</span><span style="font-family: 宋体; ">耗时</span><span lang="EN-US">1</span><span style="font-family: 宋体; ">U)Q我们要?/span><span lang="EN-US">99W</span><span style="font-family: 宋体; ">?/span><span lang="EN-US">1</span><span style="font-family: 宋体; ">才能得到</span><span lang="EN-US">A</span><span style="font-family: 宋体; ">店铺的商品LQ总耗时</span><span lang="EN-US">99W</span><span style="font-family: 宋体; ">U)Q?/span><span lang="EN-US">B1,B2</span><span style="font-family: 宋体; ">…</span><span lang="EN-US">.B9</span><span style="font-family: 宋体; ">只需分别累加</span><span lang="EN-US">1W</span><span style="font-family: 宋体; ">?/span><span lang="EN-US">1</span><span style="font-family: 宋体; ">Q分别耗时</span><span lang="EN-US">1W</span><span style="font-family: 宋体; ">U)Q而ؓ了得到商Z的商品LQ我们必ȝ待所有店铺都分别累计l束才能处理dQ显而易见,此时q算瓉侉K中在</span><span lang="EN-US">A</span><span style="font-family: 宋体; ">店铺的商品篏计上?/span><span lang="EN-US"></span></span></p><p class="MsoNormal" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "><span style="font-size: small; "><span style="font-family: 宋体; ">q类状况l常发生在分布式q算q程中,比如</span><span lang="EN-US">Hadoop Job</span><span style="font-family: 宋体; ">计算Q因?/span><span lang="EN-US">map/reduce </span><span style="font-family: 宋体; ">q程中是?/span><span lang="EN-US">Key-value</span><span style="font-family: 宋体; ">形式来处理数据,假如?/span><span lang="EN-US">key</span><span style="font-family: 宋体; ">下的数据量太大,会导致整个计过E中</span><span lang="EN-US">move/shuffle/sort</span><span style="font-family: 宋体; ">的耗时q远高于其他</span><span lang="EN-US">key</span><span style="font-family: 宋体; ">Q因此该</span><span lang="EN-US">Key</span><span style="font-family: 宋体; ">变成为效?#8220;瓉”。一般解军_法是Q自定义</span><span lang="EN-US">partitioner</span><span style="font-family: 宋体; ">Q对所有的</span><span lang="EN-US">Value</span><span style="font-family: 宋体; ">q行自定义分l,使得每组的量较^均,从而解x间瓶颈问题?/span></span></p><p class="MsoNormal" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "><span style="font-size: small; "><br /></span></p><p class="MsoNormal" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "><span style="font-size: small; "><span style="font-family: 宋体; "><strong style="font-weight: bold; ">数据分析l果?#8220;以偏概全”</strong></span><span lang="EN-US"></span></span></p><p class="MsoNormal" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "><span style="font-size: small; "><span style="font-family: 宋体; ">同样使用上述?#8220;商场”案例Qƈ且在此基上我们假?/span><span lang="EN-US">A</span><span style="font-family: 宋体; ">店铺</span><span lang="EN-US">,B9</span><span style="font-family: 宋体; ">店铺是卖低端商品Q?/span><span lang="EN-US">B1,B2</span><span style="font-family: 宋体; ">…</span><span lang="EN-US">..B8</span><span style="font-family: 宋体; ">是卖高端商品Q销量较。如果我们要Ҏ商品销售状况分析店铺在买家当中的受Ƣ迎E度。由?/span><span lang="EN-US">A</span><span style="font-family: 宋体; ">店铺本n商品量大Q而且定位的销售h位是属于薄利多销Q如果只从销售量的考虑Q我们会以ؓ</span><span lang="EN-US">A</span><span style="font-family: 宋体; ">店铺在商Z是最受买家欢q的Q造成“片面”的分析结果?/span><span lang="EN-US"></span></span></p><p class="MsoNormal" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "><span style="font-size: small; "><span style="font-family: 宋体; ">其实Q遇到这U情况,我们首先的分析卖家性质和买家性质Qƈ且用相寚w来作估|比如</span><span lang="EN-US">A</span><span style="font-family: 宋体; ">店铺卖低端商品,日销售量</span><span lang="EN-US">1W</span><span style="font-family: 宋体; ">商品Q?/span><span lang="EN-US">1W/99W<1%, </span><span style="font-family: 宋体; ">?/span><span lang="EN-US">B9</span><span style="font-family: 宋体; ">店铺卖低端商品,日销售量</span><span lang="EN-US">5K</span><span style="font-family: 宋体; ">商品Q?/span><span lang="EN-US">5K/1W=50%,</span><span style="font-family: 宋体; ">所以在低端买家中,低端商品店铺</span><span lang="EN-US">B9</span><span style="font-family: 宋体; ">应该是最受欢q的?/span></span></p></span><img src ="http://www.aygfsteel.com/ricdong/aggbug/366991.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/ricdong/" target="_blank">Ric Dong</a> 2011-12-22 10:17 <a href="http://www.aygfsteel.com/ricdong/articles/366991.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>MapReduce 解析XML法的一Ҏ?/title><link>http://www.aygfsteel.com/ricdong/articles/366960.html</link><dc:creator>Ric Dong</dc:creator><author>Ric Dong</author><pubDate>Wed, 21 Dec 2011 13:15:00 GMT</pubDate><guid>http://www.aygfsteel.com/ricdong/articles/366960.html</guid><wfw:comment>http://www.aygfsteel.com/ricdong/comments/366960.html</wfw:comment><comments>http://www.aygfsteel.com/ricdong/articles/366960.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/ricdong/comments/commentRss/366960.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/ricdong/services/trackbacks/366960.html</trackback:ping><description><![CDATA[<div><div>没想到Hadoop在解析XML时如此纠l,以至于新版api的mapreduce竟然攑ּ了XML格式的format以及readerQ在老版Qhadoop-0.19.*Q的streaming模块提供了这LapiQ由于我用的hadoop-0.20.2 3U1版本Q因此需要把处理XML的几个类ULq来使用?/div><div> </div><div>UL所带来的问题是各处依赖包,和各Uapi不兼宏V没关系Q我可以看一下源码,然后自己写一个。细看了一下reader的代码,发现mapreduce使用了BufferedInputStream的markQreset来寻找XML的tagQ这个tag是我们在提交作业所讄的,比如<log>Q?lt;/log>q样的标{。Java中stream的mark和resetQ允许指针回读,卛_扑ֈ<log>Ӟmark一下指针,然后再找?lt;/log>标签Q最后通过resetҎQ返回到mark的位|,?lt;log></log>内的数据d出来。但在匹配的q程中,我发现mapred使用了BufferedInputStream ?read(); ҎQ该Ҏq回下一个可ȝ字节。那么整个处理过E就是读一个字节,比较一个字节,我没有在mapreduce中用q样的算法,但我试q,向缓冲区QBufferedInputStreamQ中一个字节一个字节的读,性能严重不Qread(); Ҏq_q回旉?31U秒Q处理一?70M的xml文档Qtag比较多)Q竟然花?00+U。(streaming模块q写了一个faster*ҎQ哎Q慢MQ?/div><div> </div><div>周敏同学提供了pig中处理xml的readerQ但pig那边的代码我q没l看Q也不知道hadoop的jira中有没有新的feature来解决现有xml的问题。如果有的话Q不防可以告诉我一下下。呵c?nbsp;</div><div> </div><div>现在有一个构思,即主要思想仍然围绕字节比较Q因为字W串匚w效率更低Q另外算法源于String.indexOf(“”)Q即扑ֈ<log>q个后,C位置Q然后再?lt;/log>Q这L完全匚wQ中间的内容用system.arraycopy来复制到新的字节数组Q目前这法我实C一半,x?lt;log>?lt;/log>后,把这两个{标全部替换掉,170M文档Q用?.2U(最?.3U)?/div><div> </div><div>法及问题:</div><div>首先提供一个BufferedInputStreamQ默认大?kQ在E序中徏一个字节数l,大小?kQ即每次向BufferedInputStream?kQ这个效率是很不错的Q然后去L<log>.toArrayq样的字节数l,q一步速度是很惊h的。但q里有一个小的问题,xơ读4k的大去处理Q那很有可能<log></log>位于两次d的一一_那么我的x是做一个半循环的字节数l,卛_果在4k的字节数l中的最后找?lt;log>Q那么就把前面未匚w的仍掉,然后?lt;log>标签Ud字节数组最前端Q然后另用这个字节数l再向BufferedInputStream中去?k-5长度的内容(5?lt;log>的字节长度)。关?kq个大小Q首先要对XML数据q行samplingQ即定<log></log>当中的内定w度,然后再定q个~冲buf的大?/div></div><img src ="http://www.aygfsteel.com/ricdong/aggbug/366960.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/ricdong/" target="_blank">Ric Dong</a> 2011-12-21 21:15 <a href="http://www.aygfsteel.com/ricdong/articles/366960.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss> <footer> <div class="friendship-link"> <a href="http://www.aygfsteel.com/" title="狠狠久久亚洲欧美专区_中文字幕亚洲综合久久202_国产精品亚洲第五区在线_日本免费网站视频">狠狠久久亚洲欧美专区_中文字幕亚洲综合久久202_国产精品亚洲第五区在线_日本免费网站视频</a> </div> </footer> վ֩ģ壺 <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">ˮ</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">䰲</a>| <a href="http://" target="_blank">཭</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">ǿ</a>| <a href="http://" target="_blank">Դ</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">е</a>| <a href="http://" target="_blank">ѷ</a>| <a href="http://" target="_blank">߷</a>| <a href="http://" target="_blank">ɣ</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">ɳƺ</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">«ɽ</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">߱</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">崲</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">ֶ</a>| <a href="http://" target="_blank">ɽ</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">ƽ</a>| <a href="http://" target="_blank">ȳ</a>| <a href="http://" target="_blank">ͬ</a>| <a href="http://" target="_blank">̩</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">ƽ</a>| <a href="http://" target="_blank"></a>| <a href="http://" target="_blank">ˮ</a>| <script> (function(){ var bp = document.createElement('script'); var curProtocol = window.location.protocol.split(':')[0]; if (curProtocol === 'https') { bp.src = 'https://zz.bdstatic.com/linksubmit/push.js'; } else { bp.src = 'http://push.zhanzhang.baidu.com/push.js'; } var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(bp, s); })(); </script> </body>