ï»??xml version="1.0" encoding="utf-8" standalone="yes"?>
Hadoop周刊 ½W?/span> 176 æœ?/span>
坿˜Žæ˜Ÿè¾°òq›_°å’Œå¤§æ•°æ®æ€ÖM½“¾l„ç¼–è¯?/span>
2016òq?/span>6æœ?/span>29æ—?/span>
Hadoopå³îC¼šæœ¬å‘¨åœ¨åœ£ä½•塞å¬å¼€åQŒæ‰€ä»¥å¾ˆæœŸå¾…在下期周刊看到新™å¹ç›®çš„å‘布和¾_‘Ö½©æ¼”讲åQˆè¯·å‘我们æä¾›ä“Q何相关的òqȯ片)。至于本期周刊,有大é‡å…³äº?/span>Kafka Streamsã€ä»ŽAmazon Kinesiså?/span>Google BigQueryä¼ é€’æµå¼æ•°æ®ã€?/span>Googleæ•°æ®é›†æœç´¢ç³»¾lŸçš„æ–‡ç« ã€?/span>
技术新�/span>
Shine介ç»äº†ä»–们如何ä‹Éç”?/span>Amazon Lambdaå’?/span>Amazon KinesisåQŒä»¥åŠäØ“Apache webæœåС噍æä¾›çš„Kinesis代ç†åQˆç”¨äºŽé‡‡æ—¥å¿—åQ?/span>åQŒä»¥åŠä»ŽEC2¿UÕdŠ¨æ•°æ®åˆ?/span>Google BigQuery的内å®V€‚本文æä¾›äº†Lambda函数åQ?/span>javascript¾~–写åQ‰ä»£ç 片ŒDµï¼Œè§„模和开销斚w¢çš„ä¿¡æ¯ï¼Œæè¿°äº†å¦‚何通过gzip压羃数æ®ä»Žè€Œä¼˜åŒ–ä¼ è¾“å¼€é”€ã€?/span>
https://blog.shinetech.com/2016/06/21/kinesis-lambda-bigquery/
Clouderaåšå®¢æ’°æ–‡ä»‹ç»äº†å¦‚何通过Apache Sparkã€?/span>Apache ImpalaåQˆåµåŒ–ä¸åQ‰ã€?/span>Hueå¯ÒŽ¢¦ä¹‹é˜Ÿæ•°æ®˜q›è¡Œåˆ†æžã€‚本文主è¦èšç„¦åœ¨åˆ†æžä¸Šï¼Œé™„带了些Spark代ç 以åŠHue的功能演½Cºã€?/span>
http://blog.cloudera.com/blog/2016/06/how-to-analyze-fantasy-sports-with-apache-spark-and-sql-part-2-data-exploration/
KDnuggets撰文介ç»äº?/span>13个和Apache Spark相关的主è¦?/span>API/™å¹ç›®/åè¯ã€‚包æ‹?/span>RDDã€?/span>DataFrameã€?/span>Datasetã€ç»“构化‹¹å¼è®¡ç®—ã€?/span>GraphXã€?/span>Tungsten。æ¯ä¸ªæ¡ç›®éƒ½æœ‰ä¸€ŒD늫 节介¾l,‘›_¤Ÿå¾ˆå¥½çš„了è§?/span>Spark主è¦ç‰ÒŽ€§äº†ã€?/span>
http://www.kdnuggets.com/2016/06/spark-key-terms-explained.html
本文æ¥è‡ªConfluentåšå®¢åQŒä»‹¾l了那些虽看èµäh¥½Ž€å•å´åˆä¸½Ž€å•çš„Kafka Streams应用。例如用Kafka Streams¾~–写¾l“åˆç”¨æˆ·ç‚¹å‡»‹¹æ•°æ®å’Œç”¨æˆ·ä½ç½®æ•°æ®çš„程åºã€‚åŽè€…å˜å‚¨åœ¨KTableä¸ï¼ŒKTableæä¾›äº†ç±»ä¼¼å¸¦æœ‰æ•°æ®åº“表主键的抽象åQˆä¸»é”®çš„æœ€æ–°å€¼é€šè¿‡API暴露åQ‰ã€‚最åŽçš„½E‹åºå€’是½Ž€å?/span>——åªæœ‰å‡ 行代ç ã€?/span>
http://www.confluent.io/blog/distributed-real-time-joins-and-aggregations-on-user-activity-events-using-kafka-streams
Clouderaåšå®¢æ’°æ–‡ä»‹ç»äº?/span>meinstadt.deæž„å¾åœ?/span>Apache Flumeã€?/span>Apache Spark Streamingã€?/span>Apache ImpalaåQˆåµåŒ–ä¸åQ‰ä¸Šçš?/span>HTTPè¯äh±‚异常‹‚€‹¹‹ç³»¾lŸã€‚实çŽîC»£ç 放在了github上ã€?/span>
http://blog.cloudera.com/blog/2016/06/how-to-detect-and-report-web-traffic-anomalies-in-near-real-time/
AWS大数æ®åšå®¢æœ‰æ•™ç¨‹ä»‹ç»äº†å¦‚何ä‹Éç”?/span>Apache Sparkå’?/span>Apache Zeppelinä»?/span>Amazon EMR集群处ç†Amazon Kinesis‹¹æ•°æ®ã€‚本文包å«äº†ä¸€äº›é€šè¿‡Zeppelin notebook˜q行SQL产生的数æ®å¯è§†åŒ–范例ã€?/span>
http://blogs.aws.amazon.com/bigdata/post/Tx3K805CZ8WFBRP/Analyze-Realtime-Data-from-Amazon-Kinesis-Streams-Using-Zeppelin-and-Spark-Strea
Apache KuduåQˆåµåŒ–ä¸åQ‰æŽ¥˜q?/span>1.0版å‘布了åQŒå°†å…¨é¢æ”¯æŒé«˜å¯ç”¨æ€§ã€‚本文介¾l了˜q™æœ€åŽä¸€å—拼å›?/span>“ä¸Õd¤åˆ?/span>”是如何实现的。晒了下JIRA上儿U问题的跟进的情况,以åŠå®Œæˆä¸Žå‰©ä½™çš„‹¹‹è¯•ã€?/span>
http://kudu.apache.org/2016/06/24/multi-master-1-0-0.html
Google的所有数æ®åã^å°æ‹¥æœ‰è¶…˜q?/span>260亿的数æ®é›†ï¼Œæ¯å¤©è¦æ·»åŠ å’Œåˆ é™¤16亿的数æ®é›†èµ\å¾„ã€‚äØ“äº†è·Ÿítªã€æŸ¥è¯¢ã€æ¯”较数æ®é›†åQŒä»–ä»¬ç ”å‘了Google Dataset SearchåQ?/span>GOODSåQ‰ã€?/span>GOODS跟踪ç”?/span>API暴露的元数æ®åQŒè¿™äº›å…ƒæ•°æ®è¢«ç”¨äºŽæ£€ç´¢ã€ç›‘控ç‰ã€?/span>
http://dl.acm.org/citation.cfm?id=2903730
å…¶ä»–æ–°é—»
SiliconAngle采访äº?/span>Hortonworks CEO Rob Bearden。主题包括业界趋åŠÑ€?/span>Hortonworks财务ã€?/span>Hortonworksçš„éžHadoop技术以åŠç‰©è”网ã€?/span>
http://siliconangle.com/blog/2016/06/24/hadoop-and-beyond-a-conversation-with-hortonworks-ceo-rob-bearden/
产å“å‘布
Apache Sentry本周å‘布äº?/span>1.7.0版,修å¤äº?/span>bugåQŒå¢žåŠ äº†æ–°ç‰¹æ€§å’Œå…¶ä»–æ–šw¢çš„æå‡ã€‚本‹Æ¡å‘布把HiveæŽˆæƒæ¡†æž¶å‡çñ”åˆîCº†½W¬äºŒç‰ˆã€?/span>
http://mail-archives.us.apache.org/mod_mbox/www-announce/201606.mbox/%3CCAPOmu3sDqdzu9ntDSvkMaDRQnVfHrkGV5qhyh-ZRiMmwgMMvBA@mail.gmail.com%3E
åŸÞZºŽApache Cassandra 3.0æž„å¾çš?/span>DataStax Enterprise 5.0åQŒå¢žåŠ äº†å¯¹å›¾æ•°æ®ã€åˆ†å±‚å˜å‚¨ã€?/span>Cassandra多实例的支æŒã€‚本‹Æ¡å‘å¸ƒä¹Ÿå¢žåŠ äº†è¯¸å¦‚åŠ å¯†å’ŒåŸÞZºŽè§’色讉K—®æŽ§åˆ¶çš„é™„åŠ å®‰å…¨ç‰¹æ€§æ”¯æŒã€?/span>
https://www.datastax.com/2016/06/introducing-datastax-enterprise-5-0
DrivenåQŒå¤§æ•°æ®åº”用性能监控¾pÈ»Ÿå‘布äº?/span>2.2版。本‹Æ¡å‘布的亮点是对Apache Spark的监控æä¾›äº†æ”¯æŒã€?/span>
BlueDataå‘å¸ƒäº†ä»–ä»¬äØ“Amazon Web Servicesæä¾›çš?/span>EPICä¼ä¸šå¤§æ•°æ®æ—¢æœåŠ¡äº§å“。本产å“通过½Ž€å•的点击ž®Þpƒ½è‡ªåŠ¨è£…è²åˆ°åŸºäº?/span>Dockerçš?/span>Hadoop集群ã€?/span>
http://www.bluedata.com/blog/2016/06/big-data-as-a-service-on-prem-or-cloud-bdaas/
Apache Accumuloå‘布äº?/span>1.7.2版。本‹Æ¡å‘布修å¤äº†write-aheadæ—¥å¿—å¤„ç†æ–¹å¼åQŒä¼˜åŒ–了RFilesåQŒä»¥åŠæ€§èƒ½ä¸Šçš„ž®æå‡ã€?/span>
https://accumulo.apache.org/release_notes/1.7.2.html
Apache ZooKeeper的顶¾U?/span>SDKåQ?/span>Apache Curatorå‘布äº?/span>2.11.0å’?/span>3.2.0版ã€?/span>
https://cwiki.apache.org/confluence/display/CURATOR/Releases#Releases-June23,2016,Releases2.11.0and3.2.0available
Apache Hiveå‘布äº?/span>2.1.0版。修å¤äº†å¤§é‡bug和功能增强,包括å¯?/span>Hiveçš?/span>Live Longerå’?/span>Prosper 改进和以å?/span>JDBC支æŒã€?/span>
‹zÕdЍ
ä¸å›½
7æœ?/span>2æ—?/span> 上æ“vBigData Streaming½W¬ä¸‰‹Æ¡è§é¢ä¼š
Hadoop周刊 ½W?/span> 175 æœ?/span>
坿˜Žæ˜Ÿè¾°òq›_°å’Œå¤§æ•°æ®æ€ÖM½“¾l„ç¼–è¯?/span>
2016òq?/span>6æœ?/span>19æ—?/span>
Hadoopå³îC¼šå·²è¿‡åŽÖM¸€å‘¨äº†åQŒæˆ‘们已看到有多个äñ”å“(™å¹ç›®åQ‰æ•²å®šäº†å‘布旉™—´ã€‚所以在技术新闻部分,有关äº?/span>Hadoop Kerberos认è¯çš„内容å¦å¤–还æœ?/span>Salsify应用Avroçš„æ–‡ç« ã€‚åœ¨äº§å“å‘布部分åQŒåŒ…æ‹?/span>Yandexæ–°è¿‘å¼€æºçš„åˆ—å¼æ•°æ®åº“在内的多个™å¹ç›®å‡æœ‰æ–°ç‰ˆæœ¬å‘布ã€?/span>
技术新�/span>
OpenCoreåšå®¢æ’°æ–‡½Cø™Œƒäº†å¤š¿U?/span>Hadoop Kerberos认è¯å议调试工具。尤其示范了如何使用UserGropuInformationçš?/span>“main()”æ–ÒŽ³•导出一些有用的调试信æ¯ã€?/span>
http://www.opencore.com/blog/2016/5/user-name-handling-in-hadoop/
YARN¾pÕdˆ—æ–‡ç« çš„ç¬¬å››éƒ¨åˆ†ï¼ŒClodueraåšå®¢ä»‹ç»äº†å¦‚何酾|®å…¬òqŒ™°ƒåº¦é˜Ÿåˆ—。尤其对资溾U¦æŸè®„¡½®ã€é˜Ÿåˆ—安¾|®ç–ç•¥å’ŒæŠ¢å ˜q›è¡Œäº†è¯¦è§£ã€?/span>
SalsifyåŸÞZºŽApache Kafkaæž„å¾äº†ä¸€ä¸ªå¼‚æ¥å¾®æœåŠ¡æž¶æž„åQŒåƈ采用Apache Avro˜q›è¡Œæ•°æ®åºåˆ—化。该应用使用Rubyå¼€å‘,他们创å¾äº†å¤šä¸ªæ–°å·¥å…·ä½¿å¾—Avro能和Rubyè¯è¨€å¾ˆå¥½çš„é…åˆã€‚本文介¾l了˜q™äº›å·¥å…·å’Œå®ƒä»¬çš„ä»·å€û|¼šavro-builder用于定义记录ã€åŸºäº?/span>postgresçš„æ¨¡å¼æ³¨å†Œè¡¨åQ?/span>avromatic则从avro schemaç”Ÿæˆæ¨¡åž‹ã€?/span>
http://blog.salsify.com/engineering/adventures-in-avro
Apache Drillå¯ä»¥åŠ¨æ€æŽ¨æ–æ¨¡å¼ï¼Œ˜q˜æ”¯æŒå¤šæ¨¡å¼(但相互兼å®?/span>)æ•°æ®ã€‚è¿™¿U组åˆä‹É得一些有‘£çš„用例得以实现åQŒä¾‹å¦‚跨多个ä¸åŒæ¨¡å¼çš?/span>jsonæ–‡äšg查询ã€?/span>MapRåšå®¢æŽ¢ç©¶äº†è¿™äº›ç‰¹æ€§åƈ˜q›è¡Œäº†ç¤ºèŒƒã€?/span>
https://www.mapr.com/blog/sql-query-mixed-schema-data-using-apache-drill
本教½E‹å±•½CÞZº†å¦‚何ž®?/span>Druidä¸?/span>Apache Kafka¾l“åˆæž„律¹å¼åˆ†æžå’Œå¯è§†åŒ–åQˆå€ŸåŠ©PivotåQ?/span>Druidçš?/span>web UIåQ‰åº”用ã€?/span>
http://www.confluent.io/blog/building-a-streaming-analytics-stack-with-apache-kafka-and-druid
Apache BeamåQˆåµåŒ–ä¸åQ‰åšå®¢æ’°æ–‡ä»‹¾l了他们在连æŽ?/span>Apache Flink批处ç†é›†¾Ÿ¤æ–¹é¢çš„æˆæžœã€?/span>Beam是一个开æº?/span>SDKåQŒæœ€åˆæ¥è‡ªäºŽGoogleåQŒç”¨äºŽæš´éœ²åŽç«¯æœªçŸ¥æ•°æ®ç®¡é?/span>APIã€?/span>
http://beam.incubator.apache.org/blog/2016/06/13/flink-batch-runner-milestone.html
Cask Hydrator是一个通过UI界é¢é‡‡ç”¨æ‹–æ‹½æ–¹å¼æž„徿•°æ®½Ž¡é“的工兗÷€‚本教程也演½CÞZº†å¦‚何使用Hydrator把数æ®ä»ŽMySQL导入åˆ?/span>HDFSã€?/span>
http://blog.cask.co/2016/06/bringing-relational-data-into-data-lakes/
Databricks撰文介ç»äº†å³ž®†å‘布的Apache Spark 2.0䏿–°çš?/span>SQLåæŸ¥è¯¢åŠŸèƒ½ã€‚æœ‰‘£çš„æ˜¯ï¼Œæœ¬æ–‡ä»¥æ‰‹å†ŒåÅžå¼å‘ˆçŽŽÍ¼Œæœ€ç›´æˆªäº†å½“的展çŽîCº†ä»£ç 和范例数æ®ã€?/span>
https://databricks.com/blog/2016/06/17/sql-subqueries-in-apache-spark-2-0.html
Apache KuduåQˆåµåŒ–ä¸åQ‰åšå®¢æ’°å†™äº†åœ¨å•集群节点使用Raftçš„æ–‡ç« ï¼Œå€Ÿæ¤åŠ¨æ€æ‰©å±•到多主节点集群ã€?/span>
http://getkudu.io/2016/06/17/raft-consensus-single-node.html
å…¶ä»–æ–°é—»
本文指出Apache Spark½C‘ÖŒºå¦‚æžœä¸ç”¨å¿ƒç»è¥ï¼Œå¯èƒ½ä¼šé‡èµ°å› ¼„Žç‰‡åŒ–导è‡?/span>Apache Hadoop生æ€ç³»¾lŸæØœä¹Þqš„è€èµ\。ä‹D例æ¥è¯ß_¼Œæœ€æ–°ç‰ˆæœ¬çš„CDHå’?/span>HDP支æŒä¸åŒç‰ˆæœ¬çš?/span>Sparkã€?/span>
https://techcrunch.com/2016/06/12/spark-fragmentation-undermines-community/
New Stack撰写了一½‹‡å…³äº?/span>Concordçš„æ–‡ç« ï¼ŒConcord是一个构建在Apache Mesos上新的æµå¼å¤„ç†æ¡†æžÓž¼ˆå…¬å¼€‹¹‹è¯•状æ€ï¼‰ã€?/span>Concord使用C++å¼€å‘,支æŒåŠ¨æ€æ‹“æ‰‘ï¼ˆæ— éœ€åœæœºå®žçް½Ž¡é“çš„å¢žåŠ å’Œå‡å°‘åQ‰ã€?/span>
http://thenewstack.io/concord-leverages-mesos-high-performance-stream-processing/
éšç€Databricks½C‘ÖŒºç‰ˆçš„æ£å¼å‘布åQ?/span>Databrickså‘布了ä‹Éç”?/span>Databricks¾~–写Apache Spark应用½E‹åº¾pÕdˆ—教程的第一½‹‡ã€?/span>
https://databricks.com/blog/2016/06/15/an-introduction-to-writing-apache-spark-applications-on-databricks.html
Hadoopåœ£ä½•å¡žå³°ä¼šäºŽå‡ å‘¨å‰å¬å¼€åQŒæœŸé—´ä‹Dè¡Œäº†é¢˜äØ““大数æ®è¡Œä¸šä¸çš„女æ€?/span>”专场åˆå®´ã€?/span>Hortonworksåšå®¢ç‰ÒŽ„é‡‡è®¿äº†åˆå®´ä¸»æŒähHortonworks CMOåQ?/span>Ingrid Burtonã€?/span>
http://hortonworks.com/blog/summer-hortonworks-part-2-wibd-assertive-innovative-take-risks/
产å“å‘布
Apache SystemMLåQˆåµåŒ–ä¸åQ‰æœ€˜q‘å‘布了0.10.0版ã€?/span>SystemML是一个机器å¦ä¹ 框æžÓž¼Œç”±å¤šä¸ªé¡¹ç›®åœ¨èƒŒåŽæ”¯æ’‘åQŒåŒ…æ‹?/span>Apache Sparkå’?/span>Apache Hadoop。本‹Æ¡å‘布包括新çš?/span>Spark Matrix Block¾cÕdž‹ã€æ”¯æŒæ·±åº¦å¦ä¹ ã€æ€§èƒ½ä¸Šçš„æå‡ã€æ–°çš?/span>KNN½Ž—法½{‰ç‰ã€?/span>
http://systemml.apache.org/0.10.0-incubating/release_notes.html
Apache MahoutåQŒå¦ä¸€ä¸ªæœºå™¨å¦ä¹ 框架å‘布了0.12.2版。本‹Æ¡å‘布å‘ç€é›†æˆApache Zeppelinå¯è§†åŒ–和支æŒnotebookçš„ç›®æ ‡è¿ˆ˜q›äº†ä¸€æ¥ã€?/span>
http://mail-archives.us.apache.org/mod_mbox/www-announce/201606.mbox/%3CCAOtpBjgBAuQs5FiX5X_5A+Rd-A1fVz0R7SKttGe4cJuCLRiGww@mail.gmail.com%3E
Qubole宣布他们çš?/span>HBase-as-a-Serviceå·²ç»åœ?/span>AWS上æä¾›ã€‚它为长时è¿è¡Œé›†¾Ÿ¤æä¾›äº†è®¸å¤šæ¼‚亮的特性。支æŒ?/span>Hannibalå’Œå…¶å®ƒç›‘æŽ§å·¥å…øP¼Œé›†æˆäº?/span>Apache ZeppelinåQŒåƈ能通过节点引导½E‹åºä¸?/span>OpenTSDBå’?/span>Apache Phoenixé…ç½®ã€?/span>
https://www.qubole.com/blog/product/quboles-hbase-as-a-service-is-generally-available-on-aws/
Altiscaleå‘布äº?/span>Altiscale Insight Cloud实时版。本¾pÈ»Ÿç”?/span>Apache HBaseå’?/span>Spark Streaming支撑ã€?/span>
https://www.altiscale.com/blog/announcing-the-altiscale-insight-cloud-real-time-edition/
`hs2client`æ˜¯ä¸€ä¸ªäØ“Apache Hiveå’?/span>Apache ImpalaåQˆåµåŒ–ä¸åQ‰æä¾›çš„æ–?/span>C++库。除了支æŒ?/span>C++åQŒè¿™ä¸ªåº“˜q˜ç»‘定了pythonåQŒå¯ä»¥åœ¨pandasä¸æŠŠæ•°æ®è¯Õdˆ°DataFrameã€?/span>
MapR在其å‘è¡Œç‰ˆä¸æ”¯æŒäº?/span>Apache Spark 2.0å¼€å‘者预览版ã€?/span>
https://www.mapr.com/blog/spark-20-now-developer-preview-mode-mapr-platform
Apache Beamå‘布了其0.1.0åµåŒ–版,是本™å¹ç›®åŠ å…¥ApacheåµåŒ–器以æ¥é¦–‹Æ¡å‘布ã€?/span>
http://beam.incubator.apache.org/beam/release/2016/06/15/first-release.html
Yandexå¼€æºäº†ClickHouseåQŒä¸€ä¸ªåˆ—å¼åˆ†æžæ•°æ®åº“。本¾pÈ»Ÿä¸ºæ¨ªå‘å’Œ¾Uµå‘扩展而生。支æŒå¤æ‚æ•°æ®ç±»åž‹ï¼ˆä¾‹å¦‚数组åQ‰å’Œ˜q‘似查询。该团队˜q˜å‘布了与其它数æ®åº“相比的基准测试结果ã€?/span>
https://clickhouse.yandex/
‹zÕdЍ
ä¸å›½
Hadoop周刊 ½W?/span> 174 æœ?/span>
坿˜Žæ˜Ÿè¾°òq›_°å’Œå¤§æ•°æ®æ€ÖM½“¾l„ç¼–è¯?/span>
2016òq?/span>6æœ?/span>12æ—?/span>
Sparkå³îC¼šæœ¬å‘¨åœ¨æ—§é‡‘å±±å¬å¼€åQŒæ£å¦‚所料,本期周刊有大é‡å…³äº?/span>Apache Spark的新闅R€å…¬å‘Šå’Œç‰ˆæœ¬å‘布。除Spark外,本期˜q˜æœ‰Kafkaã€?/span>Caskã€?/span>Ambariæ–šw¢çš„æ–‡ç« 。在产å“å‘布部分åQŒæœ‰ä¸€òq´æ¥Apache Pig首次版本更新åQŒè¿˜ä¸€ä¸ªäؓ分布å¼ç³»¾lŸè®¾è®¡çš„½Ž€‹z新工具RunwayåQŒæœ€åŽæ˜¯æ–°ç‰ˆApache KuduåQˆåµåŒ–ä¸åQ‰ã€?/span>
技术新�/span>
Debezium是一个相对较新的™å¹ç›®åQŒç”¨äºŽæ•°æ®åº“å’?/span>Apache Kafka topic行çñ”æ”¹å˜æ•°æ®æ•èŽ·ã€‚å½“é¢æ”¯æŒ?/span>MySQLã€?/span>Zookeeperã€?/span>KafkaåQŒè¿™æ˜¯ä¸€½‹‡åœ¨Dockerã€?/span>Kubernetes容器上酾|?/span>Zookeeper, Kafka, MySQL的教½E‹ã€?/span>
http://debezium.io/blog/2016/05/31/Debezium-on-Kubernetes/
有些人对Apache Kafka™å¹ç›®å®£å¸ƒé‡‡ç”¨å¦ä¸€¿Uæµå¼å¤„ç†å¼•擎感到惊è®Óž¼Œ˜q™å°±æ˜?/span>Kafka Streamsã€?/span>Kafka Streams与其它系¾lŸå˜åœ¨æ˜¾è‘—的关键差异。本文很好的½Cø™Œƒäº†è¿™äº›ä¸åŒç‚¹——abstractionã€éƒ¨¾|²æ¨¡åž‹ã€æ”¯æŒåŸºäºŽçжæ€çš„计算ã€?/span>
https://softwaremill.com/kafka-streams-how-does-it-fit-stream-landscape/
æ¯ä¸ªä½¿ç”¨MapReduceã€?/span>Spark或类似系¾lŸçš„äººéƒ½ä¼šé™·å…¥éš¾ä»¥è°ƒè¯•ã€æ•°æ®ç‰¹å¾?/span>bug˜q™äº›é—®é¢˜ä¸ã€?/span>BigDebugæ˜?/span>UCLAåQˆåŠ å·žå¤§å¦æ´›æ‰çŸ¶åˆ†æ ¡åQ‰çš„ç ”ç©¶™å¹ç›®/论文åQŒæ—¨åœ¨è®©å¼€å‘äh员通过工具å‘çŽ°å•æœºé—®é¢˜åQšä¼ 入傿•°å¯¼è‡´çš„崩溃åQŒè·Ÿítªã€æ–ç‚V€è§‚察点ã€åšg˜qŸæŠ¥è¦ç‰ã€‚该工具支æŒApache Spark 1.2.1上ã€?/span>
https://blog.acolyer.org/2016/06/07/bigdebug-debugging-primitives-for-interactive-big-data-processing-in-spark/
Cask撰文介ç»äº†åœ¨å¼€æº?/span>Cask Data Application Platform (CDAP)ä¸è¿è¡?/span>Sparkçš„æ–‡ç« ã€‚è¿è¡Œåœ¨CDAPçš?/span>Spark½E‹åºé€šè¿‡è®‰K—®Apache TephraåQˆåµåŒ–ä¸åQ‰å®žçŽ°ç»†¾_’度事务支æŒã€‚è¿™æ øP¼Œž®Þpƒ½å¾ˆå®¹æ˜“利用快照隔¼›Õd®žçŽîC»Žä¸€ä¸ªè¡¨å¤åˆ¶åˆ°å¦ä¸€ä¸ªè¡¨çš„一致性ã€?/span>CDAPä¸çš„Spark也能讉K—®Cask TrackeråQ?/span>Cask Trackeræä¾›æ•°æ®è¡€¾~˜ä¿¡æ¯ï¼ˆä»€ä¹ˆæ—¶å€™åˆ›å»ºã€ä‹É用ç‰åQ‰ã€‚æ ¹æ®åº”用的ä¸åŒåQ?/span>CDAP工具˜q˜èƒ½å‘挥更大价倹{€?/span>
http://blog.cask.co/2016/06/cdap-spark-prototype-to-production/
IBM Hadoop Devåšå®¢æ’°å†™äº†ä»ŽcURL调用Ambari REST API的教½E‹ã€‚还½Cø™Œƒäº†åœ¨vanillaå’Œå¯ç”¨äº†kerberos的集¾Ÿ¤ä¸Šå»ºç«‹ä¼šè¯åQŒåƈ为接下æ¥çš„请求å¤ç”¨ä¼šè¯ã€?/span>
https://developer.ibm.com/hadoop/2016/06/07/ambari-rest-calls-for-kerberos-enabled-clusters/
Google云åã^å°åšå®¢æ’°æ–‡ä»‹¾l了如何调试˜q行åœ?/span>Google Dataflow上的Apache BeamåQˆåµåŒ–ä¸åQ‰ä“QåŠ¡ã€‚äØ“äº†è°ƒè¯•æ€§èƒ½ç“‰™¢ˆåQ?/span>Dataflow有一些有用的¾lŸè®¡æ•°æ®å’?/span>UIæ¥å¸®åŠ©ä‹É用者深入æ¯ä¸€ä¸ªæ¥éª¤ã€?/span>
https://cloud.google.com/blog/big-data/2016/06/understanding-timing-in-cloud-dataflow-pipelines
å…¶ä»–æ–°é—»
Transaction Processing Performance Council(TPC)å‘布äº?/span>TPCx-BB基准‹¹‹è¯•åQŒè¯¥åŸºå‡†‹¹‹è¯•为大数殾pÈ»Ÿè®¾è®¡ã€‚除了衡é‡?/span>SQL外,˜q˜å¯ä»¥å¯¹æœºå™¨å¦ä¹ 集群和分¾c»é—®é¢˜è¿›è¡Œæµ‹è¯•ã€?/span>
http://www.datanami.com/2016/06/01/big-data-benchmark-gauges-hadoop-platforms/
伦敦Strata + Hadoop世界大会两周å‰å·²å¬å¼€ã€‚演讲者的专题报告和å‰òç¯ç‰‡å·²å‘布到会议¾|‘站上ã€?/span>
http://conferences.oreilly.com/strata/hadoop-big-data-eu/public/schedule/proceedings
Splice MachineåQ?/span>Hadoop上的RDBMSæž„å¾è€…,宣布开æºä»–们的软äšg。当å‰ï¼Œä»–们æ£åœ¨å¯ÀL‰¾è´¡çŒ®è€?/span>/导师/è±ªæ°æ¥æå‡å¼€æºåŽçš„æ•ˆæžœã€?/span>Splice Machine有丞®‘有‘£çš„ç‰ÒŽ€§ï¼Œä¾‹å¦‚ACID事务åQŒäºŒ¾U§çƒ¦å¼•,引用完整性ã€?/span>
http://www.splicemachine.com/were_going_open_source/
Altiscaleåšå®¢¾~–辑了许多关于客æˆähœåŠ¡ã€æƒ…感分æžã€æ°”候å˜åŒ–ã€æ™ºæ…§åŸŽå¸‚ã€?/span>bias½{‰æ–¹é¢çš„大数æ®åº”ç”¨æ¡ˆä¾‹æ–‡ç« ã€‚è¿˜æ”‰™›†äº†ä¸€äº›å¤§æ•°æ®æ€€ç–‘è®ºè€…çš„æ–‡ç« ã€?/span>
https://www.altiscale.com/blog/big-data-news-health-and-public-safety-sentiment-analysis-fixing-education-2/
Sparkå³îC¼šæœ¬å‘¨åœ¨æ—§é‡‘å±±å¬å¼€ã€‚会议组¾l‡è€?/span>Databricks概述了两天内的çƒç‚¹å†…容,链接了许多的演讲和专题报告ã€?/span>
https://databricks.com/blog/2016/06/08/another-record-setting-spark-summit.html
大数æ®å³æœ?/span>åŠ?/span>åQˆBDaaSåQ‰å…¬å?/span>QuboleåQŒæ’°æ–‡ä»‹¾l了他们的客户如何接å—ä‹Éç”?/span>Spark。接å—速度之快——一åŠå¤šçš„客æˆïLŽ°åœ¨å¼€å§‹ç”¨Sparkã€?/span>Qubole也支æŒ?/span>PrestoåQŒä»–们也看到了类似的增长ã€?/span>
https://www.qubole.com/blog/big-data/spark-usage/
Twitterå?/span>ApacheåµåŒ–器æäº¤äº†ä»–们的å¤åˆ¶æ—¥å¿—æœåŠ?/span>DistributedLogã€?/span>
https://wiki.apache.org/incubator/DistributedLogProposal
Big Data Day LAäº?/span>6æœ?/span>9日在西洛æ‰çŸ¶å¦é™¢å¬å¼€ã€‚è¿™‹Æ¡æ´»åŠ¨æ˜¯å…费的(如果预先注册的è¯åQ‰ï¼Œæ¼”讲者æ¥è‡ªäºŽConfluentã€?/span>Databricksã€?/span>Yahooã€?/span>Netflix½{‰ã€?/span>
http://www.bigdatadayla.com/
产å“å‘布
Apache Sparkå‘布äº?/span>Spark 2.0预览版。å‘布声明ä¸è¯´é“API和功能都ž®šæœªæœ€¾lˆæ•²å®šã€?/span>
https://spark.apache.org/news/spark-2.0.0-preview.html
JustOneæž„å¾òq¶å¼€æºäº†Kafka-to-PostgreSQL˜qžæŽ¥å™¨ã€‚本文介¾l了该连接器的性能åQŒè¯¦¾l†æ˜qîCº†å¦‚何把消æ¯è{æ¢äؓ行,˜q˜æ˜qîCº†å¦‚何讑֮šé…ç½®½{‰ã€?/span>
http://www.confluent.io/blog/kafka-connect-sink-for-postgresql-from-justone-database
Salesforceå¼€æºäº†RunwayåQŒè¿™æ˜¯ä¸€ä¸ªå¾æ¨¡ã€ä»¿çœŸä»¥åŠå¯è§†åŒ–分布å¼ç³»¾lŸã€‚在runway.system上有一个在¾U¿æ¼”½CºçŽ¯å¢ƒï¼Œæ¼”ç¤ºäº?/span>“too many bananas”模型åQŒç”µæ¢¯ç³»¾lŸå’ŒRaft一致性系¾lŸã€?/span>
https://medium.com/salesforce-open-source/runway-intro-dc0d9578e248
Bloomberg最˜q‘å¼€æºäº†Presto AccumuloåQŒé¢å?/span>Apache Accumuloçš?/span>Presto˜qžæŽ¥å™¨ã€‚在声明ä¸ï¼Œé“¾æŽ¥äº?/span>11™å늚„论文åQŒæ¯”较了åŸÞZºŽçš?/span>Presto查询和基äº?/span>Accumulo Java API查询的基准测试结果ã€?/span>
http://www.bloomberg.com/company/announcements/open-source-at-bloomberg-reducing-application-development-time-via-presto-accumulo/
å¾?/span>è½?/span>Azureå‘布了基äº?/span>Apache Spark 1.6.1 ½E›_®šç‰ˆçš„Azure HDInsight。本‹Æ¡å‘布支æŒäº†é¢å‘Sparkçš?/span>Project Livy RESTä»ÕdŠ¡æœåŠ¡æ”¯æŒåQŒé›†æˆäº†Azureæ•°æ®æ¹–å˜å‚¨ï¼ˆåŸÞZºŽè§’色的访问控åˆÓž¼‰åQŒé›†æˆäº†IntelliJåQŒæ”¯æŒäº†Jupyter½W”记本ç‰ã€?/span>
https://azure.microsoft.com/en-us/blog/apache-spark-for-azure-hdinsight-now-generally-available/
LinkedInå¼€æºäº†Photon MLåQŒä»–们的大规模回归分æžåº“ã€?/span>Photonæž„å¾åœ?/span>Spark之上òq¶åœ¨LinkedInçš?/span>YARN上è¿è¡Œï¼ˆ˜q‡åŽ»åŸÞZºŽMapReduceåQŒä¼¼ä¹Žå› 䏸™¦æå‡æ€§èƒ½æ‰è¿¿U»ï¼‰ã€?/span>
https://engineering.linkedin.com/blog/2016/06/open-sourcing-photon-ml
Hortonworkså‘布äº?/span>Spark-HBase˜qžæŽ¥å™¨çš„æŠ€æœ¯é¢„览版。预览版原生支æŒAvroåQŒæ”¯æŒè¿è¡Œå®‰å…¨é›†¾Ÿ¤ï¼ŒåŽŸç”Ÿæ”¯æŒSpark Datasource APIåQŒåƈ优化了分åŒÞZ¿®å‰ªï¼Œåˆ—修剪,谓è¯ä¸‹æŽ¨ã€?/span>
http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/
Databrickså‘布äº?/span>Apache Sparkòq›_°çš„第一阶段安全ç‰ÒŽ€§ã€‚本阶段寚w›†¾Ÿ?/span>ACLã€?/span>SAML 2.0˜q›è¡Œäº†æ”¯æŒï¼Œç«¯å¯¹ç«¯çš„审计日志ã€?/span>
https://databricks.com/blog/2016/06/08/achieving-end-to-end-security-for-apache-spark-with-databricks.html
Apache ORC 1.1.0版å‘布了。本‹Æ¡å‘布完æˆäº†ä»ŽåŸºäº?/span>Apache Hive的代ç 到åŸÞZºŽJava的代ç è¿¿U»ï¼Œä¿®æ£äº?/span>C++æ—‰™—´æˆ›_¤„ç†ç¨‹åºï¼Œå¢žåŠ äº?/span>Hadoop MapReduce˜qžæŽ¥å™¨ã€?/span>
http://orc.apache.org/news/2016/06/10/ORC-1.1.0/
Apache Kuduå‘布äº?/span>0.9.0ç‰ˆã€‚å¢žåŠ äº†UPSERT命ä×oåQŒæ–°çš?/span>Sparkæ•°æ®æºä¸ä¼šä¾èµ?/span>MapReduce APIåQŒæå‡äº†Tablet Server写性能ã€?/span>
http://getkudu.io/2016/06/10/apache-kudu-0-9-0-released.html
Google云æœåŠ¡åã^å°å›¢é˜Ÿå‘布了支æŒSpark 2.0预览版的Google Cloud Dataprocã€?/span>
https://cloud.google.com/blog/big-data/2016/06/google-cloud-dataproc-the-fast-easy-and-safe-way-to-try-spark-20-preview
DoryåQ?/span>Bruceçš„ç‘ô承者)Kafka producer的守护进½E‹ï¼ŒçŽ°åœ¨æ”¯æŒä»?/span>UNIX domain sockets或本åœ?/span>TCP接收数æ®äº†ã€?/span>
http://mail-archives.apache.org/mod_mbox/kafka-users/201606.mbox/%3C1465683894.608424023@apps.rackspace.com%3E
Apache Pig 0.16.0版,一òq´æ¥é¦–次å‘布。åšå®šäº†å¯?/span>Tez的支æŒã€?/span>
http://pig.apache.org/releases.html#8+June%2C+2016%3A+release+0.16.0+available
‹zÕdЍ
ä¸å›½
Spark Meetup (上æ“v) – 周å…, 6æœ?/span>18æ—?/span>
Hadoop周刊 ½W?/span> 173 æœ?/span>
坿˜Žæ˜Ÿè¾°òq›_°å’Œå¤§æ•°æ®æ€ÖM½“¾l„ç¼–è¯?/span>
2016òq?/span>6æœ?/span>5æ—?/span>
本周åQ?/span>Sparkã€?/span>NiFiã€?/span>Netflix Mesonã€?/span>Stormæ–šw¢åªæœ‰ž®‘é‡å†…容ã€?/span>Sparkå³îC¼šæœ¬å‘¨åœ¨æ—§é‡‘å±±å¬å¼€åQŒæ‰€ä»¥å‘¢åQŒä¸‹å‘¨è‚¯å®šæœ‰ä¸å°‘内容ã€?/span>
技术新�/span>
Databricksåšå®¢ä»‹ç»äº?/span>Apache Spark 2.0的新ç‰ÒŽ€?/span>——è·¨è¯a€æ”¯æŒå˜å‚¨å’ŒåŠ è½½æœºå™¨å¦ä¹ 模型。模型通过½Ž€å•çš„API被å˜å‚¨å’ŒåŠ è²åQŒæ¨¡åž‹çš„元数æ®ä¸Žå‚æ•°ä¿å˜ä¸?/span>JSONé£Žæ ¼åQŒæ¨¡åž‹çš„æ•°æ®ä¿å˜ä¸?/span>Parqueté£Žæ ¼ã€?/span>
https://databricks.com/blog/2016/05/31/apache-spark-2-0-preview-machine-learning-model-persistence.html
https://databricks.com/blog/2016/05/31/apache-spark-2-0-preview-machine-learning-model-persistence.html
Mesonæ˜?/span>Netflix用于执行机器å¦ä¹ 工作‹¹çš„æ¡†æž¶ã€‚它æ˜?/span>Apache Hiveã€?/span>Sparkã€?/span>Mesos˜q™äº›å¤§æ•°æ®æŠ€æœ¯ä¹‹é—´çš„¾_˜åˆå‰‚。工作æµä½¿ç”¨DSL˜q›è¡Œ¾~–写åQ?/span>Meson˜q˜æä¾›äº†æ›´åŠ å…ˆè¿›çš„æµæ°´çº¿å¯è§†åŒ?/span>UIã€?/span>Netflixç›®å‰æ²¡å¼€æº?/span>MesonåQŒä½†ä»–们有这斚w¢çš„计划ã€?/span>
http://techblog.netflix.com/2016/05/meson_31.html
IBM Hadoop Devåšå®¢½Ž€è¦ä»‹¾lå’Œ½Cø™Œƒäº?/span>HDFSå½’æ¡£å˜å‚¨èƒ½åŠ›ã€?/span>
https://developer.ibm.com/hadoop/2016/06/01/use-hdfs-archival-storage/
Apache Storm 1.0有了令äh惊讶的新ç‰ÒŽ€§ã€‚æœ¬æ–‡å…³æ³¨äº†å‡ ä¸ªè°ƒè¯•èƒ½åŠ›æ–šw¢çš„å¢žå¼ºï¼šåŠ¨æ€æ—¥å¿—çñ”别ã€ç»Ÿä¸€æ—¥å¿—æœçƒ¦ã€?/span>事äšgæŠ½æ ·ã€é›†æˆ?/span>jstack/heap dumps/java飞行记录器分æž?/span>workerã€?/span>
http://hortonworks.com/blog/whats-new-apache-storm-1-0-part-1-enhanced-debugging/
Clouderaåšå®¢æ’°æ–‡ä»‹ç»äº†å¦‚何ä‹Éç”?/span>Apache Sparkæ¥æŽ¢ç´¢æ€§åˆ†æžå˜å‚¨åœ¨CSVæ–‡äšgä¸çš„NBA历岾lŸè®¡æ•°æ®ã€‚分æžè¿‡½E‹æØœåˆä‹É用了Scalaå’?/span>SQLã€?/span>
http://blog.cloudera.com/blog/2016/06/how-to-analyze-fantasy-sports-using-apache-spark-and-sql/
Apache NiFiä½œäØ“ä¸€¿U通用工具å—到了很多的å…Ïx³¨ã€‚它ä¸?/span>“åŸÞZºŽ‹¹ç¨‹çš„处ç?/span>”而生åQŒå¯èƒ½å¯¹å¾ˆå¤šäººåÆˆä¸æ„味ç€ä»€ä¹ˆï¼Œä½?/span>NiFiæ”¯æŒæ ‡å‡†çš?/span>ETLåQŒæµå¼å¤„ç†ç‰ã€‚许å¤?/span>NiFi例å都示范了如何ä»?/span>Twitter firehose把数æ®ç§»åŠ¨åˆ°HDFSä¸ï¼Œä½†æœ¬æ–‡èšç„¦åœ¨NiFiå¦å¤–的特性上——½Cø™Œƒäº†ä¸€äº›ç®€å•çš„ä»?/span>HTTP拉数æ®çš„˜q‡ç¨‹ã€?/span>
http://hortonworks.com/blog/apache-nifi-not-scratch/
Amazon Redshiftæž„å¾äº?/span>PostgreSQLå¼•æ“Žä¸Šï¼Œæ‰€ä»¥ä½ å¯ä»¥åˆ©ç”¨PostgreSQL的扩展功能让Redshift集群˜qžæŽ¥PostgresSQLå®žä¾‹ã€‚è¿™æ ·ä¸€æ¥ï¼Œè¯¸å¦‚跨数æ®åº“˜qžæŽ¥ã€å°†Redshift的结果è{æ¢äØ“JSONã€åœ¨Postgresä¸åˆ›å»?/span>Redshiftæ•°æ®è§†å›¾ã€?/span>
æ•°æ®åº“之间å¤åˆ¶æ•°æ®ç‰æœ‰è¶£çš„应用都能实现ã€?/span>
http://blogs.aws.amazon.com/bigdata/post/Tx1GQ6WLEWVJ1OX/JOIN-Amazon-Redshift-AND-Amazon-RDS-PostgreSQL-WITH-dblink
å…¶ä»–å‘布
FeatherCastå‘布了超˜q?/span>100ä¸?/span>ApacheCon北美å³îC¼šçš„相兛_½•韟ë€?/span>
http://feathercast.apache.org/tag/apacheconna2016/
InfoWorld介ç»äº?/span>HeronåQ?/span>Twitteræ‰å¼€æºçš„Apache Storm兼容™å¹ç›®ã€‚本文介¾l了两个™å¹ç›®åœ¨æž¶æž„上的ä¸åŒã€‚ä¸»è¦æŒ‡å‡ÞZº†Heronèµäh¥äºŽå‡ 个月å‰ï¼ˆStormå·²å‘布)åQŒå°±æ˜¯è¯´Storm在特性上æ¯?/span>Heron更有优势ã€?/span>
http://www.infoworld.com/article/3078134/analytics/had-it-with-apache-storm-heron-swoops-to-the-rescue.html
Databricksåœ?/span>edX上开了一门新评¡¨‹åQ?/span>“Apache Spark入门”。课½E‹ä»Ž6æœ?/span>15日开始,一直挾l两周ã€?/span>
launch-first-of-five-free-big-data-courses-on-apache-spark.html
产å“å‘布
Amazon EMRå‘布äº?/span>4.7.0版。本‹Æ¡å‘布支æŒäº†Apache Tezå’?/span>Apache PhoenixåQŒåƈ内置了新版本çš?/span>Apache HBaseã€?/span>Apache Mahoutã€?/span>Presto。å¦å¤–,AWS大数æ®åšå®¢è¿˜æŒ‡å¯¼äº?/span>Phoenix如何上手ã€?/span>
http://aws.amazon.com/blogs/aws/amazon-emr-4-7-0-apache-tez-phoenix-updates-to-existing-apps/
Apache Hive本周å‘布äº?/span>2.0.1版。从二月å‘布2.0.0以æ¥åQŒé¦–‹Æ¡å°ç‰ˆæœ¬å‘布。本‹Æ¡ä¿®å¤äº†60ä¸?/span>bugã€?/span>
http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CD37344A3.77A64%25sershe@apache.org%3E
‹zÕdЍ
ä¸å›½
æ—?/span>
Hadoop周刊 ½W?/span> 172 æœ?/span>
坿˜Žæ˜Ÿè¾°òq›_°å’Œå¤§æ•°æ®æ€ÖM½“¾l„ç¼–è¯?/span>
2016òq?/span>5æœ?/span>22æ—?/span>
本周主è¦å…Ïx³¨‹¹å¼è®¡ç®——— Twitterå’?/span>Cloudera介ç»äº†ä»–们新的æµå¼è®¡½Ž—框æžÓž¼Œæœ‰æ–‡ç« 介¾l了Apache Flinkçš„æµå¼?/span>SQLåQ?/span>DataTorrent介ç»äº?/span>Apache Apex定w”™æœºåˆ¶åQŒè¿˜æœ?/span>Concord˜q™æ ·æ–°çš„‹¹å¼è®¡ç®—框架åQŒå¦å¤–还æœ?/span>Apache Kafkaçš?/span>0.10版。其他新é—ÀL–¹é¢ï¼ŒApacheåµåŒ–器有新动å?/span>——Apache TinkerPopå’?/span>Apache ZeppelinåµåŒ–æˆäØ“™å¶çñ”™å¹ç›®åQ?/span>Tephra˜q›å…¥åµåŒ–器。除了上˜q°å†…容,Apache Sparkã€?/span>Apache HBaseã€?/span>Apache Drillã€?/span>Apache Ambari½{‰ä¹Ÿæœ‰æ–°æ–‡ç« ã€?/span>
技术新�/span>
DataTorrentåšå®¢æ’°æ–‡ä»‹ç»äº?/span>Apache Apexåœ¨è¯»å†™æ•°æ®æ–‡ä»¶æ—¶çš„容错机制ã€?/span>Apexæ˜¯ä¸“é—¨å¤„ç†æµå¼æ•°æ®çš„åQŒæµå¼è®¡½Ž—有一些微妙但é‡è¦çš„细节需è¦è€ƒè™‘。例如ä‹Éç”?/span>HDFS输出æ—Óž¼ŒHDFS的租¾U¦æœºåˆ¶ä¼šå¼•å‘问题ã€?/span>
https://www.datatorrent.com/blog/fault-tolerant-file-processing/
Databricksåšå®¢ä»‹ç»äº?/span>Spark 2.0ä¸?/span>Tungsten代ç 生æˆå¼•擎带æ¥çš„æ€§èƒ½æå‡ã€‚åšæ–‡ä‹D例说明了ç”׃ºŽè™šæ‹Ÿå‡½æ•°çš„管ç†ï¼Œæ›´å¥½åœ°åˆ©ç”?/span>CPU寄å˜å™¨å’Œå¾ªçޝ展开åQŒæ‰€ä»¥ä»£ç 生æˆå¼•擎能更快的生æˆä»£ç 。除äº?/span>Databricksçš„åšæ–‡å¤–åQ?/span>Morning Paper˜q˜è°ˆåˆîC»¥ä¸ŠæŠ€æœ¯å…¶å®žæ˜¯å—到VLDB论文的å¯å‘ã€?/span>
https://blog.acolyer.org/2016/05/23/efficiently-compiling-efficient-query-plans-for-modern-hardware/
StreamScope是微软æµå¼å¤„ç†ç³»¾lŸï¼Œæ˜?/span>Morning Paper本周撰写的å¦ä¸€ä¸ªæµå¼è®¡½Ž—æ–‡ç« ã€‚ä»‹¾l了该系¾lŸçš„特得—åžåé‡?/span>/集群大å°ã€ç¼–½E‹æ¨¡åž?/span>(SQL)ã€æ—¶é—´æ¨¡åž‹ã€è¯ä¹‰å¦/ä¿è¯åQŒä»¥åŠå¾®è½¯äñ”å“ä¸çš„应用ã€?/span>
https://blog.acolyer.org/2016/05/24/streamscope-continuous-reliable-distributed-processing-of-big-data-streams/
Apacheåšå®¢æ’°æ–‡ä»‹ç»äº?/span>HubSpot团队å¯?/span>Apache HBaseçš?/span>G1GC调优斚w¢çš„ç»éªŒã€‚本文回™å?/span>HubSpot如何ž®è¯•å’Œä¿éšœç¨³å®šæ€§ã€å¦‚何ä¿éš?/span>99%的性能ã€å¦‚何羃çŸèŠ±åœ¨åžƒåœ‘Ö›žæ”¶ä¸Šçš„æ—¶é—´ã€‚该团队使用很多技巧,很好地决¾läº†é”™ç»¼å¤æ‚çš?/span>GC½Ž—法。本文最åŽï¼Œ˜q˜ä¸€æ¥æ¥½Cø™Œƒäº?/span>HBaseçš?/span>G1GC调优ã€?/span>
https://blogs.apache.org/hbase/entry/tuning_g1gc_for_your_hbase
LinkedInæ’°æ–‡é˜è¿°äº†è°ƒè¯?/span>Kafkaåç§»é‡ç®¡ç†é—®é¢˜çš„诸多困难。本文èšç„¦äº†ä¸¤ä¸ªæ‰€è°?/span>"offset rewind"事äšg的症çŠÓž¼Œå¦‚何在监控过½E‹ä¸‹‚€‹¹‹åˆ°˜q™ç±»äº‹äšgåQŒä»¥åŠå¯¼è‡´è¿™ä¸¤ä¸ªäº‹äšgçš„æ ¹æœ¬åŽŸå› ï¼ˆåŠè§£å†Ïx–¹æ¡ˆï¼‰ã€?/span>
https://engineering.linkedin.com/blog/2016/05/kafkaesque-days-at-linkedin--part-1
Databricksåšå®¢å‘布了ä‹Éç”?/span>Apache Spark˜q›è¡ŒåŸºå› å˜å¼‚分枾pÕdˆ—æ–‡ç« çš„ç¬¬ä¸‰éƒ¨åˆ†ä¹Ÿæ˜¯æœ€åŽä¸€½‹‡ã€‚本文从准备åQˆæŠŠæ–‡äšg转æ¢åˆ?/span>Parquetòq¶åŠ è½½è¿›Spark RRDåQ‰åˆ°å¦‚ä½•åŠ è²åŸºå› 型数æ®å†åˆ°è¿è¡?/span>kmeansèšç±»½Ž—法åŸÞZºŽåŸºå› 型特å¾é¢„‹¹‹åœ°ç†ç§¾Ÿ¤ã€?/span>
https://databricks.com/blog/2016/05/24/predicting-geographic-population-using-genome-variants-and-k-means.html
许多批处ç†å¤§æ•°æ®ç”Ÿæ€ç³»¾lŸå·²ä»Žè‡ªå®šä¹‰API回到SQL上,所以如果æµå¼å¤„ç†æ¡†æž¶ä¹Ÿå‘ç”Ÿäº†åŒæ ïLš„å˜åŒ–åQŒä¸€å®šå¾ˆæœ‰è¶£ã€‚本文,Apache Flink团队介ç»ä»–们计划支挋¹å¼SQLã€?/span>Flinkå·²ç»æœ‰äº†Table APIåQŒä»–们利ç”?/span>Apache Calciteæä¾›äº†å¯¹SQL的支æŒã€‚对äº?/span>windowingåQŒä»–们计划用Calciteçš„æµå¼?/span>SQL扩展。最åˆå¯¹SQL的支æŒå°†åœ?/span>1.1.0版ä¸ä½“现åQŒåœ¨1.2.0ç‰ˆåŠ å¼ºã€?/span>
http://flink.apache.org/news/2016/05/24/stream-sql.html
本文介ç»äº?/span>Apache Drillçš?/span>XMLæ’äšg。尽½Ž¡è¿˜æ²¡æœ‰å’?/span>Drill集æˆåœ¨ä¸€èµøP¼Œä½†å®ƒç›¸å½“å®ÒŽ˜“被编译æˆjar和酾|®å¯¹XML的支æŒã€?/span>
https://www.mapr.com/blog/how-use-xml-plugin-apache-drill
Hortonworksåšå®¢½Ž€ç•¥ä»‹¾l了Ambari监控度釾pÈ»Ÿçš„æž¶æž„,最˜q‘åŠ å…¥äº†Grafanaä½œäØ“å…¶å‰ç«¯äÈA表盘。该¾pÈ»Ÿä½¿ç”¨Apache Phoenixå’?/span>Apache HBaseä½œäØ“å˜å‚¨æ”¯æ’‘åQŒæ‰€ä»¥æ˜¯å¯ä»¥æ¨ªå‘扩展的ã€?/span>
http://hortonworks.com/blog/hood-ambari-metrics-grafana/
˜q™ç¯‡æ•™ç¨‹ä»‹ç»äº†æ€Žæ ·åœ?/span>Amazon EMR上ä‹Éç”?/span>Spark SQLä¸?/span>Hueã€?/span>Apache Zeppeliné…刘q行SQL查询å˜å‚¨åœ?/span>S3ä¸è·¨åˆ¶è¡¨½W¦åˆ†å‰²çš„æ•°æ®ã€‚本文最åŽå±•½CÞZº†å¦‚何ä»?/span>Sparkå?/span>DynamoDBå˜å‚¨æ•°æ®ã€?/span>
http://blogs.aws.amazon.com/bigdata/post/Tx2D93GZRHU3TES/Using-Spark-SQL-for-ETL
Heroku团队分äín了他们ä‹É用最新版Apache Kafka的体éª?/span>——æ‰å¼•入的timestampå—æ®µåQ?/span>8å—节åQ‰ä¼šå¯ÆD‡´ä¸€äº›å直觉的性能å˜åŒ–ã€?/span>
https://engineering.heroku.com/blogs/2016-05-27-apache-kafka-010-evaluating-performance-in-distributed-systems/
å…¶ä»–æ–°é—»
O'Reillyæ•°æ®æ’客¿U€ž®?/span>Spark 2.0ä¸ç»“构化‹¹å¼è®¡ç®—æ–šw¢çš„问题采访了æ¥è‡ªDatabricksçš?/span>Michael Armbrust。网站上的一½‹‡æ–‡ç« 选择引用了其ä¸çš„è¯é¢˜—— Spark SQLã€ç»“构化‹¹å¼è®¡ç®—çš„ç›®æ ‡ã€ç«¯åˆ°ç«¯½Ž¡é“çš„ä¿è¯ã€å¯¹åœ¨çº¿å¤„熘q用Spark机器å¦ä¹ ½Ž—法ã€?/span>
https://www.oreilly.com/ideas/structured-streaming-comes-to-apache-spark-2-0
本周两个大数æ®é¡¹ç›®ä»ŽApacheåµåŒ–器åµåŒ–完æˆ?/span>——Apache TinkerPopå’?/span>Apache Zeppelinã€?/span>TinkerPop是图计算框架åQ?/span>Zeppelin是é¢å‘æ•°æ®åˆ†æžåŸºäº?/span>webçš?/span>notebookã€?/span>
https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces91
https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces92
TephraåQ?/span>HBase的事务引擎进入了ApacheåµåŒ–器ã€?/span>Tephra最åˆç”±Cask的团队创建,目å‰ä»…å’ŒApache Phoenix˜q›è¡Œäº†é›†æˆã€?/span>
http://blog.cask.co/2016/05/tephra-a-transaction-engine-for-hbase-moves-to-apache-incubation/
TechRepublic撰文介ç»äº?/span>Concord.ioåQŒä¸€ä¸ªç”±C++å¼€å‘çš„‹¹å¼å¤„ç†æ¡†æž¶ã€‚旨在填补高性能‹¹å¼è®¡ç®—市场的空¾~ºã€?/span>
http://www.techrepublic.com/article/could-concord-topple-apache-spark-from-its-big-data-throne/
产å“å‘布
Apache Avro本周å‘布äº?/span>1.8.1版。修å¤äº†‘…过20ä¸?/span>bug和一些其它进æ¥ã€?/span>
http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CCAO4re1nYMm79WQ2LUeODWjHmJ9EiYOF=mty6p2aiq-S_4R95iQ@mail.gmail.com%3E
Confluentå‘布了基äº?/span>librdkafkaå¼€å‘çš„Kafka Python客户端ã€?/span>
https://pypi.python.org/pypi/confluent-kafka/0.9.1.1
ä¼´éšç€æ–°çš„Kafka ‹¹å¼è®¡ç®—æ–¹å¼åQ?/span>Apache Kafka 0.10版å‘布了。新版本支æŒäº†æœºæž¶æ„ŸçŸ¥å’Œæ¶ˆæ¯ä¸çš„timestampåQŒæå‡äº†SASLå’?/span>Kafka Connect½{‰ã€?/span>
http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CCAPuboUuRyCRxDp5CLjv2yVM77SpYFF+HdnBeiiyeumYTJNpY4g@mail.gmail.com%3E
Confluentå‘布了基äº?/span>Apache Kafka 0.10çš?/span>Confluent Platform 3.0版。除äº?/span>Kafkaçš„æ ¸å¿ƒç‰¹æ€§ï¼ŒConfluent Platform˜q˜æœ‰ä¸€ä¸ªå•†ä¸šç»„件䨓Kafka Connectæä¾›é…置工具和端到端‹¹ç›‘控ã€?/span>
http://www.confluent.io/blog/announcing-apache-kafka-0.10-and-confluent-platform-3.0
Apache KylinåQŒå¤§æ•°æ®OLAP引擎åQŒå‘布了1.5.2版。作ä¸ÞZ¸€‹Æ¡è¡¥ä¸çñ”çš„å‘布,1.5.2有丞®‘æ–°ç‰ÒŽ€?/span>/æå‡/bugä¿®å¤åQŒåŒ…括支æŒ?/span>CDH 5.7å’?/span>MapRã€?/span>
http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CCA+LQBaTDxb4wVYVvtOC22gMbJ0p9cvhAWzEY_x2n1oNGvEDPSQ@mail.gmail.com%3E
Twitterå¼€æºäº†ä»–们的æµå¼å¤„ç†ç³»¾l?/span>Heronã€?/span>Heronæ˜?/span>Twitter用于替æ¢Apache Stormçš„äñ”å“,å‘力点在性能ã€è°ƒè¯•以åŠå¼€å‘äh员生产率ã€?/span>
https://blog.twitter.com/2016/open-sourcing-twitter-heron
Envelope是æ¥è‡ªäºŽCloudera Labs的新™å¹ç›®åQŒå®ƒæä¾›äº†åŸºäºŽé…¾|®æ–‡ä»¶çš„‹¹å¼ETL处熘q‡ç¨‹ã€‚构建在Spark streaming之上åQ?/span>Envelope最˜q‘æ£åœ¨ç ”å‘é¢å?/span>Kafkaå’?/span>Kudu的连接器ã€?/span>
http://blog.cloudera.com/blog/2016/05/new-in-cloudera-labs-envelope-for-apache-spark-streaming/
‹zÕdЍ
ä¸å›½
Spark Meetup 4 (æå·ž) – 周日, 6æœ?/span>5æ—?/span>
http://www.meetup.com/Hangzhou-Apache-Spark-Meetup/events/231071384/
Hadoop周刊 ½W?/span> 171 æœ?/span>
坿˜Žæ˜Ÿè¾°òq›_°å’Œå¤§æ•°æ®æ€ÖM½“¾l„ç¼–è¯?/span>
2016òq?/span>5æœ?/span>22æ—?/span>
本周åQŒåŒ…æ‹?/span>LinkedInæ–°å¼€æºé¡¹ç›®åœ¨å†…çš„å‡ ä¸ª™å¹ç›®éƒ½æœ‰ç‰ˆæœ¬å‘布。在技术新é—Õd’Œå…¶ä»–新闻斚w¢åQŒå¤š½‹‡æ–‡ç« 回™å¾äº†Apache: Big Data North America会议åQŒå¦å¤–有一¾l„è·¨‘Šå¤šä¸ªä¸åŒæ•°æ®ç³»¾lŸåˆ†æžçº½¾U¦å‡º¿UŸèžRæ•°æ®çš„ç³»åˆ—æ–‡ç« ã€?/span>
技术新�/span>
Databricksåšå®¢åˆ†æžäº?/span>Apache Sparkä¸ä¸¤¿U逯D¿‘½Ž—法。之一åQ?/span>“approxCountDistict”是用æ¥è¯„ä¼îC¸åŒå€¼çš„æ•°é‡åQ›ä¹‹äºŒï¼Œ“approxQuantile”用于生æˆé€ÆD¿‘癑ֈ†æ¯”。本文介¾l了½Ž—法和å¯è§†åŒ–¾_‘Öº¦ä¸åŒçš„æ®‹å·®ã€?/span>
https://databricks.com/blog/2016/05/19/approximate-algorithms-in-apache-spark-hyperloglog-and-quantiles.html
本教½E‹æ˜qîCº†å¦‚何使用Apache Hadoop HDFSã€?/span>Apache Solrã€?/span>Hueå˜å‚¨ã€çƒ¦å¼•ã€æŸ¥è¯?/span>DICOMæ ¼å¼çš„医å¦åª„åƒã€‚æ–‡ç« è¯½I¿äº†åŠ è²å’ŒèŽ·å–æ•°æ®çš„æ•´ä¸ªæ¥éª¤ã€?/span>
http://blog.cloudera.com/blog/2016/05/how-to-process-and-index-medical-images-with-apache-hadoop-and-apache-solr/
MapR Streams是一ä¸?/span>API兼容Apache Kafka的系¾lŸã€‚本文在å®è§‚上比较了MapR Streamså’?/span>Kafka的异åŒã€‚åŒæ—‰™˜æ˜Žäº†Kafka Streamsæ€Žæ ·å’?/span>MapR Streams扯上关系的ã€?/span>
https://www.mapr.com/blog/apache-kafka-and-mapr-streams-terms-techniques-and-new-designs
æœ¬æ–‡åœ¨æˆ‘çœ‹æ¥æ˜¯æœ€æ¸…晰介ç»Paxosçš„æ–‡ç« ä¹‹ä¸€åQ?/span>Paxos为分布弾pÈ»Ÿæž„å¾äº†ä¸€è‡´æ€§å议。本文用¾l˜å›¾è®¡ç®—æœºå’Œåˆ†å¸ƒå¼æ‹å–示范了˜q™ä¸ªåè®®ã€?/span>
http://ifeanyi.co/posts/understanding-consensus/
åŸÞZºŽApache: Big Data North America会议上的一½‹‡æ¼”讌Ӏ?/span>Datanami½H¥æŽ¢äº†å³ž®†å‘布的Apache Hadoop 3的新ç‰ÒŽ€§ã€‚包括,shell脚本é‡å†™ã€ä“Q务集本地优化ã€å†…å˜å¤§ž®è‡ªåЍäŽ×¾~©èƒ½åŠ›ã€æ”¯æŒ?/span>HDFS erasure codings。本文ç€é‡åœ¨erasure codingsä¸Šï¼Œæ–‡ç« å¯†åˆ‡å…Ïx³¨äº?/span>erasure codings在å˜å‚¨æ•ˆçŽ‡æ–¹é¢çš„æå‡åQ?/span>3x¼‚盘消耗é™ä½Žåˆ°1.5xåQ‰ã€?/span>
http://www.datanami.com/2016/05/18/hadoop-3-poised-boost-storage-capacity-resilience-erasure-coding/
˜q™ç¯‡æ¼”讲æ¥è‡ªäº?/span>PyDataæŸæž—会议åQŒæ˜qîCº†Apache Arrowå’?/span>Featheræ–‡äšgæ ¼å¼åQŒæŽ¢½I¶äº†æ•°æ®åœ¨è·¨è¯è¨€/框架互æ“作性的工作机制ã€?/span>
http://www.slideshare.net/wesm/python-data-ecosystem-thoughts-on-building-for-the-future
å‘布了两个æ¥è‡ªäºŽä¸åŒä¼šè®®ä¸?/span>Apache Kafka有关的演讲视频。第一个讨è®ÞZº†Kafka的安全特性,½W¬äºŒä¸ªæŽ¢ç´¢äº†Kafka如何跨系¾lŸå…±äº«æ•°æ®ã€?/span>
https://www.oreilly.com/learning/securing-apache-kafka
https://www.infoq.com/presentations/event-streams-kafka
˜q™ç¯‡åšå®¢é›†æˆäº†æ•°½‹‡åˆ©ç”?/span>Amazon Redshiftã€?/span>Google BigQueryã€?/span>Postgresã€?/span>Presto数殾pÈ»ŸåŠ è²/查询¾U½çº¦å‡ºç§Ÿè½¦æ•°æ®çš„æ–‡ç« 。除了原始基准测试,˜q˜è¯¦¾l†ä»‹¾läº†å¦‚ä½•å¤„ç†æ•…éšœã€ä¼˜åŒ–ã€æ¯”较替代方案(AWSçš?/span>S3ä¸?/span>HDFS比)ã€?/span>
http://tech.marksblogg.com/all-billion-nyc-taxi-rides-redshift.html
O'Reilly撰文介ç»äº†é€šè¿‡Kafkaã€?/span>Flinkã€?/span>Elasticsearchã€?/span>Kibanaæ€Žæ ·å®žçŽ°kappaæž¶æž„ã€‚æ–‡ç« æ¦‚˜qîCº†lambdaå’?/span>kappaæž¶æž„åQŒä»‹¾l了主è¦çš„æž¶æž„组ä»Óž¼Œä»¥åŠæ€Žæ ·è®„¡½®ä½¿ç”¨è´å¶æ–¯æ¨¡åž‹å‘现新奇事物ã€?/span>
http://www.oreilly.com/ideas/applying-the-kappa-architecture-in-the-telco-industry
å…¶ä»–æ–°é—»
本文列ä‹D了最˜q‘在Apache: Big Data North America会议上æåˆ°çš„å‡ ä¸ªå¤§æ•°æ®ç”Ÿæ€ç³»¾lŸé¡¹ç›®ã€‚有ä¸å°‘是我们没¾U›_…¥è§†çº¿çš„内å®V€?/span>
http://www.datanami.com/2016/05/11/open-source-tour-de-force-apache-big-data-2016/
Pivotalåšå®¢æœ‰ä¸€½‹‡å…³äºŽå¤§æ•°æ®å’Œæ•æ·å¼€å‘有‘£çš„æ–‡ç« 。大数殾pÈ»Ÿå¾€å¾€åœç•™åœ¨éžæ•æ·çš„世界,例如在装载数æ®å‰éœ€æ±‚è¦æ”‰™›†åˆîC½åQŒæ¨¡åž‹è¦å®šä¹‰å¥½ã€‚本文认为,没有在云环境ä¸ç»˜q‡é•¿æœŸéªŒè¯çš„è¯ï¼Œè¦å¯¹˜q™ç§æ–¹å¼˜q›è¡Œ¾U¦æŸåQˆæœ‰é™çš„能力和性能ã€ç«–äº•å¼æ•°æ®½{‰ï¼‰ã€?/span>
https://blog.pivotal.io/big-data-pivotal/features/when-it-comes-to-big-data-cloud-and-agility-go-hand-in-hand
Databrickså‘布了他们记录的¾|‘络会议视频“Apache Spark MLlib: From Quick Start to Scikit-Learn”。除了视频内容,他们˜q˜åœ¨ä¼šè®®ä¸è§£½{”了八个常è§é—®é¢˜ã€?/span>
https://databricks.com/blog/2016/05/18/spark-mllib-from-quick-start-to-scikit-learn.html
Hortonworksåšå®¢å›žé¡¾äº?/span>Apache Storm的历åŒÓ€?/span>2011òq´å¼€æºï¼Œ2013òq´è¿›å…?/span>ApacheåµåŒ–器,2014òq´æˆä¸ºé¡¶¾U§é¡¹ç›®ï¼Œä»Šå¹´åˆå‘布了1.0版。本文论˜qîCº†æ¯ä¸ªé‡Œç¨‹¼„‘çš„ä¸»è¦æŠ€æœ¯è¿›æ¥ã€?/span>
http://hortonworks.com/blog/brief-history-apache-storm/
HBaseCon本周在旧金山å¬å¼€ã€‚è¿™‹Æ¡ä¼šè®®ï¼ŒAppleã€?/span>Yahooã€?/span>Facebookéƒ½æœ‰æ¼”è®²ææ–™ã€?/span>
MapRå‘图庆ç¥äº†è¿‡åŽÖM¸€òq´ä¸Apache Drillå–得的戾l©ã€‚一òq´ä¸å‘布äº?/span>7个版本,完æˆäº†å¤šä¸ªé‡Œ½E‹ç¢‘ã€?/span>
https://www.mapr.com/blog/happy-anniversary-apache-drill-what-difference-year-makes
Datanamiå‘布了在Apache: Big Data North America会议上,ASFæ€È›‘Jim Jagielskiå’?/span>ODPi™å¹ç›®æ€È›‘John Mertic的问½{”录åQŒå¦‚大家所料,主è¦è¯é¢˜˜q˜æ˜¯ASFå’?/span>ODPi的关¾p…R€?/span>
http://www.datanami.com/2016/05/20/apache-foundation-keeps-eyes-wide-open-odpi/
产å“å‘布
LinkedInå¼€æºäº†AmbryåQŒä»–们的ObjectStore分布å¼ç³»¾lŸã€?/span>Ambry代ç å·²æäº¤åˆ°githubåQŒè¿™½‹‡å𿖇介¾l了Ambryçš„æœåŠ¡æ‰¿è¯ºï¼Œè®¾è®¡ç›®æ ‡åQŒä½“¾pÀLž¶æž„和接å£ã€?/span>
https://engineering.linkedin.com/blog/2016/05/introducing-and-open-sourcing-ambry---linkedins-new-distributed-
ç”?/span>apache HAWQåQˆåµåŒ–ä¸åQ‰é©±åŠ¨çš„Pivotal HDB 本周å‘布äº?/span>2.0版,HDBä¸?/span>Hadoopæä¾›äº†åˆ†æžæ•°æ®åº“ã€?/span>
https://blog.pivotal.io/big-data-pivotal/products/fail-fast-and-ask-more-questions-of-your-data-with-hdb-2-0
Apache Mahout本周å‘布äº?/span>0.12.1版,Mahout是一个机器å¦ä¹ å’Œæ•°æ®æŒ–掘¾pÈ»Ÿã€‚本‹Æ¡å‘布旨在推˜q?/span>Flinkä¸?/span>Mahout的集æˆã€?/span>
http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CCAOtpBjhshagyLN3Qnt0xRnc7YbnMVJjTS4piVXL7LiS2pQguXw@mail.gmail.com%3E
Apache Tajoå‘布äº?/span>0.11.3版ã€?/span>Tajoæ˜?/span>Hadoop的数æ®ä»“库。本‹Æ¡å‘布修æ£äº†5ä¸?/span>bugã€?/span>
http://tajo.apache.org/releases/0.11.3/announcement.html
MongoDBä¸?/span>Apache Sparkå‘布了新çš?/span>MongoDB Connector。除了对åº?/span>Sparkçš?/span>Hadoop InputFormat shim外,è¯?/span>Connector˜q˜æœ‰å…¶ä»–ç‰ÒŽ€§ã€‚最åŽï¼Œ˜q˜è§£é‡Šäº†MongoDB一些关键特性ã€?/span>
http://rosslawley.co.uk/introducing-a-new=mongodb-spark-connector/
SyncSortå‘布äº?/span>DMX-h v9åQŒæ”¯æŒ?/span>Kafkaä»¥åŠæ–°çš„æ™ø™ƒ½æ‰§è¡Œæ¡†æž¶ã€?/span>
http://insidebigdata.com/2016/05/20/syncsorts-latest-innovations-simplify-integration-of-streaming-data-in-spark-kafka-and-hadoop-for-real-time-analytics/
‹zÕdЍ
ä¸å›½
æ—?/span>
Hadoop周刊 ½W?/span> 169 æœ?/span>
坿˜Žæ˜Ÿè¾°òq›_°å’Œå¤§æ•°æ®æ•´ä½“¾l„ç¼–è¯?/span>
2016òq?/span>5æœ?/span>8æ—?/span>
本周内容çŸå°¾_„¡»ƒã€‚主题覆ç›?/span>Apache Beamã€?/span>MapRå£åº¦ä¸šç‡Wã€æœ€˜q‘çš„Kafkaå³îC¼šåQŒä»¥åŠæ¥è‡?/span>Clouderaæ–°å¼€æºçš„分布å¼å•元测试框架ã€?/span>
技术新�/span>
Elastic分æžäº†å®•æœÞZº‹ä»¶çš„æ ÒŽºã€‚错误酾|?/span>ZooKeeper内å˜è®„¡½®ä¼šå¼•赯‚¿‡åº¦çš„GCåQŒè¿™ž®†ä»Žæ ÒŽœ¬ä¸Šå¯¼è‡?/span>ZooKeeperé›†ç¾¤ä¸¢å¤±ã€‚æ–‡ç« ä»‹¾l了一些缓解ç–略,用æ¥é˜²æ¢æœªæ¥¾cÖM¼¼é—®é¢˜çš„å‘生ã€?/span>
https://www.elastic.co/blog/elastic-cloud-outage-april-2016
Caskåšå®¢½Ž€æ˜Žæ‰¼è¦çš„归纳了最˜q?/span>Big Data Applications Meetup的花¾i®ã€‚首先出场的æ˜?/span>PachydermåQŒå®ƒåŸÞZºŽDocker容器æä¾›“æ•°æ®Git”è¯ä¹‰ã€‚第二个出场的是TubeMogul大数æ®åã^åŽÍ¼ŒTubeMogulæž„å¾äº?/span>Hadoopã€?/span>Hiveã€?/span>Sparkã€?/span>Presto之上ã€?/span>
http://blog.cask.co/2016/05/pachyderm-and-tubemogul-share-their-big-data-application-platforms-and-experience/
Googleã€?/span>dataArtisansåŒæ—¶æ’°æ–‡ä»‹ç»äº?/span>Apache BeamåQˆå‰ç”Ÿæ˜¯Google Dataflow SDKåQ‰ã€?/span>Googleçš„æ–‡ç« è§£é‡Šäº†ä¸ÞZ½•å¼€æºå’Œå¼€å?/span>Beam的动机,dataArtisansçš„æ–‡ç« ä»‹¾l他们对Beam模型的支æŒä»¥åŠæ€Žæ ·è€ƒè™‘Flinkå’?/span>Beam API之间的关¾p…R€?/span>
https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective
http://data-artisans.com/why-apache-beam/
IBM Hadoop devåšå®¢æœ‰ä¸ªå…³äºŽå®‰è£…Pythonã€?/span>Scalaå’ŒäØ“Jupyter notebook嵌入Rå†…æ ¸çš„æ“ä½œè¯´æ˜Žã€‚åŒæ—Óž¼Œä¹Ÿè¯´æ˜Žäº†æ€Žæ ·˜qžæŽ¥Spark和通过SSL暴露notebookã€?/span>
https://developer.ibm.com/hadoop/blog/2016/05/04/install-jupyter-notebook-spark/
本文介ç»äº?/span>Mongo Hadoop的连接函数是如何½Hœè“vSparkå’?/span>MongoDBçš„ã€?/span>
https://x.ai/using-the-mongo-hadoop-connector-as-a-translation-layer-to-spark/
Quboleåšå®¢æ’°æ–‡æ¯”较了用于大数æ®åˆ†æžçš„æµè¡Œç¼–½E‹è¯a€—Pythonã€?/span>Rå’?/span>Scalaã€?/span>
http://www.qubole.com/blog/big-data/programming-language/
å…¶ä»–æ–°é—»
MapR宣布本å£åº¦ä»–们授æƒä¸‹å•创¾Uªå½•的增长了99%åQŒä»¥å?/span>146%的美元净增长率ã€?/span>
https://www.mapr.com/company/press-releases/mapr-achieves-another-record-quarter-99-software-subscription-license-growth
本文æè¿°äº†æœ€˜q?/span>Google Cloud Dataflowå’?/span>Apache Sparkåœ?/span>Google Compute Engine上的基准‹¹‹è¯•表现ã€?/span>Dataflow胜过Spark2åQ?/span>5.7å€ï¼ˆä¸€ç›´ä»¥æ¥ï¼Œæœ€å¥½æ˜¯åœ¨è‡ªå·Þqš„çŽ¯å¢ƒä¸‹è¯„ä¼°å·¥ä½œè´Ÿè½½ï¼Œè€Œä¸æ˜¯ä¸€å‘³çš„ä¿¡ä“Q基准‹¹‹è¯•åQ‰ã€‚本文还解释了一¿U?/span>“å†ähˆ˜”åQŒé€šè¿‡å®ƒä‹Éæ¯ä¸ªä½¿ç”¨å¤§æ•°æ®å·¥å…ïLš„äºø™Ž·ç›Šã€?/span>
http://www.datanami.com/2016/05/02/dataflow-tops-spark-benchmark-test/
Confluentåšå®¢å›žé¡¾äº†æœ€˜q‘å¬å¼€çš?/span>Kafkaå³îC¼šåQŒåŒ…括编½E‹æŒ‘战预选赛åQŒä¸»é¢˜æ¼”è®ÔŒ¼Œåˆ†ç»„会议½{‰ç‰ã€?/span>
http://www.confluent.io/blog/log-compaction-kafka-summit-edition-may-2016
¼›å¸ƒæ–¯ä»‹¾l了¾ŸŽå›½˜q通在˜q‡åŽ»5òq´é—´é‡‡ç”¨å¤§æ•°æ®æŠ€æœ¯çš„历程。本文ä¸åQŒç¾Žå›½è¿é€šåˆ†äº«äº†ä¸€äº›æŠ€å·§å’Œå¦åˆ°çš„ç»éªŒæ•™è®ï¼Œä¾‹å¦‚采用新技术的困难åQˆå¾—到组¾l‡é«˜å±‚çš„è®¤åŒæ˜¯å¤šä¹ˆçš„é‡è¦åQ‰ï¼Œä»¥åŠé›‡ä„¦å’Œç•™ä½å·¥½E‹å¸ˆçš„æŒ‘战牽{‰ã€?/span>
http://www.forbes.com/sites/ciocentral/2016/04/27/inside-american-express-big-data-journey/
产å“å‘布
Caskå‘布äº?/span>Cask Data Application Platform (CDAP)3.4版本ã€?/span>æ–°ç‰ˆæœ¬å¢žåŠ äº†Cask TrackeråQŒæ–°çš„æ•°æ®é›†æˆ?/span>/审计/æœçƒ¦¾pÈ»ŸåQŒå‡¾U§äº†Cask Hydratorçš?/span>UIåQŒå¢žå¼ÞZº†å¯?/span>Spark的支æŒç‰½{‰ã€?/span>
http://blog.cask.co/2016/05/announcing-cdap-release-3-4-introducing-tracker-next-gen-hydrator-enhanced-spark-support-and-much-more/
Clouderaå¼€æºäº†“dist_tes”åQŒåƈ行执行å•元测试的新工兗÷€‚é€šè¿‡è¯¥å·¥å…øP¼Œå¯?/span>Hadoopå’?/span>Kudu™å¹ç›®˜q›è¡Œå•å…ƒ‹¹‹è¯•åQŒå¯ä»¥åœ¨æ•°åˆ†é’Ÿè€Œä¸æ˜¯æ•°ž®æ—¶å†…完æˆã€‚该工具¾l‘定äº?/span>C++å’?/span>JavaåQŒåƈ在网站上演示了这些特性ã€?/span>
http://blog.cloudera.com/blog/2016/05/quality-assurance-at-cloudera-distributed-unit-testing/
Google宣布Google BigQueryå’?/span>Driveå¯é›†æˆåœ¨ä¸€èµøP¼ŒæŠŠè¾“å‡ÞZ¿å˜åˆ°Google sheetsã€?/span>
http://techcrunch.com/2016/05/06/google-connects-bigquery-to-google-drive-and-sheets/
‹zÕdЍ
ä¸å›½
æ—?/span>
Hadoop周刊 ½W?/span> 168 æœ?/span>
坿˜Žæ˜Ÿè¾°òq›_°å’Œå¤§æ•°æ®æ•´ä½“¾l„ç¼–è¯?/span>
2016òq?/span>5æœ?/span>1æ—?/span>
Kafkaå³îC¼šæœ¬å‘¨åœ¨æ—§é‡‘å±±å¬å¼€åQŒä¸å®¹ç½®ç–‘本周期刊将有大é‡çš„Kafka内容。除æ¤ä»¥å¤–,˜q˜æœ‰å¤§é‡å…³äºŽImpala性能ã€?/span>Kuduã€?/span>Druidæ–šw¢çš„æ–‡ç« 。在其他新闻部分åQ?/span>Apache ApexæˆäØ“äº?/span>Apache的顶¾U§é¡¹ç›®ï¼ŒQuboleå¼€æºäº†å…?/span>StreamX™å¹ç›®ã€?/span>
技术新�/span>
本文快速æµè§ˆäº†å¦‚何在å¯èƒ½æˆ–ä¸å¯èƒ½åˆ›å»ºæ–°æ•°æ®åˆ†åŒºçš„æƒ…况下æ“作Spark RDD。尤å…?/span>`mapValues`å’?/span>`filter`会ä¿å˜åˆ†åŒø™€?/span>`map`å´ä¸ä¼šã€?/span>
https://medium.com/@corentinanjuna/apache-spark-rdd-partitioning-preservation-2187a93bc33e
本文介ç»äº†å¦‚何ä‹Éç”?/span>Condaæž„å¾ç‹¬ç«‹çš?/span>Python环境åQˆä¾‹å¦?/span>pandasæ’äšgåQ‰ï¼Œä»¥ä¾¿åšäØ“Spark job的一部分装è²åˆ°é›†¾Ÿ¤èŠ‚ç‚V€‚绘q‡è¿™æ ïLš„处ç†åQŒå°±èƒ½åœ¨æ²¡æœ‰python原生包被安装在主æ“作¾pÈ»Ÿä¸Šçš„æƒ…况下è¿è¡?/span>PySpark job。这¿Uæ–¹æ¡ˆåŒæ ·é€‚用äº?/span>SparkRã€?/span>
http://quasiben.github.io/blog/2016/4/15/conda-spark/
Datadogåšå®¢æœ‰ä¸‰½‹‡ç›‘æŽ?/span>Kafkaçš„ç³»åˆ—æ–‡ç« ã€‚ç¬¬ä¸€½‹‡è¯¦¾l†æ¦‚括了brokerã€?/span>producerã€?/span>consumersã€?/span>ZooKeeperçš„å…³é”®åº¦é‡æŒ‡æ ‡ã€‚第二篇介ç»äº†æ€Žæ ·åœ?/span>JConsole和其他工具上通过JMXæŸ¥çœ‹æŒ‡æ ‡åQŒç¬¬ä¸‰ç¯‡ä»‹ç»äº?/span>Datadogé›†æˆæ–šw¢çš„知识ã€?/span>
https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/
Salesforce撰文介ç»äº?/span>Kafka在他们组¾l‡å†…çš„æˆé•¿å²ã€‚最åˆï¼Œä»–们借助Kafka驱动了æ“ä½œæŒ‡æ ‡åˆ†æžåŠŸèƒ½ï¼Œæ¸æ¸åœ°æˆä¸ÞZ¸€ä¸ªé©±åŠ¨ä¼—å¤šç³»¾lŸçš„大åã^å°ã€?/span>Salesforce˜q用Kafka在多个数æ®ä¸å¿ƒè¿è¡Œï¼Œòq¶ä‹Éç”?/span>MirrorMaker在集¾Ÿ¤é—´å¤åˆ¶å’Œèšåˆæ•°æ®ã€?/span>
https://medium.com/salesforce-engineering/expanding-visibility-with-apache-kafka-e305b12c4aba#.5k7j921o3
Metamarketsåšå®¢æœ‰ä¸€½‹‡å…³äºŽä¼˜åŒ–大规模分布å¼ç³»¾lŸçš„æœ‰è¶£åšæ–‡ã€?/span>DruidåQŒä»–ä»¬çš„åˆ†å¸ƒå¼æ•°æ®ä»“库,最˜q‘å¢žåŠ äº†ä¸€¿U?/span>"先进先出"的查询模å¼ï¼Œòq¶åœ¨é‡åž‹è´Ÿè²å¤§é›†¾Ÿ¤é—´˜q›è¡Œäº†æµ‹è¯•ã€‚æ ¹æ®ä»–们的å‡è®¾åQŒæŽ¨‹¹‹ä“Q何å¯èƒ½å‘生和攉™›†åˆ°æœ‰‘£çš„çš„æŒ‡æ ‡ã€?/span>
https://metamarkets.com/2016/impact-on-query-speed-from-forced-processing-ordering-in-druid/
Google Cloud Big Dataåšå®¢æ’°æ–‡ä»‹ç»äº?/span>BigQuery的内部å˜å‚¨æ ¼å¼ï¼Œå®¹å™¨åQŒä»¥åŠå…¶å®ƒä‹Éå¾—å˜å‚¨æ•°æ®æ›´æœ‰æ•ˆçŽ‡çš„ä¼˜åŒ–æŽªæ–½ã€?/span>
https://cloud.google.com/blog/big-data/2016/04/inside-capacitor-bigquerys-next-generation-columnar-storage-format
Apache KuduåQˆåµåŒ–ä¸åQ‰åšå®¢æ¦‚˜qîCº†æœ€˜q‘ä‹Éç”?/span>YCSB工具对系¾lŸæ€§èƒ½åˆ†æžå’Œè°ƒä¼˜çš„¾l“æžœã€?/span>
http://getkudu.io/2016/04/26/ycsb.html
Impala 2.5æ— è®ºæ˜?/span>TPC基准‹¹‹è¯•˜q˜æ˜¯å…¶å®ƒæ–šw¢å‡æœ‰æ˜¾è‘—的性能æå‡ã€‚æå‡é¡¹åŒ…括˜q行时过滤器åQ?/span>LLVM代ç 生æˆå™¨å¯¹`SORT`å’?/span>`DECIMAL`的支æŒï¼Œæ›´å¿«çš?/span>metadata-only查询åQŒç‰½{‰ã€?/span>
http://blog.cloudera.com/blog/2016/04/apache-impala-incubating-in-cdh-5-7-4x-faster-for-bi-workloads-on-apache-hadoop/
本文介ç»äº†ï¼Œä¸ºæ”¯æŒé«˜å¯ç”¨æ€§ï¼Œå¦‚何å¯?/span>Hive Metastoreé…ç½®MariaDBçš„ã€?/span>
https://developer.ibm.com/hadoop/blog/2016/04/26/bigsql-ha-configure-ha-hive-metastore-db-using-mariadb10-1/
Altiscaleåšå®¢æ’°æ–‡ä»‹ç»äº†å¯»æ‰?/span>NodeGroup相关bug的过½E‹ï¼ˆè·Ÿè¿›ä¸‰æœˆçš„æ–‡ç« ï¼‰ã€‚å¦‚æžœä½ å› æ²¡æ‰‘Öˆ°HadoopåQˆæˆ–其他分布å¼ç³»¾lŸï¼‰çš?/span>bugæ ¹ç»“è€Œæ°”é¦ï¼Œä¸è¦åÒŽ°”ã€‚æœ¬æ–‡å‘Šè¯‰ä½ ˜q™çš„¼‹®å›°éš¾ï¼Œç”šè‡³éœ€è¦ç¨‹åºå‘˜åœ¨é”€å”?/span>HadoopæœåŠ¡çš„ä¼ä¸šå¹²‹zÀL‰èƒ½æžå®šã€?/span>
Netflix现在˜q行了超˜q?/span>4000ä¸?/span>Kafka brokeråQŒæ¨ªè·?/span>36个集¾Ÿ¤ã€‚在云丘q行Kafka需è¦ä¸€äº›æƒè¡¡ï¼Œå›¢é˜ŸòqŒ™¡¡äº†å¼€é”€å’Œæ•°æ®ä¸¢å¤±ï¼ˆæ—¥æ•°æ®ä¸¢å¤±å°äº?/span>0.01%åQ‰ã€‚本文分享了团队åœ?/span>AWSä¸è¿è¡?/span>Kafkaçš„ç»éªŒï¼Œä¸»è¦æ˜¯ä¸€äº›å…¸åž‹é—®é¢˜ï¼Œéƒ¨çÖv½{–ç•¥åQˆå°é›†ç¾¤ã€éš”¼›Èš„zookeeper集群åQ‰ï¼Œé›†ç¾¤¾U§å®¹é”™ï¼Œæ”¯æŒAWS availability zonesåQ?/span>Kafka UIå¯è§†åŒ–牽{‰ã€?/span>
http://techblog.netflix.com/2016/04/kafka-inside-keystone-pipeline.html
Amazon大数æ®åšå®¢æ’°æ–‡ä»‹¾l了如何ä»?/span>Amazon EMRåŠ å¯†æ•°æ®å˜æ”¾åœ?/span>S3ä¸ã€‚è¿™¿Ué›†æˆæ–¹å¼åŒæ—¶æ”¯æŒå®¢æˆïL«¯å’ŒæœåŠ¡å™¨ç«¯åŠ å¯†ï¼ˆå€ŸåŠ©äº?/span>Amazon KMSåQ‰ã€?/span>
http://blogs.aws.amazon.com/bigdata/post/TxBQTAF 3X7VLEP/Process-Encrypted-Data-in-Amazon-EMR-with-Amazon-S3-and-AWS-KMS
TubeMogul介ç»äº†ä»–们大数æ®òq›_°çš„历åÔŒ¼Œè¯¥åã^å°æ¯æœˆæ”¯æ’‘万亿次数æ®åˆ†æžè¯äh±‚。该团队很早ž®Þp¿ç”?/span>Amazon EMRåQŒå¯¼å…¥äº†Stormå®žæ—¶å¤„ç†æŠ€æœ¯ï¼Œæœ€¾lˆæŠŠå¤§æ•°æ®æœåŠ¡è½åœ¨äº†Qubole上ã€?/span>
https://www.tubemogul.com/engineering/the-big-data-lifecycle-at-tubemogul/
CaffeåQŒæ·±åº¦å¦ä¹ 框æžÓž¼Œä¸?/span>Spark˜q›è¡Œäº†é›†æˆ?/span>—CaffeOnSparkã€?/span>MapR公叿’°æ–‡ä»‹ç»äº†å¦‚何在MapR YARN上è¿è¡Œï¼Œæ–‡ç« ˜q˜åŒ…括了采用的性能优化手段ã€?/span>
https://www.mapr.com/blog/distributed-deep-learning-caffe-using-mapr-cluster
å…¶ä»–æ–°é—»
Apache ApexåQŒå¤§æ•°æ®‹¹å¼å¤„ç†å’Œæ‰¹å¤„熾pÈ»ŸåQŒçŽ°åœ¨æˆä¸ÞZº†Apache软äšg基金会的™å¶çñ”™å¹ç›®ã€?/span>ApexåŽÕd¹´8月进入åµåŒ–器ã€?/span>
https://blogs.apache.org/foundation/entry/the_apache_ software_foundation_announces90
Heroku KafkaåQŒæ˜¯ä¸€ä¸ªåˆ†æ”¯äºŽHerokuçš?/span>Kafka½Ž¡ç†æœåŠ¡ã€‚æœ€˜q‘接˜q‘å‘å¸?/span>beta版ã€?/span>
https://blog.heroku.com/archives/2016/4/26/announcing-heroku-kafka-early-access
MapRåšå®¢ä¸Šçš„一½‹‡æ–‡ç« å¼ºè°ƒäØ“ä»€ä¹ˆæ€§åˆ«å¤šæ ·æ€§æ˜¯é‡è¦çš„,˜q˜æåˆîCº†å¤§æ•°æ®è®ºå›ä¸çš„女性,本文旨在鼓励å¥Ïx€§æŠ•íw«äºŽ˜q™ä¸€é¢†åŸŸã€?/span>“大数æ®è®ºå›ä¸çš„女æ€?/span>”ç ”è®¨ä¼šæœ¬å‘¨ç”±MapR¾l„织在圣何塞å¬å¼€ã€?/span>
https://www.mapr.com/blog/case-women-big-data
产å“å‘布
StreamX是一个æ¥è‡?/span>Qubole的开æºé¡¹ç›®ï¼Œå®ƒèƒ½ä»?/span>Kafka拯‚´æ•°æ®åˆ?/span>Amazon S3˜q™æ ·çš„ç›®æ ‡å˜å‚¨ä¸ã€?/span>QuboleæŠ?/span>StreamXä½œäØ“ä¸€¿Uç®¡ç†æœåŠ¡æä¾›ã€?/span>
http://www.qubole.com/blog/big-data/streamx/
SnappyDataæ˜¯ä¸€ä¸ªäØ“OLAPå’?/span>OLTP查询‹¹å¼æ•°æ®çš„æ–°òq›_°åQˆå’Œå…¬å¸åQ‰ã€?/span>SnappyDataç”?/span>Apache Sparkå’?/span>GemFire的内å˜å˜å‚¨æŠ€æœ¯é©±åЍã€?/span>
Apache GeodeåQˆåµåŒ–ä¸åQ‰å‘布了1.0.0-incubating.M2版本åQŒå®ƒæ˜¯ä¸€ä¸ªåˆ†å¸ƒå¼æ•°æ®òq›_°åQŒçž„准高性能和低延迟。新版本æä¾›äº†å¹¿åŸŸç½‘ä¸‹çš„ç‚¹å¯¹ç‚¹è¿žæŽ¥ç‰æ–°ç‰¹æ€§ã€?/span>
http://mail-archives.apache.org/mod_mbox/incubator-geode-dev/201604.mbox/%3CCAFh%2B7k2eiK2TMGK sLqrY9CZDjxjYwiuTQ4QGUVC2s3geyJYwnA% 40mail.gmail.com%3E
Apache Knoxå‘布äº?/span>0.9.0版,它是Hadoopçš?/span>REST API¾|‘关。新版本ä¸?/span>Rangerå’?/span>Ambariæä¾›äº?/span>UIç•Œé¢æ”¯æŒåQŒä»¥åŠä¸€äº›å…¶å®ƒçš„æå‡å’?/span>bugä¿®å¤ã€?/span>
http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCACRbFyjRF7zShb-NQ29d3FJ0hKZ57ts0Qfo31ffuNODpskwqPQ @mail.gmail.com%3E
‹zÕdЍ
ä¸å›½
æ—?/span>
‹Æ¢è¿Žæ¥åˆ°Hadoopå‘¨åˆŠå‘¨ä¸€ç‰¹åˆ«ç‰ˆã€‚æœ¬å‘¨æœ‰å¤§é‡æ¥è‡ªSparkã€?/span>Kafkaã€?/span>Beamã€?/span>Kudu的技术新闅R€‚å¦‚æžœä½ æ£åœ¨å¯ÀL‰¾ä¸€äº›æ›´å‰æ²¿çš„æŠ€æœ¯ï¼ŒApache MetronåQˆåµåŒ–ä¸åQ‰å‘布了它们½W¬ä¸€ä¸ªç‰ˆæœ¬ã€?/span>MetronåQŒæ˜¯ä¸€ä¸ªæž„建在Hadoop上æ£åœ¨ä¸æ–å‘展的通用安全¾pÈ»Ÿã€?/span>
技术新�/span>
本文介ç»äº†å¦‚何在AWS上构建æµå¼å¤„ç†ç³»¾lŸã€‚包括了诸如Amazon Kinesis ã€?/span>AWS Lambdaã€?/span>Kineses S3 connector之类½Ž€å•çš„æé…æ–ÒŽ¡ˆåQŒä¹Ÿä»‹ç»äº?/span>AWS实现实时分æžåœºæ™¯˜q™æ ·ç›¸å¯¹å¤æ‚点的æ–ÒŽ¡ˆã€?/span>
本文介ç»äº†æ€Žæ ·ä½¿ç”¨Spark Testing Baseã€?/span>Spark Testing Base是一个用Scala¾~–写åQŒé€šè¿‡Java调用çš?/span>Spark‹¹‹è¯•æ¡†æž¶ã€‚æœ¬æ–‡çš„æ ·ä¾‹ä»£ç 展示了如何隔¼›ÀLµ‹è¯•é€»è¾‘é‡æž„Spark代ç åQŒåŒæ—¶è¿˜é€šè¿‡Java处ç†äº†ä¸€äº›è‡ƒè‚¿çš„Scala APIã€?/span>
http://www.jesse-anderson.com/2016/04/unit-testing-spark-with-java/
Altiscaleåšå®¢æ¦‚述了在Spark环境下,构å¾thinå’?/span>uber jar包的优劣。示范了åœ?/span>Mavenå’?/span>SBT分别构å¾ä¸¤ç§åŒ…的情况ã€?/span>
https://www.altiscale.com/blog/spark-on-hadoop-thin-jars/
LinkedIn介ç»äº†ä»–们的Kafka生æ€ç³»¾lŸï¼Œç”Ÿæ€ç³»¾lŸåŒ…å«ä¸€ä¸ªç‰¹ŒDŠçš„Kafka produceråQŒä¸€ä¸ªäØ“é?/span>Java客户端æä¾›çš„REST APIåQŒä¸€ä¸?/span>avroæ¨¡å¼æ³¨å†Œè¡¨ï¼Œä»¥åŠGobblinåQˆè£…载数æ®åˆ°Hadoopçš„å·¥å…øP¼‰½{‰ç‰ã€?/span>
https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin
è¯?/span>Spark Streaming教程介ç»äº†æ€Žæ ·é€šè¿‡twitter4j API拉推文,åŸÞZºŽæ ‡ç¾˜q‡æ×oåQŒå¯¹æŽ¨æ–‡˜q›è¡Œæƒ…感分æžã€?/span>
https://www.mapr.com/blog/spark-streaming-and-twitter-sentiment-analysis
Apache KuduåQˆåµåŒ–ä¸åQ‰æ˜¯Apache ImpalaåQˆåµåŒ–ä¸åQ‰çš„¾l佳伴äÇGåQŒå› 为它能高效地解决òq¿æ³›çš„分æžå’Œæœ‰é’ˆå¯ÒŽ€§çš„æŸ¥è¯¢ã€‚本文æ˜qîCº†ä¸¤è€…集æˆçš„æŠ€æœ¯ç»†èŠ‚ï¼Œä¾‹å¦‚Kudu的设计如何ä¿è¯é«˜æ•ˆåœ°æŸ¥è¯¢èƒ½åŠ›åQŒå¦‚何通过Impalaå’?/span>Kuduæ‰§è¡Œå†™ï¼æ›´æ–°åQåˆ é™¤æ“作牽{‰ã€?/span>
http://blog.cloudera.com/blog/2016/04/how-to-use-impala-and-kudu-together-for-analytic-workloads/
MapR撰文介ç»äº†ä‹Éç”?/span>spark-sklearn扩展一个已å˜åœ¨çš?/span>scikit-learnæ¨¡åž‹ã€‚æ–‡ç« ä»‹¾l了如何é€è¿‡Airbnbæ•°æ®é›†å†…éƒ¨å¾æ¨¡ï¼Œ˜q˜ä»‹¾l了如何å‚ç€spark-sklearn˜q›è¡Œäº¤å‰éªŒè¯ã€?/span>
https://www.mapr.com/blog/predicting-airbnb-listing-prices-scikit-learn-and-apache-spark
AWS大数æ®åšå®¢å†™äº†ä¸ªå¦‚何åœ?/span>Amazon EMRä¸ä‹Éç”?/span>HBaseå’?/span>Hive的教½E‹ã€‚本教程介ç»äº?/span>HBaseåQŒæ˜qîCº†å¦‚何åœ?/span>S3䏿¢å¤?/span>HBase表,½Cø™Œƒäº?/span>Hiveå’?/span>HBase如何集戽{‰ç‰ã€?/span>
本文æè¿°äº†äØ“å¦ç”Ÿåœ¨å¤§æ•°æ®è¯„¡¨‹ä¸Šæä¾›å®žæˆ˜ç»éªŒçš„æŒ‘战。作者ç»åŽ†è‹¥òq²æ¬¡çš„è„P代和选择ä¼ég¹Žæœ‰äº†ä¸€ä¸ªå¥½æ–ÒŽ¡ˆ— Altiscaleçš?/span>Hadoop-as-a-Serviceã€?/span>
https://www.altiscale.com/blog/hadoop-as-a-service-in-the-classroom/
Clouderaåšå®¢çš„一½‹‡å®¢åšæ–‡ç« ï¼Œä½œè€…æ¯”è¾ƒäº†Parquetå’?/span>Avro在跨两个数æ®é›†çš„ä¸åŒå¤„ç†æ–¹å¼åQˆä¸€ä¸ªæ•°æ®é›†½H?/span>(3åˆ?/span>)ã€ä¸€ä¸ªæ•°æ®é›†å®?/span>(103åˆ?/span>)åQ‰ã€‚在ç”?/span>Sparkå’?/span>Spark SQL‹¹‹è¯•查询åQæ“作åŽåQŒä½œè€…å‘çŽ?/span>Parquetå’?/span>Avro在查询åºåˆ—åŒ–æ•°æ®æ–šw¢æœ‰æ—¶è¡¨çŽ°å¾ˆç±»ä¼û|¼Œž®½ç®¡åœ¨å¤§å¤šæ•°æƒ…况下查è¯?/span>Parquetæ•°æ®çš„æ—¶å€™æ›´å¿«ç‚¹åQˆåºåˆ—åŒ–æ•°æ®æ›´å°åQ‰ã€?/span>
http://blog.cloudera.com/blog/2016/04/benchmarking-apache-parquet-the-allstate-experience/
本文介ç»äº†å¦‚何在CDH˜q™æ ·çš„分布å¼çŽ¯å¢ƒä¸ä‹Éç”?/span>SparkRåQŒå°½½Ž?/span>SparkR官方˜q˜æ²¡æœ‰æ”¯æŒè¿™¿Uæ–¹å¼ã€‚借助YARNåœ?/span>worker本地安装Rè¯è¨€åŒ…,job½EåŠ æ”šw€ 就能执行了ã€?/span>
http://www.nodalpoint.com/sparkr-in-cloudera-hadoop/
å¾ˆå¤šå¼€æºæ¡†æž‰™ƒ½èƒ½æ‰§è¡?/span>MapReduce以åŠå€ŸåŠ©æ›´é«˜¾U§çš„¾~–程模型完戾cÖM¼¼çš„工作。纵观过去,它们ä¾èµ–独立˜q行的框æžÓž¼ˆä¾‹å¦‚MapReduce, StormåQ‰ï¼Œä½†æ˜¯æœ€˜q‘çš„æŸäº›å˜åŒ–使得˜q™ä¸€åˆ‡å……æ»¡äº†å˜æ•°ã€?/span>Apache BeamåQˆåµåŒ–ä¸åQ‰æ›´˜q›ä¸€æ¥åœ°è·¨è¶Šäº†æ‰¹å¤„ç†ã€æµå¼å¤„ç†ä¸¤¿U执行模å¼ï¼Œå†…ç½®æ›´åŠ å¤æ‚的计½Ž—模型ã€?/span>
http://www.datanami.com/2016/04/22/apache-beam-emerges-ambitious-goal-unify-big-data-development/
Apacheåšå®¢å‘布äº?/span>HBaseåœ?/span>HDDã€?/span>SSD以åŠRAMDISK上的写入性能‹¹‹è¯•比对çš?/span>7½‹‡ç³»åˆ—æ–‡ç« ã€‚é€šè¿‡˜q™ä¸€åˆ†æžåQŒä½œè€…å‘çŽ°åÆˆæè®®åœ?/span>HBaseå’?/span>HDFS上实çŽîC¸€äº›æœªè¦†ç›–的功能ã€?/span>
https://blogs.apache.org/hbase/entry/hdfs_hsm_and_hbase_part
å…¶ä»–æ–°é—»
Tom WhiteåQ?/span>“Hadoopæƒå¨æŒ‡å—”的作者撰文介¾l他是如何æ¥å…?/span>Apache HadoopŒD¿å ‚的。他的早期èµA献是¾l•ç€Hadoopä¸?/span>Amazon Web Services集æˆå±•å¼€åQŒè€Œä»ŠAWSå·²æˆä¸?/span>Hadoop™å¹ç›®æˆåŠŸçš„é‡è¦éƒ¨åˆ†ã€?/span>
http://vision.cloudera.com/how-i-got-into-hadoop/
FluoåQŒäØ“Apache Accumulo准备的分布å¼å¤„ç†å¼•擎åQŒå‘ApacheåµåŒ–器æäº¤äº†åµåŒ–甌™¯·ã€?/span>
https://wiki.apache.org/incubator/FluoProposal
Apache Phoenix宣布ž®†åœ¨HBaseConåŽä‹D行会议,Apache Phoenix是一ä¸?/span>SQL-on-HBase¾pÈ»Ÿã€‚è¯¥ä¼šè®®åªæœ‰åŠå¤©åQŒä¸»é¢˜æ˜¯ä»‹ç»Phoenix内部情况和用例ã€?/span>
http://hortonworks.com/blog/announcing-first-annual-phoenixcon-apache-phoenix-user-conference/
产å“å‘布
Apache MetronåQŒæž„å»ÞZºŽHadoop上的安全框架åQŒå‘布了0.1版ã€?/span>Hortonworks支撑其作为技术预览版åQŒåƈ撰写本文介ç»äº†å¦‚何上手,如何贡献åQŒå¦‚何ä‹Éç”?/span>Metron UI½{‰ç‰ã€?/span>
http://hortonworks.com/blog/apache-metron-tech-preview-1-come-get/
http://hortonworks.com/blog/apache-metron-use-case-finding-needle-haystack/
Apache NiFi本周å‘布äº?/span>0.6.1版。这是修å¤äº†10多个bugåŽçš„ä¿®å¤ç‰ˆã€?/span>
Apache Flink本周å‘布äº?/span>1.0.2版。本‹Æ¡å‘布包括了bugä¿®å¤åQ?/span>RocksDB环境下的性能æå‡ä»¥åŠä¸€äº›æ–‡æ¡£æ–¹é¢çš„˜q›æ¥ã€?/span>
http://flink.apache.org/news/2016/04/22/release-1.0.2.html
Amazonå‘布了新ç‰?/span>Amazon EMRåQŒå¼€å§‹æ”¯æŒ?/span>HBase 1.2ã€?/span>
https://aws.amazon.com/blogs/aws/amazon-emr-update-apache-hbase-1-2-is-now-available/
‹zÕdЍ
ä¸å›½
æ—?/span>
2016òq?/span>4æœ?/span>17æ—?/span>
坿˜Žæ˜Ÿè¾°——òq›_°å’Œå¤§æ•°æ®æ•´ä½“¾l„ç¼–è¯?nbsp;
Hortonworks在本å‘?/span>Hadoop‹Æ§æ´²å³îC¼šä¸Šæœ‰è‹¥å¹²çˆ†æ–™åQŒè¯½I¿äº†æœ¬æœŸæ•´ä¸ªå†…容。伴éšç€éª„äh的新ç‰ÒŽ€§ï¼ŒApache Stormå‘布äº?/span>1.0.0版。在技术新é—ÀL–¹é¢ï¼Œæœ‰ä¸ž®‘基äº?/span>Kafkaæž„å¾å¤§è§„模æœåŠ¡å’Œåˆ†å¸ƒå¼ç³»¾lŸæµ‹è¯•çš„æ–‡ç« ã€‚å¦‚æžœä½ é”™è¿‡äº?/span>Hadoopå³îC¼šåQŒé‚£ä¹ˆä¸ç”¨æ‹…å¿ƒï¼Œæ¼”è®²è§†é¢‘å·²ç»æ”‘Öˆ°äº†ç½‘上ã€?/span>
技术新�/span>
Smyte撰文介ç»äº†ä»–ä»¬åŸºäºŽäº‹ä»¶æ•°æ®æµå®žæ—¶‹‚€‹¹‹åžƒåœùN‚®ä»¶å’Œè¯ˆéª—ä¿¡æ¯çš„基¼‹€è®¾æ–½ã€‚最åˆçš„事äšg处熾pÈ»Ÿæž„å¾åœ?/span>Kafkaã€?/span>Redisã€?/span>Secor以åŠS3上,ä¸ÞZº†æ»¡èƒöè§„æ¨¡ä¸æ–æ‰©å¼ å’Œå»‰ä»ïLš„è¦æ±‚åQŒä»–们把¾pÈ»Ÿ˜q移到基于ç£ç›˜çš„æ–ÒŽ¡ˆä¸Šï¼Œä½¿ç”¨Redisåè®®ä¸?/span>RocksDB交互åQŒä‹Éç”?/span>Kafka˜q›è¡Œå¤åˆ¶ã€?/span>
https://medium.com/the-smyte-blog/counting-with-domain-specific-databases-73c660472da
本文æŠ?/span>rsyslogã€?/span>Kafkaã€?/span>AWS ä¸?/span>ELKæ ˆï¼ˆElasticSearchã€?/span>Logstashã€?/span>KibanaåQ‰ç»“åˆï¼Œå¤„ç†è¯¸å¦‚å压ã€è§„模以åŠç»´æŠ¤æ–¹é¢çš„问题。本文覆盖了rsyslog集æˆKafka以åŠschemaæ–šw¢çš„æŠ€å·§ï¼Œä¹Ÿä»‹¾l了如何˜q行Kafkaã€?/span>Zookeeper以åŠAWSä¸å¤§è§„模自动分组ã€?/span>
https://www.bashton.com/blog/2016/elk-on-ark/
Hortonworks撰文介ç»äº?/span>Apache Atlas以åŠApache Rangež®†è¦å¼•入的数æ®ç®¡ç†ç‰¹æ€§ã€‚这些特性是åQšåˆ†¾c»è®¿é—®æŽ§åˆ¶ã€æ•°æ®æœ‰æ•ˆæœŸ½{–ç•¥ã€ä½¾|®ç‰¹æ€§ç–ç•¥ã€ç¦æ¢æ•°æ®é›†¾l„åˆã€è·¨¾l„äšgå®¶æ—åQˆä¾‹å¦‚从Kafkaåˆ?/span>Stormå†åˆ°Hive的数æ®è·Ÿítªï¼‰ã€?/span>
http://hortonworks.com/blog/the-next-generation-of-hadoop-based-security-data-governance/
Apache HAWQ åQˆåµåŒ–ä¸åQ‰æ˜¯ä¸€ä¸ªåŸºäº?/span>Greenplumåœ?/span>HDFS上æä¾›æ•°æ®æŸ¥è¯¢çš„SQL引擎。本文讨è®ÞZº†å…¶å…¸åž‹è®¾è®¡ä»¥åŠæ–°ç‰ˆæœ¬çš„诸多改˜q›ã€‚包括它ä¸?/span>Sparkå’?/span>MapReduce的区别,˜q˜æœ‰äº?/span>Hadoop挑战¾lå…¸MPP设计的内容,以åŠHAWQçš„æ–°è®¾è®¡æ€Žæ ·¾l“åˆMPPå’Œæ‰¹å¤„ç†æŠ€æœ¯è¿›è€Œä‹É其两者兼™å¾ã€?/span>
Clouderaåšå®¢æ’°æ–‡ä»‹ç»äº†å¯¹Hadoop分布å¼ç³»¾lŸè¿›è¡Œæ•…障注入ã€ç»„¾|‘çš„‹¹‹è¯•工具AgenTEST。它能注入网¾lœæ•…障(例如丢包åQ‰ï¼Œèµ„æºæ»¡è²åQˆä¾‹å¦?/span>CPUã€?/span>IOã€ç£ç›˜ç©ºé—ß_¼‰½{‰ç‰ã€‚当‹¹‹è¯•¾|‘络分区æ—Óž¼Œå¯ä»¥è¯„估环åÅž¾l„ç½‘ã€æ¡¥æŽ¥ç»„¾|‘牽{‰ã€?/span>
Hortonworksåšå®¢å±•æœ›äº†å°†åŒ…å«æ–°ç‰ˆæœ?/span>Sparkå’?/span>Zeppelinçš?/span>HDP 2.4.2ã€?/span>Spark2.0预览版和Zeppelin新特性都ž®†åŒ…å«åœ¨å†…ã€?/span>
http://hortonworks.com/blog/apache-spark-apache-zeppelin-whats-coming-in-hdp-2-4-2/
Cask撰文介ç»äº†åœ¨Hbase region compaction˜q™æ ·¾|•è§äº‹äšgå‘生的å‰åŽï¼Œä»–ä»¬æ˜¯æ€Žæ ·é€šè¿‡é•¿æ—¶é—´æµ‹è¯•ä»¥è¯„ä¼°åˆ†å¸ƒå¼ç³»¾lŸæ£¼‹®æ€§çš„ã€?/span>
http://blog.cask.co/2016/04/long-running-tests-in-cdap/
本文介ç»äº†å¦‚何结å?/span>SparkR与亚马é€?/span>EMR˜q›è¡Œåœ°ç†½Iºé—´åˆ†æžçš„。通过SparkRçš?/span>Hive集戾l„äšgåQŒå¯ä»¥ç«‹åˆÕdŸºäº?/span>S3ä¸Šçš„æ•°æ®æ˜ å°„Hive外部表。从˜q™å¼€å§‹ï¼Œæ•°æ®ž®Þpƒ½ç›´æŽ¥åŠ è²åˆ°å†…å˜ä¸ä½¿ç”¨Rè¯è¨€åˆ†æžåQŒå¾ˆå®ÒŽ˜“实现高质é‡çš„æ•°æ®å¯è§†åŒ–ã€?/span>
MapR¾~–写了ä‹Éç”?/span>Pigå’?/span>Hive分æžèŒä¸š‹‚’çƒå¤§è”盟çƒé˜Ÿæ°´òq³çš„æ•™ç¨‹ã€?/span>Pig用于数æ®åˆåŠ å·¥ï¼ŒHiveæä¾›åŸÞZºŽSQLçš„æ•°æ®æŸ¥è¯¢çŽ¯å¢ƒã€‚å€ŸåŠ©Hive ODBC驱动å’?/span>HiveæœåŠ¡å™¨ï¼Œä½¿å¾—å¾®èÊYExcel也能用于获å–å’Œåˆ†æžæ•°æ®ã€?/span>
https://www.mapr.com/blog/using-hive-and-pig-baseball-statistics
SignalFX通过27节点çš?/span>Kafka集群æ¯å¤©å¤„ç†700å¤šäº¿æ¡æ¶ˆæ¯ã€‚åªæœ‰åŸºäºŽä»–们积累的大规æ¨?/span>Kafka使用¾l验æ‰èƒ½æœ‰å¦‚æ¤é«˜çš„é‡åQŒå› æ¤ä»–们共享了ä¸å°‘调试Kafka的技巧,定ä½å‘Šè¦åQˆä¾‹å¦‚日志刷新åšg˜qŸå¢žåŠ ï¼‰åQŒä»¥å?/span>Kafkaæ¨ªå‘æ‰©å±•ã€?/span>
http://www.confluent.io/blog/how-we-monitor-and-run-kafka-at-scale-signalfx
dataArtisan'såšå®¢ä¸ÞZº†åº¦é‡Flinkåœ¨æ•°æ®æµæ•ˆçއã€ä½Žå»¶è¿Ÿã€æ£¼‹®æ€§ä¸Šçš„能力,专门写了˜q™ç¯‡æ–‡ç« ã€‚äØ“äº†è¯æ˜Žæ•ˆçŽ‡ï¼Œåœ¨é«˜åžåé‡çš„环境下è¿è¡Œäº†æœ€æ–°çš„Yahoo!‹¹å¼åŸºå‡†‹¹‹è¯•½E‹åºã€‚在æ£ç¡®æ€§æ–¹é¢ï¼Œæ–‡ç« ½H出äº?/span>Flink事äšg判别和处ç†äº‹ä»Óž¼ˆæ˜Ÿçƒå¤§æˆ˜ç”µåª„òq´è¡¨åšç±»æ¯”)斚w¢çš„优åŠÑ€‚最åŽï¼Œæ–‡ç« æè¿°äº?/span>Flink未æ¥ç‰ˆæœ¬åŸÞZºŽå†…å˜çš„æŸ¥è¯¢ä“Q务ã€?/span>
http://data-artisans.com/counting-in-streams-a-hierarchy-of-needs/
本教½E‹ä»‹¾läº†æ€Žæ ·æŠ?/span>TCP Socketä¸çš„æ–‡æœ¬æ•°æ®‹¹è{æ¢äØ“Spark‹¹å¼æ•°æ®æºã€?/span>
https://medium.com/@anicolaspp/spark-custom-streaming-sources-e7d52da72e80
本文介ç»äº†åœ¨æž„å¾Hadoopçš„æ—¶å€™æ€Žæ ·é˜²æ¢AWSè¯ä¹¦æ„外æäº¤åˆ°è¡¥ä¸æˆ–git资æºåº“。除Hadoop本èín外,本文˜q˜å¾è®®ä‹Éç”?/span>“git-secrets”å·¥å…·é˜²æ¢æ„外æäº¤è®‰K—®/å®‰å…¨å¯†é’¥ã€‚å¦‚æžœä½ ç”¨çš„æ˜?/span>Hadoop S3åQŒè¿˜æŽ¨è了新补ä¸ä¾›è¯„ä¼°ã€?/span>
http://steveloughran.blogspot.co.uk/2016/04/testing-against-s3-and-object-stores.html
Big Data & Brews采访�/span>MapR�/span>Ted Dunning�/span>Jacques Nadeau�/span>Apache Arrow也在本次采访范围内�/span>
https://www.youtube.com/watch?v=l3mDDKjDjMk
https://www.youtube.com/watch?v=Xo9CO0a0VJI
å…¶ä»–æ–°é—»
DataEngConf最˜q‘在旧金山å¬å¼€ã€‚本文æ€È»“äº?/span>Uberã€?/span>Stripeã€?/span>Microsoftã€?/span>Instacartã€?/span>Jawboneçš„å‘a€å†…容。也介ç»äº†ä¼šè®®ä¸»é¢?/span>“数殿U‘å¦åœ¨çŽ°å®žä¸–ç•Œä¸æ˜¯ä¸€ä¸ªäñ”å“和工程å¦ç§‘”ã€?/span>
Hortonworksåœ¨ä¸Šå‘¨éƒ½æŸæž—举行çš?/span>Hadoop‹Æ§æ´²å³îC¼šä¸Šå¤§æ”‘Ö¼‚彩ã€?/span>ZDNet报导了这些亮点,其ä¸åŒ…括ä¸?/span>PivotalåQˆå·²è½¬å”®¾l?/span>HDPåQ‰çš„æ‰©å±•åˆä½œåQŒä¸ŽSyncosrtçš„è{å”®å议,以åŠAtlasã€?/span>Rangerã€?/span>Zeppelinã€?/span>Metronçš„æŠ€æœ¯é¢„è§ˆã€‚æŠ¥å¯ÆD¿˜ä»‹ç»äº?/span>Hortonworksã€?/span>Clouderaã€?/span>MapR产å“çš„ä¸åŒä¹‹å¤„ã€?/span>
Flink 2016å³îC¼šž®†åœ¨ä¹æœˆäºŽå¯då›½æŸæž—ä‹Dè¡Œã€‚è®¨è®ø™®®é¢˜å¾é›†å°†äºŽå…月末¾l“æŸã€?/span>
http://flink.apache.org/news/2016/04/14/flink-forward-announce.html
YouTube上å‘布了Hadoopéƒ½æŸæž—峰会演讲视频。æ£å¦‚é¢„æœŸçš„é‚£æ ·åQŒè¿™äº›æ¼”讲内å®Ò޶µç›?/span>Hadoop生æ€ç³»¾lŸçš„å„个部分ã€?/span>
产å“å‘布
Metascope是一个é…å?/span>Schedoscopeåœ?/span>Hadoop集群ä¸è¿›è¡Œå…ƒæ•°æ®½Ž¡ç†çš„æ–°å·¥å…·ã€‚通过web界é¢åQŒåˆ©ç”¨æ•°æ®æ²¿è¢å®ƒèƒ½æ´žå¯Ÿå¤§é‡çš„æ•°æ®ã€‚也æä¾›‹‚€ç´¢ã€å†…嵌文档ã€?/span>REST API½{‰ç‰åŠŸèƒ½ã€?/span>
https://github.com/ottogroup/metascope
Apache HBase 1.2.1于本周å‘布,åœ?/span>1.2.0的基¼‹€ä¸Šè§£å†³äº†27个问题。å‘布声明ä¸é‡ç‚¹ä»‹ç»äº†å››ä¸ªé«˜ä¼˜å…ˆ¾U§çš„问题ã€?/span>
Apache Mahout机器å¦ä¹ 库å‘布了0.12.0版。该版本çš?/span>“Samsara”æ•°å¦çŽ¯å¢ƒå¼€å§‹æ”¯æŒ?/span>Apache Flink了,òq¶ä¸”是åã^å°æ— 关的。å‘布声明ä¸åˆ†äín了与Flink集æˆã€å·²çŸ¥é—®é¢˜ã€é¡¹ç›®æ¼”˜q›è®¡åˆ’相关的内容ã€?/span>
Apache Storm 1.0.0本周å‘布了。亮点包括性能æå‡åQˆæ™®éæå?/span>3å€ä»¥ä¸Šï¼‰ã€æ–°çš„分布弾~“å˜APIã€?/span>nimbus的高å¯ç”¨æ€§ã€è‡ªåЍå压ã€åЍæ€?/span>worker性能分枽{‰ç‰ã€?/span>
http://storm.apache.org/2016/04/12/storm100-released.html
Apache KuduåQˆåµåŒ–ä¸åQ‰æœ¬å‘¨å‘布了0.8.0版。本‹Æ¡å‘å¸ƒæ·»åŠ äº†Apache Flume sinkã€éƒ¨åˆ†åŠŸèƒ½æå‡ã€ä¿®å¤äº†ä¸€æ‰?/span>bugã€?/span>
http://getkudu.io/releases/0.8.0/docs/release_notes.html
Cloudbreak本周å‘布äº?/span>1.2ç‰ˆï¼Œå®ƒäØ“äº‘çŽ¯å¢ƒæä¾?/span>Hadoop集群Docker。新ç‰ÒŽ€§åŒ…括支æŒ?/span>OpenStack以åŠä¸ø™‡ªå®šä¹‰æœåС噍æä¾›é…¾|®è„šæœ¬ã€?/span>
http://hortonworks.com/blog/announcing-cloudbreak-1-2/
Clouderaå‘布äº?/span>Cloudera Enterprise 5.4.10åQŒå†…¾|®äº†Flumeã€?/span>Hadoopã€?/span>HBaseã€?/span>Hiveã€?/span>Impala½{‰ç»„ä»¶ã€?/span>
Presto Accumulo是个新项目,ä¸?/span>Accumuloè¯Õd†™æ•°æ®æä¾›äº?/span>Presto˜qžæŽ¥å™¨ã€?/span>
https://github.com/bloomberg/presto-accumulo
‹zÕdЍ
ä¸å›½
æ—?/span>
½W?165 æœ?2016òq?æœ?0æ—?
坿˜Žæ˜Ÿè¾°——òq›_°å’Œå¤§æ•°æ®æ•´ä½“¾l„ç¼–è¯?/strong>
本周åQŒåŒ…æ‹?/span>LinkedIn å’?/span>Airbnbæ–°å¼€æºé¡¹ç›®åœ¨å†…çš„æ•îC¸ªäº§å“˜q›è¡Œäº†é‡å¤§ç‰ˆæœ¬å‘布。本期技术部分与‹¹å¼å¤„ç†æœ‰å…³——Sparkã€?/span>Flinkã€?/span>Kafka½{‰ç‰åQ›æ–°é—»éƒ¨åˆ†æ˜¯å…³äºŽSpark Summit å’?/span>HbaseCon的会议议½E‹ã€?/span>
Zalandoå‘表了他们是如何选择Apache Flinkä½œäØ“‹¹å¼å¤„ç†æ¡†æž¶çš„æ–‡ç« ã€‚è¯¥æ–‡ç« é˜è¿°äº†å¯¹è¯„ähæ ‡å‡†˜q›è¡ŒéªŒè¯åŽå¾—出的¾l“论åQŒé˜æ˜Žäº†é€‰æ‹©Apache Flink的主å›?/span>—在高åžåé‡çš„æƒ…况下ä¾ç„¶èƒ½ä¿æŒä½Žåšg˜qŸï¼ŒçœŸæ£çš„æµå¼å¤„ç†ï¼Œå¼€å‘äh员支æŒã€?/span>
https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/
Clouderaåšå®¢åˆŠç™»äº†æ¥è‡?/span>Wargaming.netçš„æ–‡ç« ï¼Œé€šè¿‡æœ¬æ–‡å¯äº†è§£åˆ°ä»–们如何通过Kafkaã€?/span>HBaseã€?/span>Droolsã€?/span>Sparkæž„å¾å®žæ—¶å¤„ç†åŸºç¡€è®¾æ–½çš„。å¦å¤–ï¼Œåœ¨æ•°æ®æµ½E‹æ–¹é¢ï¼Œä»–们介ç»äº†å¦‚何对HBase的检索和åºåˆ—化ã€?/span>HBaseå’?/span>Sparkä¹‹é—´çš„æ•°æ®æœ¬åœ°åŒ–以åŠSpark计算斚w¢çš„优化措施ã€?/span>
http://blog.cloudera.com/blog/2016/04/inside-wargamings-data-driven-real-time-rules-engine/
InfoQå‘布了大规模‹¹å¼å¤„熗SMACKåQ?/span>Sparkã€?/span>Mesosã€?/span>Akkaã€?/span>Cassandraä»¥åŠ KafkaåQ‰æ ˆçš„介¾l视频。讨è®ÞZº†ä¸ÞZ»€ä¹?/span>SMACKæ ˆåœ¨å¤„ç†åŒæ ·é—®é¢˜çš„æ—¶å€™æ¯”Lambda架构更简å•ã€?/span>
http://www.infoq.com/presentations/stream-analytics-scalability
Confluent“日志压羃”¾pÕdˆ—åšæ–‡åˆæœ‰æ›´æ–°åQŒä»‹¾l了Kafka™å¹ç›®ä¸‰æœˆä»½å‘生的事情。有ä¸å°‘令ähå…Ïx³¨çš„å¼€å‘内容,包括机架感知ã€?/span>Kerberos支æŒã€åŸºäºŽæ—¶é—´çƒ¦å¼•æ–¹é¢çš„˜q›å±•。以åŠä¸ž®‘ä½ åQˆæˆ‘也是åQ‰æ²¡æœ‰æ—¶é—´æŒ¾lå…³æ³¨çš„æœ€æ–°ç ”å‘æˆæžœã€?/span>
Apache Flink 1.0å¼•å…¥äº†æ–°çš„å¤æ‚事件处ç†ï¼ˆCEPåQ‰åº“ã€‚å•°å—¦å‡ å¥ï¼ŒCEPæä¾›äº†ä¸€¿U检‹¹‹äº‹ä»¶æ¨¡å¼çš„æ–ÒŽ³•ã€‚æœ¬æ–‡å€ŸåŠ©ä¼ æ„Ÿå™¨ä»Žæ•°æ®ä¸å¿ƒæœåŠ¡å™¨ä¸Šæ”‰™›†æ•°æ®åQŒè¿ç”¨ä¸€¿Uå¯èƒ½çš„异常‹‚€‹¹‹ç”¨ä¾‹ï¼Œè¯ 释äº?/span>Flinkçš?/span>CEP模å¼API ã€?/span>
http://flink.apache.org/news/2016/04/06/cep-monitoring.html
Genome Analysis Toolkit åQ?/span>GATKåQ‰æœ€˜q‘å®£å¸ƒï¼Œä¸‹ä¸€ä¸ªç‰ˆæœ¬ï¼ˆå½“å‰æ˜?/span>alphaåQ‰å°†æ”¯æŒApache Spark。本文简è¦ä»‹¾l了工具½Ž±åÆˆå±•ç¤ºäº†æ€Žæ ·é€šè¿‡Sparkæ¥æ£€‹¹‹é‡å¤?/span>DNA片段的ã€?/span>
InfoWorld¾lÆD¿°äº?/span>Spark2.0关于¾l“构化æµå¼å¤„ç†æ–¹é¢çš„计划。微批处ç†å°†ä¾ç„¶å»¶ç®‹åQŒè¿˜æœ‰äº›æ–°ç‰¹æ€§ï¼Œä¾‹å¦‚æ— é™æ•°æ®å¸§ï¼ˆInfinite DataFramesåQ‰ã€ä¸€‹¹çš„é‡å¤æŸ¥è¯¢æ”¯æŒã€?/span>
AWS大数æ®åšå®¢å‘布了一½‹‡é€šè¿‡å˜å‚¨åœ?/span>AWS Key Management Service åQ?/span>KMSåQ‰ä¸çš„åŠ å¯†å¯†é’¥åŠ è½½æ•°æ®åˆ°S3å’?/span>Redshiftçš„æ–‡ç« ã€‚é™¤äº†æ˜q°æ‰€éœ€æ¥éª¤åQŒæœ¬æ–‡è¿˜ä»‹ç»äº†å¦‚何在AWS S3ä¸é€šè¿‡KMSå¯†é’¥åŠ å¯†æ•°æ®ã€?/span>
Confluentåšå®¢ä»‹ç»äº†å¦‚何ä‹Éç”?/span>Kafka Connect å’?/span> Kafka Streams ¾~–写éžå‡¡çš?/span>“hello world”½E‹åºã€‚æ›´¼‹®åˆ‡åœ°è¯´åQŒèŒƒä¾‹ç¨‹åºä»ŽIRC拉维基百¿U‘æ•°æ®ï¼Œòq¶è§£æžæ¶ˆæ¯ã€è¿›è¡Œå¤šæ–šw¢çš„统计计½Ž—。本文还用了若干½E‹åºå±•示了整个实现过½E‹ã€?/span>
http://www.confluent.io/blog/hello-world-kafka-connect-kafka-streams
本文ä»?/span>Postgres å?/span> Cassandraè½¬æ¢½Ž€å•的模å¼åQ?/span>schemasåQ‰ï¼Œòq¶æ˜qîCº†ä¸»è¦çš„å·®å¼?/span>—å¤åˆ¶ã€æ•°æ®ç±»åž‹ï¼ˆCassandra䏿”¯æŒ?/span>JSONåQ‰ã€ä¸»é”®ã€æœ€¾lˆä»¥ä¸€è‡´æ€§ã€?/span>
http://neovintage.org/2016/04/07/data-modeling-in-cassandra-from-a-postgres-perspective/
ESGåšå®¢æŠ¥å¯¼äº†æœ€˜q?/span>Strata+Hadoop World大会的情å†üc€‚åÆˆæœ‰äº›é‡ç‚¹å…Ïx³¨åQŒä¾‹å¦?/span>Sparkçš„è‰¯å¥½åŠ¿å¤´ã€æœºå™¨å¦ä¹ ã€äº‘æœåŠ¡ã€?/span>
http://blog.esg-global.com/riding-high-at-stratahadoop-world
InformationWeek也报å¯égº†Strata大会åQŒå…³æ³¨äº†MapRå’?/span>Pivotal的关ç¯ç‰‡ã€äh工智能ç‰ã€?/span>
Spark Summit 2016议程敲定åQŒå°†äº?/span>6æœ?/span>6-8日在旧金å±×ƒ‹D行。会议将有两天展开五个方å‘的讨论ã€?/span>
https://databricks.com/blog/2016/04/04/agenda-announced-for-sparksummit-2016-in-san-francisco.html
¼›å¸ƒæ–¯é‡‡è®¿äº†Cloudera CEO Tom ReillyåQŒä»–讨论了公å¸çš„æœºé‡ã€ç«žäº‰æ€§å¸‚场ã€ä¸Šå¸‚计划ç‰ã€?/span>
Datanamiæ’°æ–‡ž®†æ£åœ¨å´›èµïLš„Apache Kafkaä½œäØ“‹¹å¼å¤„ç†çš„æ”¯æŸ±ã€‚æ–‡ç« è¿˜é‡‡è®¿äº?/span>Confluentè”åˆåˆ›å§‹äººå…¼CTO Neha NarkhedeåQŒåŠé—´å¥¹è¡¨ç¤ºæœ€˜q‘将推出Kafka Connect å’?/span> Kafka Streamsã€?/span>
http://www.datanami.com/2016/04/06/real-time-rise-apache-kafka/
HBaseConž®†äºŽ5æœ?/span>24日在旧金山å¬å¼€åQŒæœ€˜q‘è®®½E‹æ‰æ£å¼å®£å¸ƒã€‚在三个方å‘上,ž®†æœ‰20个以上的议题è¦è®¨è®ºã€?/span>
http://blog.cloudera.com/blog/2016/04/hbasecon-2016-speaker-lineup-announced/
Apache HBase 0.98.18 å’?/span>1.1.4最˜q‘都å‘布了ã€?/span>1.1.4上有包括ä¹ä¸ªæˆ–棼‹®æ€§åœ¨å†…的若干修å¤ã€?/span>HBase 0.98.18¾Ÿžç”½{”的仅解决了50个问题(bugã€æ”¹å–„两个新ç‰ÒŽ€§ï¼‰ã€?/span>
http://mail-archives.apache.org/mod_mbox/hbase-user/201603.mbox/%3CCANZa%3DGu-mAxKEtfoRjctHcE0KD7z52oE010Fgsf6AMmW2tDZLA%40mail.gmail.com%3E
http://mail-archives.apache.org/mod_mbox/hbase-user/201603.mbox/%3CCA%2BRK%3D_CtZ1L07nS6Og2ekfVwet0qTE7jw-bmyD2pp5UPweUehQ%40mail.gmail.com%3E
Apache Lenså‘布äº?/span>2.5.0-betaåQŒä½œä¸ºç»Ÿä¸€åˆ†æžæŽ¥å£åQŒå®ƒå·²ç»æ”¯æŒHadoop生æ€ç³»¾lŸçš„æ‰§è¡Œå¼•擎数æ®å˜å‚¨äº†ã€‚本‹Æ¡å‘布解决了87¼œ¨ï¼Œä¸»è¦æ˜?/span>bugä¿®å¤å’Œå®žçŽ°æ–°åŠŸèƒ½ã€?/span>
Airbnb å¼€æºäº† CaravelåQŒæ•°æ®æŽ¢ç´¢ç³»¾lŸï¼ˆæ•°æ®å¯è§†åŒ–åã^åŽÍ¼‰ã€?/span>Caravel支æŒå¤šç§åœ¨å•†ä¸šäñ”å“上æ‰èƒ½çœ‹åˆ°çš„特性,能够˜qžæŽ¥åˆîC“Qæ„åªè¦æ”¯æŒ?/span>SQL方言的系¾lŸã€‚尤其它支æŒé¢å‘Druid的实时分æžã€?/span>
https://medium.com/airbnb-engineering/caravel-airbnb-s-data-exploration-platform-15a72aa610e5
MapR 宣布支æŒApache Drill 1.6ä½œäØ“ä»–ä»¬çš„åˆ†å¸ƒå¼¾pÈ»Ÿã€‚比较有亮点的å‘布有MapR-DBæ–°å˜å‚¨æ’ä»¶ã€æ–°SQL½H—å£å‡½æ•°æ”¯æŒä»¥åŠç«¯å¯¹ç«¯å®‰å…¨ã€‚在¾|‘页介ç»éƒ¨åˆ†åQŒæœ‰äº›ä‹Éç”?/span>MapR-DB APIåŠ?/span>è½?/span>æ•°æ®òq‰™€?/span>˜q?/span>Drill查询的例åã€?/span>
Apache Flinkå‘布了修å¤?/span>bugåŽçš„1.0.x。这‹Æ¡å‘布解决了23ä¸ªé—®é¢˜ï¼ŒæŽ¨èæ‰€æœ?/span>1.0.0的用户凾U§ã€?/span>
http://flink.apache.org/news/2016/04/06/release-1.0.1.html
Cloudera Enterprise 5.7å‘布附带äº?/span>Sparkã€?/span>HBaseã€?/span>Impalaã€?/span>Kafka½{‰ç»„件版本的å‡çñ”。本‹Æ¡å‘布的亮点包括ä»?/span>Cloudera Labs 新鲜推èçš?/span>Hive-on-Sparkã€?/span>HBase-Sparkã€?/span>Impala性能é‡è¦æå‡åQŒæ”¯æŒ?/span>SSD ä¸?/span>HBase WALã€?/span>
http://blog.cloudera.com/blog/2016/04/cloudera-enterprise-5-7-is-released/
Apache TajoåQŒæž„建在Hadoop上的数æ®ä»“库¾pÈ»ŸåQŒå‘布了0.11.2版。新版本支æŒäº?/span>KerberosåQŒä¿®å¤äº†ORC表对Hive的支æŒç‰ã€?/span>
http://tajo.apache.org/releases/0.11.2/announcement.html
LinkedIn å¼€æºäº† Dr. ElephantåQŒé‡Œé¢çš„工具能诊æ–?/span>Hadoopå’?/span>Sparkä»ÕdŠ¡çš„æ€§èƒ½é—®é¢˜ã€‚åŸºäº?/span>metricsä»?/span>YARNèµ„æº½Ž¡ç†å™¨æ”¶é›†å·²å®Œæˆä»ÕdŠ¡æ•°æ®åQ?/span>Dr. Elephant评估åŽç”Ÿæˆè¯Šæ–报表,内容包括数æ®é”™ä½ã€?/span>GC开销½{‰ã€?/span>LinkedIn宣称借助它能解决80%的问题ã€?/span>
ä¸å›½
æ—?/span>