石建 | Fat Mind

          2012年3月29日


          http://incubator.apache.org/kafka/design.html

          1.Why we built this
              asd(activity stream data)數據是任何網站的一部分,反映網站使用情況,如:那些內容被搜索、展示。通常,此部分數據被以log方式記錄在文件,然后定期的整合和分析。od(operation data)是關于機器性能數據,和其它不同途徑整合的操作數據。
              在近幾年,asd和od變成一個網站重要的一部分,更復雜的基礎設施是必須的。
               數據特點:
                  a、大吞吐量的不變的ad,對實時計算是一個挑戰,會很容易超過10倍or100倍。
           
                  b、傳統的記錄log方式是respectable and scalable方式去支持離線處理,但是延遲太高。
              Kafka is intended to be a single queuing platform that can support both offline and online use cases.

          2.Major Design Elements

          There is a small number of major design decisions that make Kafka different from most other messaging systems:

          1. Kafka is designed for persistent messages as the common case;消息持久
          2. Throughput rather than features are the primary design constraint;吞吐量是第一要求
          3. State about what has been consumed is maintained as part of the consumer not the server;狀態由客戶端維護
          4. Kafka is explicitly distributed. It is assumed that producers, brokers, and consumers are all spread over multiple machines;必須是分布式
          3.Basics
              Messages are the fundamental unit of communication
              Messages are
           published to a topic by a producer which means they are physically sent to a server acting as a broker,消息被生產者發布到一個topic,意味著物理的發送消息到broker;
              多個consumer訂閱一個topic,則此topic的每個消息都會被分發到每個consumer;
              kafka是分布式:producer、broker、consumer,均可以由集群的多臺機器組成,相互協作 a logic group;
              屬于同一個consumer group的每一個consumer process,每個消息能準確的由其中的一個process消費;A more common case in our own usage is that we have multiple logical consumer groups, each consisting of a cluster of consuming machines that act as a logical whole.
              kafka不管一個topic有多少個consumer,其消息僅會存儲一份。

          4.Message Persistence and Caching

          4.1 Don't fear the filesystem !
              kafka完全依賴文件系統去存儲和cache消息;
              大家通常對磁盤的直覺是'很慢',則使人們對持久化結構,是否能提供有競爭力的性能表示懷疑;實際上,磁盤到底有多慢或多塊,完全取決于如何使用磁盤,a properly designed disk structure can often be as fast as the network.
              http://baike.baidu.com/view/969385.htm raid-5 
              http://www.china001.com/show_hdr.php?xname=PPDDMV0&dname=66IP341&xpos=172 磁盤種類
              磁盤順序讀寫的性能非常高, linear writes on a 6 7200rpm SATA RAID-5 array is about 300MB/sec;These linear reads and writes are the most predictable of all usage patterns, and hence the one detected and optimized best by the operating system using read-ahead and write-behind techniques。順序讀寫是最可預見的模式,因此操作系統通過read-head和write-behind技術去優化。
              現代操作系統,用mem作為disk的cache;Any modern OS will happily divert all free memory to disk caching with little performance penalty when the memory is reclaimed. All disk reads and writes will go through this unified cache. 
              Jvm:a、對象的內存開銷是非常大的,通常是數據存儲的2倍;b、當heap數據增大時,gc代價越來越大;
              As a result of these factors using the filesystem and relying on pagecache is superior to maintaining an in-memory cache or other structure。依賴文件系統和pagecache是優于mem cahce或其它結構的。
              數據壓縮,Doing so will result in a cache of up to 28-30GB on a 32GB machine without GC penalties. 
              This suggests a design which is very simple: maintain as much as possible in-memory and flush to the filesystem only when necessary. 盡可能的維持在內存中,僅當必須時寫回到文件系統.
              當數據被立即寫回到持久化的文件,而未調用flush,其意味著數據僅被寫入到os pagecahe,在后續某個時間由os flush。Then we add a configuration driven flush policy to allow the user of the system to control how often data is flushed to the physical disk (every N messages or every M seconds) to put a bound on the amount of data "at risk" in the event of a hard crash. 提供flush策略。

          4.2 
          Constant Time Suffices
              
          The persistent data structure used in messaging systems metadata is often a BTree. BTrees are the most versatile data structure available, and make it possible to support a wide variety of transactional and non-transactional semantics in the messaging system.
              Disk seeks come at 10 ms a pop, and each disk can do only one seek at a time so parallelism is limited. Hence even a handful of disk seeks leads to very high overhead. 
              Furthermore BTrees require a very sophisticated page or row locking implementation to avoid locking the entire tree on each operation.
          The implementation must pay a fairly high price for row-locking or else effectively serialize all reads.
              持久化消息的元數據通常是BTree結構,但磁盤結構,其代價太大。原因:尋道、避免鎖整棵樹。
              
          Intuitively a persistent queue could be built on simple reads and appends to files as is commonly the case with logging solutions.
              持久化隊列可以構建在讀和append to 文件。所以不支持BTree的一些語義,但其好處是:O(1)消耗,無鎖讀寫。
              
          the performance is completely decoupled from the data size--one server can now take full advantage of a number of cheap, low-rotational speed 1+TB SATA drives. 
          Though they have poor seek performance, these drives often have comparable performance for large reads and writes at 1/3 the price and 3x the capacity.

          4.3 Maximizing Efficiency
              Furthermore we assume each message published is read at least once (and often multiple times), hence we optimize for consumption rather than production. 更進一步,我們假設被發布的消息至少會讀一次,因此優化consumer優先于producer。
              
          There are two common causes of inefficiency :
                  two many network requests, (
           APIs are built around a "message set" abstraction,
          This allows network requests to group messages together and amortize the overhead of the network roundtrip rather than sending a single message at a time.) 僅提供批量操作api,則每次網絡開銷是平分在一組消息,而不是單個消息。
              and excessive byte copying.(
          The message log maintained by the broker is itself just a directory of message sets that have been written to disk.
          Maintaining this common format allows optimization of the most important operation : network transfer of persistent log chunks.
              To understand the impact of sendfile, it is important to understand the common data path for transfer of data from file to socket:
          1. The operating system reads data from the disk into pagecache in kernel space
          2. The application reads the data from kernel space into a user-space buffer
          3. The application writes the data back into kernel space into a socket buffer
          4. The operating system copies the data from the socket buffer to the NIC buffer where it is sent over the network
              利用os提供的zero-copy,
          only the final copy to the NIC buffer is needed.

          4.4 End-to-end Batch Compression
              In many cases the bottleneck is actually not CPU but network. This is particularly true for a data pipeline that needs to send messages across data centers.
          Efficient compression requires compressing multiple messages together rather than compressing each message individually. 
          Ideally this would be possible in an end-to-end fashion — that is, data would be compressed prior to sending by the producer and remain compressed on the server, only being decompressed by the eventual consumers. 
              
          A batch of messages can be clumped together compressed and sent to the server in this form. This batch of messages will be delivered all to the same consumer and will remain in compressed form until it arrives there.
              理解:kafka 
          producer api 提供批量壓縮,broker不對此批消息做任何操作,且以壓縮的方式,一起被發送到consumer。

          4.5 Consumer state
              Keeping track of what has been consumed is one of the key things a messaging system must provide. 
          State tracking requires updating a persistent entity and potentially causes random accesses. 
              
          Most messaging systems keep metadata about what messages have been consumed on the broker. That is, as a message is handed out to a consumer, the broker records that fact locally. 大部分消息系統,存儲是否被消費的元信息在broker。則是說,一個消息被分發到一個consumer,broker記錄。
              問題:當consumer消費失敗后,會導致消息丟失;改進:每次consumer消費后,給broker ack,若broker在超時時間未收到ack,則重發此消息。
              問題:1.當消費成功,但未ack時,會導致消費2次  2.
           now the broker must keep multiple states about every single message  3.當broker是多臺機器時,則狀態之間需要同步

          4.5.1 Message delivery semantics
              
          So clearly there are multiple possible message delivery guarantees that could be provided : at most once 、at least once、exactly once。
              
          This problem is heavily studied, and is a variation of the "transaction commit" problem. Algorithms that provide exactly once semantics exist, two- or three-phase commits and Paxos variants being examples, but they come with some drawbacks. They typically require multiple round trips and may have poor guarantees of liveness (they can halt indefinitely). 
              消費分發語義,是 ‘事務提交’ 問題的變種。算法提供 exactly onece 語義,兩階段 or 三階段提交,paxos 均是例子,但它們存在缺點。典型的問題是要求多次round trip,且
          poor guarantees of liveness。
              
          Kafka does two unusual things with respect to metadata. 
          First the stream is partitioned on the brokers into a set of distinct partitions. 
          Within a partition messages are stored in the order in which they arrive at the broker, and will be given out to consumers in that same order. This means that rather than store metadata for each message (marking it as consumed, say), we just need to store the "high water mark" for each combination of consumer, topic, and partition.  
              
          4.5.2 
          Consumer state
              In Kafka, the consumers are responsible for maintaining state information (offset) on what has been consumed. 
          Typically, the Kafka consumer library writes their state data to zookeeper.
              
          This solves a distributed consensus problem, by removing the distributed part!
              
          There is a side benefit of this decision. A consumer can deliberately rewind back to an old offset and re-consume data.

          4.5.3 Push vs. pull
              
          A related question is whether consumers should pull data from brokers or brokers should push data to the subscriber.
          There are pros and cons to both approaches.
              However a push-based system has difficulty dealing with diverse consumers as the broker controls the rate at which data is transferred. push目標是consumer能在最大速率去消費,可不幸的是,當consume速率小于生產速率時,the consumer tends to be overwhelmed。
              
          A pull-based system has the nicer property that the consumer simply falls behind and catches up when it can. This can be mitigated with some kind of backoff protocol by which the consumer can indicate it is overwhelmed, but getting the rate of transfer to fully utilize (but never over-utilize) the consumer is trickier than it seems. Previous attempts at building systems in this fashion led us to go with a more traditional pull model.  不存在push問題,且也保證充分利用consumer能力。

          5. Distribution
              Kafka is built to be run across a cluster of machines as the common case. There is no central "master" node. Brokers are peers to each other and can be added and removed at anytime without any manual configuration changes. Similarly, producers and consumers can be started dynamically at any time. Each broker registers some metadata (e.g., available topics) in Zookeeper. Producers and consumers can use Zookeeper to discover topics and to co-ordinate the production and consumption. The details of producers and consumers will be described below.

          6. Producer

          6.1 Automatic producer load balancing
              Kafka supports client-side load balancing for message producers or use of a dedicated load balancer to balance TCP connections. 
           
              The advantage of using a level-4 load balancer is that each producer only needs a single TCP connection, and no connection to zookeeper is needed. 
          The disadvantage is that the balancing is done at the TCP connection level, and hence it may not be well balanced (if some producers produce many more messages then others, evenly dividing up the connections per broker may not result in evenly dividing up the messages per broker).
              
          Client-side zookeeper-based load balancing solves some of these problems. It allows the producer to dynamically discover new brokers, and balance load on a per-request basis. It allows the producer to partition data according to some key instead of randomly.

              The working of the zookeeper-based load balancing is described below. Zookeeper watchers are registered on the following events—

          • a new broker comes up
          • a broker goes down
          • a new topic is registered
          • a broker gets registered for an existing topic

              Internally, the producer maintains an elastic pool of connections to the brokers, one per broker. This pool is kept updated to establish/maintain connections to all the live brokers, through the zookeeper watcher callbacks. When a producer request for a particular topic comes in, a broker partition is picked by the partitioner (see section on semantic partitioning). The available producer connection is used from the pool to send the data to the selected broker partition.
              producer通過zk,管理與broker的連接。當一個請求,根據partition rule 計算分區,從連接池選擇對應的connection,發送數據。

          6.2 Asynchronous send

              Asynchronous non-blocking operations are fundamental to scaling messaging systems.
              
          This allows buffering of produce requests in a in-memory queue and batch sends that are triggered by a time interval or a pre-configured batch size. 

          6.3 Semantic partitioning
              
          The producer has the capability to be able to semantically map messages to the available kafka nodes and partitions. 
          This allows partitioning the stream of messages with some semantic partition function based on some key in the message to spread them over broker machines. 


          posted @ 2013-07-06 14:57 石建 | Fat Mind 閱讀(1767) | 評論 (0)編輯 收藏

          1.Js代碼,login.js文件

          //用戶的登陸信息寫入cookies
          function SetCookie(form)//兩個參數,一個是cookie的名子,一個是值
          {   
              var name = form.name.value;
              var password = form.password.value;
              var Days = 1; //此 cookie 將被保存 7 天 
              var exp  = new Date(); //生成一個現在的日期,加上保存期限,然后設置cookie的生存期限!
              exp.setTime(exp.getTime() + Days*24*60*60*1000);
              document.cookie = "user="+ escape(name) + "/" + escape(password) + ";expires=" + exp.toGMTString();
          }
          //取cookies函數--正則表達式(不會,學習正則表達式)  
          function getCookie(name)      
          {
              var arr = document.cookie.match(new RegExp("(^| )"+name+"=([^;]*)(;|$)"));
              if(arr != nullreturn unescape(arr[2]); 
              return null;
          }
          //取cookies函數--普通實現      
            function   readCookie(form){   
                var   cookieValue   =   "";   
                var   search   =   "user=";   
                if(document.cookie.length   >   0)     {   
                    offset   =   document.cookie.indexOf(search);
                    if(offset !=  -1){     
                        offset   +=   search.length;   
                        end   =   document.cookie.indexOf(";",offset);   
                        if   (end  ==  -1)   
                              end   =   document.cookie.length;
                        //獲取cookies里面的值          
                        cookieValue   =   unescape(document.cookie.substring(offset,end))
                        if(cookieValue != null){
                              var str = cookieValue.split("/");
                              form.name.value = str[0];
                              form.password.value = str[1]; 
                        }
                    }   
                }    
            }   
          //刪除cookie,(servlet里面:設置時間為0,設置為-1和session的范圍是一樣的),javascript好像是有點區別
          function delCookie()
          {
              var name = "admin";
              var exp = new Date();
              exp.setTime(exp.getTime() - 1);
              var cval=getCookie(name);
              if(cval!=null) document.cookie= name + "="+cval+";expires="+exp.toGMTString();
          }

           

          2.jsp代碼,文件login.jsp

          <%@ page contentType="text/html; charset=gb2312" language="java"
              import="java.sql.*" errorPage=""%>
              
          <html>
              <head>
                  <meta http-equiv="Content-Type" content="text/html; charset=gb2312">
                  <title>javascript 控制 cookie</title>
                  <link href="css/style.css" rel="stylesheet" type="text/css">
                  <script type="text/javascript" src="js/login.js"></script>
              </head>
              <script language="javascript">
              function checkEmpty(form){
                  for(i=0;i<form.length;i++){
                      if(form.elements[i].value==""){
                          alert("表單信息不能為空");
                          return false;
                      }
                  }
              }
          </script>
              <body onload="readCookie(form)"> <!-- 作為JavaScript控制的cookie-->
                  <div align="center">
                      <table width="324" height="225" border="0" cellpadding="0" cellspacing="0">
                          <tr height="50">
                              <td ></td>
                          </tr>
                          <tr align="center">
                              <td background="images/back.jpg">
                                  <br>
                                  <br>
                                  登陸
                                  <form name="form" method="post" action="" onSubmit="return checkEmpty(form)">
                                      <input type="hidden" name="id" value="-1">
                                      <table width="268" border="1" cellpadding="0" cellspacing="0">
                                          <tr align="center">
                                              <td width="63" height="30">
                                                  用戶名:
                                              </td>
                                              <td width="199">
                                                  <input type="text" name="name" id="name">
                                              </td>
                                          </tr>
                                          <tr align="center">
                                              <td height="30">
                                                  密碼:
                                              </td>
                                              <td>
                                                  <input type="password" name="password" id="password">
                                              </td>
                                          </tr>
                                      </table>
                                      <br>
                                      <input type="submit" value="提交">
                                      <input type="checkbox" name="cookie" onclick="SetCookie(form)">記住我          
                                  </form>
                              </td>
                          </tr>
                      </table>
                  </div>
              </body>
          </html>

           


          目的:當你再次打開login.jsp頁面,表單里面的內容已經寫好了,是你上一次的登陸信息!


          問題:1.JavaScript里面取cookie都是寫死的,不是很靈活!
                      2.JavaScript的cookie是按照字符串的形式存放的,所以拿出的時候,你要按照你放進去的形式來選擇!
                      3.本來是想實現自動登陸的,可我的每個頁面都要session的檢查!一個客戶端,一個服務器端,沒能實現!

           

           

          posted @ 2012-09-09 15:18 石建 | Fat Mind 閱讀(642) | 評論 (0)編輯 收藏
          1.變量類型
            - undefined
            - null
            - string
                  - == 與 === 區別
            - number
            - boolean
            - string、number、boolean均有對應的 '對象類'
          2.函數
            - 定義函數
                  - function 關鍵字
                  - 參數(見例子),arguments
                  - 函數內變量聲明,var區別
            - 作用域
                  - 鏈式結構(子函數可以看見父函數的變量)
            - 匿名函數
                - 使用場景(非復用場景,如:jsonp回調函數)
                - this特征
          例子:
          var add = function(x) {
              return x++;

          }
          add(1,2,3); // 參數可以隨意多個,類似Java中的(int x ...)

          var fn = function(name, pass) {
              alert(name);
              alert(pass);
          };
          fn("hello","1234",5); // 按照傳遞的順序排列


          var name = "windows";
          var fn = function() {
              var name 
          = "hello";
              alert(
          this.name);
          }
          fn(); // windows,this在匿名函數內部是指向windows范圍

          var name = "windows";
          var fn = function() {
              name 
          = "hello";
              alert(
          this.name);
          }
          fn(); // 因函數內部變量name未聲明為var,則屬于全局變量,且this指向windows,則為'hello'

          function add(a) {
              return ++a;
          }
          var fn = function(x,add){
              return add(x);
          }
          fn(1, add);  // 函數作為參數

          3.閉包  
          http://www.ruanyifeng.com/blog/2009/08/learning_javascript_closures.html 【good】
          其它語言閉包概念 http://www.ibm.com/developerworks/cn/linux/l-cn-closure/

          4.對象
              - new Object()
              – 對象字面量
              – 構造函數
              - 上述操作,經歷的步驟
                  –創建新對象
                  –將構造方法的作用域賦給新對象(new 操作符)
                  –為對象添加屬性, 方法
                  –返回該對象

          var obj = new Object();  // new Object方式
          obj.name = 'zhangsan';

          var obj = {                   // 字面常量方式,定義對象
              name : 'zhangsan',
              showName : function (){
                  alert(this.name);
              }
          };
          alert(obj.showName());
          function Person(name) { // 構造函數
              this.name = name;
              this.showName = function(){
                  return this.name; }
              };
          var obj = new Person("zhangsan"); // 必須用 new 關鍵,否則等于調用一個普通函數
          obj.showName();
          alert(obj.name);


          資料:內部培訓ppt 
          posted @ 2012-05-20 13:50 石建 | Fat Mind 閱讀(266) | 評論 (0)編輯 收藏


          1.句柄就是一個標識符,只要獲得對象的句柄,我們就可以對對象進行任意的操作。

          2.句柄不是指針,操作系統用句柄可以找到一塊內存,這個句柄可能是標識符,mapkey,也可能是指針,看操作系統怎么處理的了。

          fd算是在某種程度上替代句柄吧;

          Linux 有相應機制,但沒有統一的句柄類型,各種類型的系統資源由各自的類型來標識,由各自的接口操作。

          3.http://tech.ddvip.com/2009-06/1244006580122204_11.html

          在操作系統層面上,文件操作也有類似于FILE的一個概念,在Linux里,這叫做文件描述符(File Descriptor),而在Windows里,叫做句柄(Handle)(以下在沒有歧義的時候統稱為句柄)。用戶通過某個函數打開文件以獲得句柄,此 后用戶操縱文件皆通過該句柄進行。

           

          設計這么一個句柄的原因在于句柄可以防止用戶隨意讀寫操作系統內核的文件對象。無論是Linux還是Windows,文件句柄總是和內核的文件對象相關聯的,但如何關聯細節用戶并不可見。內核可以通過句柄來計算出內核里文件對象的地址,但此能力并不對用戶開放。

           

          下面舉一個實際的例子,在Linux中,值為012fd分別代表標準輸入、標準輸出和標準錯誤輸出。在程序中打開文件得到的fd3開始增長。 fd具體是什么呢?在內核中,每一個進程都有一個私有的打開文件表,這個表是一個指針數組,每一個元素都指向一個內核的打開文件對象。而fd,就是這 個表的下標。當用戶打開一個文件時,內核會在內部生成一個打開文件對象,并在這個表里找到一個空項,讓這一項指向生成的打開文件對象,并返回這一項的下標 作為fd。由于這個表處于內核,并且用戶無法訪問到,因此用戶即使擁有fd,也無法得到打開文件對象的地址,只能夠通過系統提供的函數來操作。

           

          C語言里,操縱文件的渠道則是FILE結構,不難想象,C語言中的FILE結構必定和fd有一對一的關系,每個FILE結構都會記錄自己唯一對應的fd


          句柄 
          http://zh.wikipedia.org/wiki/%E5%8F%A5%E6%9F%84

          程序設計 ,句柄是一種特殊的智能指針 。當一個應用程序 要引用其他系統(數據庫操作系統 )所管理的內存 塊或對象 時,就要使用句柄。

          句柄與普通指針 的區別在于,指針包含的是引用對象 內存地址 ,而句柄則是由系統所管理的引用標識,該標識可以被系統重新定位到一個內存地址 上。這種間接訪問對象 的模式增強了系統對引用對象 的控制。(參見封裝 )

          在上世紀80年代的操作系統(如Mac OS Windows )的內存管理 中,句柄被廣泛應用。Unix 系統的文件描述符 基本上也屬于句柄。和其它桌面環境 一樣,Windows API 大量使用句柄來標識系統中的對象 ,并建立操作系統與用戶空間 之間的通信渠道。例如,桌面上的一個窗體由一個HWND 類型的句柄來標識。

          如今,內存 容量的增大和虛擬內存 算法使得更簡單的指針 愈加受到青睞,而指向另一指針的那類句柄受到冷淡。盡管如此,許多操作系統 仍然把指向私有對象 的指針以及進程傳遞給客戶端 的內部數組 下標稱為句柄。


           

          posted @ 2012-04-06 14:02 石建 | Fat Mind 閱讀(11894) | 評論 (0)編輯 收藏

          官方 :http://code.google.com/p/powermock/ 

          1. 使用mockito的同學,推薦閱讀如下部分

              - document [必選]
                  - getting started
                  - motavition
              - mockito extends [必選]
                  - mockito 1.8+ useage
              - common
              - tutorial
              - faq [必選]


          2. 附件:實際開發中使用到的
          powermock的一些特性,簡化后的例子(僅為說明powermock api使用)。主要包括

           

          -          修改私有域

          -          私有方法

          -            測試私有方法

          -            Mock

          -            Verify

          -          靜態方法

          -            Mock

          -            拋出異常

          -            Verify

          -          Mock類部分方法

          -          Mock Java core library,如:Thread

          -          Mock 構造器

          /Files/shijian/powermock.rar



          posted @ 2012-03-29 12:39 石建 | Fat Mind 閱讀(905) | 評論 (0)編輯 收藏

          導航

          <2012年3月>
          26272829123
          45678910
          11121314151617
          18192021222324
          25262728293031
          1234567

          統計

          常用鏈接

          留言簿

          隨筆分類

          隨筆檔案

          搜索

          最新評論

          What 、How、Why,從細節中尋找不斷的成長點
          主站蜘蛛池模板: 琼中| 萝北县| 霍城县| 乡城县| 怀来县| 漳州市| 胶州市| 高淳县| 叶城县| 梁山县| 石楼县| 桃源县| 塔河县| 栾川县| 合阳县| 阜阳市| 岳阳市| 武隆县| 三台县| 沙雅县| 甘德县| 洪洞县| 时尚| 高台县| 和政县| 延安市| 新丰县| 甘洛县| 信阳市| 大洼县| 新河县| 梁河县| 宁津县| 富阳市| 谢通门县| 原阳县| 平利县| 安吉县| 阿图什市| 富顺县| 甘泉县|