ivaneeo's blog

自由的力量，自由的生活。

BlogJava :: 首頁(yè) :: 聯(lián)系 :: 聚合

:: 管理

669 Posts :: 0 Stories :: 64 Comments :: 0 Trackbacks

常用鏈接

留言簿(32)

我參與的團(tuán)隊(duì)

隨筆分類

隨筆檔案

搜索

閱讀排行榜

評(píng)論排行榜

android http上傳文件

在Android的客戶端編程中（特別是SNS 類型的客戶端），經(jīng)常需要實(shí)現(xiàn)注冊(cè)功能Activity，要用戶輸入用戶名，密碼，郵箱，照片后注冊(cè)。但這時(shí)就有一個(gè)問(wèn)題，在HTML中用form表單就能實(shí)現(xiàn)如上的注冊(cè)表單，需要的信息會(huì)自動(dòng)封裝為完整的HTTP協(xié)議，但在Android中如何把這些參數(shù)和需要上傳的文件封裝為HTTP協(xié)議呢？

我們可以先做個(gè)試驗(yàn)，看一下form表單到底封裝了什么樣的信息。

第一步：編寫一個(gè)Servlet，把接收到的HTTP信息保存在一個(gè)文件中，代碼如下：

public void doPost(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
//獲取輸入流，是HTTP協(xié)議中的實(shí)體內(nèi)容
ServletInputStream sis=request.getInputStream();
//緩沖區(qū)
byte buffer[]=new byte[1024];
FileOutputStream fos=new FileOutputStream("d:\\file.log");
int len=sis.read(buffer, 0, 1024);
//把流里的信息循環(huán)讀入到file.log文件中
while( len!=-1 )
{
fos.write(buffer, 0, len);
len=sis.readLine(buffer, 0, 1024);
}
fos.close();
sis.close();
}

第二步：實(shí)現(xiàn)如下一個(gè)表單頁(yè)面，詳細(xì)的代碼如下：

<form action="servlet/ReceiveFile" method="post" enctype="multipart/form-data">
第一個(gè)參數(shù)<input type="text" name="name1"/> <br/>
第二個(gè)參數(shù)<input type="text" name="name2"/> <br/>
第一個(gè)上傳的文件<input type="file" name="file1"/> <br/>
第二個(gè)上傳的文件<input type="file" name="file2"/> <br/>
<input type="submit" value="提交">
</form>

注意了，由于要上傳附件，所以一定要設(shè)置enctype為multipart/form-data，才可以實(shí)現(xiàn)附件的上傳。

第三步：填寫完信息后按“提交”按鈕后，在D盤下查找file.log文件用記事本打開(kāi)，數(shù)據(jù)如下：

—————————–7d92221b604bc

Content-Disposition: form-data; name=”name1″

hello

—————————–7d92221b604bc

Content-Disposition: form-data; name=”name2″

world

—————————–7d92221b604bc

Content-Disposition: form-data; name=”file1″; filename=”C:\2.GIF”

Content-Type: image/gif

GIF89a

€ € €€ €€ € €€€€€覽?     3 f  3 33 3f 3 3 3 f f3 ff f f f ? 檉櫃櫶 ? ? 蘤虣燙 ?  3 f   3 3 33 f3 ? ? 33 33333f33?3?33f 3f33ff3f?f?f3 3?3檉3櫃3櫶3?3 3?3蘤3虣3燙3?3 333f3??f f 3f ff 檉蘤 f3 f33f3ff3檉3蘤3ff ff3fffff檉f蘤ff f?f檉f櫃f櫶f?f f?f蘤f虣f燙f?f f3fff檉蘤 3 f 櫃虣 ? ?3?f?櫃3虣3檉檉3檉f檉櫃f虣f櫃櫃3櫃f櫃櫃櫶櫃櫶櫶3櫶f櫶櫃燙櫶? ?3?f?櫃虣 3 f 櫶燙 ? ?3?f?櫶3燙3蘤蘤3蘤f蘤櫶f燙f虣虣3虣f虣櫶櫶虣燙燙3燙f燙櫶燙燙? ?3?f?櫶燙  3 f ? ? 3 333f3?3?3f f3fff?f?f ?檉櫃櫶??蘤虣燙? 3f??!? ,

e  ??羵Q鸚M!C囑lH馉脝遠(yuǎn)5荑p釩?3R?R愣?MV39V5?談re琷?試 3??qn?薵Q燚c?獖i鄲EW艗赥戟j ;

—————————–7d92221b604bc

Content-Disposition: form-data; name=”file2″; filename=”C:\2.txt”

Content-Type: text/plain

hello everyone!!!

—————————–7d92221b604bc–

從表單源碼可知，表單上傳的數(shù)據(jù)有4個(gè)：參數(shù)name1和name2，文件file1和file2

首先從file.log觀察兩個(gè)參數(shù)name1和name2的情況。這時(shí)候使用UltraEdit打開(kāi)file.log（因?yàn)橛行┳址谟浭卤纠镲@示不出來(lái)，所以要用16進(jìn)制編輯器）

結(jié)合16進(jìn)制數(shù)據(jù)和記事本顯示的數(shù)據(jù)可知上傳參數(shù)部分的格式規(guī)律：

1. 第一行是“—————————–7d92221b604bc”作為分隔符，然后是“\r\n”（即16進(jìn)制編輯器顯示的0D 0A）回車換行符。

2. 第二行

（1）首先是HTTP中的擴(kuò)展頭部分“Content-Disposition: form-data;”，表示上傳的是表單數(shù)據(jù)。

（2） “name=”name1″”參數(shù)的名稱。

（3） “\r\n”（即16進(jìn)制編輯器顯示的0D 0A）回車換行符。

3. 第三行：“\r\n”（即16進(jìn)制編輯器顯示的0D 0A）回車換行符。

4. 第四行：參數(shù)的值，最后是“\r\n”（即16進(jìn)制編輯器顯示的0D 0A）回車換行符。

由觀察可得，表單上傳的每個(gè)參數(shù)都是按照以上1—4的格式構(gòu)造HTTP協(xié)議中的參數(shù)部分。

結(jié)合16進(jìn)制數(shù)據(jù)和記事本顯示的數(shù)據(jù)可知上傳文件部分的格式規(guī)律：

1. 第一行是“—————————–7d92221b604bc”作為分隔符，然后是“\r\n”（即16進(jìn)制編輯器顯示的0D 0A）回車換行符。

2. 第二行：

a) 首先是HTTP中的擴(kuò)展頭部分“Content-Disposition: form-data;”，表示上傳的是表單數(shù)據(jù)。

b) “name=”file2″;”參數(shù)的名稱。

c) “filename=”C:\2.txt””參數(shù)的值。

d) “\r\n”（即16進(jìn)制編輯器顯示的0D 0A）回車換行符。

3. 第三行：HTTP中的實(shí)體頭部分“Content-Type: text/plain”：表示所接收到得實(shí)體內(nèi)容的文件格式。計(jì)算機(jī)的應(yīng)用中有多種多種通用的文件格式，人們?yōu)槊糠N通用格式都定義了一個(gè)名稱，稱為 MIME，MIME的英文全稱是”Multipurpose Internet Mail Extensions” （多功能Internet 郵件擴(kuò)充服務(wù)）

4. 第四行：“\r\n”（即16進(jìn)制編輯器顯示的0D 0A）回車換行符。

5. 第五行開(kāi)始：上傳的內(nèi)容的二進(jìn)制數(shù)。

6. 最后是結(jié)束標(biāo)志“—————————–7d92221b604bc–”，注意：這個(gè)結(jié)束標(biāo)志和分隔符的區(qū)別是最后多了“–”部分。

但現(xiàn)在還有一個(gè)問(wèn)題，就是分隔符“—————————–7d92221b604bc”是怎么確定的呢？是不是一定要“7d92221b604bc”這串?dāng)?shù)字?

我們以前的分析只是觀察了HTTP請(qǐng)求的實(shí)體部分，可以借用工具觀察完整的HTTP請(qǐng)求看一看有沒(méi)有什么線索？

在IE下用HttpWatch，在Firefox下用Httpfox這個(gè)插件，可以實(shí)現(xiàn)網(wǎng)頁(yè)數(shù)據(jù)的抓包，從圖4可看出，原來(lái)在Content-Type部分指定了分隔符所用的字符串。

根據(jù)以上總結(jié)的注冊(cè)表單中的參數(shù)傳遞和文件上傳的規(guī)律，我們可以能寫出Android中實(shí)現(xiàn)一個(gè)用戶注冊(cè)功能（包括個(gè)人信息填寫和上傳圖片部分）的工具類，

首先，要有一個(gè)javaBean類FormFile封裝文件的信息：

public class FormFile {
/* 上傳文件的數(shù)據(jù) */
private byte[] data;
/* 文件名稱 */
private String filname;
/* 表單字段名稱*/
private String formname;
/* 內(nèi)容類型 */
private String contentType = "application/octet-stream"; //需要查閱相關(guān)的資料
public FormFile(String filname, byte[] data, String formname, String contentType) {
this.data = data;
this.filname = filname;
this.formname = formname;
if(contentType!=null) this.contentType = contentType;
}
public byte[] getData() {
return data;
}
public void setData(byte[] data) {
this.data = data;
}
public String getFilname() {
return filname;
}
public void setFilname(String filname) {
this.filname = filname;
}
public String getFormname() {
return formname;
}
public void setFormname(String formname) {
this.formname = formname;
}
public String getContentType() {
return contentType;
}
public void setContentType(String contentType) {
this.contentType = contentType;
}
}

實(shí)現(xiàn)文件上傳的代碼如下：

/**
* 直接通過(guò)HTTP協(xié)議提交數(shù)據(jù)到服務(wù)器,實(shí)現(xiàn)表單提交功能
* @param actionUrl 上傳路徑
* @param params 請(qǐng)求參數(shù) key為參數(shù)名,value為參數(shù)值
* @param file 上傳文件
*/
public static String post(String actionUrl, Map<String, String> params, FormFile[] files) {
    try {
        String BOUNDARY = “———7d4a6d158c9″; //數(shù)據(jù)分隔線
        String MULTIPART_FORM_DATA = “multipart/form-data”;

        URL url = new URL(actionUrl);
        HttpURLConnection conn = (HttpURLConnection) url.openConnection();
        conn.setDoInput(true);//允許輸入
        conn.setDoOutput(true);//允許輸出
        conn.setUseCaches(false);//不使用Cache
        conn.setRequestMethod(”POST”);
        conn.setRequestProperty(”Connection”, “Keep-Alive”);
        conn.setRequestProperty(”Charset”, “UTF-8″);
        conn.setRequestProperty(”Content-Type”, MULTIPART_FORM_DATA + “; boundary=” + BOUNDARY);

        StringBuilder sb = new StringBuilder();

        //上傳的表單參數(shù)部分，格式請(qǐng)參考文章
        for (Map.Entry<String, String> entry : params.entrySet()) {//構(gòu)建表單字段內(nèi)容
            sb.append(”–”);
            sb.append(BOUNDARY);
            sb.append(”\r\n”);
            sb.append(”Content-Disposition: form-data; name=\”"+ entry.getKey() + “\”\r\n\r\n”);
            sb.append(entry.getValue());
            sb.append(”\r\n”);
        }
        DataOutputStream outStream = new DataOutputStream(conn.getOutputStream());
        outStream.write(sb.toString().getBytes());//發(fā)送表單字段數(shù)據(jù)

        //上傳的文件部分，格式請(qǐng)參考文章
        for(FormFile file : files){
            StringBuilder split = new StringBuilder();
            split.append(”–”);
            split.append(BOUNDARY);
            split.append(”\r\n”);
            split.append(”Content-Disposition: form-data;name=\”"+ file.getFormname()+”\”;filename=\”"+ file.getFilname() + “\”\r\n”);
            split.append(”Content-Type: “+ file.getContentType()+”\r\n\r\n”);
            outStream.write(split.toString().getBytes());
            outStream.write(file.getData(), 0, file.getData().length);
            outStream.write(”\r\n”.getBytes());
        }
        byte[] end_data = (”–” + BOUNDARY + “–\r\n”).getBytes();//數(shù)據(jù)結(jié)束標(biāo)志
        outStream.write(end_data);
        outStream.flush();
        int cah = conn.getResponseCode();
        if (cah != 200) throw new RuntimeException(”請(qǐng)求url失敗”);
        InputStream is = conn.getInputStream();
        int ch;
        StringBuilder b = new StringBuilder();
        while( (ch = is.read()) != -1 ){
            b.append((char)ch);
        }
        outStream.close();
        conn.disconnect();
        return b.toString();
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

posted @ 2011-06-09 16:26 ivaneeo 閱讀(3325) | 評(píng)論 (0) | 編輯收藏

Freemarker 的 Configuration 實(shí)例和 MRU Cache

一. 相同配置（set....）的 Configuration 可以考慮只在整個(gè) Application 中共享同一個(gè)實(shí)例：

Create a configuration instance

First you have to create a freemarker.template.Configuration instance and adjust its settings. A Configuration instance is a central place to store the application level settings of FreeMarker. Also, it deals with the creation and caching of pre-parsed templates.

Probably you will do it only once at the beginning of the application (possibly servlet) life-cycle:

二. 具有不同配置（set....）的 Configuration 應(yīng)該建立相互獨(dú)立的實(shí)例：

From now you should use this single configuration instance. Note however that if a system has multiple independent components that use FreeMarker, then of course they will use their own private Configuration instance.

三. 共享的 Configuration 實(shí)例有利于開(kāi)啟 MRU Cache 功能：

Multithreading

In a multithreaded environment Configuration instances, Template instances and data models should be handled as immutable (read-only) objects. That is, you create and initialize them (for example with set... methods), and then you don't modify them later (e.g. you don't call set...). This allows us to avoid expensive synchronized blocks in a multithreaded environment. Beware with Template instances; when you get a Template instance with Configuration.getTemplate, you may get an instance from the template cache that is already used by other threads, so do not call its set... methods (calling process is of course fine).

The above restrictions do not apply if you access all objects from the same single thread only.

四. 開(kāi)啟 MRU Cache 策略

Template caching

FreeMarker caches templates (assuming you use the Configuration methods to create Template objects). This means that when you call getTemplate, FreeMarker not only returns the resulting Template object, but stores it in a cache, so when next time you call getTemplate with the same (or equivalent) path, it just returns the cached Template instance, and will not load and parse the template file again.

cfg.setCacheStorage(new freemarker.cache.MruCacheStorage(20, 250))

Or, since MruCacheStorage is the default cache storage implementation:

cfg.setSetting(Configuration.CACHE_STORAGE_KEY, "strong:20, soft:250");

When you create a new Configuration object, initially it uses an MruCacheStorage where maxStrongSize is 0, and maxSoftSize is Integer.MAX_VALUE (that is, in practice, infinite). But using non-0 maxStrongSize is maybe a better strategy for high load servers, since it seems that, with only softly referenced items, JVM tends to cause just higher resource consumption if the resource consumption was already high, because it constantly throws frequently used templates from the cache, which then have to be re-loaded and and re-parsed.

五. MRU （Most Recently Used） Cache 自動(dòng)更新模板內(nèi)容的特性

If you change the template file, then FreeMarker will re-load and re-parse the template automatically when you get the template next time. However, since checking if the file has been changed can be time consuming, there is a Configuration level setting called ``update delay''. This is the time that must elapse since the last checking for a newer version of a certain template before FreeMarker will check that again. This is set to 5 seconds by default. If you want to see the changes of templates immediately, set it to 0. Note that some template loaders may have problems with template updating. For example, class-loader based template loaders typically do not notice that you have changed the template file.

六. MRU Cache 的兩級(jí)緩存策略

A template will be removed from the cache if you call getTemplate and FreeMarker realizes that the template file has been removed meanwhile. Also, if the JVM thinks that it begins to run out of memory, by default it can arbitrarily drop templates from the cache. Furthermore, you can empty the cache manually with the clearTemplateCache method of Configuration.

The actual strategy of when a cached template should be thrown away is pluggable with the cache_storage setting, by which you can plug any CacheStorage implementation. For most users freemarker.cache.MruCacheStorage will be sufficient. This cache storage implements a two-level Most Recently Used cache. In the first level, items are strongly referenced up to the specified maximum (strongly referenced items can't be dropped by the JVM, as opposed to softly referenced items). When the maximum is exceeded, the least recently used item is moved into the second level cache, where they are softly referenced, up to another specified maximum. The size of the strong and soft parts can be specified with the constructor. For example, set the size of the strong part to 20, and the size of soft part to 250:

posted @ 2011-06-09 15:50 ivaneeo 閱讀(730) | 評(píng)論 (0) | 編輯收藏

HADOOP報(bào)錯(cuò)Incompatible namespaceIDs

今早一來(lái)，突然發(fā)現(xiàn)使用-put命令往HDFS里傳數(shù)據(jù)傳不上去了，抱一大堆錯(cuò)誤，然后我使用bin/hadoop dfsadmin -report查看系統(tǒng)狀態(tài)

admin@adw1:/home/admin/joe.wangh/hadoop-0.19.2>bin/hadoop dfsadmin -report
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: ?%

-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)

使用bin/stop-all.sh關(guān)閉HADOOP

admin@adw1:/home/admin/joe.wangh/hadoop-0.19.2>bin/stop-all.sh
stopping jobtracker
172.16.197.192: stopping tasktracker
172.16.197.193: stopping tasktracker
stopping namenode
172.16.197.193: no datanode to stop
172.16.197.192: no datanode to stop
172.16.197.191: stopping secondarynamenode

哦，看到了吧，發(fā)現(xiàn)datanode前面并沒(méi)有啟動(dòng)起來(lái)。去DATANODE上查看一下日志

admin@adw2:/home/admin/joe.wangh/hadoop-0.19.2/logs>vi hadoop-admin-datanode-adw2.hst.ali.dw.alidc.net.log

************************************************************/
2010-07-21 10:12:11,987 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /home/admin/joe.wangh/hadoop/data/dfs.data.dir: namenode namespaceID = 898136669; datanode namespaceID = 2127444065
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:233)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:148)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:288)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:206)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1239)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1194)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1202)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1324)
......

錯(cuò)誤提示namespaceIDs不一致。

下面給出兩種解決辦法，我使用的是第二種。

Workaround 1: Start from scratch

I can testify that the following steps solve this error, but the side effects won't make you happy (me neither). The crude workaround I have found is to:

1. stop the cluster

2. delete the data directory on the problematic datanode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml; if you followed this tutorial, the relevant directory is /usr/local/hadoop-datastore/hadoop-hadoop/dfs/data

3. reformat the namenode (NOTE: all HDFS data is lost during this process!)

4. restart the cluster

When deleting all the HDFS data and starting from scratch does not sound like a good idea (it might be ok during the initial setup/testing), you might give the second approach a try.

Workaround 2: Updating namespaceID of problematic datanodes

Big thanks to Jared Stehler for the following suggestion. I have not tested it myself yet, but feel free to try it out and send me your feedback. This workaround is "minimally invasive" as you only have to edit one file on the problematic datanodes:

1. stop the datanode

2. edit the value of namespaceID in <dfs.data.dir>/current/VERSION to match the value of the current namenode

3. restart the datanode

If you followed the instructions in my tutorials, the full path of the relevant file is /usr/local/hadoop-datastore/hadoop-hadoop/dfs/data/current/VERSION (background: dfs.data.dir is by default set to ${hadoop.tmp.dir}/dfs/data, and we set hadoop.tmp.dir to /usr/local/hadoop-datastore/hadoop-hadoop).

If you wonder how the contents of VERSION look like, here's one of mine:

#contents of <dfs.data.dir>/current/VERSION

namespaceID=393514426

storageID=DS-1706792599-10.10.10.1-50010-1204306713481

cTime=1215607609074

storageType=DATA_NODE

layoutVersion=-13

原因:每次namenode format會(huì)重新創(chuàng)建一個(gè)namenodeId,而tmp/dfs/data下包含了上次format下的id,namenode format清空了namenode下的數(shù)據(jù),但是沒(méi)有晴空datanode下的數(shù)據(jù),導(dǎo)致啟動(dòng)時(shí)失敗,所要做的就是每次fotmat前,清空tmp一下的所有目錄.

posted @ 2011-06-09 14:20 ivaneeo 閱讀(564) | 評(píng)論 (0) | 編輯收藏

zeekeeper重連的代碼

private void buildZK() {
System.out.println("Build zk client");
try {
zk = new ZooKeeper(zookeeperConnectionString, 10000, this);
Stat s = zk.exists(rootPath, false);
if (s == null) {
zk.create(rootPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
zk.create(rootPath + "/ELECTION", new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
}
String value = zk.create(rootPath + "/ELECTION/n_", hostAddress, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
} catch (Exception e) {
e.printStackTrace();
System.err.println("Error connect to zoo keeper");
}
}
public void process(WatchedEvent event) {
System.out.println(event);
if (event.getState() == Event.KeeperState.Disconnected || event.getState() == Event.KeeperState.Expired) {
System.out.println("Zookeeper connection timeout.");
buildZK();
}
}

posted @ 2011-06-09 13:38 ivaneeo 閱讀(455) | 評(píng)論 (0) | 編輯收藏

Zookeeper配置文件

修改配置

復(fù)制conf/zoo_sample.cfg文件為conf/zoo.cfg，修改其中的數(shù)據(jù)目錄。

# cat /opt/apps/zookeeper/conf/zoo.cfg  tickTime=2000 initLimit=5 syncLimit=2 dataDir=/opt/zkdata clientPort=2181

相關(guān)配置如下：

tickTime：這個(gè)時(shí)間作為Zookeeper服務(wù)器之間或者服務(wù)器與客戶端之間維護(hù)心跳的時(shí)間，時(shí)間單位毫秒。
initLimit：選舉leader的初始延時(shí)。由于服務(wù)器啟動(dòng)加載數(shù)據(jù)需要一定的時(shí)間（尤其是配置數(shù)據(jù)非常多），因此在選舉 Leader后立即同步數(shù)據(jù)前需要一定的時(shí)間來(lái)完成初始化。可以適當(dāng)放大一點(diǎn)。延時(shí)時(shí)間為initLimit*tickTime，也即此數(shù)值為 tickTime的次數(shù)。
syncLimit：此時(shí)間表示為L(zhǎng)eader與Follower之間的最大響應(yīng)時(shí)間單元，如果超時(shí)此時(shí)間（syncLimit*tickTime)，那么Leader認(rèn)為Follwer也即死掉，將從服務(wù)器列表中刪除。

如果是單機(jī)模式的話，那么只需要tickTime/dataDir/clientPort三個(gè)參數(shù)即可，這在單機(jī)調(diào)試環(huán)境很有效。

集群環(huán)境配置

增加其他機(jī)器的配置

# cat /opt/apps/zookeeper/conf/zoo.cfg  tickTime=2000 initLimit=5 syncLimit=2 dataDir=/opt/zkdata clientPort=2181 server.1=10.11.5.202:2888:3888 server.2=192.168.105.218:2888:3888 server.3=192.168.105.65:2888:3888

其中server.X的配置是每一個(gè)機(jī)器的相關(guān)參數(shù)。X代表唯一序號(hào)，例如1/2/3等，值是IP:PORT:PORT。其中IP是 zookeeper服務(wù)器的IP地址或者域名，第一個(gè)PORT（例如2888）是服務(wù)器之間交換數(shù)據(jù)的端口，也即Follower連接Leader的端口，而第二個(gè)端口（例如3888）是各服務(wù)器選舉Leader的端口。單機(jī)配置集群的話可以通過(guò)不同的端口來(lái)實(shí)現(xiàn)。

同步文件目錄

# rsync --inplace -vzrtLp --delete-after --progress /opt/apps/zookeeper root@192.168.105.218:/opt/apps # rsync --inplace -vzrtLp --delete-after --progress /opt/apps/zookeeper root@192.168.106.65:/opt/apps

建立每一個(gè)服務(wù)器的id

注意，此id需要和zoo.cfg中的配置對(duì)應(yīng)起來(lái)

ssh root@10.11.5.202 'echo 1 > /opt/zkdata/myid' ssh root@192.168.105.218 'echo 2 > /opt/zkdata/myid' ssh root@192.168.106.65 'echo 3 > /opt/zkdata/myid'

啟動(dòng)服務(wù)器

ssh root@10.11.5.202 '/opt/apps/zookeeper/bin/zkServer.sh start' ssh root@192.168.105.218 '/opt/apps/zookeeper/bin/zkServer.sh start' ssh root@192.168.106.65 '/opt/apps/zookeeper/bin/zkServer.sh start'

防火墻配置

如果開(kāi)啟了iptables防火墻，則需要在文件/etc/sysconfig/iptables文件下增加如下配置

-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2181 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2888 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 3888 -j ACCEPT

重啟防火墻：

service iptables restart

posted @ 2011-06-08 18:07 ivaneeo 閱讀(1199) | 評(píng)論 (0) | 編輯收藏

HBase解惑

最近經(jīng)常對(duì)自己提一些問(wèn)題，然后自己通過(guò)google、讀代碼、測(cè)試尋求答案來(lái)解決疑惑，可能這些問(wèn)題也能給其他人帶來(lái)一些幫助。

quora是個(gè)不錯(cuò)的問(wèn)答型網(wǎng)站，興趣去看一下自己感興趣的話題吧~

1）HBase中的TTL參數(shù)什么意思？
TTL == "Time To Live". You can specify how long a cell lives in hbase.
Onces its "TTL" has expired, its removed.

2）影響read性能的配置參數(shù)有哪些？

hbase-env.xml:
export HBASE_HEAPSIZE=4000

hbase-default.xml:
hfile.block.cache.size

3）HBase在寫操作的時(shí)候會(huì)更新LruBlockCache嗎？

從代碼上看寫的時(shí)候不會(huì)更新lruBlockCache!

4）如何將一個(gè)HBase CF指定為IN_MEMORY？
創(chuàng)建table的時(shí)候可以指定CF的屬性，create 'taobao', {NAME => 'edp', IN_MEMORY => true}

5）HBase cache每次load的最小單位是block

6）如果每次load一個(gè)block到cache中，而以后不會(huì)再讀取這個(gè)block，則這個(gè)block對(duì)block cache
hit ratio沒(méi)有貢獻(xiàn)啊，但是為什么block cache hit ratio有60%+呢？（這個(gè)我當(dāng)初的錯(cuò)誤理解，漏
洞還是很多的）

注意block cache hit ratio的最小計(jì)量單位應(yīng)該是record，cache的最小單位才是block，因?yàn)閎lock
下面有很多record，后面的record借助了讀第一個(gè)record帶來(lái)的cache福利，所以block cache hit ratio
才會(huì)有60%+

7）如果只有一行一個(gè)cf，寫入很大量的數(shù)據(jù)會(huì)不會(huì)發(fā)生region split?

view plain copy to clipboard print ?

<property>
<name>hbase.hregion.max.filesize</name>
<value>67108864</value>
<description>
Maximum HStoreFile size. If any one of a column families' HStoreFiles has
grown to exceed this value, the hosting HRegion is split in two.
Default: 256M.
</description>
</property>

測(cè)試: 將參數(shù)hbase.hregion.max.filesize設(shè)置成64M以后，然后create table的時(shí)候只創(chuàng)建一個(gè)CF，測(cè)試的時(shí)候只往一個(gè)row + CF 下面塞入數(shù)據(jù)，數(shù)據(jù)量大概在80M左右，在web上顯示的數(shù)目是107M，但是沒(méi)有發(fā)生region split。這說(shuō)明region split最小單位應(yīng)該是row key級(jí)別，因?yàn)檫@里只有一個(gè)row，即使數(shù)據(jù)量已經(jīng)上去了，但是還是沒(méi)有發(fā)生region split.

posted @ 2011-06-08 18:02 ivaneeo 閱讀(741) | 評(píng)論 (0) | 編輯收藏

Thinkpad T60 volume buttons in 9.04 and 9.10

Hi all.
I've a thinkpad T60 and 9.10 installed. I did some search on the forums and found the workaround with tpb package to fix thinkpad volume buttons issue.
My problems with that fix are:
-tbp package depens on xosd (or whatever like that, NOT Notify-OSD) so the result is not the best...

-tpb package is not neccessary at all, because thinkpad_acpi module can take care about volume buttons as well, you just have to enable the hotkey mask! http://www.thinkwiki.org/wiki/Thinkpad-acpi

So my workaround on T60 (in terminal):
9.04 jaunty:

Code:

sudo echo enable,0x00ffffff > /proc/acpi/ibm/hotkey

9.10 karmic: (using sysfs): (also works on 10.04 and 10.10 as well...)

Code:

sudo cp /sys/devices/platform/thinkpad_acpi/hotkey_all_mask /sys/devices/platform/thinkpad_acpi/hotkey_mask

Update:
The solutions only works till next reboot or suspend/resume cycle.
you should put the commands in:
/etc/rc.local
without sudo of course, to make it permanent.

Please confirm if the solution works on other thikpad models.

As soon as I find solution for all the things I need on my T60 I will put it up on Thinkwiki and paste the link here.
(Active protection - hdaps)
(Trackpoint additional functions - you just have to install the: gpointing-device-settings package)
(fingerprint reader - thinkfinger)

Hope it helped for someone.

posted @ 2011-05-31 15:16 ivaneeo 閱讀(308) | 評(píng)論 (0) | 編輯收藏

Accessing Hadoop DFS for Data Storage and Retrieval Using Java

Distributed File Systems (DFS) are a new type of file systems which provides some extra features over normal file systems and are used for storing and sharing files across wide area network and provide easy programmatic access. File Systems like HDFS from Hadoop and many others falls in the category of distributed file systems and has been widely used and are quite popular.

This tutorial provides a step by step guide for accessing and using distributed file system for storing and retrieving data using j\Java. Hadoop Distributed File System has been used for this tutorial because it is freely available, easy to setup and is one of the most popular and well known Distributed file system. The tutorial demonstrates how to access Hadoop distributed file system using java showing all the basic operations.

Introduction
Distributed File Systems (DFS) are a new type of file systems which provides some extra features over normal file systems and are used for storing and sharing files across wide area network and provide easy programmatic access.

Distributed file system is used to make files distributed across multiple servers appear to users as if they reside in one place on the network. Distributed file system allows administrators to consolidate file shares that may exist on multiple servers to appear as if they all are in the same location so that users can access them from a single point on the network.
HDFS stands for Hadoop Distributed File System and is a distributed file system designed to run on commodity hardware. Some of the features provided by Hadoop are:
•   Fault tolerance: Data can be replicated, so if any of the servers goes down, resources still will be available for user.
•   Resource management and accessibility: Users does not require knowing the physical location of the data; they can access all the resources through a single point. HDFS also provides web browser interface to view the contents of the file.
•   It provides high throughput access to application data.

This tutorial will demonstrate how to use HDFS for basic distributed file system operations using Java. Java 1.6 version and Hadoop driver has been used (link is given in Pre-requisites section). The development environment consists of Eclipse 3.4.2 and Hadoop 0.19.1 on Microsoft Windows XP – SP3.

Pre-requisites
1. Hadoop-0.19.1 installation - here and here -

2. Hadoop-0.19.1-core.jar file

3. Commons-logging-1.1.jar file

4. Java 1.6

5. Eclipse 3.4.2

Creating New Project and FileSystem Object

First step is to create a new project in Eclipse and then create a new class in that project.
Now add all the jar files to the project, as mentioned in the pre-requisites.
First step in using or accessing Hadoop Distributed File System (HDFS) is to create file system object.
Without creating an object you cannot perform any operations on the HDFS, so file system object is always required to be created.
Two input parameters are required to create object. They are “Host name” and “Port”.
Code below shows how to create file system object to access HDFS.

Configuration config = new Configuration();

config.set("fs.default.name","hdfs://127.0.0.1:9000/");

FileSystem dfs = FileSystem.get(config);

Here Host name = “127.0.0.1” & Port = “9000”.

Various HDFS operations

Now we will see various operations that can be performed on HDFS.

Creating Directory

Now we will start with creating a directory.
First step for using HDFS is to create a directory where we will store our data.
Now let us create a directory named “TestDirectory”.

String dirName = "TestDirectory";

Path src = new Path(dfs.getWorkingDirectory()+"/"+dirName);

dfs.mkdirs(src);

Here dfs.getWorkingDirectory() function will return the path of the working directory which is the basic working directory and all the data will be stored inside this directory. mkdirs() function accepts object of the type Path, so as shown above Path object is created first. Directory is required to be created inside basic working directory, so Path object is created accordingly. dfs.mkdirs(src)function will create a directory in the working folder with name “TestDirectory”.

Sub directories can also be created inside the “TestDirectory”; in that case path specified during creation of Path object will change. For example a directory named “subDirectory” can be created inside directory “TestDirectory” as shown in below code.

String subDirName = "subDirectory";

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/"+ subDirName);

dfs.mkdirs(src);

Deleting Directory or file

Existing directory in the HDFS can be deleted. Below code shows how to delete the existing directory.

String dirName = "TestDirectory";

Path src = new Path(dfs.getWorkingDirectory()+"/"+dirName);

Dfs.delete(src);

Please note that delete() method can also be used to delete files. What needs to be deleted should be specified in the Path object.

Copying file to/from HDFS from/to Local file system

Basic aim of using HDFS is to store data, so now we will see how to put data in HDFS.
Once directory is created, required data can be stored in HDFS from the local file system.
So consider that a file named “file1.txt” is located at “E:\HDFS” in the local file system, and it is required to be copied under the folder “subDirectory” (that was created earlier) in HDFS.
Code below shows how to copy file from local file system to HDFS.

Path src = new Path("E://HDFS/file1.txt");

Path dst = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/");

dfs.copyFromLocalFile(src, dst);

Here src and dst are the Path objects created for specifying the local file system path where file is located and HDFS path where file is required to be copied respectively. copyFromLocalFile() method is used for copying file from local file system to HDFS.

Similarly, file can also be copied from HDFS to local file system. Code below shows how to copy file from HDFS to local file system.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file1.txt");

Path dst = new Path("E://HDFS/");

dfs.copyToLocalFile(src, dst);

Here copyToLocalFile() method is used for copying file from HDFS to local file system.

CIO, CTO & Developer Resources

Creating a file and writing data in it

It is also possible to create a file in HDFS and write data in it. So if required instead of directly copying the file from the local file system, a file can be first created and then data can be written in it.
Code below shows how to create a file name “file2.txt” in HDFS directory.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file2.txt");

dfs.createNewFile(src);

Here createNewFile() method will create the file in HDFS based on the input provided in src object.

Now as the file is created, data can be written in it. Code below shows how to write data present in the “file1.txt” of local file system to “file2.txt” of HDFS.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file2.txt");

FileInputStream fis = new FileInputStream("E://HDFS/file1.txt");

int len = fis.available();

byte[] btr = new byte[len];

fis.read(btr);

FSDataOutputStream fs = dfs.create(src);

fs.write(btr);

fs.close();

Here write() method of FSDataOutputStream is used to write data in file located in HDFS.

Reading data from a file

It is always necessary to read the data from file for performing various operations on data. It is possible to read data from the file which is stored in HDFS.
Code below shows how to retrieve data from the file present in the HDFS. Here data is read from the file (file1.txt) which is present in the directory (subDirectory) that was created earlier.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file1.txt");

FSDataInputStream fs = dfs.open(src);

String str = null;

while ((str = fs.readline())!= null)
{
System.out.println(str);
}

Here readline() method of FSDataInputStream is used to read data from the file located in HDFS. Also src is the Path object used to specify the path of the file in HDFS which has to be read.

Miscellaneous operations that can be performed on HDFS

Below are some of the basic operations that can be performed on HDFS.

Below is the code that can be used to check whether particular file or directory exists in HDFS. If it exists, it returns true and if it doesn’t exists it returns false.dfs.exists() method is used for this.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/HDFS/file1.txt");

System.out.println(dfs.exists(src));

Below is the code that can be used to check the default block size in which file would be split. It returns block size in terms of Number of Bytes.dfs.getDefaultBlockSize() method is used for this.

System.out.println(dfs.getDefaultBlockSize());

To check for the default replication factor, as shown belowdfs.getDefaultReplication() method can be used.

System.out.println(dfs.getDefaultReplication());

To check whether given path is HDFS directory or file, as shown belowdfs.isDirectory() or dfs.isFile() methods can be used.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file1.txt");
System.out.println(dfs.isDirectory(src));
System.out.println(dfs.isFile(src));

Conclusion
So we just learned some of the basics about Hadoop Distributed File System, how to create and delete directory, how to copy file to/from HDFS from/to local file system, how to create and delete file into directory, how to write data in file, and how to read data from file. We also learned various other operations that can be performed on HDFS. Thus from what we have done we can say that, HDFS is easy to use for data storage and retrieval.

References:
http://hadoop.apache.org/common/docs/current/hdfs_design.html
http://en.wikipedia.org/wiki/Hadoop

posted @ 2011-05-17 10:43 ivaneeo 閱讀(571) | 評(píng)論 (0) | 編輯收藏

paxos 實(shí)現(xiàn)

本文主要介紹zookeeper中zookeeper Server leader的選舉，zookeeper在選舉leader的時(shí)候采用了paxos算法(主要是fast paxos)，這里主要介紹其中兩種：LeaderElection 和FastLeaderElection.

我們先要清楚以下幾點(diǎn)

一個(gè)Server是如何知道其它的Server

在zookeeper中,一個(gè)zookeeper集群有多少個(gè)Server是固定，每個(gè)Server用于選舉的IP和PORT都在配置文件中

除了IP和PORT能標(biāo)識(shí)一個(gè)Server外，還有沒(méi)有別的方法

每一個(gè)Server都有一個(gè)數(shù)字編號(hào)，而且是唯一的，我們根據(jù)配置文件中的配置來(lái)對(duì)每一個(gè)Server進(jìn)行編號(hào)，這一步在部署時(shí)需要人工去做，需要在存儲(chǔ)數(shù)據(jù)文件的目錄中創(chuàng)建一個(gè)文件叫myid的文件，并寫入自己的編號(hào),這個(gè)編號(hào)在處理我提交的value相同很有用

成為L(zhǎng)eader的必要條件

獲得n/2 + 1個(gè)Server同意(這里意思是n/2 + 1個(gè)Server要同意擁有zxid是所有Server最大的哪個(gè)Server)

zookeeper中選舉采用UDP還是TCP

zookeeper中選舉主要是采用UDP，也一種實(shí)現(xiàn)是采用TCP，在這里介紹的兩種實(shí)現(xiàn)采用的是UDP

zookeeper中有哪幾種狀態(tài)

LOOKING 初始化狀態(tài)

LEADING 領(lǐng)導(dǎo)者狀態(tài)

FOLLOWING 跟隨者狀態(tài)

如果所有zxid都相同(例如: 剛初始化時(shí)),此時(shí)有可能不能形成n/2+1個(gè)Server，怎么辦

zookeeper中每一個(gè)Server都有一個(gè)ID,這個(gè)ID是不重復(fù)的，而且按大小排序，如果遇到這樣的情況時(shí)，zookeeper就推薦ID最大的哪個(gè)Server作為L(zhǎng)eader

zookeeper中Leader怎么知道Fllower還存活，F(xiàn)llower怎么知道Leader還存活

Leader定時(shí)向Fllower發(fā)ping消息，F(xiàn)llower定時(shí)向Leader發(fā)ping消息，當(dāng)發(fā)現(xiàn)Leader無(wú)法ping通時(shí)，就改變自己的狀態(tài)(LOOKING)，發(fā)起新的一輪選舉

名詞解釋

zookeeer Server： zookeeper中一個(gè)Server,以下簡(jiǎn)稱Server

zxid(zookeeper transtion id)： zookeeper 事務(wù)id，他是選舉過(guò)程中能否成為leader的關(guān)鍵因素，它決定當(dāng)前Server要將自己這一票投給誰(shuí)(也就是我在選舉過(guò)程中的value,這只是其中一個(gè),還有id)

myid/id(zookeeper server id)： zookeeper server id ，他也是能否成為leader的一個(gè)因素

epoch/logicalclock：他主要用于描述leader是否已經(jīng)改變,每一個(gè)Server中啟動(dòng)都會(huì)有一個(gè)epoch,初始值為0,當(dāng) 開(kāi)始新的一次選舉時(shí)epoch加1,選舉完成時(shí) epoch加1。

tag/sequencer：消息編號(hào)

xid：隨機(jī)生成的一個(gè)數(shù)字，跟epoch功能相同

Fast Paxos消息流向圖與Basic Paxos的對(duì)比

消息流向圖

basic paxos 消息流向圖

Client   Proposer      Acceptor     Learner
|         |          |  |  |       |  |
X-------->|          |  |  |       |  |  Request
|         X--------->|->|->|       |  |  Prepare(N)//向所有Server提議
|         |<---------X--X--X       |  |  Promise(N,{Va,Vb,Vc})//向提議人回復(fù)是否接受提議(如果不接受回到上一步)
|         X--------->|->|->|       |  |  Accept!(N,Vn)//向所有人發(fā)送接受提議消息
|         |<---------X--X--X------>|->|  Accepted(N,Vn)//向提議人回復(fù)自己已經(jīng)接受提議)
|<---------------------------------X--X  Response
|         |          |  |  |       |  |

fast paxos消息流向圖

沒(méi)有沖突的選舉過(guò)程

Client    Leader         Acceptor      Learner
|         |          |  |  |  |       |  |
|         X--------->|->|->|->|       |  |  Any(N,I,Recovery)
|         |          |  |  |  |       |  |
X------------------->|->|->|->|       |  |  Accept!(N,I,W)//向所有Server提議，所有Server收到消息后，接受提議
|         |<---------X--X--X--X------>|->|  Accepted(N,I,W)//向提議人發(fā)送接受提議的消息
|<------------------------------------X--X  Response(W)
|         |          |  |  |  |       |  |

第一種實(shí)現(xiàn): LeaderElection

LeaderElection是Fast paxos最簡(jiǎn)單的一種實(shí)現(xiàn)，每個(gè)Server啟動(dòng)以后都詢問(wèn)其它的Server它要投票給誰(shuí)，收到所有Server回復(fù)以后，就計(jì)算出zxid最大的哪個(gè)Server，并將這個(gè)Server相關(guān)信息設(shè)置成下一次要投票的Server

每個(gè)Server都有一個(gè)response線程和選舉線程,我們先看一下每個(gè)線程是做一些什么事情

response線程

它主要功能是被動(dòng)的接受對(duì)方法的請(qǐng)求，并根據(jù)當(dāng)前自己的狀態(tài)作出相應(yīng)的回復(fù)，每次回復(fù)都有自己的Id，以及xid，我們根據(jù)他的狀態(tài)來(lái)看一看他都回復(fù)了哪些內(nèi)容

LOOKING狀態(tài)：

自己要推薦的Server相關(guān)信息(id,zxid)

LEADING狀態(tài)

myid,上一次推薦的Server的id

FLLOWING狀態(tài):

當(dāng)前Leader的id，以及上一次處理的事務(wù)ID(zxid)

選舉線程

選舉線程由當(dāng)前Server發(fā)起選舉的線程擔(dān)任，他主要的功能對(duì)投票結(jié)果進(jìn)行統(tǒng)計(jì)，并選出推薦的Server。選舉線程首先向所有Server發(fā)起一次詢問(wèn)(包括自己)，被詢問(wèn)方，根據(jù)自己當(dāng)前的狀態(tài)作相應(yīng)的回復(fù)，選舉線程收到回復(fù)后，驗(yàn)證是否是自己發(fā)起的詢問(wèn)(驗(yàn)證 xid是否一致)，然后獲取對(duì)方的id(myid)，并存儲(chǔ)到當(dāng)前詢問(wèn)對(duì)象列表中，最后獲取對(duì)方提議的leader相關(guān)信息(id,zxid)，并將這些信息存儲(chǔ)到當(dāng)次選舉的投票記錄表中，當(dāng)向所有Server都詢問(wèn)完以后，對(duì)統(tǒng)計(jì)結(jié)果進(jìn)行篩選并進(jìn)行統(tǒng)計(jì)，計(jì)算出當(dāng)次詢問(wèn)后獲勝的是哪一個(gè) Server，并將當(dāng)前zxid最大的Server設(shè)置為當(dāng)前Server要推薦的Server(有可能是自己，也有可以是其它的Server，根據(jù)投票結(jié)果而定，但是每一個(gè)Server在第一次投票時(shí)都會(huì)投自己)，如果此時(shí)獲勝的Server獲得n/2 + 1的Server票數(shù)，設(shè)置當(dāng)前推薦的leader為獲勝的Server，將根據(jù)獲勝的Server相關(guān)信息設(shè)置自己的狀態(tài)。每一個(gè)Server都重復(fù)以上流程，直到選出 leader

了解每個(gè)線程的功能以后，我們來(lái)看一看選舉過(guò)程

選舉過(guò)程中，Server的加入

當(dāng)一個(gè)Server啟動(dòng)時(shí)它都會(huì)發(fā)起一次選舉，此時(shí)由選舉線程發(fā)起相關(guān)流程，那么每個(gè)Server都會(huì)獲得當(dāng)前zxid最大的哪個(gè)Server是誰(shuí)，如果當(dāng)次最大的Server沒(méi)有獲得n/2+1個(gè)票數(shù)，那么下一次投票時(shí)，他將向zxid最大的Server投票，重復(fù)以上流程，最后一定能選舉出一個(gè)Leader

選舉過(guò)程中，Server的退出

只要保證n/2+1個(gè)Server存活就沒(méi)有任何問(wèn)題，如果少于n/2+1個(gè)Server存活就沒(méi)辦法選出Leader

選舉過(guò)程中，Leader死亡

當(dāng)選舉出Leader以后，此時(shí)每個(gè)Server應(yīng)該是什么狀態(tài)(FLLOWING)都已經(jīng)確定，此時(shí)由于Leader已經(jīng)死亡我們就不管它，其它的Fllower按正常的流程繼續(xù)下去，當(dāng)完成這個(gè)流程以后，所有的Fllower都會(huì)向Leader發(fā)送Ping消息，如果無(wú)法ping通，就改變自己的狀態(tài)為(FLLOWING ==> LOOKING)，發(fā)起新的一輪選舉

選舉完成以后，Leader死亡

這個(gè)過(guò)程的處理跟選舉過(guò)程中Leader死亡處理方式一樣，這里就不再描述

第二種實(shí)現(xiàn): FastLeaderElection

fastLeaderElection是標(biāo)準(zhǔn)的fast paxos的實(shí)現(xiàn)，它首先向所有Server提議自己要成為leader，當(dāng)其它Server收到提議以后，解決epoch和zxid的沖突，并接受對(duì)方的提議，然后向?qū)Ψ桨l(fā)送接受提議完成的消息

數(shù)據(jù)結(jié)構(gòu)

本地消息結(jié)構(gòu)：

static public class Notification {
long leader; //所推薦的Server id

long zxid; //所推薦的Server的zxid(zookeeper transtion id)

long epoch; //描述leader是否變化(每一個(gè)Server啟動(dòng)時(shí)都有一個(gè)logicalclock，初始值為0)

QuorumPeer.ServerState state; //發(fā)送者當(dāng)前的狀態(tài)
InetSocketAddress addr; //發(fā)送者的ip地址
}

網(wǎng)絡(luò)消息結(jié)構(gòu)：

static public class ToSend {

int type;        //消息類型
long leader; //Server id
long zxid;     //Server的zxid
long epoch; //Server的epoch
QuorumPeer.ServerState state; //Server的state
long tag;      //消息編號(hào)

InetSocketAddress addr;

}

Server具體的實(shí)現(xiàn)

每個(gè)Server都一個(gè)接收線程池(3個(gè)線程)和一個(gè)發(fā)送線程池 (3個(gè)線程),在沒(méi)有發(fā)起選舉時(shí)，這兩個(gè)線程池處于阻塞狀態(tài)，直到有消息到來(lái)時(shí)才解除阻塞并處理消息，同時(shí)每個(gè)Server都有一個(gè)選舉線程(可以發(fā)起選舉的線程擔(dān)任)；我們先看一下每個(gè)線程所做的事情，如下：

被動(dòng)接收消息端(接收線程池)的處理:

notification：首先檢測(cè)當(dāng)前Server上所被推薦的zxid,epoch是否合法(currentServer.epoch <= currentMsg.epoch && (currentMsg.zxid > currentServer.zxid || (currentMsg.zxid == currentServer.zxid && currentMsg.id > currentServer.id))) 如果不合法就用消息中的zxid,epoch,id更新當(dāng)前Server所被推薦的值，此時(shí)將收到的消息轉(zhuǎn)換成Notification消息放入接收隊(duì)列中，將向?qū)Ψ桨l(fā)送ack消息

ack: 將消息編號(hào)放入ack隊(duì)列中，檢測(cè)對(duì)方的狀態(tài)是否是LOOKING狀態(tài)，如果不是說(shuō)明此時(shí)已經(jīng)有Leader已經(jīng)被選出來(lái)，將接收到的消息轉(zhuǎn)發(fā)成Notification消息放入接收對(duì)隊(duì)列

主動(dòng)發(fā)送消息端(發(fā)送線程池)的處理:

notification: 將要發(fā)送的消息由Notification消息轉(zhuǎn)換成ToSend消息，然后發(fā)送對(duì)方，并等待對(duì)方的回復(fù),如果在等待結(jié)束沒(méi)有收到對(duì)方法回復(fù)，重做三次,如果重做次還是沒(méi)有收到對(duì)方的回復(fù)時(shí)檢測(cè)當(dāng)前的選舉(epoch)是否已經(jīng)改變，如果沒(méi)有改變，將消息再次放入發(fā)送隊(duì)列中，一直重復(fù)直到有Leader選出或者收到對(duì)方回復(fù)為止

ack: 主要將自己相關(guān)信息發(fā)送給對(duì)方

主動(dòng)發(fā)起選舉端(選舉線程)的處理:

首先自己的epoch 加1，然后生成notification消息,并將消息放入發(fā)送隊(duì)列中，系統(tǒng)中配置有幾個(gè)Server就生成幾條消息，保證每個(gè)Server都能收到此消息,如果當(dāng)前Server的狀態(tài)是LOOKING就一直循環(huán)檢查接收隊(duì)列是否有消息，如果有消息，根據(jù)消息中對(duì)方的狀態(tài)進(jìn)行相應(yīng)的處理。

LOOKING狀態(tài):

首先檢測(cè)消息中epoch是否合法，是否比當(dāng)前Server的大,如果比較當(dāng)前Server的epoch大時(shí)，更新epoch，檢測(cè)是消息中的zxid,id是否比當(dāng)前推薦的Server大，如果是更新相關(guān)值，并新生成notification消息放入發(fā)關(guān)隊(duì)列，清空投票統(tǒng)計(jì)表；如果消息小的epoch則什么也不做；如果相同檢測(cè)消息中zxid,id是否合法,如果消息中的zxid，id大，那么更新當(dāng)前Server相關(guān)信息，并新生成notification消息放入發(fā)送隊(duì)列，將收到的消息的IP和投票結(jié)果放入統(tǒng)計(jì)表中，并計(jì)算統(tǒng)計(jì)結(jié)果，根據(jù)結(jié)果設(shè)置自己相應(yīng)的狀態(tài)

LEADING狀態(tài):

將收到的消息的IP和投票結(jié)果放入統(tǒng)計(jì)表中(這里的統(tǒng)計(jì)表是獨(dú)立的)，并計(jì)算統(tǒng)計(jì)結(jié)果，根據(jù)結(jié)果設(shè)置自己相應(yīng)的狀態(tài)

FOLLOWING狀態(tài):

了解每個(gè)線程的功能以后，我們來(lái)看一看選舉過(guò)程,選舉過(guò)程跟第一程一樣

選舉過(guò)程中，Server的加入

當(dāng)一個(gè)Server啟動(dòng)時(shí)它都會(huì)發(fā)起一次選舉，此時(shí)由選舉線程發(fā)起相關(guān)流程，通過(guò)將自己的zxid和epoch告訴其它Server，最后每個(gè)Server都會(huì)得zxid值最大的哪個(gè)Server的相關(guān)信息，并且在下一次投票時(shí)就投zxid值最大的哪個(gè)Server，重復(fù)以上流程，最后一定能選舉出一個(gè)Leader

選舉過(guò)程中，Server的退出

只要保證n/2+1個(gè)Server存活就沒(méi)有任何問(wèn)題，如果少于n/2+1個(gè)Server存活就沒(méi)辦法選出Leader

選舉過(guò)程中，Leader死亡

當(dāng)選舉出Leader以后，此時(shí)每個(gè)Server應(yīng)該是什么狀態(tài) (FLLOWING)都已經(jīng)確定，此時(shí)由于Leader已經(jīng)死亡我們就不管它，其它的Fllower按正常的流程繼續(xù)下去，當(dāng)完成這個(gè)流程以后，所有的 Fllower都會(huì)向Leader發(fā)送Ping消息，如果無(wú)法ping通，就改變自己的狀態(tài)為(FLLOWING ==> LOOKING)，發(fā)起新的一輪選舉

選舉完成以后，Leader死亡

這個(gè)過(guò)程的處理跟選舉過(guò) 程中Leader死亡處理方式一樣，這里就不再描述

posted @ 2011-05-05 13:16 ivaneeo 閱讀(1265) | 評(píng)論 (1) | 編輯收藏

Zookeeper研究和應(yīng)用

摘要: zookeeper簡(jiǎn)介 zookeeper是一個(gè)開(kāi)源分布式的服務(wù),它提供了分布式協(xié)作,分布式同步,配置管理等功能. 其實(shí)現(xiàn)的功能與google的chubby基本一致.zookeeper的官方網(wǎng)站已經(jīng)寫了一篇非常經(jīng)典的概述性文章,請(qǐng)大家參閱:ZooKeeper: A Distributed Coordination Service for Distributed Applications 在此我... 閱讀全文

posted @ 2011-05-05 13:15 ivaneeo 閱讀(1655) | 評(píng)論 (0) | 編輯收藏

僅列出標(biāo)題

ivaneeo's blog

常用鏈接

留言簿(32)

我參與的團(tuán)隊(duì)

隨筆分類

隨筆檔案

搜索

最新評(píng)論

閱讀排行榜

評(píng)論排行榜

Create a configuration instance

Multithreading

Template caching

修改配置

集群環(huán)境配置

增加其他機(jī)器的配置

同步文件目錄

建立每一個(gè)服務(wù)器的id

啟動(dòng)服務(wù)器

防火墻配置

我們先要清楚以下幾點(diǎn)

名詞解釋

Fast Paxos消息流向圖與Basic Paxos的對(duì)比

消息流向圖

第一種實(shí)現(xiàn): LeaderElection

response線程

選舉線程

第二種實(shí)現(xiàn): FastLeaderElection

數(shù)據(jù)結(jié)構(gòu)

Server具體的實(shí)現(xiàn)