亚洲日韩欧美一区二区在线,一本一道久久a久久精品综合蜜臀,国产人成精品一区二区三

Hadoop：用MRUnit做單元測試

　引言

　　借年底盛宴品鑒之風,繼續抒我Hadoop之情，本篇文章介紹如何對Hadoop的MapReduce進行單元測試。MapReduce的開發周期差不多是這樣：編寫mapper和reducer、編譯、打包、提交作業和結果檢索等，這個過程比較繁瑣，一旦提交到分布式環境出了問題要定位調試，重復這樣的過程實在無趣，因此先對MapReduce做單元測試，消除明顯的代碼bug尤為必要。

　　MRUnit簡介

　　MRUnit是一款由Couldera公司開發的專門針對Hadoop中編寫MapReduce單元測試的框架。可以用MapDriver單獨測試Map，用ReduceDriver單獨測試Reduce，用MapReduceDriver測試MapReduce作業。

　　實戰

　　我們將利用MRUnit對本系列上篇文章MapReduce基本編程中的字數統計功能進行單元測試。

　　· 加入MRUnit依賴

<groupId>com.cloudera.hadoop</groupId>

<artifactId>hadoop-mrunit</artifactId>

</dependency>

　　· 單獨測試Map

public class WordCountMapperTest {

private Mappermapper;

private MapDriverdriver;

@Before

public voidinit(){

mapper = newWordCountMapper();

driver = newMapDriver(mapper);

}

@Test

public voidtest() throws IOException{

String line ="Taobao is a great website";

driver.withInput(null,newText(line))

.withOutput(newText("Taobao"),new IntWritable(1))

.withOutput(newText("is"), new IntWritable(1))

.withOutput(newText("a"), new IntWritable(1))

.withOutput(newText("great"), new IntWritable(1))

.withOutput(newText("website"), new IntWritable(1))

.runTest();

}

　　上面的例子通過MapDriver的withInput和withOutput組織map函數的輸入鍵值和期待的輸出鍵值，通過runTest方法運行作業，測試Map函數。測試運行通過。

　　· 單獨測試Reduce

public class WordCountReducerTest {

private Reducerreducer;

privateReduceDriver driver;

@Before

public voidinit(){

reducer = newWordCountReducer();

driver = newReduceDriver(reducer);

}

@Test

public voidtest() throws IOException{

String key ="taobao";

List values =new ArrayList();

values.add(newIntWritable(2));

values.add(newIntWritable(3));

driver.withInput(new Text("taobao"), values)

.withOutput(new Text("taobao"), new IntWritable(5))

.runTest();

}

\　上面的例子的測試Map函數的寫法類似，測試reduce函數，

　　因為reduce函數實現相加功能，因此我們假設輸入為<taobao,[2,3]>，

　　則期待結果應該為<taobao,5>.測試運行通過。

　　· 測試MapReduce

public class WordCountTest {

private Mapper mapper;

private Reducer reducer;

private MapReduceDriver driver;

@Before

public void init(){

mapper = new WordCountMapper();

reducer = new WordCountReducer();

driver = new MapReduceDriver(mapper,reducer);

}

@Test

public void test() throws RuntimeException, IOException{

String line = "Taobao is a great website, is it not?";

driver.withInput("",new Text(line))

.withOutput(new Text("Taobao"),new IntWritable(1))

.withOutput(new Text("a"),new IntWritable(1))

.withOutput(new Text("great"),new IntWritable(1))

.withOutput(new Text("is"),new IntWritable(2))

.withOutput(new Text("it"),new IntWritable(1))

.withOutput(new Text("not"),new IntWritable(1))

.withOutput(new Text("website"),new IntWritable(1))

.runTest();

}

　　這次我們測試MapReduce的作業，通過MapReduceDriver的withInput構造map函數的輸入鍵值，通過withOutput構造reduce函數的輸出鍵值。來測試這個字數統計功能，這次運行測試時拋出了異常，測試沒有通過但沒有詳細junit異常信息，在控制臺顯示

　　2010-11-5 11:14:08org.apache.hadoop.mrunit.TestDriver lookupExpectedValue嚴重:Received unexpectedoutput (not?, 1)

　　2010-11-5 11:14:08org.apache.hadoop.mrunit.TestDriver lookupExpectedValue嚴重: Received unexpectedoutput (website,, 1)

　　2010-11-5 11:14:08org.apache.hadoop.mrunit.TestDriver validate嚴重:Missing expected output (not, 1) atposition 5

　　2010-11-5 11:14:08org.apache.hadoop.mrunit.TestDriver validate嚴重:Missing expected output (website, 1)at position 6

　　看樣子是那里出了問題，不過看控制臺日志不是很直觀，因此我們修改測試代碼，不調用runTest方法，而是調用run方法獲取輸出結果，再跟期待結果相比較，mrunit提供了org.apache.hadoop.mrunit.testutil.ExtendedAssert.assertListEquals輔助類來斷言輸出結果。

　　重構后的測試代碼

@Test

public void test() throws RuntimeException, IOException{

String line = "Taobao is a great website, is it not?";

List<Pair> out = null;

out = driver.withInput("",new Text(line)).run();

List<Pair> expected = new ArrayList<Pair>();

expected.add(new Pair(new Text("Taobao"),new IntWritable(1)));

expected.add(new Pair(new Text("a"),new IntWritable(1)));

expected.add(new Pair(new Text("great"),new IntWritable(1)));

expected.add(new Pair(new Text("is"),new IntWritable(2)));

expected.add(new Pair(new Text("it"),new IntWritable(1)));

expected.add(new Pair(new Text("not"),new IntWritable(1)));

expected.add(new Pair(new Text("website"),new IntWritable(1)));

assertListEquals(expected, out);

}

　　再次運行，測試不通過，但有了明確的斷言信息，

　　java.lang.AssertionError:Expected element (not, 1) at index 5 != actual element (not?, 1)

　　斷言顯示實際輸出的結果為"not?"不是我們期待的"not"，為什么?檢查Map函數,發現程序以空格為分隔符未考慮到標點符號的情況，哈哈，發現一個bug，趕緊修改吧。這個問題也反映了單元測試的重要性，想想看，如果是一個更加復雜的運算，不做單元測試直接放到分布式集群中去運行，當結果不符時就沒這么容易定位出問題了。

　　小結

　　用MRUnit做單元測試可以歸納為以下幾點：用MapDriver單獨測試Map，用ReduceDriver單獨測試Reduce，用MapReduceDriver測試MapReduce作業；不建議調用runTest方法，建議調用run方法獲取輸出結果，再跟期待結果相比較；對結果的斷言可以借助org.apache.hadoop.mrunit.testutil.ExtendedAssert.assertListEquals。

　　如果你能堅持看到這里，我非常高興，但我打賭，你肯定對前面大片的代碼匆匆一瞥而過，這也正常，不是每個人都對測試實戰的代碼感興趣（或在具體需要時才感興趣），為了感謝你的關注，我再分享一個小秘密：本篇講的不僅僅是如何對MapReduce做單元測試，通過本篇測試代碼的閱讀，你可以更加深刻的理解MapReduce的原理（通過測試代碼的輸入和預期結果，你可以更加清楚地知道map、reduce究竟輸入、輸出了什么，對結果的排序在何處進行等細節）。

　　單元測試很必要，可以較早較容易地發現定位問題，但只有單元測試是不夠的，我們需要對MapReduce進行集成測試，在運行集成測試之前，需要掌握如何將MapReduce 作業在hadoop集群中運行起來，本系列后面的文章將介紹這部分內容。

posted on 2014-01-29 10:44 順其自然EVO 閱讀(406) 評論(0) 編輯收藏

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問管理

qileilove

Hadoop：用MRUnit做單元測試

導航

統計

常用鏈接

留言簿(55)

隨筆分類

隨筆檔案

文章分類

文章檔案

搜索

最新評論

閱讀排行榜

評論排行榜