引言
借年底盛宴品鑒之風(fēng),繼續(xù)抒我Hadoop之情,本篇
文章介紹如何對Hadoop的MapReduce進行
單元測試。MapReduce的開發(fā)周期差不多是這樣:編寫mapper和reducer、編譯、打包、提交作業(yè)和結(jié)果檢索等,這個過程比較繁瑣,一旦提交到分布式環(huán)境出了問題要定位調(diào)試,重復(fù)這樣的過程實在無趣,因此先對MapReduce做單元測試,消除明顯的代碼bug尤為必要。
MRUnit簡介
MRUnit是一款由Couldera公司開發(fā)的專門針對Hadoop中編寫MapReduce單元測試的框架。可以用MapDriver單獨測試Map,用ReduceDriver單獨測試Reduce,用MapReduceDriver測試MapReduce作業(yè)。
實戰(zhàn)
我們將利用MRUnit對本系列上篇文章MapReduce基本編程中的字數(shù)統(tǒng)計功能進行單元測試。
· 加入MRUnit依賴
<dependency> <groupId>com.cloudera.hadoop</groupId> <artifactId>hadoop-mrunit</artifactId> <version>0.20.2-320</version> <scope>test</scope> </dependency> |
· 單獨測試Map
public class WordCountMapperTest { private Mappermapper; private MapDriverdriver; @Before public voidinit(){ mapper = newWordCountMapper(); driver = newMapDriver(mapper); } @Test public voidtest() throws IOException{ String line ="Taobao is a great website"; driver.withInput(null,newText(line)) .withOutput(newText("Taobao"),new IntWritable(1)) .withOutput(newText("is"), new IntWritable(1)) .withOutput(newText("a"), new IntWritable(1)) .withOutput(newText("great"), new IntWritable(1)) .withOutput(newText("website"), new IntWritable(1)) .runTest(); } } |
上面的例子通過MapDriver的withInput和withOutput組織map函數(shù)的輸入鍵值和期待的輸出鍵值,通過runTest方法運行作業(yè),測試Map函數(shù)。測試運行通過。
· 單獨測試Reduce
public class WordCountReducerTest { private Reducerreducer; privateReduceDriver driver; @Before public voidinit(){ reducer = newWordCountReducer(); driver = newReduceDriver(reducer); } @Test public voidtest() throws IOException{ String key ="taobao"; List values =new ArrayList(); values.add(newIntWritable(2)); values.add(newIntWritable(3)); driver.withInput(new Text("taobao"), values) .withOutput(new Text("taobao"), new IntWritable(5)) .runTest(); } } |
\
上面的例子的測試Map函數(shù)的寫法類似,測試reduce函數(shù), 因為reduce函數(shù)實現(xiàn)相加功能,因此我們假設(shè)輸入為<taobao,[2,3]>,
則期待結(jié)果應(yīng)該為<taobao,5>.測試運行通過。
· 測試MapReduce
public class WordCountTest { private Mapper mapper; private Reducer reducer; private MapReduceDriver driver; @Before public void init(){ mapper = new WordCountMapper(); reducer = new WordCountReducer(); driver = new MapReduceDriver(mapper,reducer); } @Test public void test() throws RuntimeException, IOException{ String line = "Taobao is a great website, is it not?"; driver.withInput("",new Text(line)) .withOutput(new Text("Taobao"),new IntWritable(1)) .withOutput(new Text("a"),new IntWritable(1)) .withOutput(new Text("great"),new IntWritable(1)) .withOutput(new Text("is"),new IntWritable(2)) .withOutput(new Text("it"),new IntWritable(1)) .withOutput(new Text("not"),new IntWritable(1)) .withOutput(new Text("website"),new IntWritable(1)) .runTest(); } } |
這次我們測試MapReduce的作業(yè),通過MapReduceDriver的withInput構(gòu)造map函數(shù)的輸入鍵值,通過withOutput構(gòu)造reduce函數(shù)的輸出鍵值。來測試這個字數(shù)統(tǒng)計功能,這次運行測試時拋出了異常,測試沒有通過但沒有詳細junit異常信息,在控制臺顯示
2010-11-5 11:14:08org.apache.hadoop.mrunit.TestDriver lookupExpectedValue嚴重:Received unexpectedoutput (not?, 1)
2010-11-5 11:14:08org.apache.hadoop.mrunit.TestDriver lookupExpectedValue嚴重: Received unexpectedoutput (website,, 1)
2010-11-5 11:14:08org.apache.hadoop.mrunit.TestDriver validate嚴重:Missing expected output (not, 1) atposition 5
2010-11-5 11:14:08org.apache.hadoop.mrunit.TestDriver validate嚴重:Missing expected output (website, 1)at position 6
看樣子是那里出了問題,不過看控制臺日志不是很直觀,因此我們修改測試代碼,不調(diào)用runTest方法,而是調(diào)用run方法獲取輸出結(jié)果,再跟期待結(jié)果相比較,mrunit提供了org.apache.hadoop.mrunit.testutil.ExtendedAssert.assertListEquals輔助類來斷言輸出結(jié)果。
重構(gòu)后的測試代碼
@Test public void test() throws RuntimeException, IOException{ String line = "Taobao is a great website, is it not?"; List<Pair> out = null; out = driver.withInput("",new Text(line)).run(); List<Pair> expected = new ArrayList<Pair>(); expected.add(new Pair(new Text("Taobao"),new IntWritable(1))); expected.add(new Pair(new Text("a"),new IntWritable(1))); expected.add(new Pair(new Text("great"),new IntWritable(1))); expected.add(new Pair(new Text("is"),new IntWritable(2))); expected.add(new Pair(new Text("it"),new IntWritable(1))); expected.add(new Pair(new Text("not"),new IntWritable(1))); expected.add(new Pair(new Text("website"),new IntWritable(1))); assertListEquals(expected, out); } |
再次運行,測試不通過,但有了明確的斷言信息,
java.lang.AssertionError:Expected element (not, 1) at index 5 != actual element (not?, 1)
斷言顯示實際輸出的結(jié)果為"not?"不是我們期待的"not",為什么?檢查Map函數(shù),發(fā)現(xiàn)程序以空格為分隔符未考慮到標(biāo)點符號的情況,哈哈,發(fā)現(xiàn)一個bug,趕緊修改吧。這個問題也反映了單元測試的重要性,想想看,如果是一個更加復(fù)雜的運算,不做單元測試直接放到分布式集群中去運行,當(dāng)結(jié)果不符時就沒這么容易定位出問題了。
小結(jié)
用MRUnit做單元測試可以歸納為以下幾點:用MapDriver單獨測試Map,用ReduceDriver單獨測試Reduce,用MapReduceDriver測試MapReduce作業(yè);不建議調(diào)用runTest方法,建議調(diào)用run方法獲取輸出結(jié)果,再跟期待結(jié)果相比較;對結(jié)果的斷言可以借助org.apache.hadoop.mrunit.testutil.ExtendedAssert.assertListEquals。
如果你能堅持看到這里,我非常高興,但我打賭,你肯定對前面大片的代碼匆匆一瞥而過,這也正常,不是每個人都對測試實戰(zhàn)的代碼感興趣(或在具體需要時才感興趣),為了感謝你的關(guān)注,我再分享一個小秘密:本篇講的不僅僅是如何對MapReduce做單元測試,通過本篇測試代碼的閱讀,你可以更加深刻的理解MapReduce的原理(通過測試代碼的輸入和預(yù)期結(jié)果,你可以更加清楚地知道m(xù)ap、reduce究竟輸入、輸出了什么,對結(jié)果的排序在何處進行等細節(jié))。
單元測試很必要,可以較早較容易地發(fā)現(xiàn)定位問題,但只有單元測試是不夠的,我們需要對MapReduce進行集成測試,在運行集成測試之前,需要掌握如何將MapReduce 作業(yè)在hadoop集群中運行起來,本系列后面的文章將介紹這部分內(nèi)容。