單個node新加硬盤
1.修改需要新加硬盤的node的dfs.data.dir,用逗號分隔新、舊文件目錄
2.重啟dfs
同步hadoop 代碼
hadoop-env.sh
# host:path where hadoop code should be rsync'd from. Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop
用命令合并HDFS小文件
hadoop fs -getmerge <src> <dest>
重啟reduce job方法
Introduced recovery of jobs when JobTracker restarts. This facility is off by default.
Introduced config parameters "mapred.jobtracker.restart.recover", "mapred.jobtracker.job.history.block.size", and "mapred.jobtracker.job.history.buffer.size".
還未驗證過。
IO寫操作出現問題
0-1246359584298, infoPort=50075, ipcPort=50020):Got exception while serving blk_-5911099437886836280_1292 to /172.16.100.165:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/
172.16.100.165:50010 remote=/172.16.100.165:50930]
at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)
at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94)
at java.lang.Thread.run(Thread.java:619)
It seems there are many reasons that it can timeout, the example given in
HADOOP-3831 is a slow reading client.
解決辦法:在hadoop-site.xml中設置dfs.datanode.socket.write.timeout=0試試;
My understanding is that this issue should be fixed in Hadoop 0.19.1 so that
we should leave the standard timeout. However until then this can help
resolve issues like the one you're seeing.
HDFS退服節點的方法
目前版本的dfsadmin的幫助信息是沒寫清楚的,已經file了一個bug了,正確的方法如下:
1. 將 dfs.hosts 置為當前的 slaves,文件名用完整路徑,注意,列表中的節點主機名要用大名,即 uname -n 可以得到的那個。
2. 將 slaves 中要被退服的節點的全名列表放在另一個文件里,如 slaves.ex,使用 dfs.host.exclude 參數指向這個文件的完整路徑
3. 運行命令 bin/hadoop dfsadmin -refreshNodes
4. web界面或 bin/hadoop dfsadmin -report 可以看到退服節點的狀態是 Decomission in progress,直到需要復制的數據復制完成為止
5. 完成之后,從 slaves 里(指 dfs.hosts 指向的文件)去掉已經退服的節點
附帶說一下 -refreshNodes 命令的另外三種用途:
2. 添加允許的節點到列表中(添加主機名到 dfs.hosts 里來)
3. 直接去掉節點,不做數據副本備份(在 dfs.hosts 里去掉主機名)
4. 退服的逆操作——停止 exclude 里面和 dfs.hosts 里面都有的,正在進行 decomission 的節點的退服,也就是把 Decomission in progress 的節點重新變為 Normal (在 web 界面叫 in service)