锘??xml version="1.0" encoding="utf-8" standalone="yes"?>成人av在线一区二区三区,亚洲嫩模一区,亚洲人体偷拍http://www.aygfsteel.com/DLevin/category/51369.htmlIn general the OO style is to use a lot of little objects with a lot of little methods that give us a lot of plug points for overriding and variation. To do is to be -Nietzsche, To bei is to do -Kant, Do be do be do -Sinatrazh-cnFri, 04 Dec 2015 19:18:47 GMTFri, 04 Dec 2015 19:18:47 GMT60[杞琞HDFS Metadata Directories Explained - An overview of files and configurations under the HDFS meta-directoryhttp://www.aygfsteel.com/DLevin/archive/2015/01/25/422428.htmlDLevinDLevinSun, 25 Jan 2015 15:28:00 GMThttp://www.aygfsteel.com/DLevin/archive/2015/01/25/422428.htmlhttp://www.aygfsteel.com/DLevin/comments/422428.htmlhttp://www.aygfsteel.com/DLevin/archive/2015/01/25/422428.html#Feedback1http://www.aygfsteel.com/DLevin/comments/commentRss/422428.htmlhttp://www.aygfsteel.com/DLevin/services/trackbacks/422428.html

HDFS metadata represents the structure of HDFS directories and files in a tree. It also includes the various attributes of directories and files, such as ownership, permissions, quotas, and replication factor. In this blog post, I’ll describe how HDFS persists its metadata in Hadoop 2 by exploring the underlying local storage directories and files. All examples shown are from testing a build of the soon-to-be-released Apache Hadoop 2.6.0.

WARNING: Do not attempt to modify metadata directories or files. Unexpected modifications can cause HDFS downtime, or even permanent data loss. This information is provided for educational purposes only.

Persistence of HDFS metadata broadly breaks down into 2 categories of files:

  • fsimage – An fsimage file contains the complete state of the file system at a point in time. Every file system modification is assigned a unique, monotonically increasing transaction ID. An fsimage file represents the file system state after all modifications up to a specific transaction ID.
  • edits – An edits file is a log that lists each file system change (file creation, deletion or modification) that was made after the most recent fsimage.

Checkpointing is the process of merging the content of the most recent fsimage with all edits applied after that fsimage is merged in order to create a new fsimage. Checkpointing is triggered automatically by configuration policies or manually by HDFS administration commands.

NameNode

Here is an example of an HDFS metadata directory taken from a NameNode. This shows the output of running the tree command on the metadata directory, which is configured by setting dfs.namenode.name.dir in hdfs-site.xml.

data/dfs/name
├── current
│ ├── VERSION
│ ├── edits_0000000000000000001-0000000000000000007
│ ├── edits_0000000000000000008-0000000000000000015
│ ├── edits_0000000000000000016-0000000000000000022
│ ├── edits_0000000000000000023-0000000000000000029
│ ├── edits_0000000000000000030-0000000000000000030
│ ├── edits_0000000000000000031-0000000000000000031
│ ├── edits_inprogress_0000000000000000032
│ ├── fsimage_0000000000000000030
│ ├── fsimage_0000000000000000030.md5
│ ├── fsimage_0000000000000000031
│ ├── fsimage_0000000000000000031.md5
│ └── seen_txid
└── in_use.lock

In this example, the same directory has been used for both fsimage and edits. Alternatively, configuration options are available that allow separating fsimage and edits into different directories. Each file within this directory serves a specific purpose in the overall scheme of metadata persistence:

  • VERSION – Text file that contains:
    • layoutVersion – The version of the HDFS metadata format. When we add new features that require changing the metadata format, we change this number. An HDFS upgrade is required when the current HDFS software uses a layout version newer than what is currently tracked here.
    • namespaceID/clusterID/blockpoolID – These are unique identifiers of an HDFS cluster. The identifiers are used to prevent DataNodes from registering accidentally with an incorrect NameNode that is part of a different cluster. These identifiers also are particularly important in a federated deployment. Within a federated deployment, there are multiple NameNodes working independently. Each NameNode serves a unique portion of the namespace (namespaceID) and manages a unique set of blocks (blockpoolID). The clusterID ties the whole cluster together as a single logical unit. It’s the same across all nodes in the cluster.
    • storageType – This is either NAME_NODE or JOURNAL_NODE. Metadata on a JournalNode in an HA deployment is discussed later.
    • cTime – Creation time of file system state. This field is updated during HDFS upgrades.
  • edits_start transaction ID-end transaction ID – These are finalized (unmodifiable) edit log segments. Each of these files contains all of the edit log transactions in the range defined by the file name’s through . In an HA deployment, the standby can only read up through the finalized log segments. It will not be up-to-date with the current edit log in progress (described next). However, when an HA failover happens, the failover finalizes the current log segment so that it’s completely caught up before switching to active.
  • edits_inprogress__start transaction ID – This is the current edit log in progress. All transactions starting from are in this file, and all new incoming transactions will get appended to this file. HDFS pre-allocates space in this file in 1 MB chunks for efficiency, and then fills it with incoming transactions. You’ll probably see this file’s size as a multiple of 1 MB. When HDFS finalizes the log segment, it truncates the unused portion of the space that doesn’t contain any transactions, so the finalized file’s space will shrink down.
  • fsimage_end transaction ID – This contains the complete metadata image up through . Each fsimage file also has a corresponding .md5 file containing a MD5 checksum, which HDFS uses to guard against disk corruption.
  • seen_txid - This contains the last transaction ID of the last checkpoint (merge of edits into a fsimage) or edit log roll (finalization of current edits_inprogress and creation of a new one). Note that this is not the last transaction ID accepted by the NameNode. The file is not updated on every transaction, only on a checkpoint or an edit log roll. The purpose of this file is to try to identify if edits are missing during startup. It’s possible to configure the NameNode to use separate directories for fsimage and edits files. If the edits directory accidentally gets deleted, then all transactions since the last checkpoint would go away, and the NameNode would start up using just fsimage at an old state. To guard against this, NameNode startup also checks seen_txid to verify that it can load transactions at least up through that number. It aborts startup if it can’t.
  • in_use.lock – This is a lock file held by the NameNode process, used to prevent multiple NameNode processes from starting up and concurrently modifying the directory.

JournalNode

In an HA deployment, edits are logged to a separate set of daemons called JournalNodes. A JournalNode’s metadata directory is configured by setting dfs.journalnode.edits.dir. The JournalNode will contain a VERSION file, multiple edits__ files and an edits_inprogress_, just like the NameNode. The JournalNode will not have fsimage files or seen_txid. In addition, it contains several other files relevant to the HA implementation. These files help prevent a split-brain scenario, in which multiple NameNodes could think they are active and all try to write edits.

  • committed-txid – Tracks last transaction ID committed by a NameNode.
  • last-promised-epoch – This file contains the “epoch,” which is a monotonically increasing number. When a new writer (a new NameNode) starts as active, it increments the epoch and presents it in calls to the JournalNode. This scheme is the NameNode’s way of claiming that it is active and requests from another NameNode, presenting a lower epoch, must be ignored.
  • last-writer-epoch – Similar to the above, but this contains the epoch number associated with the writer who last actually wrote a transaction. (This was a bug fix for an edge case not handled by last-promised-epoch alone.)
  • paxos – Directory containing temporary files used in implementation of the Paxos distributed consensus protocol. This directory often will appear as empty.

DataNode

Although DataNodes do not contain metadata about the directories and files stored in an HDFS cluster, they do contain a small amount of metadata about the DataNode itself and its relationship to a cluster. This shows the output of running the tree command on the DataNode’s directory, configured by setting dfs.datanode.data.dir in hdfs-site.xml.

data/dfs/data/
├── current
│ ├── BP-1079595417-192.168.2.45-1412613236271
│ │ ├── current
│ │ │ ├── VERSION
│ │ │ ├── finalized
│ │ │ │ └── subdir0
│ │ │ │ └── subdir1
│ │ │ │ ├── blk_1073741825
│ │ │ │ └── blk_1073741825_1001.meta
│ │ │ │── lazyPersist
│ │ │ └── rbw
│ │ ├── dncp_block_verification.log.curr
│ │ ├── dncp_block_verification.log.prev
│ │ └── tmp
│ └── VERSION
└── in_use.lock

The purpose of these files is:

  • BP-random integer-NameNode-IP address-creation time – The naming convention on this directory is significant and constitutes a form of cluster metadata. The name is a block pool ID. “BP” stands for “block pool,” the abstraction that collects a set of blocks belonging to a single namespace. In the case of a federated deployment, there will be multiple “BP” sub-directories, one for each block pool. The remaining components form a unique ID: a random integer, followed by the IP address of the NameNode that created the block pool, followed by creation time.
  • VERSION – Much like the NameNode and JournalNode, this is a text file containing multiple properties, such as layoutVersion, clusterId and cTime, all discussed earlier. There is a VERSION file tracked for the entire DataNode as well as a separate VERSION file in each block pool sub-directory. In addition to the properties already discussed earlier, the DataNode’s VERSION files also contain:
    • storageType – In this case, the storageType field is set to DATA_NODE.
    • blockpoolID – This repeats the block pool ID information encoded into the sub-directory name.
  • finalized/rbw - Both finalized and rbw contain a directory structure for block storage. This holds numerous block files, which contain HDFS file data and the corresponding .meta files, which contain checksum information. “Rbw” stands for “replica being written”. This area contains blocks that are still being written to by an HDFS client. The finalized sub-directory contains blocks that are not being written to by a client and have been completed.
  • lazyPersist – HDFS is incorporating a new feature to support writing transient data to memory, followed by lazy persistence to disk in the background. If this feature is in use, then a lazyPersist sub-directory is present and used for lazy persistence of in-memory blocks to disk. We’ll cover this exciting new feature in greater detail in a future blog post.
  • dncp_block_verification.log – This file tracks the last time each block was verified by checking its contents against its checksum. The last verification time is significant for deciding how to prioritize subsequent verification work. The DataNode orders its background block verification work in ascending order of last verification time. This file is rolled periodically, so it’s typical to see a .curr file (current) and a .prev file (previous).
  • in_use.lock – This is a lock file held by the DataNode process, used to prevent multiple DataNode processes from starting up and concurrently modifying the directory.

Commands

Various HDFS commands impact the metadata directories

CommandsDescription
hdfs namenodeNameNode startup automatically saves a new checkpoint. As stated earlier, checkpointing is the process of merging any outstanding edit logs with the latest fsimage, saving the full state to a new fsimage file, and rolling edits. Rolling edits means finalizing the current edits_inprogress and starting a new one.
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
This saves a new checkpoint (much like restarting NameNode) while the NameNode process remains running. Note that the NameNode must be in safe mode, so all attempted write activity would fail while this is running.
hdfs dfsadmin -rollEditsThis manually rolls edits. Safe mode is not required. This can be useful if a standby NameNode is lagging behind the active and you want it to get caught up more quickly. (The standby NameNode can only read finalized edit log segments, not the current in progress edits file.)
hdfs dfsadmin -fetchImageDownloads the latest fsimage from the NameNode. This can be helpful for a remote backup type of scenario.

Configuration Properties

Several configuration properties in hdfs-site.xml control the behavior of HDFS metadata directories.

  • dfs.namenode.name.dir – Determines where on the local filesystem the DFS name node should store the name table (fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
  • dfs.namenode.edits.dir - Determines where on the local filesystem the DFS name node should store the transaction (edits) file. If this is a comma-delimited list of directories then the transaction file is replicated in all of the directories, for redundancy. Default value is same as dfs.namenode.name.dir.
  • dfs.namenode.checkpoint.period - The number of seconds between two periodic checkpoints.
  • dfs.namenode.checkpoint.txns – The standby will create a checkpoint of the namespace every ‘dfs.namenode.checkpoint.txns’ transactions, regardless of whether ‘dfs.namenode.checkpoint.period’ has expired.
  • dfs.namenode.checkpoint.check.period – How frequently to query for the number of uncheckpointed transactions.
  • dfs.namenode.num.checkpoints.retained - The number of image checkpoint files that will be retained in storage directories. All edit logs necessary to recover an up-to-date namespace from the oldest retained checkpoint will also be retained.
  • dfs.namenode.num.extra.edits.retained – The number of extra transactions which should be retained beyond what is minimally necessary for a NN restart. This can be useful for audit purposes or for an HA setup where a remote Standby Node may have been offline for some time and need to have a longer backlog of retained edits in order to start again.
  • dfs.namenode.edit.log.autoroll.multiplier.threshold – Determines when an active namenode will roll its own edit log. The actual threshold (in number of edits) is determined by multiplying this value by dfs.namenode.checkpoint.txns. This prevents extremely large edit files from accumulating on the active namenode, which can cause timeouts during namenode startup and pose an administrative hassle. This behavior is intended as a failsafe for when the standby fails to roll the edit log by the normal checkpoint threshold.
  • dfs.namenode.edit.log.autoroll.check.interval.ms – How often an active namenode will check if it needs to roll its edit log, in milliseconds.
  • dfs.datanode.data.dir – Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. Heterogeneous storage allows specifying that each directory resides on a different type of storage: DISK, SSD, ARCHIVE or RAM_DISK.

Conclusion

We briefly discussed how HDFS persists its metadata in Hadoop 2 by exploring the underlying local storage directories and files, the relevant configurations that drive specific behaviors, and appropriate HDFS metadata directory commands that print out the directory tree, initiate checkpoint, and create a fsimage.

In a future blog, we’ll explore lazy persistence, a scheme to persist in-memory data to disk, in more in details.



DLevin 2015-01-25 23:28 鍙戣〃璇勮
]]>
eclipse涓嬩嬌鐢╟ygwin鐩存帴榪愯shell鏂囦歡閰嶇疆http://www.aygfsteel.com/DLevin/archive/2012/04/12/373875.htmlDLevinDLevinWed, 11 Apr 2012 16:38:00 GMThttp://www.aygfsteel.com/DLevin/archive/2012/04/12/373875.htmlhttp://www.aygfsteel.com/DLevin/comments/373875.htmlhttp://www.aygfsteel.com/DLevin/archive/2012/04/12/373875.html#Feedback0http://www.aygfsteel.com/DLevin/comments/commentRss/373875.htmlhttp://www.aygfsteel.com/DLevin/services/trackbacks/373875.html
褰撶劧鍦╳indows涓嬭窇shell褰撶劧鏄鍏堝畨瑁卌ygwin浜嗭紝鍏充簬榪欎釜cygwin鐨勫畨瑁呭氨涓嶅湪榪欓噷璇翠簡錛屽彟澶栧叧浜庡浣曞湪cygwin閰嶇疆hadoop璨屼技宸茬粡鏈変漢鍐欒繃浜嗭紝涔熶笉鍦ㄨ繖閲屼粙緇嶄簡錛屾湁鍏磋叮鐨勭闉嬪彲浠ュ弬鑰冿細http://blog.csdn.net/yanical/article/details/4474830

鎵浠ユ湰鏂囧彧鍏蟲敞濡備綍灝哻ygwin寮曞叆鍒癳clipse涓互榪愯shell鑴氭湰銆?br />鍦╡clipse涓紝閫氳繃External Tools鏂瑰紡鏉ユ敮鎸佸祵鍏ュ寘鎷琧ygwin鍦ㄥ唴鐨勫叾浠栧伐鍏鳳紝浠ヤ笅鏄繖浜涙楠わ細
1. eclipse->Run->External Tools->External Tools Configuration....
2. 鍦ㄩ厤緗〉闈腑錛岄偅涔堝彲浠ユ寜浣犵殑鐖卞ソ闅忎究鎸囧畾錛屽cywin_hadoop錛宭ocation鏄寚externl tools鐨勫湴鍧錛岃繖閲屽氨鏄В閲妔hell鐨刡ash錛岀畝鍗曚竴鐐圭殑錛屽彲浠ョ洿鎺ユ寚瀹氾細C:\cygwin\bin\bash.exe錛岃繖鏍峰彲浠ユ墽琛屼竴浜涚畝鍗曠殑鍛戒護錛屼絾鏄鏋滆寮曠敤鍏朵粬瑙i噴鍣紝閭e氨鏈夐棶棰樹簡錛屾瘮濡傛墽琛宧adoop鐨剆hell鏂囦歡錛屽氨浼氬彂鐜癲irname鍛戒護鎵句笉鍒扮殑鎻愮ず銆傛墍浠ヤ竴縐嶈В鍐蟲柟娉曟槸鑷繁鍐欎竴涓猙at鑴氭湰錛屾妸闇瑕佺敤鍒扮殑鐩綍閮藉啓閬揚ATH涓紝姣斿鎴戠紪鍐欎簡濡備笅鐨刡at鑴氭湰錛堝綋鐒跺鏋滈渶瑕佸叾浠栨洿澶氬叾浠栫洰褰曠殑鍛戒護錛屽彲浠ュ線PATH涓坊鍔狅級錛?br />
@echo off
rem 涓婁竴琛屽叧闂懡浠ゅ洖鏄?br />
PATH
=C:\cygwin\bin\;%PATH%

bash 
%1

rem 寮鍚懡浠ゅ洖鏄?br />echo on
鐒跺悗鎶妉ocation鎸囧悜璇ユ枃浠躲?br />3. Work Directory鏄寚宸ヤ綔鐩綍錛屽彲浠ユ寚瀹氫綘鑴氭湰鎵鍦ㄧ洰褰曪紝濡傛垜鐨刪adoop鑴氭湰鍦╯cripts涓嬶紝閭d箞鎴戝氨鎸囧畾浜嗭細${workspace_loc:/hadoop/scripts}
4. Arguments鎴戞寚瀹氫簡褰撳墠鐨勬枃浠跺悕錛?{resource_name}錛屽鏋滃湪瀹為檯榪愯hadoop鑴氭湰鏃跺弬鏁板彲浠ュ線鍚庡啀娣誨姞銆?br />
榪欐牱灝遍厤緗ソ浜嗭紝鐩存帴鐐瑰嚮Run灝卞彲浠ヨ繍琛屼簡銆傝繖鏍鋒劅瑙変互鍚庡紑鍙戝氨鏂逛究澶氫簡銆?br />
鍙﹀錛岃繕鍙戠幇浜嗕竴涓潪甯告湁瓚g殑涓滀笢錛屼竴鍚岃褰曞垎浜?br />涓轟簡鍦╳indows涓嬬偣鍑籹hell鑴氭湰鏂囦歡灝卞彲浠ョ洿鎺ヨ繍琛宻hell鑴氭湰錛屾湁浜烘兂鍑轟簡濡備笅鍛戒護錛堝嚭鑷細http://stackoverflow.com/questions/105075/how-can-i-associate-sh-files-with-cygwin錛夛細
assoc .sh=bashscript

ftype bashscript
=C:\cygwin\bin\bash.exe --login --'cd "$(dirname "$(cygpath -u "%1")")"; bash "$(cygpath -u "%1")"'
鍗寵緗?.sh鏂囦歡鐨勯粯璁ゆ墽琛岃蔣浠舵槸bash錛屽鏋滃湪win7涓嬮渶瑕佺敤綆$悊鍛樿韓浠芥墦寮cmd錛岀劧鍚庤繍琛岃繖涓や釜鎸囦護銆傚彲鎯滄垜濂藉儚鏈ㄦ湁榪愯鎴愬姛錛屾病鏈変粩緇嗘壘鍘熷洜錛屼笉榪囨垜灝濊瘯浜嗕竴涓嬪懡浠ょ‘瀹炲彲浠ヨ繍琛岋細
assoc .sh=bashscript
ftype bashscript
=C:\cygwin\bin\bash.exe %1
鎰熻鎸哄ソ鐜╃殑銆傘傘傘?

DLevin 2012-04-12 00:38 鍙戣〃璇勮
]]>
主站蜘蛛池模板: 营山县| 桑植县| 化德县| 宜春市| 措勤县| 瑞金市| 龙州县| 元江| 凤台县| 弥勒县| 洛川县| 万载县| 政和县| 织金县| 昌宁县| 钟山县| 普宁市| 武城县| 大渡口区| 长寿区| 云阳县| 偃师市| 长春市| 黄浦区| 油尖旺区| 新晃| 菏泽市| 铁岭市| 西昌市| 泗洪县| 靖远县| 班玛县| 鸡西市| 富源县| 东城区| 襄樊市| 青神县| 本溪市| 虹口区| 什邡市| 乌什县|