求最少的分割次數。 閱讀全文
posted @ 2013-04-11 11:24 小明 閱讀(4150) | 評論 (3) | 編輯 收藏
|
|||
摘要: 給定一個字符串s,分割s使得每個子串都是回文的(即倒過來和原字符串是一樣的,如aba)
求最少的分割次數。 閱讀全文 posted @ 2013-04-11 11:24 小明 閱讀(4150) | 評論 (3) | 編輯 收藏 問題描述: Problem Statement THIS PROBLEM WAS TAKEN FROM THE SEMIFINALS OF THE TOPCODER INVITATIONAL TOURNAMENT DEFINITION Class Name: MatchMaker Method Name: getBestMatches Paramaters: String[], String, int Returns: String[] Method signature (be sure your method is public): String[] getBestMatches(String[] members, String currentUser, int sf); PROBLEM STATEMENT A new online match making company needs some software to help find the "perfect couples". People who sign up answer a series of multiple-choice questions. Then, when a member makes a "Get Best Mates" request, the software returns a list of users whose gender matches the requested gender and whose answers to the questions were equal to or greater than a similarity factor when compared to the user's answers. Implement a class MatchMaker, which contains a method getBestMatches. The method takes as parameters a String[] members, String currentUser, and an int sf: - members contains information about all the members. Elements of members are of the form "NAME G D X X X X X X X X X X" * NAME represents the member's name * G represents the gender of the current user. * D represents the requested gender of the potential mate. * Each X indicates the member's answer to one of the multiple-choice questions. The first X is the answer to the first question, the second is the answer to the second question, et cetera. - currentUser is the name of the user who made the "Get Best Mates" request. - sf is an integer representing the similarity factor. The method returns a String[] consisting of members' names who have at least sf identical answers to currentUser and are of the requested gender. The names should be returned in order from most identical answers to least. If two members have the same number of identical answers as the currentUser, the names should be returned in the same relative order they were inputted. TopCoder will ensure the validity of the inputs. Inputs are valid if all of the following criteria are met: - members will have between 1 and 50 elements, inclusive. - Each element of members will have a length between 7 and 44, inclusive. - NAME will have a length between 1 and 20, inclusive, and only contain uppercase letters A-Z. - G can be either an uppercase M or an uppercase F. - D can be either an uppercase M or an uppercase F. - Each X is a capital letter (A-D). - The number of Xs in each element of the members is equal. The number of Xs will be between 1 and 10, inclusive. - No two elements will have the same NAME. - Names are case sensitive. - currentUser consists of between 1 and 20, inclusive, uppercase letters, A-Z, and must be a member. - sf is an int between 1 and 10, inclusive. - sf must be less than or equal to the number of answers (Xs) of the members. NOTES The currentUser should not be included in the returned list of potential mates. EXAMPLES For the following examples, assume members = {"BETTY F M A A C C", "TOM M F A D C A", "SUE F M D D D D", "ELLEN F M A A C A", "JOE M F A A C A", "ED M F A D D A", "SALLY F M C D A B", "MARGE F M A A C C"} If currentUser="BETTY" and sf=2, BETTY and TOM have two identical answers and BETTY and JOE have three identical answers, so the method should return {"JOE","TOM"}. If currentUser="JOE" and sf=1, the method should return {"ELLEN","BETTY","MARGE"}. If currentUser="MARGE" and sf=4, the method should return []. Definition Class: MatchMaker Method: getBestMatches Parameters: String[], String, int Returns: String[] Method signature: String[] getBestMatches(String[] param0, String param1, int param2) (be sure your method is public) ================================================================我的代碼============= import java.util.ArrayList; import java.util.Collections; import java.util.Comparator; import java.util.List; public class MatchMaker { enum GENDER{MALE,FEMALE}; //"NAME G D X X X X X X X X X X" private static class Member{ String name; GENDER gender; GENDER mate; String[] answers; int index; int matched = 0; } String[] getBestMatches(String[] members, String currentUser, int sf){ List<Member> allMembers = new ArrayList<Member>(); Member cu = null; for(int i=0;i<members.length;++i){ String m = members[i]; String[] c = m.split(" "); Member mem = new Member(); mem.name= c[0]; mem.gender = c[1].equals("M")?GENDER.MALE:GENDER.FEMALE; mem.mate = c[2].equals("M")?GENDER.MALE:GENDER.FEMALE; mem.index = i; mem.matched = 0; String[] answers = mem.answers = new String[c.length-3]; for(int j=3;j<c.length;++j){ answers[j-3] = c[j]; } allMembers.add(mem); if(c[0].equals(currentUser)){ cu = mem; } } List<Member> matched = new ArrayList<Member>(); if(cu!=null){ for(Member mem:allMembers){ if(mem!=cu && mem.gender==cu.mate){ for(int i=0;i<mem.answers.length;++i){ if(mem.answers[i].equals(cu.answers[i])){ ++mem.matched; } } if(mem.matched>=sf){ matched.add(mem); } } } Collections.sort(matched, new Comparator<Member>(){ public int compare(Member ma, Member mb) { if(ma.matched!=mb.matched){ return mb.matched - ma.matched; } return ma.index-mb.index; } }); String[] result = new String[matched.size()]; for(int i=0;i<result.length;++i){ result[i] = matched.get(i).name; } return result; } return new String[0]; } } posted @ 2013-04-02 14:04 小明 閱讀(406) | 評論 (0) | 編輯 收藏 以下是我在上一家公司出的java筆試題,有些難度,感興趣的同學可以做做看。
---Question--- 1.What is the output of the following program? public class Foo { public static void main(String[] args){ Map<byte[], String> m = new HashMap<byte[], String>(); byte[] key = "abcd".getBytes(); m.put(key, "abcd"); System.out.println(m.containsKey(key)); System.out.println(m.containsKey("abcd")); System.out.println(m.containsKey("abcd".getBytes())); } } a) true,true,false b)true,false,false c)true,true,true d) false,false,false e)Program throws an exception
2. What is the proper string filled in the following program? Public class Foo { public static void main(String[] args) { String s=”1\\2\\3\\4”; //split the string with “\” String []result = s.split(“____”); for(String r:result){ System.out.println(r); } } } a) \ b) \\ c) \\\ d)\\\\ e)\\\\\
3. What is the output of the following program? public class Foo { public static void main(String[] args) { char[] c = new char[] { '1' }; String s = new String(c); System.out.println("abcd" + c); System.out.println("abcd" + s); } } a) Compile error b)abcd1,abcd1 c) abcd49,abcd1 d) Program throws an exception e)none of above
4. Which class is threading safe which one object can be used between multi-threads without extra synchronized? a) Vector b) HashMap c) ArrayList d)StringBuilder e)HashSet
5. What is the output of the following program? public class Foo { public static void main(String[] args) throws IOException { ByteArrayOutputStream baos = new ByteArrayOutputStream(); byte[] b = new byte[]{(byte)0x0,(byte)0x1,(byte)0x2}; baos.write(b); baos.write(0x0102); byte[] result = baos.toByteArray(); ByteArrayInputStream bais = new ByteArrayInputStream(result); System.out.println(bais.available()); } } a) 0 b) 1 c)4 d) 5 e) Program throws an exception
6. What is return value of function “calc”? public class Foo { public static int calc() throws IOException{ int ret = 0; try{ ++ret; throw new IOException("try"); } catch(IOException ioe){ --ret; return ret; } finally{ ++ret; return ret; } } } a) 0 b) 1 c)2 d)3 e) throws an exception
7. What is the output of the following program? public class Foo { public static class Value { private int value; public int get(){ return value; } public void set(int v){ value = v; } } public static class Values implements Iterable<Value>{ public Values(int capacity){ this.capacity = capacity; }
int count =1 ; int capacity; Value v = new Value(); public Iterator<Value> iterator() { return new Iterator<Value>(){ public boolean hasNext() { return count<=capacity; }
public Value next() { v.set(count++); return v; }
public void remove() { throw new UnsupportedOperationException(); } }; } } public static void main(String[] args) { Values vs = new Values(10); Value result = null; for(Value v:vs){ if(result == null){ result = v; } else{ result.set(result.get()+v.get()); } } System.out.println(result.get()); } } a) 20 b)40 c)45 d)55 e)throws NullpointerException
8. If add keyword “final” before a class member function, it means: a) The method can’t access the non-final member variable. b) The method can’t modify the member variable. c) The method can’t be override by subclass. d) The method is a thread-safe function. e) The method can’t be accessed by other non-final function.
9. About java memory and garbage collector, which statement is correct? a) Moving variable from locale to class will make GC more effectively. b) When Full GC is executing, all the user threads will be paused. c) If object A contains a reference of object B and object B contains a reference of object A, the two objects can’t be reclaimed by GC. d) When a thread exits, all objects which created by that thread will be reclaimed e) It is recommended that calling “System.gc()” to control the memory usage.
10. About java classpath and classloader, which statement is NOT correct? a) User can specify the classpath by using the option “-cp” in Java command line. b) If user doesn’t specify classpath, the JVM search the class from the current folder by default. c) A JVM can load two different versions of a library. d) To define customized class loader, it is possible to load class from internet at runtime.
11. Which data structure has best performance when remove an element from it? a) Vector b)ArrayList c)LinkedList d)HashMap e)HashSet
12. Which is the correct way to convert bytes from charset “gb2312” to “utf-8”? byte[] src , dst; a) dst = new String(src,”utf-8”).getBytes(“gb2312”); b) dst = new String(src,”gb2312”).getBytes(“utf-8”); c) dst = new String(src,”utf-16”).getBytes(); d) dst = new String(src).getBytes(); e) None of above.
posted @ 2012-11-07 09:46 小明 閱讀(2546) | 評論 (3) | 編輯 收藏 摘要: 準備工作:1. 下載Snappy庫Download source code from: http://code.google.com/p/snappy編譯并安裝./configure & make & sudo make install2. 編譯leveldb自帶的db_benchmake db_bench注意:在ubuntu 11.04上編譯會出錯,修改makefile:$(CX... 閱讀全文
posted @ 2012-03-22 17:32 小明 閱讀(4174) | 評論 (0) | 編輯 收藏 leveldb讀數據
先看看ReadOptions有哪些參數可以指定: // Options that control read operations struct ReadOptions { // 是否檢查checksum // Default: false bool verify_checksums; // 是否將此次結果放入cache // Default: true bool fill_cache; //是否指定snapshot,否則讀取當前版本 // Default: NULL const Snapshot* snapshot; ReadOptions() : verify_checksums(false), fill_cache(true), snapshot(NULL) { } }; 下面看看讀取的詳細過程: 查詢memtable=>查詢previous memtable(imm_)=>查詢文件(緩沖) Status DBImpl::Get(const ReadOptions& options, const Slice& key, std::string* value) { Status s; MutexLock l(&mutex_); SequenceNumber snapshot; //設置snapshot if (options.snapshot != NULL) { snapshot = reinterpret_cast<const SnapshotImpl*>(options.snapshot)->number_; } else { snapshot = versions_->LastSequence(); } MemTable* mem = mem_; MemTable* imm = imm_; Version* current = versions_->current(); mem->Ref(); if (imm != NULL) imm->Ref(); current->Ref(); bool have_stat_update = false; Version::GetStats stats; // Unlock while reading from files and memtables { mutex_.Unlock(); LookupKey lkey(key, snapshot); //先查詢memtable if (mem->Get(lkey, value, &s)) { // Done } else if (imm != NULL && imm->Get(lkey, value, &s)) { //然后查詢previous memtable:imm_ // Done } else { //從文件中讀取 s = current->Get(options, lkey, value, &stats); have_stat_update = true; } mutex_.Lock(); } //是否有文件需要被compaction,參見allowed_seek if (have_stat_update && current->UpdateStats(stats)) { MaybeScheduleCompaction(); } mem->Unref(); if (imm != NULL) imm->Unref(); current->Unref(); return s; } 重點來看看從version中讀取: Status Version::Get(const ReadOptions& options, const LookupKey& k, std::string* value, GetStats* stats) { Slice ikey = k.internal_key(); Slice user_key = k.user_key(); const Comparator* ucmp = vset_->icmp_.user_comparator(); Status s; stats->seek_file = NULL; stats->seek_file_level = -1; FileMetaData* last_file_read = NULL; int last_file_read_level = -1; //從level0向高層查找,如果再低級level中查到,則不再查詢 std::vector<FileMetaData*> tmp; FileMetaData* tmp2; for (int level = 0; level < config::kNumLevels; level++) { size_t num_files = files_[level].size(); //本層文件數為空,則返回 if (num_files == 0) continue; // Get the list of files to search in this level FileMetaData* const* files = &files_[level][0]; if (level == 0) { //level0特殊處理,因為key是重疊,所有符合條件的文件必須被查找 tmp.reserve(num_files); for (uint32_t i = 0; i < num_files; i++) { FileMetaData* f = files[i]; if (ucmp->Compare(user_key, f->smallest.user_key()) >= 0 && ucmp->Compare(user_key, f->largest.user_key()) <= 0) { tmp.push_back(f); } } if (tmp.empty()) continue; std::sort(tmp.begin(), tmp.end(), NewestFirst); files = &tmp[0]; num_files = tmp.size(); } else { // 二分法查找,某個key只可能屬于一個文件 uint32_t index = FindFile(vset_->icmp_, files_[level], ikey); //沒有查到 if (index >= num_files) { files = NULL; num_files = 0; } else { tmp2 = files[index]; if (ucmp->Compare(user_key, tmp2->smallest.user_key()) < 0) { // All of "tmp2" is past any data for user_key files = NULL; num_files = 0; } else { files = &tmp2; num_files = 1; } } } for (uint32_t i = 0; i < num_files; ++i) { //遍歷本層符合條件的文件 if (last_file_read != NULL && stats->seek_file == NULL) { //seek_file只記錄第一個 stats->seek_file = last_file_read; stats->seek_file_level = last_file_read_level; } FileMetaData* f = files[i]; last_file_read = f; last_file_read_level = level; //從table cache中讀取 Iterator* iter = vset_->table_cache_->NewIterator( options, f->number, f->file_size); iter->Seek(ikey); const bool done = GetValue(ucmp, iter, user_key, value, &s); if (!iter->status().ok()) { //查找到 s = iter->status(); delete iter; return s; } else { delete iter; if (done) { return s; } } } } return Status::NotFound(Slice()); // Use an empty error message for speed } 繼續跟蹤:TableCache Iterator* TableCache::NewIterator(const ReadOptions& options, uint64_t file_number, uint64_t file_size, Table** tableptr) { if (tableptr != NULL) { *tableptr = NULL; } char buf[sizeof(file_number)]; EncodeFixed64(buf, file_number); Slice key(buf, sizeof(buf)); //從LRU cache中查找 Cache::Handle* handle = cache_->Lookup(key); if (handle == NULL) { /加載文件 std::string fname = TableFileName(dbname_, file_number); RandomAccessFile* file = NULL; Table* table = NULL; Status s = env_->NewRandomAccessFile(fname, &file); if (s.ok()) { s = Table::Open(*options_, file, file_size, &table); } if (!s.ok()) { assert(table == NULL); delete file; // We do not cache error results so that if the error is transient, // or somebody repairs the file, we recover automatically. return NewErrorIterator(s); } //插入Cache TableAndFile* tf = new TableAndFile; tf->file = file; tf->table = table; handle = cache_->Insert(key, tf, 1, &DeleteEntry); } Table* table = reinterpret_cast<TableAndFile*>(cache_->Value(handle))->table; //從Table對象中生成iterator Iterator* result = table->NewIterator(options); result->RegisterCleanup(&UnrefEntry, cache_, handle); if (tableptr != NULL) { *tableptr = table; } return result; } posted @ 2012-03-21 17:30 小明 閱讀(2736) | 評論 (0) | 編輯 收藏 總體來說,leveldb的寫操作有兩個步驟,首先是針對log的append操作,然后是對memtable的插入操作。
影響寫性能的因素有: 1. write_buffer_size 2. kL0_SlowdownWritesTrigger and kL0_StopWritesTrigger.提高這兩個值,能夠增加寫的性能,但是降低讀的性能 看看WriteOptions有哪些參數可以指定 struct WriteOptions { //設置sync=true,leveldb會調用fsync(),這會降低插入性能 //同時會增加數據的安全性 //Default: false bool sync; WriteOptions() : sync(false) { } }; 首先把Key,value轉成WriteBatch Status DB::Put(const WriteOptions& opt, const Slice& key, const Slice& value) { WriteBatch batch; batch.Put(key, value); return Write(opt, &batch); } 接下來就是真正的插入了 這里使用了兩把鎖,主要是想提高并發能力,減少上鎖的時間。 首先是檢查是否可寫,然后append log,最后是插入memtable <db/dbimpl.cc> Status DBImpl::Write(const WriteOptions& options, WriteBatch* updates) { Status status; //加鎖 MutexLock l(&mutex_); LoggerId self; //拿到寫log的權利 AcquireLoggingResponsibility(&self); //檢查是否可寫 status = MakeRoomForWrite(false); // May temporarily release lock and wait uint64_t last_sequence = versions_->LastSequence(); if (status.ok()) { WriteBatchInternal::SetSequence(updates, last_sequence + 1); last_sequence += WriteBatchInternal::Count(updates); // Add to log and apply to memtable. We can release the lock during // this phase since the "logger_" flag protects against concurrent // loggers and concurrent writes into mem_. { assert(logger_ == &self); mutex_.Unlock(); //IO操作:寫入LOG status = log_->AddRecord(WriteBatchInternal::Contents(updates)); if (status.ok() && options.sync) { status = logfile_->Sync(); } //插入memtable if (status.ok()) { status = WriteBatchInternal::InsertInto(updates, mem_); } mutex_.Lock(); assert(logger_ == &self); } //設置新的seqence number versions_->SetLastSequence(last_sequence); } //釋放寫LOG鎖 ReleaseLoggingResponsibility(&self); return status; } 寫流量控制: <db/dbimpl.cc> Status DBImpl::MakeRoomForWrite(bool force) { mutex_.AssertHeld(); assert(logger_ != NULL); bool allow_delay = !force; Status s; while (true) { if (!bg_error_.ok()) { // Yield previous error s = bg_error_; break; } else if ( allow_delay && versions_->NumLevelFiles(0) >= config::kL0_SlowdownWritesTrigger) { mutex_.Unlock(); //如果level0的文件大于kL0_SlowdownWritesTrigger閾值,則sleep 1s,這樣給compaction更多的CPU env_->SleepForMicroseconds(1000); allow_delay = false; // Do not delay a single write more than once mutex_.Lock(); } else if (!force && (mem_->ApproximateMemoryUsage() <= options_.write_buffer_size)) { //可寫 break; } else if (imm_ != NULL) { // imm_:之前的memtable 沒有被compaction,需要等待 bg_cv_.Wait(); } else if (versions_->NumLevelFiles(0) >= config::kL0_StopWritesTrigger) { // level0文件個數大于kL0_StopWritesTrigger,需要等待 Log(options_.info_log, "waiting ![]() bg_cv_.Wait(); } else { //生成新的額memtable和logfile,把當前memtable傳給imm_ assert(versions_->PrevLogNumber() == 0); uint64_t new_log_number = versions_->NewFileNumber(); WritableFile* lfile = NULL; s = env_->NewWritableFile(LogFileName(dbname_, new_log_number), &lfile); if (!s.ok()) { break; } delete log_; delete logfile_; logfile_ = lfile; logfile_number_ = new_log_number; log_ = new log::Writer(lfile); imm_ = mem_; has_imm_.Release_Store(imm_); mem_ = new MemTable(internal_comparator_); mem_->Ref(); force = false; // Do not force another compaction if have room // 發起compaction,dump imm_ MaybeScheduleCompaction();} } return s; } posted @ 2012-03-21 14:41 小明 閱讀(3458) | 評論 (0) | 編輯 收藏 摘要: leveldb 是通過Open函數來打開/新建數據庫。Code highlighting produced by Actipro CodeHighlighter (freeware)http://www.CodeHighlighter.com/-->static Status Open(const Options& options, &nb... 閱讀全文
posted @ 2012-03-20 16:02 小明 閱讀(3443) | 評論 (0) | 編輯 收藏 我們知道,leveldb在寫數據的時候,除了log文件,都要在內存中寫一份。
先看看SkipList【跳表】這個數據結構: ![]() SkipList有如下特點: 1. 本質上一個排序好的鏈表 2. 分層,上層節點比下層的少,更具有跳躍性 3. 查詢的復雜度是O(logn) SkipList跟紅黑樹等還是比較容易實現和理解的,主要長處是比較容易實現Lock free和遍歷。 我們來看看leveldb的實現 插入: //插入一個新的key template<typename Key, class Comparator> void SkipList<Key,Comparator>::Insert(const Key& key) { //查找插入節點,prev為各層的前置節點 Node* prev[kMaxHeight]; Node* x = FindGreaterOrEqual(key, prev); // Our data structure does not allow duplicate insertion assert(x == NULL || !Equal(key, x->key)); //生成隨機高度 int height = RandomHeight(); if (height > GetMaxHeight()) { for (int i = GetMaxHeight(); i < height; i++) { prev[i] = head_; } //設置當前最大高度 max_height_.NoBarrier_Store(reinterpret_cast<void*>(height)); } //生成新節點 x = NewNode(key, height); for (int i = 0; i < height; i++) { //設置新節點的各層的下一跳 x->NoBarrier_SetNext(i, prev[i]->NoBarrier_Next(i)); //設置前節點的next為當前節點,完成插入 prev[i]->SetNext(i, x); } } 查詢: template<typename Key, class Comparator> typename SkipList<Key,Comparator>::Node* SkipList<Key,Comparator>::FindGreaterOrEqual(const Key& key, Node** prev) const { Node* x = head_; int level = GetMaxHeight() - 1; //從高層開始查找,依次到0 level while (true) { Node* next = x->Next(level); if (KeyIsAfterNode(key, next)) { //比next key 要大 // Keep searching in this list x = next; } else { //比next key小,查找下一層 //標記當前level的前置節點 if (prev != NULL) prev[level] = x; if (level == 0) { return next; } else { level--; } } } } template<typename Key, class Comparator> bool SkipList<Key,Comparator>::Contains(const Key& key) const { Node* x = FindGreaterOrEqual(key, NULL); if (x != NULL && Equal(key, x->key)) { return true; } else { return false; } } 接著我們看看leveldb MemTable的實現,很簡單了,基本是對SkipList訪問一個封裝 <db/memtable.h> class MemTable { public: explicit MemTable(const InternalKeyComparator& comparator); //增加引用計數 void Ref() { ++refs_; } //減少引用計數 void Unref() { --refs_; assert(refs_ >= 0); if (refs_ <= 0) { delete this; } } //內存使用量 size_t ApproximateMemoryUsage(); //遍歷操作 Iterator* NewIterator(); //插入 void Add(SequenceNumber seq, ValueType type, const Slice& key, const Slice& value); //查詢 bool Get(const LookupKey& key, std::string* value, Status* s); private: ~MemTable(); // Private since only Unref() should be used to delete it //key compartor,用于排序 struct KeyComparator { const InternalKeyComparator comparator; explicit KeyComparator(const InternalKeyComparator& c) : comparator(c) { } int operator()(const char* a, const char* b) const; }; friend class MemTableIterator; friend class MemTableBackwardIterator; typedef SkipList<const char*, KeyComparator> Table; KeyComparator comparator_; int refs_; //引用計數 Arena arena_; //內存分配器 Table table_; //數據存放SkipList // No copying allowed MemTable(const MemTable&); void operator=(const MemTable&); }; 先看看插入 <db/memtable.cc> void MemTable::Add(SequenceNumber s, ValueType type, const Slice& key, const Slice& value) { //數據結構: //1.internal key size : Varint32 (length of 2+3) //2.key data //3.SequenceNumber+Key type: 8 bytes //4 value size: Varint32 //5 value data size_t key_size = key.size(); size_t val_size = value.size(); size_t internal_key_size = key_size + 8; const size_t encoded_len = VarintLength(internal_key_size) + internal_key_size + VarintLength(val_size) + val_size; char* buf = arena_.Allocate(encoded_len); char* p = EncodeVarint32(buf, internal_key_size); memcpy(p, key.data(), key_size); p += key_size; EncodeFixed64(p, (s << 8) | type); p += 8; p = EncodeVarint32(p, val_size); memcpy(p, value.data(), val_size); assert((p + val_size) - buf == encoded_len); table_.Insert(buf); } 查詢 <db/memtable.cc> bool MemTable::Get(const LookupKey& key, std::string* value, Status* s) { Slice memkey = key.memtable_key(); Table::Iterator iter(&table_); iter.Seek(memkey.data()); if (iter.Valid()) { // entry format is: // klength varint32 // userkey char[klength] // tag uint64 // vlength varint32 // value char[vlength] // Check that it belongs to same user key. We do not check the // sequence number since the Seek() call above should have skipped // all entries with overly large sequence numbers. const char* entry = iter.key(); uint32_t key_length; const char* key_ptr = GetVarint32Ptr(entry, entry+5, &key_length); if (comparator_.comparator.user_comparator()->Compare( Slice(key_ptr, key_length - 8), key.user_key()) == 0) { // Correct user key const uint64_t tag = DecodeFixed64(key_ptr + key_length - 8); switch (static_cast<ValueType>(tag & 0xff)) { case kTypeValue: { Slice v = GetLengthPrefixedSlice(key_ptr + key_length); value->assign(v.data(), v.size()); return true; } case kTypeDeletion: *s = Status::NotFound(Slice()); return true; } } } return false; } posted @ 2012-03-19 16:31 小明 閱讀(3003) | 評論 (0) | 編輯 收藏 leveldb 使用 version 來保存數據庫的狀態。
先看看一個重要的數據結果,sst file的META info <db/version_edit.h> struct FileMetaData {
int refs; //引用計數 int allowed_seeks; //允許的seeks次數 uint64_t number;//文件編號 uint64_t file_size; //文件大小 InternalKey smallest; //最小的key InternalKey largest; //最大的key FileMetaData() : refs(0), allowed_seeks(1 << 30), file_size(0) { } }; 這里面有一個很有意思的字段: allowed_seeks,代表了可以seek的次數,為0的時候表示這個文件需要被compaction.如何設置seeks次數呢?文件大小除以16k,不到100算100。 f->allowed_seeks = (f->file_size / 16384);
if (f->allowed_seeks < 100) f->allowed_seeks = 100; 原因,請看leveldb的注釋: // We arrange to automatically compact this file after a certain number of seeks. Let's assume:
// (1) One seek costs 10ms // (2) Writing or reading 1MB costs 10ms (100MB/s) // (3) A compaction of 1MB does 25MB of IO: // 1MB read from this level // 10-12MB read from next level (boundaries may be misaligned) // 10-12MB written to next level // This implies that 25 seeks cost the same as the compaction // of 1MB of data. I.e., one seek costs approximately the // same as the compaction of 40KB of data. We are a little // conservative and allow approximately one seek for every 16KB // of data before triggering a compaction. 接下來看Version的定義,version其實就是一系列的SST file的集合。 class Version { public: //生成iterator用于遍歷 void AddIterators(const ReadOptions&, std::vector<Iterator*>* iters); //根據key來查詢,若沒有查到,更新GetStats struct GetStats { FileMetaData* seek_file; int seek_file_level; }; Status Get(const ReadOptions&, const LookupKey& key, std::string* val, GetStats* stats); //是否需要進行compaction bool UpdateStats(const GetStats& stats); //引用計算,避免在被引用時候刪除 void Ref(); void Unref(); //查詢和key range有關的files void GetOverlappingInputs( int level, const InternalKey* begin, // NULL means before all keys const InternalKey* end, // NULL means after all keys std::vector<FileMetaData*>* inputs); //計算是否level對某個key range是否有overlap bool OverlapInLevel(int level, const Slice* smallest_user_key, const Slice* largest_user_key); //memtable output應該放到哪個level int PickLevelForMemTableOutput(const Slice& smallest_user_key, const Slice& largest_user_key); //某個level的文件個數 int NumFiles(int level) const { return files_[level].size(); } // Return a human readable string that describes this version's contents. std::string DebugString() const; private: friend class Compaction; friend class VersionSet; class LevelFileNumIterator; Iterator* NewConcatenatingIterator(const ReadOptions&, int level) const; VersionSet* vset_; // VersionSet to which this Version belongs Version* next_; // Next version in linked list Version* prev_; // Previous version in linked list int refs_; // Number of live refs to this version //sst files std::vector<FileMetaData*> files_[config::kNumLevels]; //下一個要被compaction的文件 FileMetaData* file_to_compact_; int file_to_compact_level_; //compaction score:>1表示要compaction double compaction_score_; int compaction_level_; explicit Version(VersionSet* vset) : vset_(vset), next_(this), prev_(this), refs_(0), file_to_compact_(NULL), file_to_compact_level_(-1), compaction_score_(-1), compaction_level_(-1) { } ~Version(); // No copying allowed Version(const Version&); void operator=(const Version&); }; 那VersionSet呢?VersionSet 是version組成一個雙向循環鏈表。 class VersionSet{ //. . . Env* const env_; const std::string dbname_; const Options* const options_; TableCache* const table_cache_; const InternalKeyComparator icmp_; uint64_t next_file_number_; uint64_t manifest_file_number_; uint64_t last_sequence_; uint64_t log_number_; WritableFile* descriptor_file_; log::Writer* descriptor_log_; Version dummy_versions_; // Head of circular doubly-linked list of versions. Version* current_; // == dummy_versions_.prev_ //每層都有一個compact pointer用于指示下次從哪里開始compact,以用于實現循環compact std::string compact_pointer_[config::kNumLevels]; //. . . } VersionEdit是version對象的變更記錄,用于寫入manifest.這樣通過原始的version加上一系列的versionedit的記錄,就可以恢復到最新狀態。 class VersionEdit { public: VersionEdit() { Clear(); } ~VersionEdit() { } void Clear(); void SetComparatorName(const Slice& name) { has_comparator_ = true; comparator_ = name.ToString(); } void SetLogNumber(uint64_t num) { has_log_number_ = true; log_number_ = num; } void SetPrevLogNumber(uint64_t num) { has_prev_log_number_ = true; prev_log_number_ = num; } void SetNextFile(uint64_t num) { has_next_file_number_ = true; next_file_number_ = num; } void SetLastSequence(SequenceNumber seq) { has_last_sequence_ = true; last_sequence_ = seq; } void SetCompactPointer(int level, const InternalKey& key) { compact_pointers_.push_back(std::make_pair(level, key)); } //添加meta file void AddFile(int level, uint64_t file, uint64_t file_size, const InternalKey& smallest, const InternalKey& largest) { FileMetaData f; f.number = file; f.file_size = file_size; f.smallest = smallest; f.largest = largest; new_files_.push_back(std::make_pair(level, f)); } //刪除特定的文件 void DeleteFile(int level, uint64_t file) { deleted_files_.insert(std::make_pair(level, file)); } //編碼,解碼:用于寫入manifest void EncodeTo(std::string* dst) const; Status DecodeFrom(const Slice& src); std::string DebugString() const; private: friend class VersionSet; typedef std::set< std::pair<int, uint64_t> > DeletedFileSet; std::string comparator_; uint64_t log_number_; uint64_t prev_log_number_; uint64_t next_file_number_; SequenceNumber last_sequence_; bool has_comparator_; bool has_log_number_; bool has_prev_log_number_; bool has_next_file_number_; bool has_last_sequence_; std::vector< std::pair<int, InternalKey> > compact_pointers_; DeletedFileSet deleted_files_; std::vector< std::pair<int, FileMetaData> > new_files_; }; posted @ 2012-03-16 17:10 小明 閱讀(3963) | 評論 (0) | 編輯 收藏 摘要: leveldb之所以使用level作為數據庫名稱,精華就在于level的設計。本質是一種歸并排序算法。這樣設計的好處主要是可以減少compaction的次數和每次的文件個數。Compaction為什么要compaction? compaction可以提高數據的查詢效率,沒有經過compaction,需要從很多SST file去查找,而做過compaction后,只需要從有限的SST文件去... 閱讀全文
posted @ 2012-03-15 17:28 小明 閱讀(7900) | 評論 (0) | 編輯 收藏 |
|||