Change Dir

先知cd——熱愛生活是一切藝術的開始

公告

寫下來的都是資源，分享給互聯網~~均屬原創隨筆。
轉載引用請注明作者changedi。
喜歡應用研究，熱愛編程，歡迎交流。

隨筆分類(125)

隨筆檔案(123)

統計

隨筆 - 222
文章 - 0
評論 - 182
引用 - 0

留言簿(18)

積分與排名

積分 - 421894
排名 - 132

閱讀排行榜

評論排行榜

分享代碼系列——HashedArrayTree

HashedArrayTree是一種可變動態數組結構，不像通用動態數組一樣是線性結構。HashedArrayTree（以下簡稱HAT）內部維護了一個數組列表（或者說一個二維數組更好理解）

（圖來源wikipedia）

首先會有m個數組（我簡稱其為根數組），表示為上圖的p0到pm，然后每個數組里有m個元素，這m個元素的數組稱之為葉子數組。HAT可以有O(1)的隨機訪問性能——因為是基于數組的，有O(1)的append以及remove last性能。乍一看，god damn，這和線性數組有區別嗎？是的，沒有。HAT的優勢體現在數組大小不足時進行擴容以及實際內容太少產生稀疏時的自動緊縮能力上。HAT的擴容不像基于array的線性list（做2倍的copy），HAT會double一下m（上面說到的根數組大小），同時也會double每個葉子數組，但是這種double不是直接new的，而是根據現有情況new出來，然后copy原有元素到new出來的array里，新擴容的根數組元素對應的葉子array將是null，也就是說暫不分配內存空間，等到執行真正的add操作時，才會判斷如果是null就進行一次new（分配）。這樣延遲了內存分配，使得真正的空間消耗在性能上有提升，線性動態數組的擴容策略的空間性能是Ω(n)，而HAT可以做到O(n^1/2)。同樣，在數組元素經過多次remove慢慢變少時，HAT會進行縮緊操作，一般是當實際存儲容量達到總容量的1/8時，進行一次縮緊，根數組和葉子數組都half。

下面我簡單畫一個內存分配的對比，來說明HAT的行為。默認數組原有4個元素，且總容量都是4，需要擴容。其擴容策略具體對比：

線性動態數組擴容：

HAT擴容：

源代碼如下，同樣繼承了jdk中的AbstractList<T>，其中有一些計算二維數組行列的小技巧可以學習一下computeOffset和computeIndex。其實這個數據結構最讓人不解的地方可能是它的名字，可能是我的理解還不夠深度和通透，我并不能認可這個結構和hash以及tree有多么強的關系，tree可能還好點，hash實在有點費解。不多說廢話，上代碼：

   1: /***************************************************************************   2:  * File: HashedArrayTree.java   3:  * Author: Keith Schwarz (htiek@cs.stanford.edu)   4:  *   5:  * An implementation of the List abstraction backed by a hashed array tree   6:  * (HAT), a data structure supporting amortized O(1) lookup, append, and   7:  * last-element removal.  In this sense it is akin to a standard dynamic   8:  * array implementation.  However, a hashed array tree also has the advantage   9:  * that its memory overhead is only O(sqrt(n)) rather than the typical O(n)  10:  * found in dynamic arrays and their variants.  11:  *  12:  * Internally, the hashed array tree is implemented as an array of pointers  13:  * that optionally point to an array of elements.  The topmost array and each  14:  * element always have the same size, which is always a power of two.  In  15:  * this sense, the hashed array tree is essentially a two-dimensional array  16:  * of elements.  However, the advantage of the hashed array tree is that the  17:  * topmost array pointers are all initially null and only filled in when space  18:  * is needed.  This means that the maximum overhead of the structure is the  19:  * size of the topmost array, plus the number of unused elements in the current  20:  * block.  Here is one sample HAT:  21:  *  22:  *           [ ] [ ] [ ] [ ]  23:  *            |   |   |  24:  *            v   v   v  25:  *           [0] [4] [8]  26:  *           [1] [5] [9]  27:  *           [2] [6] [ ]  28:  *           [3] [7] [ ]  29:  *  30:  * Here, the topmost array of pointers has three pointers in use, each of which  31:  * point to an array of the corresponding number of elements.  32:  *  33:  * Whenever an element is added to a hashed array tree, one of three cases  34:  * must hold:  35:  *  36:  * 1. There is extra space at the end of the final subarray (for example, in  37:  *    the top picture).  In that case, the element is added to that position.  38:  * 2. The final subarray is full, but space for another subarray exists in the  39:  *    topmost array.  In that case, a new array is allocated and the element  40:  *    is added to that array.  41:  * 3. The final subarray is full and no new arrays remain open.  In that case,  42:  *    since the topmost array has size 2^n and each array has size 2^n, there  43:  *    must be a total of 2^(2n) elements in the hashed array tree.  We  44:  *    next double the size of the topmost array to 2^(n + 1), then allocate  45:  *    2^(n - 1) subarrays of size 2^(n + 1) elements for a total of 2^(2n)  46:  *    elements' worth of space.  The elements from the old HAT are then copied  47:  *    over, a new array is allocated for the new element, and it is added as  48:  *    the first element of that array.  49:  *  50:  * Let's now talk about the performance and memory usage of this structure.  51:  * First, we note that we can perform lookups in O(1), assuming that the  52:  * machine is transdichotomous (meaning that a single machine word can hold  53:  * the size of any array).  This can be done by breaking the input index in  54:  * half, then using the first half to choose which array to look into (the  55:  * "hashed" part of HAT) and the second half to choose which index to select.  56:  * This trick is similar to the trick used to represent a two-dimensional  57:  * array using a linear structure.  58:  *  59:  * Next, let's think about how much time it takes to do an append operation.  60:  * Each append when space remains takes O(1) to look up the proper position  61:  * in the hashed array tree for writing, but appends can be much more expensive  62:  * when the HAT needs to be resized.  Fortunately, this does not happen very  63:  * often.  Whenever the HAT doubles in size, its capacity grows from 2^(2n) to  64:  * 2^(2n + 2), meaning that four times as many elements must be inserted before  65:  * the next copy operation.  If we define a potential function as twice the   66:  * number of filled-in elements in the HAT above half capacity (i.e. the  67:  * number of elements in the arrays in the second half), then we can prove an  68:  * amortized O(1) time for append.  Consider any series of appends.  If the  69:  * append does not expand the HAT, then it takes O(1) time and increases the  70:  * potential by 1/2.  If the append does expand the HAT, and the HAT's topmost  71:  * array has size 2^n, then there must be 2^(n-1) elements in the latter half  72:  * of the HAT, so the potential is 2^(2n).  The time required to move each  73:  * of the 2^(2n) elements is 2^(2n), and in the new HAT there are no elements  74:  * in the latter half of the array.  Consequently, the new potential is zero.  75:  * The actual time required to perform the append is thus 2^2n + O(1), and  76:  * the decrease in potential is -2^2n, so the amortized cost is O(1) as  77:  * expected.  78:  *  79:  * Last, let's talk about the cost to do a remove.  This is similar to the   80:  * append case - we delete the last element of the last array, removing the  81:  * array from the topmost array if it becomes empty.  We also compact the  82:  * HAT if it becomes too sparse by shrinking from a HAT of size 2^(2n + 2) to  83:  * a HAT of size 2^(2n) if the HAT becomes one-eighth full.  A similar  84:  * potential method can be used to show that this operation runs in amortized  85:  * O(1).  86:  *  87:  * Finally, let's consider the memory overhead of the HAT.  For any HAT of  88:  * topmost array size 8 or more, since the HAT is always at least one-eighth  89:  * full, there must be at least one full array.  This array has size equal to  90:  * the size of the topmost array (call this k), and so if every array were to  91:  * be filled in to capacity, there would be a total of k arrays of size k,  92:  * for a total of k^2 elements.  Of this capacity, we know that at least an  93:  * eighth of them are filled in, so k^2 must be at most 8n, and so   94:  * k = O(sqrt(n)).  To finish the analysis, the overhead of the structure is  95:  * at most the overhead of this top-level array, plus potentially k - 1  96:  * unused elements in some array.  This is a total of O(k) = O(sqrt(n))  97:  * overhead, which is what we originally desired.  98:  */  99: import java.util.*; // For AbstractList 100:   101: @SuppressWarnings("unchecked") 102: public final class HashedArrayTree<T> extends AbstractList<T> { 103:     /* To simplify the implementation, we enforce that the size of the topmost 104:      * array never drops below 2.  This prevents weirdness when we try to 105:      * allocate 2^(n-1) arrays during a doubling and find that n = 0. 106:      */ 107:     private static final int kMinArraySize = 2; 108:   109:     /* The topmost array of elements; initially of size two. */ 110:     private T[][] mArrays = (T[][]) new Object[kMinArraySize][]; 111:   112:     /* Number of elements, initially zero since the HAT is created empty. */ 113:     private int mSize = 0; 114:   115:     /* A constant containing lg2 of the topmost array size.  This enables some 116:      * cute bit-twiddling tricks to improve efficiency. 117:      */ 118:     private int mLgSize = 1; 119:   120:     /** 121:      * Returns the number of elements in the HashedArrayTree. 122:      * 123:      * @return The number of elements in the HashedArrayTree. 124:      */ 125:     @Override  126:     public int size() { 127:         return mSize; 128:     } 129:   130:     /** 131:      * Adds a new element to the HashedArrayTree. 132:      * 133:      * @param elem The element to add. 134:      * @return true 135:      */ 136:     @Override  137:     public boolean add(T elem) { 138:         /* First, check if we're completely out of space.  If so, do a resize 139:          * to ensure we do indeed have room. 140:          */ 141:         if (size() == mArrays.length * mArrays.length) 142:             grow(); 143:   144:         /* Compute the (arr, index) pair for the next position.  The next 145:          * position is at the location indicated by size(), but we know that 146:          * space exists from the previous call. 147:          */ 148:         final int offset = computeOffset(size()); 149:         final int index  = computeIndex(size()); 150:   151:         /* Check if an array exists here.  If not, make one up. */ 152:         if (mArrays[offset] == null) 153:             mArrays[offset] = (T[]) new Object[mArrays.length]; 154:   155:         /* Write the element to its location. */ 156:         mArrays[offset][index] = elem; 157:   158:         /* Update the element count. */ 159:         ++mSize; 160:   161:         /* Per the Collections contract, return true to signal a successful 162:          * add. 163:          */ 164:         return true; 165:     } 166:   167:     /** 168:      * Sets the element at the specified position to the indicated value. 169:      * If the index is out of bounds, throws an IndexOutOfBounds exception. 170:      * 171:      * @param index The index at which to set the value. 172:      * @param elem The element to store at that position. 173:      * @return The value initially at that location. 174:      * @throws IndexOutOfBoundsException If index is invalid. 175:      */ 176:     @Override  177:     public T set(int index, T elem) { 178:         /* Find out where to look. */ 179:         final int offset   = computeOffset(index); 180:         final int arrIndex = computeIndex(index); 181:   182:         /* Cache the value there and write the new one. */ 183:         T result = mArrays[offset][arrIndex]; 184:         mArrays[offset][arrIndex] = elem; 185:   186:         /* Hand back the old value. */ 187:         return result; 188:     } 189:   190:     /** 191:      * Returns the value of the element at the specified position. 192:      * 193:      * @param index The index at which to query. 194:      * @return The value of the element at that position. 195:      * @throws IndexOutOfBoundsException If the index is invalid. 196:      */ 197:     @Override  198:     public T get(int index) { 199:         /* Check that this is a valid index. */ 200:         if (index < 0 || index >= size()) 201:             throw new IndexOutOfBoundsException("Index " + index + ", size " + size()); 202:   203:         /* Look up the element. */ 204:         return mArrays[computeOffset(index)][computeIndex(index)]; 205:     } 206:   207:     /** 208:      * Adds the specified element at the position just before the specified 209:      * index. 210:      * 211:      * @param index The index just before which to insert. 212:      * @param elem The value to insert 213:      * @throws IndexOutOfBoundsException if the index is invalid. 214:      */ 215:     @Override  216:     public void add(int index, T elem) { 217:         /* Confirm the validity of the index. */ 218:         if (index < 0 || index >= size()) 219:             throw new IndexOutOfBoundsException("Index " + index + ", size " + size()); 220:          221:         /* Add a dummy element to ensure that everything resizes correctly. 222:          * There's no reason to repeat the logic. 223:          */ 224:         add(null); 225:   226:         /* Next, we need to shuffle down every element that appears after 227:          * the inserted element.  We'll do this using our own public interface. 228:          */ 229:         for (int i = size(); i > index; ++i) 230:             set(i, get(i - 1)); 231:   232:         /* Finally, write the element. */ 233:         set(index, elem); 234:     } 235:   236:     /** 237:      * Removes the element at the specified position from the HashedArrayTree. 238:      * 239:      * @param index The index of the element to remove. 240:      * @return The value of the element at that position. 241:      * @throws IndexOutOfBoundsException If the index is invalid. 242:      */ 243:     @Override  244:     public T remove(int index) { 245:         /* Cache the value at the indicated position; this also does the bounds 246:          * check. 247:          */ 248:         T result = get(index); 249:   250:         /* Use a naive shuffle-down algorithm to reposition elements after 251:          * the removed one. 252:          */ 253:         for (int i = index + 1; i < size(); ++i) 254:             set(i - 1, get(i)); 255:   256:         /* Clobber the last element to play nicely with the garbage collector. */ 257:         set(size() - 1, null); 258:   259:         /* Decrement our size. */ 260:         --mSize; 261:   262:         /* If we are now at 1/8 total capacity, shrink the structure. */ 263:         if (size() * 8 <= mArrays.length * mArrays.length) 264:             shrink(); 265:         /* Otherwise, if the size is now an even multiple of the array size, 266:          * we can drop the very last array.  This is the array whose offset 267:          * is one after the end of the elements. 268:          */ 269:         else if (size() % mArrays.length == 0) 270:             mArrays[computeOffset(size())] = null; 271:   272:         return result; 273:     } 274:   275:     /** 276:      * Given an index, returns the offset into the master array at which the 277:      * element with that index can be found. 278:      * 279:      * @return The index into the topmost array where the given element can 280:      *         be found. 281:      */ 282:     private int computeOffset(int index) { 283:         /* This can be computed by dividing the index by the index by the 284:          * topmost array.  However, if we want to be very clever, we can do 285:          * this efficiently by bit-shifting downard by the lg2 of the size 286:          * of the topmost array. 287:          */ 288:         return index >> mLgSize; 289:     } 290:   291:     /** 292:      * Given an index, returns the offset into the appropriate subarray in 293:      * which the element with that index can be found. 294:      * 295:      * @return The index into the subarray array where the given element can 296:      *         be found. 297:      */ 298:     private int computeIndex(int index) { 299:         /* This can be computed by modding the index by the index by the 300:          * topmost array.  But we can do this more efficiently with a different 301:          * tactic.  Since the array size is a perfect power of two, it must 302:          * look like this: 303:          * 304:          * 00..010..0 305:          * 306:          * Subtracting one yields 307:          * 308:          * 00..001..1 309:          * 310:          * ANDing this with the index produces the value we're looking for. 311:          */ 312:         return index & (mArrays.length - 1); 313:     } 314:   315:     /** 316:      * Grows the internal representation by doubling the size of the topmost 317:      * array and copying the appropriate number of elements over. 318:      */ 319:     private void grow() { 320:         /* Double the size of the topmost array. */ 321:         T[][] newArrays = (T[][]) new Object[mArrays.length * 2][]; 322:   323:         /* The new arrays each have size 2^(n + 1).  We need 2^(n - 1) of them 324:          * to hold the old elements.  Allocate those here and copy everything 325:          * over. 326:          */ 327:         for (int i = 0; i < mArrays.length; i += 2) { 328:             /* Allocate the array. */ 329:             newArrays[i / 2] = (T[]) new Object[newArrays.length]; 330:   331:             /* Use System.arraycopy to move everything over. */ 332:             System.arraycopy(mArrays[i], 0, newArrays[i / 2], 0, mArrays.length); 333:             System.arraycopy(mArrays[i + 1], 0, newArrays[i / 2], mArrays.length, mArrays.length); 334:   335:             /* Null out the old arrays to be nice to the GC during this  336:              * potentially stressful time. 337:              */ 338:             mArrays[i] = mArrays[i + 1] = null; 339:         } 340:   341:         /* Switch out this new array for the old. */ 342:         mArrays = newArrays; 343:   344:         /* Bump up lg2 of the size. */ 345:         ++mLgSize; 346:     } 347:   348:     /** 349:      * Decreases the size of the HAT by shrinking into a better fit. 350:      */ 351:     private void shrink() { 352:         /* If the size of the topmost array is at its minimum, don't do 353:          * anything.  This doesn't change the asymptotic memory usage because 354:          * we only do this for small arrays. 355:          */ 356:         if (mArrays.length == kMinArraySize) return; 357:   358:         /* Otherwise, we currently have 2^(2n) / 8 = 2^(2n - 3) elements. 359:          * We're about to shrink into a grid of 2^(2n - 2) elements, and so 360:          * we'll fill in half of the elements. 361:          */ 362:         T[][] newArrays = (T[][]) new Object[mArrays.length / 2][]; 363:   364:         /* Copy everything over.  We'll need half as many arrays as before. */ 365:         for (int i = 0; i < newArrays.length / 2; ++i) { 366:             /* Create the arrays. */ 367:             newArrays[i] = (T[]) new Object[newArrays.length]; 368:   369:             /* Move everything into it.  If this is an odd array, it comes  370:              * from the upper half of the old array; otherwise it comes from 371:              * the lower half. 372:              */ 373:             System.arraycopy(mArrays[i / 2], (i % 2 == 0)? 0 : newArrays.length, 374:                              newArrays[i], 0, newArrays.length); 375:   376:             /* Play nice with the GC.  If this is an odd-numbered array, we 377:              * just copied over everything we needed and can clear out the 378:              * old array. 379:              */ 380:             if (i % 2 == 1) 381:                 mArrays[i / 2] = null; 382:         } 383:   384:         /* Copy the arrays over. */ 385:         mArrays = newArrays; 386:   387:         /* Drop the lg2 of the size. */ 388:         --mLgSize; 389:     } 390: }

參考資料：

http://www.keithschwarz.com/interesting/code/?dir=hashed-array-tree

http://en.wikipedia.org/wiki/Hashed_array_tree

posted on 2012-06-08 14:40 changedi 閱讀(1655) 評論(0) 編輯收藏所屬分類: 好代碼分享

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關文章: 分享代碼系列——HashedArrayTree 分享一個LRUMap的實現——來自apache common-collections框架分享代碼系列——vlist 分享代碼系列——parseInt(包含java和c語言的atoi方法）分享一個ThreadMonitor

Change Dir

導航

公告

隨筆分類(125)

隨筆檔案(123)

統計

留言簿(18)

積分與排名

“牛”們的博客

各個公司技術

我的鏈接

淘寶技術

閱讀排行榜

評論排行榜

分享代碼系列——HashedArrayTree