如何高效的實現一個計數器map
這本是多年前一個stackoverflow上的一個討論,回答中涉及到了多種計數方法。對于一個key-value結構的map,我們在編程時會經常涉及到key是對象,而value是一個integer或long來負責計數,從而統計多個key的頻率。
面對這樣一個基本需求,可能有很多種實現。比如最基本的使用jdk的map直接實現——value是一個integer或者long。其基本代碼型如下:
1: final Map<String, Integer> freq = new HashMap<String, Integer>();
2: int count = freq.containsKey(word) ? freq.get(word) : 0;
3: freq.put(word, count + 1);
邏輯簡單,判斷是否存在,是則get取值,否則為0,再put進去一個加1后的值。總共要contain判斷,get,put做三次方法調用。
當然進一步我們可以把contain判斷去掉,代碼如下:
1: final Map<String, Integer> freq = new HashMap<String, Integer>();
2: Integer count = freq.get(word);
3: if (count == null) {
4: freq.put(word, 1);
5: } else {
6: freq.put(word, count + 1);
7: }
一般情況,我們做到這個地步,多數人對其邏輯已經滿足,簡單性能也能接受,試著想一下,難道不是這樣嗎?get加put,解決了。
當然這樣的實現還不夠高效,于是我們開始去嘗試實現或尋找更高效的方法,看看開源的集合類庫是否有需要的:
有個Trove,可以讓我們參考一下:
1: final TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>();
2: freq.adjustOrPutValue(word, 1, 1);
這樣做,非常優雅啊,性能如何呢?不知道,需要看源碼了解細節。那再看看大名鼎鼎的guava如何呢?
1: AtomicLongMap<String> map = AtomicLongMap.create();
2: map.getAndIncrement(word);
實現依然優雅,但是,但是看這名字,再看源碼,好吧,線程安全的,支持并發,這不好搞了,我們場景需要嗎?不需要的話直覺告訴我們這肯定是“慢”的。再找:
1: Multiset<String> bag = HashMultiset.create();
2: bag.add(word);
這個看上去合適了,bag的實現明顯好很多,而且從語義理解上,這樣的接口更容易讓人理解。
那么這些方法,性能如何呢?做了個簡單的比較,將26個英文字母做key,均勻循環若干次比較各個方法的效率(單純時間效率),而且時間不統計構建開銷。外加了一個線程安全版的concurrentMap實現,而其實google的guava里的AtomicLongMap也是包裝了juc的concurrentMap而已。里面有最終極的MutableInt方法,找找看吧,性能最好的就是它了。
1: /**
2: *
3: */
4:
5:
6: import gnu.trove.map.hash.TObjectIntHashMap;
7:
8: import java.util.HashMap;
9: import java.util.Map;
10: import java.util.concurrent.ConcurrentHashMap;
11: import java.util.concurrent.ConcurrentMap;
12: import java.util.concurrent.atomic.AtomicLong;
13:
14: import com.google.common.collect.HashMultiset;
15: import com.google.common.collect.Multiset;
16: import com.google.common.util.concurrent.AtomicLongMap;
17:
18: /**
19: * @author Administrator
20: *
21: */
22: public class IntMapTest {
23:
24: /**
25: * @param args
26: */
27: public static void main(String[] args) {
28: // TODO Auto-generated method stub
29: int cycles[] = { 100, 1000, 10000, 100000 };
30: Tester baseLine = new BaseLine();
31: Tester testForNull = new UseNullTest();
32: Tester useAtomicLong = new UseAtomicLong();
33: Tester useTrove = new UseTrove();
34: Tester useMutableInt = new UseMutableInt();
35: Tester useGuava = new UseGuava();
36: Tester useGuava2 = new UseGuava2();
37:
38: for (int i = 0; i < cycles.length; i++) {
39: System.out.println("-----With " + cycles[i] + " cycles-----");
40: baseLine.test(cycles[i]);
41: testForNull.test(cycles[i]);
42: useAtomicLong.test(cycles[i]);
43: useTrove.test(cycles[i]);
44: useMutableInt.test(cycles[i]);
45: useGuava.test(cycles[i]);
46: useGuava2.test(cycles[i]);
47: System.out.println("------------------------");
48: }
49:
50: }
51:
52: }
53:
54: abstract class Tester {
55: long ms;
56: static String[] strs = "abcdefghijklmnopqrstuvwxyz".split("");
57:
58: void pre() {
59: System.out.println("====" + this.getName() + "Test Case ");
60: ms = System.currentTimeMillis();
61: System.out.println("start at " + ms);
62: }
63:
64: void post() {
65: ms = System.currentTimeMillis() - ms;
66: System.out.println("Time used: " + ms + " ms");
67: }
68:
69: abstract void doAction(int cycles);
70:
71: public void test(int cycles) {
72: pre();
73: doAction(cycles);
74: post();
75: }
76:
77: abstract String getName();
78: }
79:
80: class BaseLine extends Tester {
81: final Map<String, Integer> freq = new HashMap<String, Integer>();
82:
83: @Override
84: void doAction(int cycles) {
85: for (int i = 0; i < cycles; i++) {
86: for (String word : strs) {
87: int count = freq.containsKey(word) ? freq.get(word) : 0;
88: freq.put(word, count + 1);
89: }
90: }
91: }
92:
93: @Override
94: String getName() {
95: return "BaseLine";
96: }
97:
98: }
99:
100: class UseNullTest extends Tester {
101: final Map<String, Integer> freq = new HashMap<String, Integer>();
102:
103: @Override
104: void doAction(int cycles) {
105: for (int i = 0; i < cycles; i++) {
106: for (String word : strs) {
107: Integer count = freq.get(word);
108: if (count == null) {
109: freq.put(word, 1);
110: } else {
111: freq.put(word, count + 1);
112: }
113: }
114: }
115: }
116:
117: @Override
118: String getName() {
119: return "TestForNull";
120: }
121:
122: }
123:
124: class UseAtomicLong extends Tester {
125: final ConcurrentMap<String, AtomicLong> map = new ConcurrentHashMap<String, AtomicLong>();
126:
127: @Override
128: void doAction(int cycles) {
129: for (int i = 0; i < cycles; i++) {
130: for (String word : strs) {
131: map.putIfAbsent(word, new AtomicLong(0));
132: map.get(word).incrementAndGet();
133: }
134: }
135: }
136:
137: @Override
138: String getName() {
139: return "AtomicLong";
140: }
141:
142: }
143:
144: class UseTrove extends Tester {
145: final TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>();
146:
147: @Override
148: void doAction(int cycles) {
149: for (int i = 0; i < cycles; i++) {
150: for (String word : strs) {
151: freq.adjustOrPutValue(word, 1, 1);
152: }
153: }
154: }
155:
156: @Override
157: String getName() {
158: return "Trove";
159: }
160:
161: }
162:
163: class MutableInt {
164: int value = 1; // note that we start at 1 since we're counting
165:
166: public void increment() {
167: ++value;
168: }
169:
170: public int get() {
171: return value;
172: }
173: }
174:
175: class UseMutableInt extends Tester {
176: Map<String, MutableInt> freq = new HashMap<String, MutableInt>();
177:
178: @Override
179: void doAction(int cycles) {
180: for (int i = 0; i < cycles; i++) {
181: for (String word : strs) {
182: MutableInt count = freq.get(word);
183: if (count == null) {
184: freq.put(word, new MutableInt());
185: } else {
186: count.increment();
187: }
188: }
189: }
190: }
191:
192: @Override
193: String getName() {
194: return "MutableInt";
195: }
196:
197: }
198:
199: class UseGuava extends Tester {
200: AtomicLongMap<String> map = AtomicLongMap.create();
201:
202: @Override
203: void doAction(int cycles) {
204: for (int i = 0; i < cycles; i++) {
205: for (String word : strs) {
206: map.getAndIncrement(word);
207: }
208: }
209: }
210:
211: @Override
212: String getName() {
213: return "Guava AtomicLongMap";
214: }
215:
216: }
217:
218: class UseGuava2 extends Tester {
219: Multiset<String> bag = HashMultiset.create();
220:
221: @Override
222: void doAction(int cycles) {
223: for (int i = 0; i < cycles; i++) {
224: for (String word : strs) {
225: bag.add(word);
226: }
227: }
228: }
229:
230: @Override
231: String getName() {
232: return "Guava HashMultiSet";
233: }
234:
235: }
輸出結果如下:
1: -----With 100 cycles-----
2: ====BaseLineTest Case
3: start at 1358655702729
4: Time used: 7 ms
5: ====TestForNullTest Case
6: start at 1358655702736
7: Time used: 3 ms
8: ====AtomicLongTest Case
9: start at 1358655702739
10: Time used: 14 ms
11: ====TroveTest Case
12: start at 1358655702753
13: Time used: 2 ms
14: ====MutableIntTest Case
15: start at 1358655702755
16: Time used: 2 ms
17: ====Guava AtomicLongMapTest Case
18: start at 1358655702757
19: Time used: 4 ms
20: ====Guava HashMultiSetTest Case
21: start at 1358655702761
22: Time used: 7 ms
23: ------------------------
24: -----With 1000 cycles-----
25: ====BaseLineTest Case
26: start at 1358655702768
27: Time used: 17 ms
28: ====TestForNullTest Case
29: start at 1358655702785
30: Time used: 7 ms
31: ====AtomicLongTest Case
32: start at 1358655702792
33: Time used: 44 ms
34: ====TroveTest Case
35: start at 1358655702836
36: Time used: 17 ms
37: ====MutableIntTest Case
38: start at 1358655702853
39: Time used: 5 ms
40: ====Guava AtomicLongMapTest Case
41: start at 1358655702858
42: Time used: 9 ms
43: ====Guava HashMultiSetTest Case
44: start at 1358655702868
45: Time used: 50 ms
46: ------------------------
47: -----With 10000 cycles-----
48: ====BaseLineTest Case
49: start at 1358655702918
50: Time used: 16 ms
51: ====TestForNullTest Case
52: start at 1358655702934
53: Time used: 14 ms
54: ====AtomicLongTest Case
55: start at 1358655702948
56: Time used: 29 ms
57: ====TroveTest Case
58: start at 1358655702977
59: Time used: 10 ms
60: ====MutableIntTest Case
61: start at 1358655702988
62: Time used: 5 ms
63: ====Guava AtomicLongMapTest Case
64: start at 1358655702993
65: Time used: 15 ms
66: ====Guava HashMultiSetTest Case
67: start at 1358655703009
68: Time used: 77 ms
69: ------------------------
70: -----With 100000 cycles-----
71: ====BaseLineTest Case
72: start at 1358655703086
73: Time used: 124 ms
74: ====TestForNullTest Case
75: start at 1358655703210
76: Time used: 118 ms
77: ====AtomicLongTest Case
78: start at 1358655703329
79: Time used: 240 ms
80: ====TroveTest Case
81: start at 1358655703569
82: Time used: 102 ms
83: ====MutableIntTest Case
84: start at 1358655703671
85: Time used: 45 ms
86: ====Guava AtomicLongMapTest Case
87: start at 1358655703716
88: Time used: 126 ms
89: ====Guava HashMultiSetTest Case
90: start at 1358655703842
91: Time used: 98 ms
92: ------------------------
一般結論:單線程使用MutableInt,多線程使用guava的AtomicLongMap,其實可以看看guava對addAndGet的實現,循環,很有趣。
最后總結一下,我們在對這個問題做優化的時候,明顯的思路就是減少方法調用,而MutableInt效率最高,明顯的是它將方法調用減少到最小——1次get,指針的威力頓時顯現。當然實際業務代碼實現的時候還要考慮到多個因素,比如代碼可讀性,與業務結合等等,我們現實中不一定要追求如此的效率,但是也要避免毫無思考的寫下baseline里的代碼,因為明顯是可優化的,why not?
注:文中單個實現代碼來自stackoverflow的各個回答,測試代碼本人編寫。
ref:
本文提到的so的原帖:http://stackoverflow.com/questions/81346/most-efficient-way-to-increment-a-map-value-in-java
trove:http://trove.starlight-systems.com/
guava:http://code.google.com/p/guava-libraries/
posted on 2013-01-20 12:40 changedi 閱讀(4662) 評論(0) 編輯 收藏 所屬分類: Java技術