如何高效的實(shí)現(xiàn)一個(gè)計(jì)數(shù)器map
這本是多年前一個(gè)stackoverflow上的一個(gè)討論,回答中涉及到了多種計(jì)數(shù)方法。對(duì)于一個(gè)key-value結(jié)構(gòu)的map,我們?cè)诰幊虝r(shí)會(huì)經(jīng)常涉及到key是對(duì)象,而value是一個(gè)integer或long來(lái)負(fù)責(zé)計(jì)數(shù),從而統(tǒng)計(jì)多個(gè)key的頻率。
面對(duì)這樣一個(gè)基本需求,可能有很多種實(shí)現(xiàn)。比如最基本的使用jdk的map直接實(shí)現(xiàn)——value是一個(gè)integer或者long。其基本代碼型如下:
1: final Map<String, Integer> freq = new HashMap<String, Integer>();
2: int count = freq.containsKey(word) ? freq.get(word) : 0;
3: freq.put(word, count + 1);
邏輯簡(jiǎn)單,判斷是否存在,是則get取值,否則為0,再put進(jìn)去一個(gè)加1后的值。總共要contain判斷,get,put做三次方法調(diào)用。
當(dāng)然進(jìn)一步我們可以把contain判斷去掉,代碼如下:
1: final Map<String, Integer> freq = new HashMap<String, Integer>();
2: Integer count = freq.get(word);
3: if (count == null) {
4: freq.put(word, 1);
5: } else {
6: freq.put(word, count + 1);
7: }
一般情況,我們做到這個(gè)地步,多數(shù)人對(duì)其邏輯已經(jīng)滿足,簡(jiǎn)單性能也能接受,試著想一下,難道不是這樣嗎?get加put,解決了。
當(dāng)然這樣的實(shí)現(xiàn)還不夠高效,于是我們開(kāi)始去嘗試實(shí)現(xiàn)或?qū)ふ腋咝У姆椒ǎ纯撮_(kāi)源的集合類庫(kù)是否有需要的:
有個(gè)Trove,可以讓我們參考一下:
1: final TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>();
2: freq.adjustOrPutValue(word, 1, 1);
這樣做,非常優(yōu)雅啊,性能如何呢?不知道,需要看源碼了解細(xì)節(jié)。那再看看大名鼎鼎的guava如何呢?
1: AtomicLongMap<String> map = AtomicLongMap.create();
2: map.getAndIncrement(word);
實(shí)現(xiàn)依然優(yōu)雅,但是,但是看這名字,再看源碼,好吧,線程安全的,支持并發(fā),這不好搞了,我們場(chǎng)景需要嗎?不需要的話直覺(jué)告訴我們這肯定是“慢”的。再找:
1: Multiset<String> bag = HashMultiset.create();
2: bag.add(word);
這個(gè)看上去合適了,bag的實(shí)現(xiàn)明顯好很多,而且從語(yǔ)義理解上,這樣的接口更容易讓人理解。
那么這些方法,性能如何呢?做了個(gè)簡(jiǎn)單的比較,將26個(gè)英文字母做key,均勻循環(huán)若干次比較各個(gè)方法的效率(單純時(shí)間效率),而且時(shí)間不統(tǒng)計(jì)構(gòu)建開(kāi)銷。外加了一個(gè)線程安全版的concurrentMap實(shí)現(xiàn),而其實(shí)google的guava里的AtomicLongMap也是包裝了juc的concurrentMap而已。里面有最終極的MutableInt方法,找找看吧,性能最好的就是它了。
1: /**
2: *
3: */
4:
5:
6: import gnu.trove.map.hash.TObjectIntHashMap;
7:
8: import java.util.HashMap;
9: import java.util.Map;
10: import java.util.concurrent.ConcurrentHashMap;
11: import java.util.concurrent.ConcurrentMap;
12: import java.util.concurrent.atomic.AtomicLong;
13:
14: import com.google.common.collect.HashMultiset;
15: import com.google.common.collect.Multiset;
16: import com.google.common.util.concurrent.AtomicLongMap;
17:
18: /**
19: * @author Administrator
20: *
21: */
22: public class IntMapTest {
23:
24: /**
25: * @param args
26: */
27: public static void main(String[] args) {
28: // TODO Auto-generated method stub
29: int cycles[] = { 100, 1000, 10000, 100000 };
30: Tester baseLine = new BaseLine();
31: Tester testForNull = new UseNullTest();
32: Tester useAtomicLong = new UseAtomicLong();
33: Tester useTrove = new UseTrove();
34: Tester useMutableInt = new UseMutableInt();
35: Tester useGuava = new UseGuava();
36: Tester useGuava2 = new UseGuava2();
37:
38: for (int i = 0; i < cycles.length; i++) {
39: System.out.println("-----With " + cycles[i] + " cycles-----");
40: baseLine.test(cycles[i]);
41: testForNull.test(cycles[i]);
42: useAtomicLong.test(cycles[i]);
43: useTrove.test(cycles[i]);
44: useMutableInt.test(cycles[i]);
45: useGuava.test(cycles[i]);
46: useGuava2.test(cycles[i]);
47: System.out.println("------------------------");
48: }
49:
50: }
51:
52: }
53:
54: abstract class Tester {
55: long ms;
56: static String[] strs = "abcdefghijklmnopqrstuvwxyz".split("");
57:
58: void pre() {
59: System.out.println("====" + this.getName() + "Test Case ");
60: ms = System.currentTimeMillis();
61: System.out.println("start at " + ms);
62: }
63:
64: void post() {
65: ms = System.currentTimeMillis() - ms;
66: System.out.println("Time used: " + ms + " ms");
67: }
68:
69: abstract void doAction(int cycles);
70:
71: public void test(int cycles) {
72: pre();
73: doAction(cycles);
74: post();
75: }
76:
77: abstract String getName();
78: }
79:
80: class BaseLine extends Tester {
81: final Map<String, Integer> freq = new HashMap<String, Integer>();
82:
83: @Override
84: void doAction(int cycles) {
85: for (int i = 0; i < cycles; i++) {
86: for (String word : strs) {
87: int count = freq.containsKey(word) ? freq.get(word) : 0;
88: freq.put(word, count + 1);
89: }
90: }
91: }
92:
93: @Override
94: String getName() {
95: return "BaseLine";
96: }
97:
98: }
99:
100: class UseNullTest extends Tester {
101: final Map<String, Integer> freq = new HashMap<String, Integer>();
102:
103: @Override
104: void doAction(int cycles) {
105: for (int i = 0; i < cycles; i++) {
106: for (String word : strs) {
107: Integer count = freq.get(word);
108: if (count == null) {
109: freq.put(word, 1);
110: } else {
111: freq.put(word, count + 1);
112: }
113: }
114: }
115: }
116:
117: @Override
118: String getName() {
119: return "TestForNull";
120: }
121:
122: }
123:
124: class UseAtomicLong extends Tester {
125: final ConcurrentMap<String, AtomicLong> map = new ConcurrentHashMap<String, AtomicLong>();
126:
127: @Override
128: void doAction(int cycles) {
129: for (int i = 0; i < cycles; i++) {
130: for (String word : strs) {
131: map.putIfAbsent(word, new AtomicLong(0));
132: map.get(word).incrementAndGet();
133: }
134: }
135: }
136:
137: @Override
138: String getName() {
139: return "AtomicLong";
140: }
141:
142: }
143:
144: class UseTrove extends Tester {
145: final TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>();
146:
147: @Override
148: void doAction(int cycles) {
149: for (int i = 0; i < cycles; i++) {
150: for (String word : strs) {
151: freq.adjustOrPutValue(word, 1, 1);
152: }
153: }
154: }
155:
156: @Override
157: String getName() {
158: return "Trove";
159: }
160:
161: }
162:
163: class MutableInt {
164: int value = 1; // note that we start at 1 since we're counting
165:
166: public void increment() {
167: ++value;
168: }
169:
170: public int get() {
171: return value;
172: }
173: }
174:
175: class UseMutableInt extends Tester {
176: Map<String, MutableInt> freq = new HashMap<String, MutableInt>();
177:
178: @Override
179: void doAction(int cycles) {
180: for (int i = 0; i < cycles; i++) {
181: for (String word : strs) {
182: MutableInt count = freq.get(word);
183: if (count == null) {
184: freq.put(word, new MutableInt());
185: } else {
186: count.increment();
187: }
188: }
189: }
190: }
191:
192: @Override
193: String getName() {
194: return "MutableInt";
195: }
196:
197: }
198:
199: class UseGuava extends Tester {
200: AtomicLongMap<String> map = AtomicLongMap.create();
201:
202: @Override
203: void doAction(int cycles) {
204: for (int i = 0; i < cycles; i++) {
205: for (String word : strs) {
206: map.getAndIncrement(word);
207: }
208: }
209: }
210:
211: @Override
212: String getName() {
213: return "Guava AtomicLongMap";
214: }
215:
216: }
217:
218: class UseGuava2 extends Tester {
219: Multiset<String> bag = HashMultiset.create();
220:
221: @Override
222: void doAction(int cycles) {
223: for (int i = 0; i < cycles; i++) {
224: for (String word : strs) {
225: bag.add(word);
226: }
227: }
228: }
229:
230: @Override
231: String getName() {
232: return "Guava HashMultiSet";
233: }
234:
235: }
輸出結(jié)果如下:
1: -----With 100 cycles-----
2: ====BaseLineTest Case
3: start at 1358655702729
4: Time used: 7 ms
5: ====TestForNullTest Case
6: start at 1358655702736
7: Time used: 3 ms
8: ====AtomicLongTest Case
9: start at 1358655702739
10: Time used: 14 ms
11: ====TroveTest Case
12: start at 1358655702753
13: Time used: 2 ms
14: ====MutableIntTest Case
15: start at 1358655702755
16: Time used: 2 ms
17: ====Guava AtomicLongMapTest Case
18: start at 1358655702757
19: Time used: 4 ms
20: ====Guava HashMultiSetTest Case
21: start at 1358655702761
22: Time used: 7 ms
23: ------------------------
24: -----With 1000 cycles-----
25: ====BaseLineTest Case
26: start at 1358655702768
27: Time used: 17 ms
28: ====TestForNullTest Case
29: start at 1358655702785
30: Time used: 7 ms
31: ====AtomicLongTest Case
32: start at 1358655702792
33: Time used: 44 ms
34: ====TroveTest Case
35: start at 1358655702836
36: Time used: 17 ms
37: ====MutableIntTest Case
38: start at 1358655702853
39: Time used: 5 ms
40: ====Guava AtomicLongMapTest Case
41: start at 1358655702858
42: Time used: 9 ms
43: ====Guava HashMultiSetTest Case
44: start at 1358655702868
45: Time used: 50 ms
46: ------------------------
47: -----With 10000 cycles-----
48: ====BaseLineTest Case
49: start at 1358655702918
50: Time used: 16 ms
51: ====TestForNullTest Case
52: start at 1358655702934
53: Time used: 14 ms
54: ====AtomicLongTest Case
55: start at 1358655702948
56: Time used: 29 ms
57: ====TroveTest Case
58: start at 1358655702977
59: Time used: 10 ms
60: ====MutableIntTest Case
61: start at 1358655702988
62: Time used: 5 ms
63: ====Guava AtomicLongMapTest Case
64: start at 1358655702993
65: Time used: 15 ms
66: ====Guava HashMultiSetTest Case
67: start at 1358655703009
68: Time used: 77 ms
69: ------------------------
70: -----With 100000 cycles-----
71: ====BaseLineTest Case
72: start at 1358655703086
73: Time used: 124 ms
74: ====TestForNullTest Case
75: start at 1358655703210
76: Time used: 118 ms
77: ====AtomicLongTest Case
78: start at 1358655703329
79: Time used: 240 ms
80: ====TroveTest Case
81: start at 1358655703569
82: Time used: 102 ms
83: ====MutableIntTest Case
84: start at 1358655703671
85: Time used: 45 ms
86: ====Guava AtomicLongMapTest Case
87: start at 1358655703716
88: Time used: 126 ms
89: ====Guava HashMultiSetTest Case
90: start at 1358655703842
91: Time used: 98 ms
92: ------------------------
一般結(jié)論:?jiǎn)尉€程使用MutableInt,多線程使用guava的AtomicLongMap,其實(shí)可以看看guava對(duì)addAndGet的實(shí)現(xiàn),循環(huán),很有趣。
最后總結(jié)一下,我們?cè)趯?duì)這個(gè)問(wèn)題做優(yōu)化的時(shí)候,明顯的思路就是減少方法調(diào)用,而MutableInt效率最高,明顯的是它將方法調(diào)用減少到最小——1次get,指針的威力頓時(shí)顯現(xiàn)。當(dāng)然實(shí)際業(yè)務(wù)代碼實(shí)現(xiàn)的時(shí)候還要考慮到多個(gè)因素,比如代碼可讀性,與業(yè)務(wù)結(jié)合等等,我們現(xiàn)實(shí)中不一定要追求如此的效率,但是也要避免毫無(wú)思考的寫下baseline里的代碼,因?yàn)槊黠@是可優(yōu)化的,why not?
注:文中單個(gè)實(shí)現(xiàn)代碼來(lái)自stackoverflow的各個(gè)回答,測(cè)試代碼本人編寫。
ref:
本文提到的so的原帖:http://stackoverflow.com/questions/81346/most-efficient-way-to-increment-a-map-value-in-java
trove:http://trove.starlight-systems.com/
guava:http://code.google.com/p/guava-libraries/
posted on 2013-01-20 12:40 changedi 閱讀(4662) 評(píng)論(0) 編輯 收藏 所屬分類: Java技術(shù)