分享代碼系列——parseInt(包含java和c語言的atoi方法)
jdk中的Integer類是int對象的包裝類,正常的Integer占用內存開銷要比int大,比例大概是1:4 。今天分享的代碼是Integer類中的靜態方法parseInt(String, int)。這個方法眾所周知,甚至在我們一開始學習編程時就嘗試的寫過這樣的代碼,一個正常的思路:遍歷輸入的字符數組(java的字符串就是一個字符數組),然后parse每個char,依據參數給定的進制,判斷每個char是否滿足,滿足則繼續,否則拋出異常或中斷,直到處理完畢所有字符,返回結果。
那么我們看看jdk給出的實現:
public static int parseInt(String s, int radix)
throws NumberFormatException
{
if (s == null) {
throw new NumberFormatException("null");
}
if (radix < Character.MIN_RADIX) {
throw new NumberFormatException("radix " + radix +
" less than Character.MIN_RADIX");
}
if (radix > Character.MAX_RADIX) {
throw new NumberFormatException("radix " + radix +
" greater than Character.MAX_RADIX");
}
int result = 0;
boolean negative = false;
int i = 0, max = s.length();
int limit;
int multmin;
int digit;
if (max > 0) {
if (s.charAt(0) == '-') {
negative = true;
limit = Integer.MIN_VALUE;
i++;
} else {
limit = -Integer.MAX_VALUE;
}
multmin = limit / radix;
if (i < max) {
digit = Character.digit(s.charAt(i++),radix);
if (digit < 0) {
throw NumberFormatException.forInputString(s);
} else {
result = -digit;
}
}
while (i < max) {
// Accumulating negatively avoids surprises near MAX_VALUE
digit = Character.digit(s.charAt(i++),radix);
if (digit < 0) {
throw NumberFormatException.forInputString(s);
}
if (result < multmin) {
throw NumberFormatException.forInputString(s);
}
result *= radix;
if (result < limit + digit) {
throw NumberFormatException.forInputString(s);
}
result -= digit;
}
} else {
throw NumberFormatException.forInputString(s);
}
if (negative) {
if (i > 1) {
return result;
} else { /* Only got "-" */
throw NumberFormatException.forInputString(s);
}
} else {
return -result;
}
}
過程就是按照思路來的,但是更全面一些,首先做一些參數檢查,然后定義了局部變量用于計算:result是對應的int結果,negative對應是否是負數的判斷,i是遍歷用的索引指針,max代表字符串的長度,limit是合法數字的上限(下限),digit是當前掃描到的字符對應的數字,multmin是在做乘法計算時能走到的合法下限。
嚴謹是這段程序最大的特點,因為有符號int的上下限是-2147483648~2147483647,可見負數表達的范圍比正數多一個,這樣就好理解為什么在開頭要把limit全部表達為負數(下限),這樣的操作減少了后續的判斷,可以一步到位,相當于二者選擇取其大一樣,大的包含了小的。同理,那么multmin也就是負數了,而且可以認為是只和進制參數radix有關系。接著每個char的掃描計算digit利用到了Character.digit(char,int) 方法,這個方法就是在調用CharacterDataLatin1.digit(codePoint, radix) 方法,而這個新的方法其實只是去靜態數組中取個映射而已。最后當順利的執行完while循環后,result結果也就計算好了。
作為程序設計人員,我最初接觸的語言是C++,當初用到的庫函數是atoi,那么我們看看atoi的庫標準實現:
int atoi(str) const char *str; { _DIAGASSERT(str != NULL); return((int)strtol(str, (char **)NULL, 10)); }
其中調用了strtol方法,參數傳遞的radix是10,也就是說我們常用的atoi是默認轉化字符串到10進制的。其中開始時還進行了一個trim的操作,而且支持16進制的0x開頭,可謂完全的盡善盡美啊。
strtol方法:
#define _FUNCNAME strtol #define __INT long #define __INT_MIN LONG_MIN #define __INT_MAX LONG_MAX
__INT _FUNCNAME(const char *nptr, char **endptr, int base) { const char *s; __INT acc, cutoff; char c; int i, neg, any, cutlim; _DIAGASSERT(nptr != NULL); /* endptr may be NULL */ /* check base value */ if (base && (base < 2 || base > 36)) { errno = EINVAL; return(0); } /* * Skip white space and pick up leading +/- sign if any. * If base is 0, allow 0x for hex and 0 for octal, else * assume decimal; if base is already 16, allow 0x. */ s = nptr; do { c = *s++; } while (isspace(c)); if (c == '-') { neg = 1; c = *s++; } else { neg = 0; if (c == '+') c = *s++; } if ((base == 0 || base == 16) && c == '0' && (*s == 'x' || *s == 'X')) { c = s[1]; s += 2; base = 16; } if (base == 0) base = c == '0' ? 8 : 10; /* * Compute the cutoff value between legal numbers and illegal * numbers. That is the largest legal value, divided by the * base. An input number that is greater than this value, if * followed by a legal input character, is too big. One that * is equal to this value may be valid or not; the limit * between valid and invalid numbers is then based on the last * digit. For instance, if the range for longs is * [-2147483648..2147483647] and the input base is 10, * cutoff will be set to 214748364 and cutlim to either * 7 (neg==0) or 8 (neg==1), meaning that if we have accumulated * a value > 214748364, or equal but the next digit is > 7 (or 8), * the number is too big, and we will return a range error. * * Set any if any `digits' consumed; make it negative to indicate * overflow. */ cutoff = neg ? __INT_MIN : __INT_MAX; cutlim = (int)(cutoff % base); cutoff /= base; if (neg) { if (cutlim > 0) { cutlim -= base; cutoff += 1; } cutlim = -cutlim; } for (acc = 0, any = 0;; c = *s++) { if (!isascii(c)) break; if (isdigit(c)) i = c - '0'; else if (isalpha(c)) i = c - (isupper(c) ? 'A' - 10 : 'a' - 10); else break; if (i >= base) break; if (any < 0) continue; if (neg) { if (acc < cutoff || (acc == cutoff && i > cutlim)) { any = -1; acc = __INT_MIN; errno = ERANGE; } else { any = 1; acc *= base; acc -= i; } } else { if (acc > cutoff || (acc == cutoff && i > cutlim)) { any = -1; acc = __INT_MAX; errno = ERANGE; } else { any = 1; acc *= base; acc += i; } } } if (endptr != 0) /* LINTED interface specification */ *endptr = __DECONST(char *, (any ? s - 1 : nptr)); return(acc); }
當然,類似的代碼還有很多,這里只列出了兩大語言的庫實現,總體思路是一致的,當我們設計api時,這種編程思路和風格以及功能的考慮是我們需要學習的。
下面這兩篇stackoverflow的問答給出了一些比較全面的c風格代碼,可以參考,這里不貼全文只給link:
http://stackoverflow.com/questions/194465/how-to-parse-a-string-to-an-int-in-c
http://stackoverflow.com/questions/4442658/c-parse-int-from-string
參考文獻:
jdk文檔及源碼
c庫函數源碼及文檔
posted on 2012-04-06 12:20 changedi 閱讀(2667) 評論(2) 編輯 收藏 所屬分類: 好代碼分享