Change Dir

先知cd——熱愛生活是一切藝術的開始

公告

寫下來的都是資源，分享給互聯網~~均屬原創隨筆。
轉載引用請注明作者changedi。
喜歡應用研究，熱愛編程，歡迎交流。

隨筆分類(125)

隨筆檔案(123)

統計

隨筆 - 222
文章 - 0
評論 - 182
引用 - 0

留言簿(18)

積分與排名

積分 - 421895
排名 - 132

閱讀排行榜

評論排行榜

分享代碼系列——parseInt(包含java和c語言的atoi方法）

jdk中的Integer類是int對象的包裝類，正常的Integer占用內存開銷要比int大，比例大概是1:4 。今天分享的代碼是Integer類中的靜態方法parseInt(String, int)。這個方法眾所周知，甚至在我們一開始學習編程時就嘗試的寫過這樣的代碼，一個正常的思路：遍歷輸入的字符數組（java的字符串就是一個字符數組），然后parse每個char，依據參數給定的進制，判斷每個char是否滿足，滿足則繼續，否則拋出異常或中斷，直到處理完畢所有字符，返回結果。

那么我們看看jdk給出的實現：

    public static int parseInt(String s, int radix)
        throws NumberFormatException
    {
        if (s == null) {
            throw new NumberFormatException("null");
        }

    if (radix < Character.MIN_RADIX) {
        throw new NumberFormatException("radix " + radix +
                        " less than Character.MIN_RADIX");
    }

    if (radix > Character.MAX_RADIX) {
        throw new NumberFormatException("radix " + radix +
                        " greater than Character.MAX_RADIX");
    }

    int result = 0;
    boolean negative = false;
    int i = 0, max = s.length();
    int limit;
    int multmin;
    int digit;

    if (max > 0) {
        if (s.charAt(0) == '-') {
        negative = true;
        limit = Integer.MIN_VALUE;
        i++;
        } else {
        limit = -Integer.MAX_VALUE;
        }
        multmin = limit / radix;
        if (i < max) {
        digit = Character.digit(s.charAt(i++),radix);
        if (digit < 0) {
            throw NumberFormatException.forInputString(s);
        } else {
            result = -digit;
        }
        }
        while (i < max) {
        // Accumulating negatively avoids surprises near MAX_VALUE
        digit = Character.digit(s.charAt(i++),radix);
        if (digit < 0) {
            throw NumberFormatException.forInputString(s);
        }
        if (result < multmin) {
            throw NumberFormatException.forInputString(s);
        }
        result *= radix;
        if (result < limit + digit) {
            throw NumberFormatException.forInputString(s);
        }
        result -= digit;
        }
    } else {
        throw NumberFormatException.forInputString(s);
    }
    if (negative) {
        if (i > 1) {
        return result;
        } else {    /* Only got "-" */
        throw NumberFormatException.forInputString(s);
        }
    } else {
        return -result;
    }
    }

過程就是按照思路來的，但是更全面一些，首先做一些參數檢查，然后定義了局部變量用于計算：result是對應的int結果，negative對應是否是負數的判斷，i是遍歷用的索引指針，max代表字符串的長度，limit是合法數字的上限（下限），digit是當前掃描到的字符對應的數字，multmin是在做乘法計算時能走到的合法下限。

嚴謹是這段程序最大的特點，因為有符號int的上下限是-2147483648~2147483647，可見負數表達的范圍比正數多一個，這樣就好理解為什么在開頭要把limit全部表達為負數（下限），這樣的操作減少了后續的判斷，可以一步到位，相當于二者選擇取其大一樣，大的包含了小的。同理，那么multmin也就是負數了，而且可以認為是只和進制參數radix有關系。接著每個char的掃描計算digit利用到了Character.digit(char,int) 方法，這個方法就是在調用CharacterDataLatin1.digit(codePoint, radix) 方法，而這個新的方法其實只是去靜態數組中取個映射而已。最后當順利的執行完while循環后，result結果也就計算好了。

作為程序設計人員，我最初接觸的語言是C++，當初用到的庫函數是atoi，那么我們看看atoi的庫標準實現：

int
atoi(str)
	const char *str;
{
	_DIAGASSERT(str != NULL);

	return((int)strtol(str, (char **)NULL, 10));
}

其中調用了strtol方法，參數傳遞的radix是10，也就是說我們常用的atoi是默認轉化字符串到10進制的。其中開始時還進行了一個trim的操作，而且支持16進制的0x開頭，可謂完全的盡善盡美啊。

strtol方法：

#define	_FUNCNAME	strtol
#define	__INT		long
#define	__INT_MIN	LONG_MIN
#define	__INT_MAX	LONG_MAX

__INT
_FUNCNAME(const char *nptr, char **endptr, int base)
{
	const char *s;
	__INT acc, cutoff;
	char c;
	int i, neg, any, cutlim;

	_DIAGASSERT(nptr != NULL);
	/* endptr may be NULL */

	/* check base value */
	if (base && (base < 2 || base > 36)) {
		errno = EINVAL;
		return(0);
	}

	/*
	 * Skip white space and pick up leading +/- sign if any.
	 * If base is 0, allow 0x for hex and 0 for octal, else
	 * assume decimal; if base is already 16, allow 0x.
	 */
	s = nptr;
	do {
		c = *s++;
	} while (isspace(c));
	if (c == '-') {
		neg = 1;
		c = *s++;
	} else {
		neg = 0;
		if (c == '+')
			c = *s++;
	}
	if ((base == 0 || base == 16) &&
	    c == '0' && (*s == 'x' || *s == 'X')) {
		c = s[1];
		s += 2;
		base = 16;
	}
	if (base == 0)
		base = c == '0' ? 8 : 10;

	/*
	 * Compute the cutoff value between legal numbers and illegal
	 * numbers.  That is the largest legal value, divided by the
	 * base.  An input number that is greater than this value, if
	 * followed by a legal input character, is too big.  One that
	 * is equal to this value may be valid or not; the limit
	 * between valid and invalid numbers is then based on the last
	 * digit.  For instance, if the range for longs is
	 * [-2147483648..2147483647] and the input base is 10,
	 * cutoff will be set to 214748364 and cutlim to either
	 * 7 (neg==0) or 8 (neg==1), meaning that if we have accumulated
	 * a value > 214748364, or equal but the next digit is > 7 (or 8),
	 * the number is too big, and we will return a range error.
	 *
	 * Set any if any `digits' consumed; make it negative to indicate
	 * overflow.
	 */
	cutoff = neg ? __INT_MIN : __INT_MAX;
	cutlim = (int)(cutoff % base);
	cutoff /= base;
	if (neg) {
		if (cutlim > 0) {
			cutlim -= base;
			cutoff += 1;
		}
		cutlim = -cutlim;
	}
	for (acc = 0, any = 0;; c = *s++) {
		if (!isascii(c))
			break;
		if (isdigit(c))
			i = c - '0';
		else if (isalpha(c))
			i = c - (isupper(c) ? 'A' - 10 : 'a' - 10);
		else
			break;
		if (i >= base)
			break;
		if (any < 0)
			continue;
		if (neg) {
			if (acc < cutoff || (acc == cutoff && i > cutlim)) {
				any = -1;
				acc = __INT_MIN;
				errno = ERANGE;
			} else {
				any = 1;
				acc *= base;
				acc -= i;
			}
		} else {
			if (acc > cutoff || (acc == cutoff && i > cutlim)) {
				any = -1;
				acc = __INT_MAX;
				errno = ERANGE;
			} else {
				any = 1;
				acc *= base;
				acc += i;
			}
		}
	}
	if (endptr != 0)
		/* LINTED interface specification */
		*endptr = __DECONST(char *, (any ? s - 1 : nptr));
	return(acc);
}

當然，類似的代碼還有很多，這里只列出了兩大語言的庫實現，總體思路是一致的，當我們設計api時，這種編程思路和風格以及功能的考慮是我們需要學習的。

下面這兩篇stackoverflow的問答給出了一些比較全面的c風格代碼，可以參考，這里不貼全文只給link：

http://stackoverflow.com/questions/194465/how-to-parse-a-string-to-an-int-in-c

http://stackoverflow.com/questions/4442658/c-parse-int-from-string

參考文獻：

jdk文檔及源碼

c庫函數源碼及文檔

posted on 2012-04-06 12:20 changedi 閱讀(2677) 評論(2) 編輯收藏所屬分類: 好代碼分享


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關文章: 分享代碼系列——HashedArrayTree 分享一個LRUMap的實現——來自apache common-collections框架分享代碼系列——vlist 分享代碼系列——parseInt(包含java和c語言的atoi方法）分享一個ThreadMonitor

Change Dir

導航

公告

隨筆分類(125)

隨筆檔案(123)

統計

留言簿(18)

積分與排名

“牛”們的博客

各個公司技術

我的鏈接

淘寶技術

閱讀排行榜

評論排行榜

分享代碼系列——parseInt(包含java和c語言的atoi方法）

評論

# re: 分享代碼系列&mdash;&mdash;parseInt(包含java和c語言的atoi方法） 2014-04-29 22:46 zuidaima

# re: 分享代碼系列——parseInt(包含java和c語言的atoi方法） 2014-04-29 22:46 zuidaima