亚洲精品国产电影,秋霞久久久久久一区二区,国产精品一区二区在线观看

求整數中比特為1的二進制位數（轉）

求整數中比特為1的二進制位數

分類： C/C++ 2009-11-14 01:05 2306人閱讀評論(3) 收藏舉報

performance 匯編測試 methods iostream express

好幾次在CSDN上看到別人討論如何求出一個整數的二進制表示中狀態為1的比特位數。今天寫了個程序把從網上看來的加上自己想出來的共有5種方法測試了一下，覺得好玩，就寫了這篇博客。

首先簡單介紹一下這五種方法。

第一種：最簡單的，通過移位操作逐位測試并計數，不妨稱為“逐位測試法”；

第二種：注意到對于“單比特二進制”而言，其狀態與數值“相同”。即對于單個比特的“數”而言，為0即表示數值0，“全部為1”即表示數值1（注意多比特數值顯然沒有這個特點，比如一個字節有8個比特，當8個比特全為1時并不代表整數8，而代表255）。利用這一特點，從單個比特開始，以相鄰的一位、兩位、四位、八位和十六位為分組，一組一組地相加并逐步累計，最終得出結果；不妨稱為“分組統計法”；

第三種：注意到一個整數減1的會把最后一位1變成0，而其后的0全變成1，利用這一特點，把一個數不斷與它減1后的結果做“按位與”，直到它變成0，那么循環的次數便是其狀態為1的比特位數。不妨稱之為“循環減一相與法”；

第四種：考虛到多數處理器都提供“找到第一個為1的比特位”這種指令，在我的PC上直接調用X86處理器的BSF或BSR指令，每次直接找到第一個不為0的位并消掉，直到該數變成0，那么循環的次數即為狀態為1的比特位數。這種不妨稱為“匯編嵌入法”；

第五種，一個字節一共有256種狀態，將每一個取值所對應的0比特位數的事先寫在程序中（注意這些數值是有規律的），也耗不了太多內存，然后程序運行的時候，把整數的四個字節拆開逐字節查表并相加，這種可稱為“查表法”。

以下是程序。程序中對應以上五種方法的函數分別命名為f1到f5。另外還有三個函數，correctness_test通過幾個簡單而又特殊的取值測試各個函數的正確性，相當于一個小單元測試；performance_test則分別將這5個函數作用于一億個隨機整數同進行性能測試；prepare_test_data則是準備1億個隨機整數的程序（這個函數實際并沒有為測試數據提供足夠的隨機性，但這一點對性能測試的結果應該沒有太大影響）

[cpp] view plain copy print ?

#include <iostream>
#include <cstdlib>
#include <ctime>
using namespace std;
int f1(unsigned int num);
int f2(unsigned int num);
int f3(unsigned int num);
int f4(unsigned int num);
int f5(unsigned int num);
void correctness_test();
void performance_test();
void prepare_test_data(unsigned int data[], int size);
int main() {
correctness_test();
performance_test();
return 0;
}
int f1(unsigned int num) {
int count = 0;
while(num) {
if(num & 1) ++count;
num >>= 1;
}
return count;
}
int f2(unsigned int num) {
const unsigned int M1 = 0x55555555;
const unsigned int M2 = 0x33333333;
const unsigned int M4 = 0x0f0f0f0f;
const unsigned int M8 = 0x00ff00ff;
const unsigned int M16 = 0x0000ffff;
num = (num & M1) + ((num >> 1) & M1);
num = (num & M2) + ((num >> 2) & M2);
num = (num & M4) + ((num >> 4) & M4);
num = (num & M8) + ((num >> 8) & M8);
num = (num & M16) + ((num >> 16) & M16);
return num;
}
int f3(unsigned int num) {
int count = 0;
while(num) {
num &= (num - 1);
++count;
}
return count;
}
int f4(unsigned int num) {
int count = 0;
while(num) {
int n;
__asm {
bsr eax, num
mov n, eax
}
++count;
num ^= (1 << n);
}
return count;
}
int f5(unsigned int num) {
static const signed char TABLE[256] = {
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8
};
unsigned char* p = reinterpret_cast<unsigned char*>(&num);
int count = 0;
while(p != reinterpret_cast<unsigned char*>(&num + 1)) {
count += TABLE[*p++];
}
return count;
}
void correctness_test() {
unsigned int test_data[] = {0, 1, 2, 3, 0x01234567, 0x89abcdef, 0xffffffff};
unsigned int corect_result[] = {0, 1, 1, 2, 12, 20, 32};
int (*fn[])(unsigned int) = {f1, f2, f3, f4, f5};
for(int i = 0; i < sizeof(fn) / sizeof(*fn); ++i) {
for(int j = 0; j < sizeof(test_data) / sizeof(*test_data); ++j) {
if(fn[i](test_data[j]) != corect_result[j]) {
cout << "f" << (i + 1) << " failed!" << endl;
exit(-1);
}
}
}
cout << "All methods passed correctness test." << endl;
}
void performance_test() {
const int TEST_DATA_SIZE = 100000000;
unsigned int* test_data = new unsigned int[TEST_DATA_SIZE];
prepare_test_data(test_data, TEST_DATA_SIZE);
int (*fn[])(unsigned int) = {f1, f2, f3, f4, f5};
for(int i = 0; i < sizeof(fn) / sizeof(*fn); ++i) {
clock_t start = clock();
for(int j = 0; j < TEST_DATA_SIZE; ++j) {
fn[i](test_data[j]);
}
int ticks = clock() - start;
double seconds = ticks * 1.0 / CLOCKS_PER_SEC;
cout << "f" << (i + 1) << " consumed " << seconds << " seconds." << endl;
}
delete[] test_data;
}
void prepare_test_data(unsigned int* data, int len) {
srand(0);
for(int i = 0; i < len; ++i) {
data[i] = static_cast<unsigned int>(rand() * 1.0 / RAND_MAX * 0xffffffff);
}
}

#include <iostream> 
#include <cstdlib>
#include <ctime>
using namespace std;

int f1(unsigned int num);
int f2(unsigned int num);
int f3(unsigned int num);
int f4(unsigned int num);
int f5(unsigned int num);

void correctness_test();
void performance_test();
void prepare_test_data(unsigned int data[], int size);

int main() {
	correctness_test();
	performance_test();
	return 0; 
}

int f1(unsigned int num) {
	int count = 0;
	while(num) {
		if(num & 1) ++count;
		num >>= 1;
	}
	return count;
}

int f2(unsigned int num) {
	const unsigned int M1 = 0x55555555;
	const unsigned int M2 = 0x33333333;
	const unsigned int M4 = 0x0f0f0f0f;
	const unsigned int M8 = 0x00ff00ff;
	const unsigned int M16 = 0x0000ffff;

num = (num & M1) + ((num >> 1) & M1);
	num = (num & M2) + ((num >> 2) & M2);
	num = (num & M4) + ((num >> 4) & M4);
	num = (num & M8) + ((num >> 8) & M8);
	num = (num & M16) + ((num >> 16) & M16);
	return num;
}

int f3(unsigned int num) {
	int count = 0;
	while(num) {
		num &= (num - 1);
		++count;
	}
	return count;
}

int f4(unsigned int num) {
	int count = 0;
	while(num) {
		int n;
		__asm {
			bsr eax, num
			mov n, eax
		}
		++count;
		num ^= (1 << n);
	}
	return count;
}

int f5(unsigned int num) {
	static const signed char TABLE[256] = { 
		0, 1, 1, 2, 1, 2, 2, 3,	1, 2, 2, 3, 2, 3, 3, 4,
		1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
		1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
		2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
		1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
		2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
		2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
		3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
		1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
		2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
		2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
		3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
		2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
		3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
		3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
		4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8
	};

unsigned char* p = reinterpret_cast<unsigned char*>(&num);
	int count = 0;
	while(p != reinterpret_cast<unsigned char*>(&num + 1)) {
		count += TABLE[*p++];		
	}
	return count;
}

void correctness_test() {
	unsigned int test_data[] = {0, 1, 2, 3, 0x01234567, 0x89abcdef, 0xffffffff};
	unsigned int corect_result[] = {0, 1, 1, 2, 12, 20, 32};

int (*fn[])(unsigned int) = {f1, f2, f3, f4, f5};
	for(int i = 0; i < sizeof(fn) / sizeof(*fn); ++i) {
		for(int j = 0; j < sizeof(test_data) / sizeof(*test_data); ++j) {
			if(fn[i](test_data[j]) != corect_result[j]) {
				cout << "f" << (i + 1) << " failed!" << endl;
				exit(-1);
			}
		}
	}
	cout << "All methods passed correctness test." << endl;
}

void performance_test() {
	const int TEST_DATA_SIZE = 100000000;
	unsigned int* test_data = new unsigned int[TEST_DATA_SIZE];
	prepare_test_data(test_data, TEST_DATA_SIZE);

int (*fn[])(unsigned int) = {f1, f2, f3, f4, f5};
	for(int i = 0; i < sizeof(fn) / sizeof(*fn); ++i) {
		clock_t start = clock();
		for(int j = 0; j < TEST_DATA_SIZE; ++j) {
			fn[i](test_data[j]);
		}
		int ticks = clock() - start;
		double seconds = ticks * 1.0 / CLOCKS_PER_SEC;

cout << "f" << (i + 1) << " consumed " << seconds << " seconds." << endl;
	}
	delete[] test_data;
}

void prepare_test_data(unsigned int* data, int len) {
	srand(0);
	for(int i = 0; i < len; ++i) {
		data[i] = static_cast<unsigned int>(rand() * 1.0 / RAND_MAX * 0xffffffff);
	}
}

在我的機器上（AMD Phenom 8560處理器，Windows XP SP2），使用Visual C++ 2008 Express Edition編譯并運行（Release版），某一次得到的輸出如下：

All methods passed correctness test.

f1 consumed 14.156 seconds.

f2 consumed 1.032 seconds.

f3 consumed 4.656 seconds.

f4 consumed 13.687 seconds.

f5 consumed 1.422 seconds.

從結果來看，最慢的是第一種“逐位測試法”，最快的是第二種“分組統計法”。兩者相差近14倍！

第三種“循環減一相與法”表現也很不錯，雖然跟最快的相比遜色很多，但比最慢的強多了；

第四種“匯編嵌入法”，表面上看，其復雜度是跟數值中1的位數相關，這一點與方法三一樣。而不像方法一那樣復雜度跟整個數的位數有關。但其表現并不令人滿意，結果幾乎跟方法一一樣差，而無法跟方法三相比。查了一下指令手冊，發現BSR指令并不是一條固定周期的指令，作用于32位整數時，快的時候它只需幾個CPU時鐘周期，慢的時候需要40幾個時鐘周期，我想它極有可能是在CPU內部通過類似于“逐位查找”的微命令實現的。

第五種“查表法”的表現倒讓人相當滿意，性能逼近方法二。注意我只用了基于字節編碼的表，如果實際使用中需要這種運算，我們完全可以構造一個基于unsigned short編碼的表，這樣一個表將占用64K內存，在現代PC上仍屬小菜一碟，而每個32位數只需要把前后兩半查表的結果一加即可。那樣一來，其性能會不會比方法二還要強呢？有興趣的朋友可以試試。：P

最后，或許有朋友會問：第四種方法中既然采用嵌入匯編，為何不把整個函數都用匯編來寫呢？那樣或許效率還會好一些。但那對其它幾種方法來說是不公平的，因為所有的方法都可以改用匯編來寫。所以，在這個函數中我只是把依賴于特定處理器（X86）、且無法使用C++語言及其標準庫直接實現的部分用匯編實現，其它的計算仍然用C++語言寫出。剩下的事情，跟其它幾種方法的實現一樣——讓編譯器看著辦吧，它愛怎么優化就怎么優化。

posted on 2013-11-11 18:57 ** 閱讀(230) 評論(0) 編輯收藏

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問管理

Hopes

求整數中比特為1的二進制位數（轉）

求整數中比特為1的二進制位數

導航

統計

公告

常用鏈接

留言簿(2)

隨筆檔案

文章分類

文章檔案

新聞檔案

相冊

收藏夾

C#學習

友情鏈接

搜索

最新評論

閱讀排行榜

評論排行榜

Hopes

求整數中比特為1的二進制位數 （轉）

求整數中比特為1的二進制位數

導航

統計

公告

常用鏈接

留言簿(2)

隨筆檔案

文章分類

文章檔案

新聞檔案

相冊

收藏夾

C#學習

友情鏈接

搜索

最新評論

閱讀排行榜

評論排行榜

求整數中比特為1的二進制位數（轉）