国产精品资源在线观看,婷婷久久一区,日韩高清二区

GP-GPU 閱讀筆記 (3)

4. GPGPU Techniques
4.1. Stream Operations
4.1.1. Map
Given a stream of data elements and a function, map will apply the function to every element in the stream.
4.1.2. Reduce
Sometimes a computation requires computing a smaller stream from a larger input stream, possibly to a single element stream. This type of computation is called a reduction. For example, computing the sum or maximum of all the elements in a stream.
On GPUs, reductions can be performed by alternately rendering to and reading from a pair of textures.
也就是用分治法，不斷切換輸入和輸出數據，每次都能減少一定比例的數據規模。
4.1.3. Scatter and Gather
If the write and read operations access memory indirectly, they are called scatter and gather respectively.
4.1.4. Stream Filtering
This stream fitering operation is essentially a nonuniform reduction.
4.1.5. Sort
Classic sorting algorithms are data-dependent and generally require scatter operations.
主要的幾個算法都和Sorting Network有關，還有一種adaptive sort，和原來序列的有序度相關。
4.1.6. Search
4.2. Data Structures

posted @ 2008-02-09 13:14 ZelluX 閱讀(362) | 評論 (0) | 編輯收藏

Windows - QQ、網頁Flash視頻無聲音的解決方法

注冊表
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Drivers32
新建一個字符串值
名為 wavemapper
值為 msacm32.drv

posted @ 2008-02-09 01:00 ZelluX 閱讀(4997) | 評論 (14) | 編輯收藏

GP-GPU 閱讀筆記 (2)

2.4 GPU Program Flow Control
最新的GPU支持多種形式的分支，但是由于它們的高度并行化的本質，使用這些分支的時候一定要注意。
2.4.1 Hardware Machanisms for Flow Control
三種主要實現：
Predication 并非真正的data-dependent branch
MIMD branching
SIMD branching 同時進行的指令唯一，即各個點的分支選擇應該一致
2.4.2 Moving Branching Up The Pipeline
2.4.2.1 Static Branch Resolution
靜態分析，避免循環內部的分支。這里舉了一個在離散空間點格(discrete spatial grid)上解偏微分方程的例子，不過沒怎么看懂，大致是把循環拆成兩部分的做法。
2.4.2.2 Pre-computation
有時候一段時間內或者幾次循環中某個分支的結果會是一個常數。這時候就只要在知道結果會改變的時候重新計算即可。
2.4.2.3 Z-Cull
現代GPU有一系列用于避免處理不會被看到的像素的技術，其中之一就是Z-cull。簡單的說Z-cull把沒有通過深度測試（Z軸覆蓋）點直接放棄。在流體模擬中，把land-locked障礙單元的Z深度標記為0，即可跳過這些點的計算。
2.4.2.4 Data-Dependent Looping With Occlusion Queries
同樣是避免處理不可見的點的技術

3 Programming Systems
GPU的架構發展非常迅速，使得profiling和tuning需要由GPU生產商解決。
3.1 High-level Shading Languages
Cg, HLSL 和底層硬件很接近
OpenGL Shading Language 有一些不直接映射到硬件的特性，比如整數支持
Sh, Ashli, ...
3.2 GPGPU Languages and Libraries
上面提到的幾個語言在使用時都要求編程人員站在幾何元素的視角寫代碼。下面的幾個系統試著把一些GPGPU功能抽象出來，隱藏底層的GPU實現。
Brook 前幾星期打過交道的東東
Scout, Glift 都沒聽說過。。。
3.3 Debugging Tools
GPU的調試功能很受局限。它必須提供在某一時刻顯示多個點的調試信息的功能。一種printf-style的方法是把他們直接顯示在屏幕上（汗，如果是GPGPU編程豈不是花屏了 >,<）。

posted @ 2008-02-08 16:05 ZelluX 閱讀(530) | 評論 (0) | 編輯收藏

GP-GPU 閱讀筆記 (1)

實驗室的寒假任務 =_=
No.1
A Survey of General-Purpose Computation on Graphics Hardware
on EUROGRAPHICS 2005

1. Why GP-GPU?
1.1 Powerful and Inexpensive
高內存帶寬：Nvidia GeForce 6800 Ultra - 35.2GB/sec
強大的計算能力：ATI X800 XT - 63GFLOPS, Intel Pentium4 SSE unit(3.7GHz) - 14.8GFLOPS
尖端處理科技的應用：最新公布(指該survey發布的時間)的GPU包含三億個晶體管，由0.011微米技術制作
快速發展：GeForce 6800的throughput為5900的兩倍。通常GPU的計算能力平均每年增長速度為1.7x(pixels/second)和2.3x(vertices/second)，而根據摩爾定律，CPU的對應數值大概為每年1.4x。粗略的說，GPU性能每六個月增長一倍。

1.2 Flexible and Programmable

1.3 Limitations and Difficulties
GPU的強大計算性能是建立在它高度針對的架構上的，因此很多應用都不適合放到GPU上做。比如文字處理，主要包括內存通信，而且很難并行化。
如今的GPU也缺少一些基本的計算功能，比如整數運算。而且很多只支持32位浮點數（貌似最近的R670指令集可以處理double類型了），這樣導致很多科學計算都沒法在GPU上做。
另外即使對于適合GPU這些特性的問題，真正使用GPU做時也有不少問題。GPU的編程模型很不一樣，高效的GPU編程不僅僅是說多學一門高級語言。如今要借助GPU的計算能力，需要編程人員同時掌握相應的科學計算知識和計算機圖形學知識。盡管如此，GPU對性能提升的幫助還是很誘人的。

1.4 GPGPU Today
http://gpgpu.org
一些GPGPU的應用包括
Dense and sparse matrix multiplication 計算領域
Multigrid and conjugate-gradient solves for systems partial differential equations   計算領域
Ray tracing   圖像處理
Photon mapping   圖像處理
Fluid mechanics solvers   物理模擬
Datamining operations   數據庫/數據挖掘

2. Overview of Programmable Graphics Hardware
2.1 Overview of the Graphics Pipeline
當今的GPU都采用了稱為graphics pipeline的架構。pipeline被分成不同的stage，硬件上每個stage都被放到task-parallel machine organization上實現。

2.2 Programmable Hardware
顯卡商們把固定功能的pipeline轉化成了一個更靈活的可編程的pipeliine。主要在geometry stage和fragment stage。原來的固定的操作被用戶定義的vertex program和fragment program代替
通常來說，這些可編程階段讀入一組含有限數量的有4個32位浮點的向量數組并輸出一組含有限數量的4*32浮點向量的數組。每個可編程階段都可以訪問常數寄存器，也可以讀寫對應的寄存器。

2.3 Introduction to the GPU Programming Model
典型的GPGPU程序都使用了fragment processor作為計算引擎。通常的結構為：
a. 程序員確定該應用的并行部分。應用程序被分成幾個獨立的可并行段，每段都被看成是一個kernel，被當成fragment program實現。每個kernel的輸入輸出都是一個或多個數據數組，以texture形式保存在GPU內存中。用流相關的術語表述的話，這些在texture中的數據組成了stream，每個stream上的元素都要被kernel分別處理。
b. 調用kernel前要先確定計算范圍，程序員可以傳遞點的數據給GPU。注意GPU在處理一維數組時性能有所局限。
c. rasterizer為每個像素生成一個fragment。
d. 每個fragment被同一個活動的kernel程序處理。fragment程序可以讀入任意的全局內存，但只能寫到rasterizer決定的frame buffer中。這塊還沒怎么搞懂
e. 每個fragment的輸出是一個值或者向量值，可以作為作中的程序結果，也可以保存為一個texture，用于后面的計算，復雜的應用通常需要多個pipeline之間的傳遞(multipass)

posted @ 2008-02-07 16:31 ZelluX 閱讀(776) | 評論 (1) | 編輯收藏

讀核筆記(1) - 內核模塊

include/linux/module.h
struct module:

struct module

{

// 用于在用戶空間傳入module對象時判斷傳入的結構是否有效

unsigned long size_of_struct; /* == sizeof(module) */

struct module *next;

// 指向本module的名稱，通常內核空間里申請的name內存位置都是緊跟在module{}結構后面的

const char *name;

// 本module{}結構的空間 + 緊接著這段內存申請的歸module{}結構使用的一部分空間

// size = sizeof(struct module) + sizeof(misc data)

unsigned long size;

// 模塊引用計數器，還沒搞清楚這里的pad是干什么用的

// i386中atomic_t的定義: typedef struct { volatile int counter; } atomic_t;

union

{

atomic_t usecount;

long pad;

} uc; /* Needs to keep its size - so says rth */

// 模塊當前狀態，已初始化/運行中/被移除/被訪問過等

unsigned long flags; /* AUTOCLEAN et al */

// 定義的內核模塊符號數

unsigned nsyms;

// 引用的模塊鏈表節點數，遍歷模塊依賴性時使用

unsigned ndeps;

// 符號表

struct module_symbol *syms;

// 記錄依賴的其他模塊的數組

struct module_ref *deps;

// 記錄引用該模塊的其他模塊的數組

struct module_ref *refs;

// 初始化和刪除模塊時調用的函數指針

int (*init)(void);

void (*cleanup)(void);

// 中斷向量表的入口和結束位置

const struct exception_table_entry *ex_table_start;

const struct exception_table_entry *ex_table_end;

#ifdef __alpha__

unsigned long gp;

#endif

/* Members past this point are extensions to the basic

module support and are optional. Use mod_member_present()

to examine them. */

// 這兩個指針維持一些模塊相關信息，方便卸載后再次裝載模塊時的配置

const struct module_persist *persist_start;

const struct module_persist *persist_end;

int (*can_unload)(void);

int runsize; /* In modutils, not currently used */

const char *kallsyms_start; /* All symbols for kernel debugging */

const char *kallsyms_end;

const char *archdata_start; /* arch specific data for module */

const char *archdata_end;

const char *kernel_data; /* Reserved for kernel internal use */

};

struct module_symbol:
保存目標代碼中的內核符號，讀取文件裝入模塊時通過這個數據結構將里面包含的符號信息讀入。

struct module_symbol

{

unsigned long value; // 入口地址

const char *name; // 內核符號名稱

};

struct module_ref:
注意這里dep和ref記錄的不對稱，應該可以看成是一個ref鏈表吧
module{} 中的deps數組分別指向了各個依賴的module_ref{}

struct module_ref

{

struct module *dep; /* "parent" pointer */

struct module *ref; /* "child" pointer */

struct module_ref *next_ref;

};

struct kernel_sym:
在sys_get_kernel_syms()中用到的結構，該函數將內核符號拷貝到用戶空間的kernel_sym{}中，從而可以在用戶態存放模塊信息。

struct kernel_sym

{

unsigned long value; // 內核符號地址

char name[60]; /* should have been 64-sizeof(long); oh well */

};

module.c中的一些函數先略去了，書上蠻詳細的

模塊的加載和卸載
insmod的任務：
從命令行中讀入模塊名，確定代碼所在文件的位置
計算需要的內存
執行系統調用create_module()，傳遞新模塊的名稱和大小
用QM_MODULES獲得所有已經鏈接模塊的模塊名
用QM_SYMBOL獲得內核符號表和已經鏈接到內核的模塊的符號表
使用這些信息重新定位該模塊文件中的代碼
在用戶空間分配內存，拷貝相關信息
調用sys_init_module()，傳遞上面創建的用戶態的內存區地址
釋放用戶態內存，結束

rmmod的任務：
用QM_MODULES和QM_REFS取得已經鏈接的模塊列表和依賴關系
調用delete_module

posted @ 2008-02-07 11:22 ZelluX 閱讀(574) | 評論 (3) | 編輯收藏

Minesweeper is NP-complete

http://web.mat.bham.ac.uk/R.W.Kaye/minesw/ordmsw.htm

居然是NPC...

posted @ 2008-02-06 19:40 ZelluX 閱讀(474) | 評論 (0) | 編輯收藏

Linux 內核相關資料鏈接

如果一些文章的鏈接失效，google相應的標題應該還是很容易找到其他網站的轉載的。

中斷處理：
Interrupt in Linux
相當不錯的中文資料

內核調度：
Inside the Linux scheduler?
講的是用expired/active兩個數組維護的O(1)算法，大多數講2.6內核的書上都會提到的調度算法 (2008-02-06)

Multiprocessing with the Completely Fair Scheduler
最新的2.6.23采用的CFS，還沒搞懂 (2008-02-06)

http://www.ibm.com/developerworks/cn/linux/l-cn-scheduler/index.html
Linux 調度器發展簡述 (2008-02-13)

內核模塊：
2.6 內核中的模塊注入 (2008-02-17)
http://www.linuxforum.net/forum/showflat.php?Cat=&Board=security&Number=536404&page=0&view=collapsed&sb=5&o=31&fpart

系統調用：
Linux 2.6 新增的 vsyscall 系統服務調用機制 (2008-02-18)
http://blog.csdn.net/wishfly/archive/2005/01/23/264435.aspx

Linux on-the-fly kernel patching without LKM (2008-02-19)
http://doc.bughunter.net/rootkit-backdoor/kernel-patching.html

內存管理：
http://linux-mm.org/LinuxMM
Linux-mm.org is a wiki for documenting how memory management works and for coordinating new memory management development projects. (2008-02-21)

并發同步：
http://hi.baidu.com/charleswen/blog/item/61f3e40ebc26dcce7acbe1c8.html
Linux內核中的同步和互斥分析報告 (2008-02-21)

http://www-128.ibm.com./developerworks/cn/linux/kernel/sync/index.html
Linux 2.4.x內核同步機制 (2008-02-22)

Big Picture:
http://www.linuxdriver.co.il/kernel_map
Interactive Linux kernel map (2008-02-16)
把內核中的函數相互調用做成了一張可放大縮小的地圖，單擊相應函數名會跳轉到lxr的相應代碼鏈接。

編程資料：
http://www.jegerlehner.ch/intel/
Intel Assembler CodeTable 80x86?(2008-02-21)

相關站點：
http://kernelnewbies.org
Linux Kernel Newbies

http://bbs4.newsmth.net/bbsdoc.php?board=KernelTech
水木KernelTech版

http://www.phrack.org
Phrack is an underground ezine made by and for hackers.
有不少和內核相關的hack資料