亚洲欧洲日韩精品在线,国产女人aaa级久久久级,国产精品久久久久久av公交车

vim太贊了

用了快一年了，真是好用啊好用，恩

posted @ 2007-12-19 20:02 ZelluX 閱讀(328) | 評論 (2) | 編輯收藏

Sampling

CAL樣例程序里面出現(xiàn)很多sample指令，google到的簡單介紹：

Antialias （抗鋸齒）

雖然減小像素的大小可以使圖像可以更加精細，一定程度上減輕了鋸齒，但是只要像素的大小大到可以互相彼此區(qū)分，那么鋸齒的產(chǎn)生是不可避免的！抗鋸齒的方法一般是多點（注意此處是“點”而不是“像素”，后面可以看出它們間的區(qū)別）采樣。

一、??????? 理論與方法：

1 ． Oversampling （重復(fù)取樣）：

（ 1 ）方法：

　首先，將場景以比你的顯示器（前緩沖）更高分辨率進行渲染：

假設(shè)當前的（前 / 后緩沖）的分辨率是 800 × 600 ，那么可以先將場景渲染到 1600 × 1200 的渲染目標上（紋理）；

　然后，從高分辨率的渲染目標得到低分辨率的場景渲染結(jié)果：

????? 此時取每 2 × 2 個像素塊顏色的平均值為最終渲染的像素顏色值。

（ 2 ）優(yōu)點：可以顯著地改善鋸齒導(dǎo)致的失真。

（ 3 ）缺點：需要更大的緩沖，同時填充緩沖導(dǎo)致性能消耗變大；

?????????? 進行多個像素的取樣，導(dǎo)致性能下降；

?????????? 由于以上缺點， D3D 并沒有采用這種抗鋸齒方法。

2 ． Multisampling （多取樣）：

（ 1 ）方法：

只需要對像素進行一次取樣，而是在每個像素中取 N 個點（取決于具體的取樣模型），該像素的最終顏色 = 該像素原先的顏色 * 　多邊形覆蓋的點數(shù)　 / 　總的取樣點數(shù)；

（ 2 ）優(yōu)點：可以改善鋸齒帶來的失真的同時而不會增加取樣次數(shù)，同時比起 Oversampling 它也不需要更大的后備緩沖。

（ 3 ）缺點：原本當一個多邊形覆蓋了一個像素的中心點時，該像素的顏色才會由該多邊形決定（在像素管線階段典型的就是尋址到合適的紋理顏色與頂點管線輸出的顏色進行調(diào)制），但是 Multisampling 中，如果該多邊形覆蓋了其中一部分取樣點卻未覆蓋像素中心點，該像素顏色仍然由此多邊形決定。如此一來，紋理尋址可能出現(xiàn)錯誤，這對于紋理集（ atlas ）會出現(xiàn)另一種失真效果：多邊形邊緣顏色錯誤！

3 ． Centriod Sampling （質(zhì)心采樣）：

（ 1 ）方法：

???? 為了解決在使用 Multisampling 導(dǎo)致的在紋理集中進行紋理尋址帶來的錯誤，不再采用像素中心的顏色作為“該像素原先的顏色”，而是用“該像素中被多邊形覆蓋的那些取樣點的中心點的顏色”。這樣就保證了被渲染的像素點始終是多邊形的內(nèi)部（也就是說紋理地址不會超出多邊形的范圍）。

（ 2 ）如何使用：

???????? ①任何有COLOR語義作為輸入的Pixel Shader會自動運用質(zhì)心采樣；

???? ②在Pixel Shader的輸入?yún)?shù)的語義后中手動加入 _centroid 擴展，例如：

?? float4 ?TexturePointCentroidPS( float4 TexCoord : TEXCOORD0_centroid ) : COLOR0

{

? return tex2D( PointSampler, TexCoord );

}

（ 3 ）注意：

??? 質(zhì)心采樣主要用于采用紋理集的 Multisampling ，對于一整張紋理對應(yīng)一個的多邊形網(wǎng)格的情況，采用質(zhì)心采樣反而會導(dǎo)致錯誤！

posted @ 2007-12-14 13:42 ZelluX 閱讀(449) | 評論 (0) | 編輯收藏

關(guān)于矩陣乘法的優(yōu)化

CS:APP P521
在CC同學(xué)的幫助下終于看懂這個程序了
關(guān)鍵在于P488的Generic Cache Memory Organization，以前看過，沒留下什么印象
cache是有多個(2^s個）大小為block size的片組成的
這樣在訪問B[k][j]時，B[k][j] - B[k][j + bsize - 1]這條內(nèi)存就被cache了
重復(fù)bsize次后B[k][k] - b[k + bsize - 1][k + bsize - 1]這塊內(nèi)存被cache
后面做乘法就快很多的

posted @ 2007-12-06 13:17 ZelluX 閱讀(888) | 評論 (0) | 編輯收藏

Python中Dictionary類型的排序

lambda真是王道啊

#!/usr/bin/env python

d={'a':1,'b':5,'c':4}

print sorted(d.items(), key=lambda (k,v): (v,k))

Help on built-in function sorted in module __builtin__:

sorted(...)
sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted list

posted @ 2007-12-04 18:48 ZelluX 閱讀(1103) | 評論 (0) | 編輯收藏

Tom Duff on Duff's Device

Subject: Re: Explanation, please!
Summary: Original citation
From: td@alice.UUCP (Tom Duff)
Organization: AT&T Bell Laboratories, Murray Hill NJ
Date: 29 Aug 88 20:33:51 GMT
Message-ID: <8144@alice.UUCP>

I normally do not read comp.lang.c, but Jim McKie told me that ``'' had come up in comp.lang.c again. I have lost the version that was sent to netnews in May 1984, but I have reproduced below the note in which I originally proposed the device. (If anybody has a copy of the netnews version, I would gratefully receive a copy at research!td or td@research.att.com.)

To clear up a few points:

The point of the device is to express general loop unrolling directly in C. People who have posted saying `just use memcpy' have missed the point, as have those who have criticized it using various machine-dependent memcpy implementations as support. In fact, the example in the message is not implementable as memcpy, nor is any computer likely to have an memcpy-like idiom that implements it.

Somebody claimed that while the device was named for me, I probably didn't invent it. I almost certainly did invent it. I had definitely not seen or heard of it when I came upon it, and nobody has ever even claimed prior knowledge, let alone provided dates and times. Note the headers on the message below: apparently I invented the device on November 9, 1983, and was proud (or disgusted) enough to send mail to dmr. Please note that I do not claim to have invented loop unrolling, merely this particular expression of it in C.
The device is legal dpANS C. I cannot quote chapter and verse, but Larry Rosler, who was chairman of the language subcommittee (I think), has assured me that X3J11 considered it carefully and decided that it was legal. Somewhere I have a note from dmr certifying that all the compilers that he believes in accept it. Of course, the device is also legal C++, since Bjarne uses it in his book.
Somebody invoked (or more properly, banished) the `false god of efficiency.' Careful reading of my original note will put this slur to rest. The alternative to genuflecting before the god of code-bumming is finding a better algorithm. It should be clear that none such was available. If your code is too slow, you must make it faster. If no better algorithm is available, you must trim cycles.
The same person claimed that the device wouldn't exhibit the desired speed-up. The argument was flawed in two regards: first, it didn't address the performance of the device, but rather the performance of one of its few uses (implementing memcpy) for which many machines have a high-performance idiom. Second, the poster made his claims in the absence of timing data, which renders his assertion suspect. A second poster tried the test, but botched the implementation, proving only that with diligence it is possible to make anything run slowly.
Even Henry Spencer, who hit every other nail square on the end with the flat round thing stuck to it, made a mistake (albeit a trivial one). Here is Henry replying to bill@proxftl.UUCP (T. William Wells):
```
   >>... Dollars to doughnuts this
    >>was written on a RISC machine.
    >Nope.  Bell Labs Research uses VAXen and 68Ks, mostly.
    
```
I was at Lucasfilm when I invented the device.
Transformations like this can only be justified by measuring the resulting code. Be careful when you use this thing that you don't unwind the loop so much that you overflow your machine's instruction cache. Don't try to be smarter than an over-clever C compiler that recognizes loops that implement block move or block clear and compiles them into machine idioms.

Here then, is the original document describing Duff's device:

From research!ucbvax!dagobah!td Sun Nov 13 07:35:46 1983
Received: by ucbvax.ARPA (4.16/4.13) id AA18997; Sun, 13 Nov 83 07:35:46 pst
Received: by dagobah.LFL (4.6/4.6b) id AA01034; Thu, 10 Nov 83 17:57:56 PST
Date: Thu, 10 Nov 83 17:57:56 PST
From: ucbvax!dagobah!td (Tom Duff)
Message-Id: <8311110157.AA01034@dagobah.LFL>
To: ucbvax!decvax!hcr!rrg, ucbvax!ihnp4!hcr!rrg, ucbvax!research!dmr, ucbvax!research!rob

Consider the following routine, abstracted from code which copies an array of shorts into the Programmed IO data register of an Evans & Sutherland Picture System II:

 
send(to, from, count)

register short *to, *from;

register count;

{

    do

        *to = *from++;

    while (--count>0);

}

(Obviously, this fails if the count is zero.)
The VAX C compiler compiles the loop into 2 instructions (a movw and a sobleq,
I think.) As it turns out, this loop was the bottleneck in a real-time animation playback program which ran too slowly by about 50%. The standard way to get more speed out of something like this is to unwind the loop a few times, decreasing the number of sobleqs. When you do that, you wind up with a leftover partial loop. I usually handle this in C with a switch that indexes a list of copies of the original loop body. Of course, if I were writing assembly language code, I'd just jump into the middle of the unwound loop to deal with the leftovers. Thinking about this yesterday, the following implementation occurred to me:

send(to, from, count)

register short *to, *from;

register count;

{

register n=(count+7)/8;

switch(count%8) {

case 0: do { *to = *from++;

case 7: *to = *from++;

case 6: *to = *from++;

case 5: *to = *from++;

case 4: *to = *from++;

case 3: *to = *from++;

case 2: *to = *from++;

case 1: *to = *from++;

} while(--n>0);

}

Disgusting, no? But it compiles and runs just fine. I feel a combination of pride and revulsion at this discovery. If no one's thought of it before, I think I'll name it after myself.

It amazes me that after 10 years of writing C there are still little corners that I haven't explored fully. (Actually, I have another revolting way to use switches to implement interrupt driven state machines but it's too horrid to go into.)

Many people (even bwk?) have said that the worst feature of C is that switches don't break automatically before each case label. This code forms some sort of argument in that debate, but I'm not sure whether it's for or against.

yrs trly
Tom

posted @ 2007-11-29 16:02 ZelluX 閱讀(516) | 評論 (0) | 編輯收藏

用GraphViz畫了ICS Lab1的call graph

其實關(guān)鍵的工具還是google的gprof2dot
http://google-gprof2dot.googlecode.com/

四種風(fēng)格，應(yīng)該在生成dot的時候還可以設(shè)定其他信息，比如每個結(jié)點費時等，畢竟profiling這個指數(shù)更重要

posted @ 2007-11-27 20:04 ZelluX 閱讀(544) | 評論 (0) | 編輯收藏

Inter-Procedural Analysis 相關(guān)的資料 (3)

ORC (Open Research Compiler) 的一個講座，里面有不少IPA的內(nèi)容
http://www.aygfsteel.com/Files/zellux/ORC-PACT02-tutorial.rar

然后貌似龍書第二版里也講了大量的IPA優(yōu)化和call graph方面的東西，啃啊啃

posted @ 2007-11-27 15:24 ZelluX 閱讀(297) | 評論 (0) | 編輯收藏

Inter-Procedural Analysis 相關(guān)的資料 (2)

University of Houston, Computer Science Department, High Performance Computing Tools Group的一篇論文：
Overview of the Open64 Compiler Infrastructure
VI.4. Interprocedural Analysis
Interprocedural Analysis (IPA) is performed in the following phases of Open64:
• Inliner phase
• IPA local summary phase
• IPA analysis phase
• IPA optimization phase
• IPA miscellaneous
By default the IPA does the function inlining in the inliner facility. The local summary phase is done in the IPL module and the analysis phase and optimization phase in the ipa-link module.
During the analysis phase, it does the following:
• IPA_Padding Analysis (common blocks Padding/Split Analysis)
• Construction of the Callgraph
Then it does space and multigot partitioning of the Callgraph. The partitioning algorithm takes into account whether it is doing partitioning for solving space or the multigot problem.
During the optimization phase the following phases are performed:
• IPA Global Variable Optimization
• IPA Dead function elimination
• IPA Interprocedural Alias Analysis
• IPA Cloning Analysis (It propagates information about formal parameters used as symbolic terms in array section summaries. This information is later used to trigger cloning.
• IPA Interprocedural Constant propagation
• IPA Array_Section Analysis
• IPA Inlining Analysis
• Array section summaries arrays for the Dependence Analyzer of the Loop Nest Optimizer.

posted @ 2007-11-26 12:53 ZelluX 閱讀(399) | 評論 (1) | 編輯收藏

Inter-Procedural Analysis 相關(guān)的資料 (1)

突然要做一個相關(guān)的編譯優(yōu)化項目，先放一點國外網(wǎng)的IPA的資料上來，教育網(wǎng)出國不方便

GCC wiki:

Analysis and optimizations that work on more than one procedure at a time. This is usually done by making walking the Strongly Connected Components of the call graph, and performing some analysis and optimization across some set of procedures (be it the whole program, or just a subset) at once.

GCC has had a callgraph for a few versions now (since GCC 3.4 in the FSF releases), but the procedures didn't have control flow graphs (CFGs) built. The tree-profiling-branch in GCC CVS now has a CFG for every procedure built and accessible from the callgraph, as well as a basic IPA pass manager. It also contains in-progress interprocedural optimizations and analyses: interprocedural constant propagation (with cloning for specialization) and interprocedural type escape analysis.

IBM的XL Fortran V10.1 for Linux:

Benefits of interprocedural analysis (IPA)

Interprocedural Analysis (IPA) can analyze and optimize your application as a whole, rather than on a file-by-file basis. Run during the link step of an application build, the entire application, including linked libraries, is available for interprocedural analysis. This whole program analysis opens your application to a powerful set of transformations available only when more than one file or compilation unit is accessible. IPA optimizations are also effective on mixed language applications.

?

Figure 2. IPA at the link step

The following are some of the link-time transformations that IPA can use to restructure and optimize your application:

Inlining between compilation units
Complex data flow analyses across subprogram calls to eliminate parameters or propagate constants directly into called subprograms.
Improving parameter usage analysis, or replacing external subprogram calls to system libraries with more efficient inline code.
Restructuring data structures to maximize access locality.

In order to maximize IPA link-time optimization, you must use IPA at both the compile and link step. Objects you do not compile with IPA can only provide minimal information to the optimizer, and receive minimal benefit. However when IPA is active on the compile step, the resulting object file contains program information that IPA can read during the link step. The program information is invisible to the system linker, and you can still use the object file and link without invoking IPA. The IPA optimizations use hidden information to reconstruct the original compilation and can completely analyze the subprograms the object contains in the context of their actual usage in your application.

During the link step, IPA restructures your application, partitioning it into distinct logical code units. After IPA optimizations are complete, IPA applies the same low-level compilation-unit transformations as the -O2 and -O3 base optimizations levels. Following those transformations, the compiler creates one or more object files and linking occurs with the necessary libraries through the system linker.

It is important that you specify a set of compilation options as consistent as possible when compiling and linking your application. This includes all compiler options, not just -qipa suboptions. When possible, specify identical options on all compilations and repeat the same options on the IPA link step. Incompatible or conflicting options that you specify to create object files, or link-time options in conflict with compile-time options can reduce the effectiveness of IPA optimizations.

Using IPA on the compile step only

IPA can still perform transformations if you do not specify IPA on the link step. Using IPA on the compile step initiates optimizations that can improve performance for an individual object file even if you do not link the object file using IPA. The primary focus of IPA is link-step optimization, but using IPA only on the compile-step can still be beneficial to your application without incurring the costs of link-time IPA.

?

Figure 3. IPA at the compile step

IPA Levels and other IPA suboptions

You can control many IPA optimization functions using the -qipa option and suboptions. The most important part of the IPA optimization process is the level at which IPA optimization occurs. Default compilation does not invoke IPA. If you specify -qipa without a level, or specify -O4, IPA optimizations are at level one. If you specify -O5, IPA optimizations are at level two.

Table 5. The levels of IPA
IPA Level	Behaviors
qipa=level=0	Automatically recognizes standard library functions Localizes statically bound variables and procedures Organizes and partitions your code according to call affinity, expanding the scope of the -O2 and -O3 low-level compilation unit optimizer Lowers compilation time in comparison to higher levels, though limits analysis
qipa=level=1	Level 0 optimizations Performs procedure inlining across compilation units Organizes and partitions static data according to reference affinity
qipa=level=2	Level 0 and level 1 optimizations Performs whole program alias analysis which removes ambiguity between pointer references and calls, while refining call side effect information Propagates interprocedural constants Eliminates dead code Performs pointer analysis Performs procedure cloning Optimizes intraprocedural operations, using specifically: Value numbering Code propagation and simplification Code motion, into conditions and out of loops Redundancy elimination techniques

IPA includes many suboptions that can help you guide IPA to perform optimizations important to the particular characteristics of your application. Among the most relevant to providing information on your application are:

lowfreq which allows you to specify a list of procedures that are likely to be called infrequently during the course of a typical program run. Performance can increase because optimization transformations will not focus on these procedures.
partition which allows you to specify the size of the regions within the program to analyze. Larger partitions contain more procedures, which result in better interprocedural analysis but require more storage to optimize.
threads which allows you to specify the number of parallel threads available to IPA optimizations. This can provide an increase in compilation-time performance on multi-processor systems.
clonearch which allows you to instruct the compiler to generate duplicate subprograms with each tuned to a particular architecture.

Using IPA across the XL compiler family

The XL compiler family shares optimization technology. Object files you create using IPA on the compile step with the XL C, C++, and Fortran compilers can undergo IPA analysis during the link step. Where program analysis shows that objects were built with compatible options, such as -qnostrict, IPA can perform transformations such as inlining C functions into Fortran code, or propagating C++ constant data into C function calls.

posted @ 2007-11-25 23:04 ZelluX 閱讀(739) | 評論 (0) | 編輯收藏

【轉(zhuǎn)載】Python 中的函數(shù)式編程 (1)

摘要: from IBM developerWorks 原文的代碼部分很亂，整理了一下 Although users usually think of Python as a procedural and object-oriented language, it actually contains everything you need for a completely func... 閱讀全文

posted @ 2007-11-23 21:15 ZelluX 閱讀(1329) | 評論 (0) | 編輯收藏

Snowdream

vim太贊了

Sampling

關(guān)于矩陣乘法的優(yōu)化

Python中Dictionary類型的排序

Tom Duff on Duff's Device

用GraphViz畫了ICS Lab1的call graph

Inter-Procedural Analysis 相關(guān)的資料 (3)

Inter-Procedural Analysis 相關(guān)的資料 (2)

Inter-Procedural Analysis 相關(guān)的資料 (1)

Benefits of interprocedural analysis (IPA)

Using IPA on the compile step only

IPA Levels and other IPA suboptions

Using IPA across the XL compiler family

【轉(zhuǎn)載】Python 中的函數(shù)式編程 (1)

日歷

常用鏈接

留言簿(21)

隨筆分類(390)

隨筆檔案(389)

文章檔案(7)

相冊

15ers

友情鏈接

收藏夾

搜索

積分與排名

最新隨筆

最新評論

閱讀排行榜

評論排行榜