posted @ 2007-12-19 20:02 ZelluX 閱讀(326) | 評論 (2) | 編輯 收藏
CAL樣例程序里面出現很多sample指令,google到的簡單介紹:
Antialias
(抗鋸齒)
雖然減小像素的大小可以使圖像可以更加精細,一定程度上減輕了鋸齒,但是只要像素的大小大到可以互相彼此區分,那么鋸齒的產生是不可避免的!抗鋸齒的方法一般是多點(注意此處是“點”而不是“像素”,后面可以看出它們間的區別)采樣。
一、???????
理論與方法:
1
.
Oversampling
(重復取樣):
(
1
)方法:
首先,將場景以比你的顯示器(前緩沖)更高分辨率進行渲染:
假設當前的(前
/
后緩沖)的分辨率是
800
×
600
,那么可以先將場景渲染到
1600
×
1200
的渲染目標上(紋理);
然后,從高分辨率的渲染目標得到低分辨率的場景渲染結果:
?????
此時取每
2
×
2
個像素塊顏色的平均值為最終渲染的像素顏色值。
(
2
)優點:可以顯著地改善鋸齒導致的失真。
(
3
)缺點:需要更大的緩沖,同時填充緩沖導致性能消耗變大;
??????????
進行多個像素的取樣,導致性能下降;
??????????
由于以上缺點,
D3D
并沒有采用這種抗鋸齒方法。
2
.
Multisampling
(多取樣):
(
1
)方法:
只需要對像素進行一次取樣,而是在每個像素中取
N
個點(取決于具體的取樣模型),該像素的最終顏色
=
該像素原先的顏色
*
多邊形覆蓋的點數
/
總的取樣點數;
(
2
)優點:可以改善鋸齒帶來的失真的同時而不會增加取樣次數,同時比起
Oversampling
它也不需要更大的后備緩沖。
(
3
)缺點:原本當一個多邊形覆蓋了一個像素的中心點時,該像素的顏色才會由該多邊形決定(在像素管線階段典型的就是尋址到合適的紋理顏色與頂點管線輸出的顏色進行調制),但是
Multisampling
中,如果該多邊形覆蓋了其中一部分取樣點卻未覆蓋像素中心點,該像素顏色仍然由此多邊形決定。如此一來,紋理尋址可能出現錯誤,這對于紋理集(
atlas
)會出現另一種失真效果:多邊形邊緣顏色錯誤!
3
.
Centriod Sampling
(質心采樣):
(
1
)方法:
????
為了解決在使用
Multisampling
導致的在紋理集中進行紋理尋址帶來的錯誤,不再采用像素中心的顏色作為“該像素原先的顏色”,而是用“該像素中被多邊形覆蓋的那些取樣點的中心點的顏色”。這樣就保證了被渲染的像素點始終是多邊形的內部(也就是說紋理地址不會超出多邊形的范圍)。
(
2
)如何使用:
????????
①任何有COLOR語義作為輸入的Pixel Shader會自動運用質心采樣;
???? ②在Pixel Shader的輸入參數的語義后中手動加入 _centroid 擴展,例如:
?? float4 ?TexturePointCentroidPS( float4 TexCoord : TEXCOORD0_centroid ) : COLOR0
{
? return tex2D( PointSampler, TexCoord );
}
( 3 )注意:
??? 質心采樣主要用于采用紋理集的 Multisampling ,對于一整張紋理對應一個的多邊形網格的情況,采用質心采樣反而會導致錯誤!
posted @ 2007-12-14 13:42 ZelluX 閱讀(440) | 評論 (0) | 編輯 收藏
在CC同學的幫助下終于看懂這個程序了
關鍵在于P488的Generic Cache Memory Organization,以前看過,沒留下什么印象
cache是有多個(2s個)大小為block size的片組成的
這樣在訪問B[k][j]時,B[k][j] - B[k][j + bsize - 1]這條內存就被cache了
重復bsize次后B[k][k] - b[k + bsize - 1][k + bsize - 1]這塊內存被cache
后面做乘法就快很多的
posted @ 2007-12-06 13:17 ZelluX 閱讀(884) | 評論 (0) | 編輯 收藏



Help on built-in function sorted in module __builtin__:
sorted(...)
sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted list
posted @ 2007-12-04 18:48 ZelluX 閱讀(1098) | 評論 (0) | 編輯 收藏
Summary: Original citation
From: td@alice.UUCP (Tom Duff)
Organization: AT&T Bell Laboratories, Murray Hill NJ
Date: 29 Aug 88 20:33:51 GMT
Message-ID: <8144@alice.UUCP>
I normally do not read comp.lang.c, but Jim McKie told me that ``'' had come up in comp.lang.c again. I have lost the version that was sent to netnews in May 1984, but I have reproduced below the note in which I originally proposed the device. (If anybody has a copy of the netnews version, I would gratefully receive a copy at research!td or td@research.att.com.)
To clear up a few points: I was at Lucasfilm when I invented the device.
Here then, is the original document describing Duff's device:
From research!ucbvax!dagobah!td Sun Nov 13 07:35:46 1983 Consider the following routine, abstracted from code which copies an array of shorts into the Programmed IO data register of an Evans & Sutherland Picture System II:
(Obviously, this fails if the count is zero.) It amazes me that after 10 years of writing C there are still little corners that I haven't explored fully. (Actually, I have another revolting way to use switches to implement interrupt driven state machines but it's too horrid to go into.)
Many people (even bwk?) have said that the worst feature of C is that switches don't break automatically before each case label. This code forms some sort of argument in that debate, but I'm not sure whether it's for or against.
yrs trly
>>... Dollars to doughnuts this
>>was written on a RISC machine.
>Nope. Bell Labs Research uses VAXen and 68Ks, mostly.
Received: by ucbvax.ARPA (4.16/4.13) id AA18997; Sun, 13 Nov 83 07:35:46 pst
Received: by dagobah.LFL (4.6/4.6b) id AA01034; Thu, 10 Nov 83 17:57:56 PST
Date: Thu, 10 Nov 83 17:57:56 PST
From: ucbvax!dagobah!td (Tom Duff)
Message-Id: <8311110157.AA01034@dagobah.LFL>
To: ucbvax!decvax!hcr!rrg, ucbvax!ihnp4!hcr!rrg, ucbvax!research!dmr, ucbvax!research!rob
send(to, from, count)
register short *to, *from;
register count;
{
do
*to = *from++;
while (--count>0);
}
The VAX C compiler compiles the loop into 2 instructions (a movw and a sobleq,
I think.) As it turns out, this loop was the bottleneck in a real-time animation playback program which ran too slowly by about 50%. The standard way to get more speed out of something like this is to unwind the loop a few times, decreasing the number of sobleqs. When you do that, you wind up with a leftover partial loop. I usually handle this in C with a switch that indexes a list of copies of the original loop body. Of course, if I were writing assembly language code, I'd just jump into the middle of the unwound loop to deal with the leftovers. Thinking about this yesterday, the following implementation occurred to me:
send(to, from, count)
register short *to, *from;
register count;
{
register n=(count+7)/8;
switch(count%8) {
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while(--n>0);
}
}
Tom
posted @ 2007-11-29 16:02 ZelluX 閱讀(510) | 評論 (0) | 編輯 收藏
其實關鍵的工具還是google的gprof2dot
http://google-gprof2dot.googlecode.com/
四種風格,應該在生成dot的時候還可以設定其他信息,比如每個結點費時等,畢竟profiling這個指數更重要




posted @ 2007-11-27 20:04 ZelluX 閱讀(537) | 評論 (0) | 編輯 收藏
http://www.aygfsteel.com/Files/zellux/ORC-PACT02-tutorial.rar
然后貌似龍書第二版里也講了大量的IPA優化和call graph方面的東西,啃啊啃
posted @ 2007-11-27 15:24 ZelluX 閱讀(289) | 評論 (0) | 編輯 收藏
Overview of the Open64 Compiler Infrastructure
VI.4. Interprocedural Analysis
Interprocedural Analysis (IPA) is performed in the following phases of Open64:
• Inliner phase
• IPA local summary phase
• IPA analysis phase
• IPA optimization phase
• IPA miscellaneous
By default the IPA does the function inlining in the inliner facility. The local summary phase is done in the IPL module and the analysis phase and optimization phase in the ipa-link module.
During the analysis phase, it does the following:
• IPA_Padding Analysis (common blocks Padding/Split Analysis)
• Construction of the Callgraph
Then it does space and multigot partitioning of the Callgraph. The partitioning algorithm takes into account whether it is doing partitioning for solving space or the multigot problem.
During the optimization phase the following phases are performed:
• IPA Global Variable Optimization
• IPA Dead function elimination
• IPA Interprocedural Alias Analysis
• IPA Cloning Analysis (It propagates information about formal parameters used as symbolic terms in array section summaries. This information is later used to trigger cloning.
• IPA Interprocedural Constant propagation
• IPA Array_Section Analysis
• IPA Inlining Analysis
• Array section summaries arrays for the Dependence Analyzer of the Loop Nest Optimizer.
posted @ 2007-11-26 12:53 ZelluX 閱讀(387) | 評論 (1) | 編輯 收藏
突然要做一個相關的編譯優化項目,先放一點國外網的IPA的資料上來,教育網出國不方便
GCC wiki:
Analysis and optimizations that work on more than one procedure at a time. This is usually done by making walking the Strongly Connected Components of the call graph, and performing some analysis and optimization across some set of procedures (be it the whole program, or just a subset) at once.
GCC has had a callgraph for a few versions now (since GCC 3.4 in the FSF releases), but the procedures didn't have control flow graphs (CFGs) built. The tree-profiling-branch in GCC CVS now has a CFG for every procedure built and accessible from the callgraph, as well as a basic IPA pass manager. It also contains in-progress interprocedural optimizations and analyses: interprocedural constant propagation (with cloning for specialization) and interprocedural type escape analysis.
IBM的XL Fortran V10.1 for Linux:
Benefits of interprocedural analysis (IPA)
Interprocedural Analysis (IPA) can analyze and optimize your application as a whole, rather than on a file-by-file basis. Run during the link step of an application build, the entire application, including linked libraries, is available for interprocedural analysis. This whole program analysis opens your application to a powerful set of transformations available only when more than one file or compilation unit is accessible. IPA optimizations are also effective on mixed language applications.
?
Figure 2. IPA at the link stepThe following are some of the link-time transformations that IPA can use to restructure and optimize your application:
- Inlining between compilation units
- Complex data flow analyses across subprogram calls to eliminate parameters or propagate constants directly into called subprograms.
- Improving parameter usage analysis, or replacing external subprogram calls to system libraries with more efficient inline code.
- Restructuring data structures to maximize access locality.
In order to maximize IPA link-time optimization, you must use IPA at both the compile and link step. Objects you do not compile with IPA can only provide minimal information to the optimizer, and receive minimal benefit. However when IPA is active on the compile step, the resulting object file contains program information that IPA can read during the link step. The program information is invisible to the system linker, and you can still use the object file and link without invoking IPA. The IPA optimizations use hidden information to reconstruct the original compilation and can completely analyze the subprograms the object contains in the context of their actual usage in your application.
During the link step, IPA restructures your application, partitioning it into distinct logical code units. After IPA optimizations are complete, IPA applies the same low-level compilation-unit transformations as the -O2 and -O3 base optimizations levels. Following those transformations, the compiler creates one or more object files and linking occurs with the necessary libraries through the system linker.
It is important that you specify a set of compilation options as consistent as possible when compiling and linking your application. This includes all compiler options, not just -qipa suboptions. When possible, specify identical options on all compilations and repeat the same options on the IPA link step. Incompatible or conflicting options that you specify to create object files, or link-time options in conflict with compile-time options can reduce the effectiveness of IPA optimizations.
Using IPA on the compile step only
IPA can still perform transformations if you do not specify IPA on the link step. Using IPA on the compile step initiates optimizations that can improve performance for an individual object file even if you do not link the object file using IPA. The primary focus of IPA is link-step optimization, but using IPA only on the compile-step can still be beneficial to your application without incurring the costs of link-time IPA.
?
Figure 3. IPA at the compile stepIPA Levels and other IPA suboptions
You can control many IPA optimization functions using the -qipa option and suboptions. The most important part of the IPA optimization process is the level at which IPA optimization occurs. Default compilation does not invoke IPA. If you specify -qipa without a level, or specify -O4, IPA optimizations are at level one. If you specify -O5, IPA optimizations are at level two.
IPA Level | Behaviors |
qipa=level=0 |
|
qipa=level=1 |
|
qipa=level=2 |
|
IPA includes many suboptions that can help you guide IPA to perform optimizations important to the particular characteristics of your application. Among the most relevant to providing information on your application are:
- lowfreq which allows you to specify a list of procedures that are likely to be called infrequently during the course of a typical program run. Performance can increase because optimization transformations will not focus on these procedures.
- partition which allows you to specify the size of the regions within the program to analyze. Larger partitions contain more procedures, which result in better interprocedural analysis but require more storage to optimize.
- threads which allows you to specify the number of parallel threads available to IPA optimizations. This can provide an increase in compilation-time performance on multi-processor systems.
- clonearch which allows you to instruct the compiler to generate duplicate subprograms with each tuned to a particular architecture.
Using IPA across the XL compiler family
The XL compiler family shares optimization technology. Object files you create using IPA on the compile step with the XL C, C++, and Fortran compilers can undergo IPA analysis during the link step. Where program analysis shows that objects were built with compatible options, such as -qnostrict, IPA can perform transformations such as inlining C functions into Fortran code, or propagating C++ constant data into C function calls.
posted @ 2007-11-25 23:04 ZelluX 閱讀(731) | 評論 (0) | 編輯 收藏
posted @ 2007-11-23 21:15 ZelluX 閱讀(1325) | 評論 (0) | 編輯 收藏