??xml version="1.0" encoding="utf-8" standalone="yes"?> 摩尔定律l制下的软g开发时代有一个非常有意思的现象Q”Andy giveth, and Bill taketh away.”。不CPU的主频有多快Q我们始l有办法来利用它Q而我们也陉在机器升U带来的E序性能提高中?/p>
我记着我大二的时候曾l做q一个五子棋的程序,当时的算法就是预先设计一些棋型(有优先Q,然后扫描盘Q对形势q行分析Q看看当前走哪部对自己最重要。当然下还要堵别hQ这需要互换双方的型再计。如果只一步,很可能被狡猾的对手欺骗,所以ؓ(f)了多惛_步,q需要递归和回朔。在当时的机器上Q算3步就基本上需?U左右的旉了。后来大学毕业收拾东西的时候找到这个程序,试了一下,发现?0步需要的旉也基本上感觉不出来了?/p>
不知道你是否有同L(fng)l历Q我们不知不觉的一直在享受着q样的免费午。可是,随着摩尔定律的提前终l,免费的午终I要q回厅R虽然硬件设计师q在努力QHyper Threading CPUQ多Z套寄存器Q相当于一个逻辑CPUQ得Pipeline可能满负荷Q多个Thread的操作有可能q行Q得多U程E序的性能?%-15%的提升;增加Cache定w也得包括Single-Thread和Multi-ThreadE序都能受益。也许这些还能帮助你一D|_(d)但问题是Q我们必d出改变,面对q个卛_到来的变革,你准备好了么Q?/p>
Concurrency Programming != Multi-Thread Programming。很多h都会(x)说MultiThreading谁不?x),问题是,你是Z么?如何使用多线E的Q我从前做过一个类似AcdSee一L(fng)囑փ查看/处理E序Q我通常用它来处理我的数码照片。我在里面用了大量的多线E,不过主要目的是在囑փ处理的时候不要Block住UIQ所以将CPU Intensive的计部分用后台U程q行处理。而ƈ没有把对囑փ矩阵的运ƈ行分开?/p>
我觉得Concurrency Programming真正的挑战在于Programming Model的改变,在程序员的脑子里面要对自qE序怎样q行化有很清楚的认识Q更重要的是Q如何去实现Q包括架构、容错、实时监控等{)q种q行化,如何?strong>调试Q如何去试?/p>
在GoogleQ每天有量的数据需要在有限的时间内q行处理Q其实每个互联网公司都会(x)到q样的问题)Q每个程序员都需要进行分布式的程序开发,q其中包括如何分布、调度、监控以?qing)容错等{。Google?a >MapReduce正是把分布式的业务逻辑从这些复杂的l节中抽象出来,使得没有或者很ƈ行开发经验的E序员也能进行ƈ行应用程序的开发?/p>
MapReduce中最重要的两个词是MapQ映)和ReduceQ规U)。初看Map/Reduceq两个词Q熟(zhn)Function Language的h一定感觉很熟?zhn)。FP把这L(fng)函数UCؓ(f)”higher order function”(”High order function”被成ؓ(f)Function Programming的利器之一哦)Q也是_(d)q些函数是编写来被与其它函数相结合(或者说被其它函数调用的Q。如果说要比的化,可以把它惌成C里面的CallBack函数Q或者STL里面的Functor。比如你要对一个STL的容器进行查找,需要制定每两个元素相比较的FunctorQComparatorQ,q个Comparator在遍历容器的时候就?x)被调用?/p>
拿前面说q图像处理程序来举例Q其实大多数的图像处理操作都是对囑փ矩阵q行某种q算。这里的q算通常有两U,一U是映射Q一U是规约。拿两种效果来说Q”老照片”效果通常是强化照片的G/B|然后Ҏ(gu)个象素加一些随机的偏移Q这些操作在二维矩阵上的每一个元素都是独立的Q是Map操作。而”雕删Z效果需要提取图像边~,需要元素之间的q算了,是一UReduce操作。再举个单的例子Q一个一l矩阵(数组Q[0,1,2,3,4]可以映射为[0,2,3,6,8]Q乘2Q,也可以映ؓ(f)[1,2,3,4,5]Q加1Q。它可以规约?Q元素求U)也可以规Uؓ(f)10Q元素求和)?/p>
面对复杂问题Q古人教导我们要?strong>?/strong>?strong>?/strong>之”,英文中对应的词是?strong>Divide and Conquer“。Map/Reduce其实是Divide/Conquer的过E,通过把问题DivideQɘq些Divide后的Mapq算高度q行Q再Map后的l果ReduceQ根据某一个KeyQ,得到最l的l果?/p>
Googler发现q是问题的核心,其它都是共性问题。因此,他们把MapReduce抽象分离出来。这PGoogle的程序员可以只关心应用逻辑Q关心根据哪些Key把问题进行分解,哪些操作是Map操作Q哪些操作是Reduce操作。其它ƈ行计中的复杂问题诸如分布、工作调度、容错、机器间通信都交lMap/Reduce FrameworkdQ很大程度上化了整个~程模型?/p>
MapReduce的另一个特Ҏ(gu)QMap和Reduce?strong>输入和输出都是中间(f)时文?/strong>QMapReduce利用Google文gpȝ来管理和讉Kq些文gQ,而不是不同进E间或者不同机器间的其它通信方式。我觉得Q这是Google一贯的风格Q化Jؓ(f)Q返璞归真?/p>
接下来就放下其它Q研I一下Map/Reduce操作。(其它比如定w、备份Q务也有很l典的经验和实现Q论文里面都有详qͼ Map的定义:(x) Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same intermediate key I and passes them to the Reduce function. Reduce的定义:(x) The Reduce function, also written by the user, accepts an intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values. Typically just zero or one output value is produced per Reduce invocation. The intermediate values are supplied to the user’s reduce function via an iterator. This allows us to handle lists of values that are too large to fit in memory. MapReduce论文中给Zq样一个例子:(x)在一个文档集合中l计每个单词出现的次数?/p>
Map操作的输入是每一文档,输入文档中每一个单词的出现输出C间文件中厅R?/p>
map(String key, String value): 比如我们有两文档,内容分别?/p>
A Q?“I love programming?/p>
B Q?“I am a blogger, you are also a blogger”?/p>
B文档l过Mapq算后输出的中间文g会(x)是:(x) Reduce操作的输入是单词和出现次数的序列。用上面的例子来_(d)是 (”I? [1, 1]), (”love? [1]), (”programming? [1]), (”am? [1]), (”a? [1,1]) {。然后根据每个单词,出ȝ出现ơ数?/p>
reduce(String key, Iterator values): 最后输出的最l结果就?x)是Q?”I? 2?, (”a? 2?…?/p>
实际的执行顺序是Q?/p>
可见Q这里的分(DivideQ体现在两步Q分别是输入分成M份,以及(qing)Map的中间结果分成R份。将输入分开通常很简单,Map的中间结果通常用”hash(key) mod R”这个结果作为标准,保证相同的Key出现在同一个Partition里面。当Ӟ使用者也可以指定自己的Partition FunctionQ比如,对于Url KeyQ如果希望同一个Host的URL出现在同一个PartitionQ可以用”hash(Hostname(urlkey)) mod R”作为Partition Function?/p>
对于上面的例子来_(d)每个文档中都可能?x)出现成千上万?(”the? 1)q样的中间结果,琐碎的中间文件必然导致传输上的损失。因此,MapReduceq支持用h供Combiner Function。这个函数通常与Reduce Function有相同的实现Q不同点在于Reduce函数的输出是最l结果,而Combiner函数的输出是Reduce函数的某一个输入的中间文g?/p>
Tom Whitel出了Nutch[2]中另一个很直观的例子,分布式Grep。我一直觉得,Pipe中的很多操作Q比如More、Grep、Cat都类g一UMap操作Q而Sort、Uniq、wc{都相当于某UReduce操作?/p>
加上前两天Google刚刚发布?a >BigTable论文Q现在Google有了自己的集?- Googel ClusterQ分布式文gpȝ - GFSQ分布式计算环境 - MapReduceQ分布式l构化存?- BigTableQ再加上Lock Service。我真的能感觉的到Google著名的免Ҏ(gu)之外的对于E序员的另一U免费的晚餐Q那个由大量的commodity PCl成的large clusters。我觉得q些才真正是Google的核心h(hun)值所在?/p>
呵呵Q就像微软老兵Joel SpolskyQ你应该看过他的”Joel on Software”吧Q)曄说过Q对于微软来说最可怕的是[1]Q微软还在苦苦追赶Google来完善Search功能的时候,Google已经在部|下一代的计算Z?/p>
The very fact that Google invented MapReduce, and Microsoft didn’t, says something about why Microsoft is still playing catch up trying to get basic search features to work, while Google has moved on to the next problem: building Skynet^H^H^H^H^H^H the world’s largest massively parallel supercomputer. I don’t think Microsoft completely understands just how far behind they are on that wave. ?Q其实,微Y也有自己的方?- DryAd。问题是Q大公司里,要想重新部vq样一个底层的InfraStructureQ无论是技术的原因Q还是政ȝ原因Q将是如何的难?/p>
?Q?a >Lucene之父Doug Cutting的又一力作QProject Hadoop - 由Hadoop分布式文件系l和一个Map/Reduce的实现组成,Lucene/Nutch的成产线也够齐全的了?br /> One day, you're browsing through your code, and you notice two big blocks that look almost exactly the same. In fact, they're exactly the same, except that one block refers to "Spaghetti" and one block refers to "Chocolate Moose." These examples happen to be in JavaScript, but even if you don't know JavaScript, you should be able to follow along. The repeated code looks wrong, of course, so you create a function: OK, it's a trivial example, but you can imagine a more substantial example. This is better code for many reasons, all of which you've heard a million times. Maintainability, Readability, Abstraction = Good! Now you notice two other blocks of code which look almost the same, except that one of them keeps calling this function called BoomBoom and the other one keeps calling this function called PutInPot. Other than that, the code is pretty much the same. Now you need a way to pass an argument to the function which itself is a function. This is an important capability, because it increases the chances that you'll be able to find common code that can be stashed away in a function. Look! We're passing in a function as an argument. Can your language do this? Wait... suppose you haven't already defined the functions PutInPot or BoomBoom. Wouldn't it be nice if you could just write them inline instead of declaring them elsewhere? Jeez, that is handy. Notice that I'm creating a function there on the fly, not even bothering to name it, just picking it up by its ears and tossing it into a function. As soon as you start thinking in terms of anonymous functions as arguments, you might notice code all over the place that, say, does something to every element of an array. Doing something to every element of an array is pretty common, and you can write a function that does it for you: Now you can rewrite the code above as: Another common thing with arrays is to combine all the values of the array in some way.
sum and join look so similar, you might want to abstract out their essence into a generic function that combines elements of an array into a single value: Many older languages simply had no way to do this kind of stuff. Other languages let you do it, but it's hard (for example, C has function pointers, but you have to declare and define the function somewhere else). Object-oriented programming languages aren't completely convinced that you should be allowed to do anything with functions. Java required you to create a whole object with a single method called a functor if you wanted to treat a function like a first class object. Combine that with the fact that many OO languages want you to create a whole file for each class, and it gets really klunky fast. If your programming language requires you to use functors, you're not getting all the benefits of a modern programming environment. See if you can get some of your money back. How much benefit do you really get out of writting itty bitty functions that do nothing more than iterate through an array doing something to each element? Well, let's go back to that map function. When you need to do something to every element in an array in turn, the truth is, it probably doesn't matter what order you do them in. You can run through the array forward or backwards and get the same result, right? In fact, if you have two CPUs handy, maybe you could write some code to have each CPU do half of the elements, and suddenly map is twice as fast. Or maybe, just hypothetically, you have hundreds of thousands of servers in several data centers around the world, and you have a really big array, containing, let's say, again, just hypothetically, the entire contents of the internet. Now you can run map on thousands of computers, each of which will attack a tiny part of the problem. So now, for example, writing some really fast code to search the entire contents of the internet is as simple as calling the map function with a basic string searcher as an argument. The really interesting thing I want you to notice, here, is that as soon as you think of map and reduce as functions that everybody can use, and they use them, you only have to get one supergenius to write the hard code to run map and reduce on a global massively parallel array of computers, and all the old code that used to work fine when you just ran a loop still works only it's a zillion times faster which means it can be used to tackle huge problems in an instant. Lemme repeat that. By abstracting away the very concept of looping, you can implement looping any way you want, including implementing it in a way that scales nicely with extra hardware. And now you understand something I wrote a while ago where I complained about CS students who are never taught anything but Java: Without understanding functional programming, you can't invent MapReduce, the algorithm that makes Google so massively scalable. The terms Map and Reduce come from Lisp and functional programming. MapReduce is, in retrospect, obvious to anyone who remembers from their 6.001-equivalent programming class that purely functional programs have no side effects and are thus trivially parallelizable. The very fact that Google invented MapReduce, and Microsoft didn't, says something about why Microsoft is still playing catch up trying to get basic search features to work, while Google has moved on to the next problem: building Skynet^H^H^H^H^H^H the world's largest massively parallel supercomputer. I don't think Microsoft completely understands just how far behind they are on that wave. Ok. I hope you're convinced, by now, that programming languages with first-class functions let you find more opportunities for abstraction, which means your code is smaller, tighter, more reusable, and more scalable. Lots of Google applications use MapReduce and they all benefit whenever someone optimizes it or fixes bugs. And now I'm going to get a little bit mushy, and argue that the most productive programming environments are the ones that let you work at different levels of abstraction. Crappy old FORTRAN really didn't even let you write functions. C had function pointers, but they were ugleeeeee and not anonymous and had to be implemented somewhere else than where you were using them. Java made you use functors, which is even uglier. As Steve Yegge points out, Java is the Kingdom of Nouns.
Correction: The last time I used FORTRAN was 27 years ago. Apparently it got functions. I must have been thinking about GW-BASIC.
Let us consider a program P, taking two arguments S and D, and producing a result R: A specialization of P with respect to S is a program PS such that, for all input D, Input S is called static, it is known (i.e., available) at specialization time. Input D is dynamic, it is unknown (i.e., unavailable) until run time. Program specialization makes sense in any programming language. Consider for example the following Scheme program. (See below for more examples, in C.) A possible specialization of Function Depending on the context, S is called a specialization value or an invariant. In the general case, a specialization may exploit several invariants, whether input values or constants already present in the code of P. The interest of function Note that all program arguments do not have the same impact on specialization. For example, specializing Specialization is used in particular (sometimes unknowingly) to optimize critical sections of code. It is often handwritten.
Partial evaluation (PE) is the process that automates program specialization [CD93, DRT96, JGS93]. A partial evaluator (or specializer) is a program M that takes two arguments, the source of a program P and a static (known) subset of the input S, and produces a specialized program PS: Roughly speaking, partial evaluation can be thought of as a combination of aggressive constant folding, inlining, loop unrolling and inter-procedural constant propagation applied to all data types (including pointers, structures and arrays) instead of just scalars. Handwritten specialization is tedious, error-prone and does not scale to large programs. Because it is automatic, specialization via partial evaluation does not have all those drawbacks; it is even predictable (see below). As a result, specialization becomes an issue in engineering software: it is possible to rapidly write generic programs, which are maintainable but slow, and automatically produce fast specialized instances. Because the programmer focuses less on optimization hacks, and more on reusability, partial evaluation greatly improve productivity and program safety. Partial evaluation has been successfully applied as an optimizer in various domains such as operating systems and networking, computer graphics, numerical computation, circuit simulation, software architectures, compiling and compiler generation. It has also been used for program understanding and reengineering: given various running options, partial evaluation may split large programs into smaller ones. An on-line partial evaluator takes as arguments the source of a program P and a static subset of the input S, performs symbolic computations on available data, and directly yields the source of a specialized program PS. In an off-line partial evaluator, the specialization is divided into two steps. First, an program binding-time analysis propagate abstract information about static and dynamic values throughout the code. It prepares the second phase that, given actual specialization values, produce specialized code.
On-line partial evaluator are theoretically more powerful: specialization relies on actual values, not on the fact that values are known. On the other hand, off-line partial evaluator are faster because value propagation is "pre-compiled". Moreover, they are predictable in the sense that it is possible to assess the degree of specialization. Some partial evaluators, like Tempo, can specialize programs not only at compile time (i.e., source-to-source transformation) but also run time (i.e., run-time code generation). Only off-line partial evaluation lends itself to run-time specialization. Binding-time analysis (BTA) propagates the static/dynamic information throughout the program and annotates each statement and expression with a binding time. These annotations can be visualized using colors (or font effects). The blue color (bold face for black and white display) represent static constructions, i.e. values that can be computed at specialization time. The red color (standard font for black and white display) is for dynamic expressions, whose value cannot be precomputed knowing only the static arguments. Basically, everything in blue (bold) will disappear after specialization; only red (standard font) parts will remain. Visualizing of the analysis is very important for the user to assess the amount of specialization in the code. Note that in the case of languages like C, the binding-time analysis must takes into account pointer aliases and side-effects. The function Various resources concerning partial evaluation, including existing specializers, PE-related events and basic references are accessible from pe_resources.php3.
// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, ??; I,1
am,1
a,1
blogger,1
you,1
are,1
a,1
blogger,1
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
from: http://xerdoc.com/blog/archives/246.html // A trivial example:
alert("I'd like some Spaghetti!");
alert("I'd like some Chocolate Moose!");
function SwedishChef( food )
{
alert("I'd like some " + food + "!");
}
SwedishChef("Spaghetti");
SwedishChef("Chocolate Moose");
alert("get the lobster");
PutInPot("lobster");
PutInPot("water");
alert("get the chicken");
BoomBoom("chicken");
BoomBoom("coconut");
function Cook( i1, i2, f )
{
alert("get the " + i1);
f(i1);
f(i2);
}
Cook( "lobster", "water", PutInPot );
Cook( "chicken", "coconut", BoomBoom );
Cook( "lobster",
"water",
function(x) { alert("pot " + x); } );
Cook( "chicken",
"coconut",
function(x) { alert("boom " + x); } );
var a = [1,2,3];
for (i=0; i<a.length; i++)
{
a[i] = a[i] * 2;
}
for (i=0; i<a.length; i++)
{
alert(a[i]);
}
function map(fn, a)
{
for (i = 0; i < a.length; i++)
{
a[i] = fn(a[i]);
}
}
map( function(x){return x*2;}, a );
map( alert, a );
function sum(a)
{
var s = 0;
for (i = 0; i < a.length; i++)
s += a[i];
return s;
}
function join(a)
{
var s = "";
for (i = 0; i < a.length; i++)
s += a[i];
return s;
}
alert(sum([1,2,3]));
alert(join(["a","b","c"]));
function reduce(fn, a, init)
{
var s = init;
for (i = 0; i < a.length; i++)
s = fn( s, a[i] );
return s;
}
function sum(a)
{
return reduce( function(a, b){ return a + b; },
a, 0 );
}
function join(a)
{
return reduce( function(a, b){ return a + b; },
a, "" );
}
About the Author: I'm your host, Joel Spolsky, a software developer in New York City. Since 2000, I've been writing about software development, management, business, and the Internet on this site. For my day job, I run Fog Creek Software, makers of FogBugz - the smart bug tracking software with the stupid name, and Fog Creek Copilot - the easiest way to provide remote tech support over the Internet, with nothing to install or configure.
from: http://www.joelonsoftware.com/items/2006/08/01.html
Program Specialization
Specialization Examples
(define (append list1 list2)
(if (null? list1)
list2
(cons (car list1) (append (cdr list1) list2))))
append
with respect to a static argument list1
= (4 2)
is function append42
below.
(define (append42 list2)
(cons 4 (cons 2 list2)))
append42
preserves the semantics of append
, or more precisely, it has the same semantics as the trivial specialization function triv_append42
, defined as
(define (triv-append42 list2)
(append '(4 2) list2))
Interest of Specialization
append42
above, as opposed to triv-append42
, is that computations depending only on the static input list1
= (4 2)
have already been performed. More generally, specialization impacts on speed and size of programs, thus offering applications to program optimization.
append42
above runs faster than append
(or more precisely, triv-append42
) because the traversal of argument list1
as already been performed.
append
with respect to list2
= (4 2)
leads to the quite unexciting function below.
(define (dull-append42 list1)
(if (null? list1)
'(4 2)
(cons (car list1) (dull-append42 (cdr list1)))))
Partial Evaluation
Applications of Partial Evaluation
Off-line vs. On-line Partial Evaluation
Binding-Time Analysis
As a first step, the user provides a program and specifies initial binding times, that is, which arguments (including global variables) are static (i.e., known) and which are dynamic (i.e., yet unknown). For example, the user provides the following code for a miniprintf
function, and specifies that the first argument is static whereas the second is dynamic: miniprintf(S,D)
.
miniprintf(char fmt[], int val[])
{
int i = 0;
while( *fmt != '\0' ) {
if( *fmt != '%' )
putchar(*fmt);
else
switch(*++fmt) {
case 'd' : putint(val[i++]); break;
case '%' : putchar('%'); break;
default : prterror(*fmt); break;
}
fmt++;
}
}
/* LEGEND: STATICDYNAMIC
*/
miniprintf(char fmt[],int val[]){int i = 0;while( *fmt != '\0' ) {if( *fmt != '%' )putchar(*fmt);elseswitch(*++fmt) {case 'd' :putint(val[i++]);break;case '%' :putchar('%');break;default :prterror(*fmt);break;}fmt++;}}
Compile-Time Specialization
When the user is satisfied with the analysis (i.e., what the user expects to specialize is indeed considered as static by the BTA), actual specialization values must be provided. For example, giving "<%d|%d>"
as the actual specialization value for the fmt
argument of the miniprintf()
function yields the following specialized code.
Many specializations can be performed, sharing the same analysis. Only different specialization values have to be provided.
miniprintf_1(int val[])
{
putchar( '<' );
putint ( val[0] );
putchar( '|' );
putint ( val[1] );
putchar( '>' );
}
Run-Time Specialization
Partial evaluators like Tempo [CHL+98,CHN+96] can also perform run-time specialization [CN96], using optimized binary code templates [NHCL97]. A dedicated run-time specializer is generated from the results of the program analysis. In the case of the miniprintf
function, a runtime specializer rts_miniprintf()
is generated, which can be used as in the following example.
/* * Some dynamic execution context setting variable 'format' */
spec_printf = rts_miniprintf(format); // specialize
...
(*spec_printf)(val1); // <=> miniprintf(format,val1)
(*spec_printf)(val2); // <=> miniprintf(format,val2)
rts_miniprintf()
is a dedicated runtime specializer. It returns a pointer to the specialized function. Several specialized versions can also be generated and used at the same time.References
Last modified: 2003-09-25. - Jocelyn.Frechot@labri.fr - http://compose.labri.fr
from: http://compose.labri.fr/documentation/pe/pe_overview.php3
F QXQ?F QXQ?S Qi=1Q?QMQ?br />׃每个个体的遗传概率是由其适应度大来控制的,所以这U调整适应度的Ҏ(gu)p够限制群体中个别个体的大量增加,从而维护了体的多h,q就了一U小生境的进化环境?br />下面介绍一个基于小生境概念的遗传算法。这个算法的基本思想是:(x)首先两两比较体中各个个体之间的距离Q若q个距离在预先的距离L 之内的话Q在比较两者之间的适应度大,q对其中适应D低的个体施加一个较强的|函敎ͼ极大地降低其适应度,q样Q对于在预先指定的某一距离L之内的两个个体,其中较差的个体经处理后其适应度变得更差,他在后面的进化过E被淘汰的概率就极大。也是_(d)在距L 内将只存在一个优良个体,从而既l护了群体的多样性,又得各个个体之间保持一定的距离Qƈ使得个体能够在整个约束的I间中分散开来,q样实C一U小生境遗传法?br />q个生境算法的描述如下Q?br />法 NicheGA Q?Q设|进化代数计数器Q随机生成M个初始群体P(yng)QtQ,q求出各个个体的适应度F Qi=1Q?QMQ。(2Q?依据各个个体的适应度对其进行降序排列,记忆前N个个体(N<MQ?(3) 选择法。对体P(yng)QtQ进行比例选择q算Q得到P QtQ。(4Q交叉选择。对选择的个体集合P QtQ?作单点交叉运,得到P QtQ。(5Q变异运。对P QtQ作均匀变异q算Q得到P QtQ。(6Q小生境淘汰q算。将W(5Q步得到的M个个体和W(2Q步所记忆的N个个体合q在一P得到一个含有M+N 个个体的新群体;对着M+N个个体,按照下式得到两个个体x 和x 之间的v明距:(x)|| x - x ||= ( )当|| x - x ||<LӞ比较个体x 和个体x 的适应度大,q对其中适应度较低的个体处以|函敎ͼ(x) Fmin(x Qx )=PenaltyQ?Q依据这M+N个个体的新适应度对各个个体q行降序排列Q记忆前N个个体。(8Q终止条件判断。若不满终止条Ӟ则:(x)更新q化代数记忆器t t+1Q?q将W(7Q步排列中的前M个个体作为新的下一代群体P(yng)(t),然后转到W(3Q步Q若满l止条gQ则Q输结果,法l束?br />[例] Shubert 函数的全局最优化计算?br />min f(x , x )={ } { }
s.t. -10 x 10Qi=1Q?Q?br />上述函数共有760个局部最优点Q其中有18个是全局最优点Q全局最优点处的目标函数值是f Qx Q?x Q?-186.731?br />用上q小生境遗传法求解该例题时Q可用下式进行目标函数值到个体适应度的变换处理Q?br />FQx Q?x Q?
L=202Q二q制~码串长度,其中每个变量?0位二q制~码来表C)
M=50
T=500
p =0.1
p =0.1
L=0.5(生境之间的距离参数)
Penlty=10 Q罚函数Q?br />使用上述参数q行?0ơ,试算Q每ơ都可得到许多全局最优解下表为其中一ơ运所得到的最好的18个个体。从该表可以看出Q从生境的角度来数Q该法得到了一个较好的l果。上q算法的特点保证了在一个函数峰内只存在一个较优的个体Q这h一个函数峰是一个小生境?br />Z生境遗传算法的Shubert函数优化法计算l果
个体标号 x x fQx Q?x Q?br />1 5.4828 4.8581 -186.731
2 5.4830 -7.7083 -186.731
3 4.8581 5.4831 -186.731
4 4.8581 -7.0838 -186.731
5 -4.4252 -7.4983 -186.731
6 -7.0832 -7.0838 -186.731
7 5.4827 -1.4249 -186.731
8 0.8580 5.4831 -186.731
9 4.8580 -0.8009 -186.730
10 -0.8009 -7.7084 -186.730
11 -0.8009 4.8581 -186.730
12 -7.7088 -0.7999 -186.730
13 -7.7088 -7.0831 -186.730
14 -1.4256 -0.8009 -186.730
15 -0.8011 -1.4252 -186.730
16 -7.7075 5.4834 -186.730
17 -7.7088 4.8579 -186.730
18 -7.0825 -1.4249 -186.730
下面再介l一U隔d生境技术的遗传法
隔离生境技术的基本概念?qing)进化策略依照自然界的地理隔L?遗传算法的初始体分ؓ(f)几个子群?子群体之间独立进?各个子群体的q化快慢?qing)规模取决于各个子群体的q_适应水^.׃隔离后的子群体彼此独?界限分明,可以对各个子体的进化过E灵zL制。生物界?竞争不仅存在于个体之?U群作ؓ(f)整体同样存在着竞争,适者生存的法则在种这一层次上同样适用.在基于隔ȝ生境技术中,是通过种的规模同种个体^均适应值相联系来实C胜劣汰、适者生存这一机制?子群体^均适应值高,则其体规模大,反之,体规模小.生物界在q化q程?适应环境的物U能得到更多的繁D机?其后代不断地增多,但这U增加不是无限制?否则׃(x)引v生态环境的p.在遗传算法中,体的M规模是一定的,Z保证体中物U的多样?必限制某些子体的规?U子体中所允许的最大规模ؓ(f)子群体最大允许规?maximum allowed scale),Cؓ(f)S .生物界中同样?x)出现某些物U因不适应环境数量逐渐减少,直至灭绝的现?在隔d生境机制?Z保持体的多h?有时需要有意识C护某些子体,使之不会(x)q早地被淘汰,q保持一定的q化能力.子群体的q化能力是和子群体的规模相联pȝ,要保证子体的进化能?必须规定每一子群体生存的最规?UCؓ(f)子群体最生存规?minimum live scale),Cؓ(f)S .在群体进化过E中,如果某一子群体在规定的代数内,持箋表现最?应该使这个子体灭绝,代之以搜索空间的新解,q一最劣子体灭绝的机?定义为劣U不z?the worst die).子群体在q化q程?如果出现两个子群体相似或相同的现?则去掉其中的一?代之以搜索空间的新解,q种{略UCؓ(f)同种互斥或种内竞?intraspecific competition).解群中出现的新的子群?在进化的初期往往无法同已l得到进化的其它子群体相竞争,如果不对此施加保?q些新解往往在进化的初期p淘汰?q显然是我们所不希望的.Z解决q个问题,必须Ҏ(gu)产生的解加以保护,q种保护新解的策略叫q弱保护(immature protection).子群体在q化q程?如果收敛到或接近局部最优解,?x)出现进化停滞的现?此时应当以某U概率将该子体L,代之以搜索空间的新解,此种{略UCؓ(f)新老更?the new superseding the old).在进化过E中,表现最优的个体q化为最优解的概率最?应当使它充分q化,故新老更替策略不能用于最优子体,q种做法UCؓ(f)优种保留(the best live).优种保留可以作用于最好的一个子体,也可以作用于最好的几个子群?
Z隔离生境技术的遗传法步骤
1)~码:针对具体问题,选择合适的~码Ҏ(gu),完成问题解空间向遗传法解空间的转化.
2)产生初始体:随机产生N个初始个?
3)初始体隔离:N个初始个体均分给K个子体,每个子群体含有的个体数均为N/K.
4)计算适应?计算体中所有个体的适应?q保存适应值最高的个体.
5)定子群体规?子群体的规模同子体的^均适应值相?子群体的q_适应D?其在下一代中拥有的个体就多;反之,在下一代中拥有的个体就?但数目必Lx大允许规模和最保护规模的限制,即第t+1代第k个子体的规模n (t+1)满S ≤n (t+1) ≤S .
定子群体规模的具体Ҏ(gu)如下,首先l每个子体都预分配S 个个?剩下的个体根据子体的^均适应值利用赌轮法选择,直到ȝ体数量辑ֈN为止.子群体的q_适应g般可单取为f (t)= (1)
式中f (t)为t代第k个子体的^均适应?f (t)为t代第k个子体中第i个个体的适应? n (t+1)为t代第k个子体的规?子群体kWt+1代的规模n (t+1)为:(x)
n (t+1)=N . f (t)/ Q?Q?br />子群体规模的定也可以根据其q_适应水^用赌轮法定.
6)保护解除判定:对群体中施加保护的群?q行保护解除判定,Ҏ(gu)保护解除条件的,撤除保护.
7)劣种不活判定:对解中没有保护而连l几代表现又最差的体,予以剔除q生等规模的新子群?
8)同种互斥判定:随机挑选出两个子群?依据某种原则判定其相似程?Ҏ(gu)相似条件的两个子群?L其中的一?产生同等规模的新?
9)新老更替判?判定解群中是否存在已l进化停滞的子群?如果?q行新老更?产生同等规模的新?但对包含最优个体的子群体要保留(最优保留机?.
10)重新计算适应?Ҏ(gu)产生的子体计算适应性?q施加幼׃护措?
11)子群体进?׃子群体的规模同其在群体中的^均表现水q相联系,故子体的规模是不断变化?
Ҏ(gu)公式(2)定的规?选择出子体的繁D个?利用交叉和变异算子生下一代解?
12)收敛性判?如果满收敛性条?或已l进化了规定的代?则结束进化过E?否则q回W?步?br />除了上面的还有下面几U常用的的小生境法Q?br />1 定性拥挤算?br />定性拥挤(Deterministic crowding, DCQ算法由Mahfoud 提出。该法属于拥挤法范畴Q采用子个体与父个体直接q行竞争的模式,竞争的内容包括适应值和个体之间的距R算法的q程如下Q?br />定性拥挤算法(重复G 代)
重复下列步骤N/2ơ:(x)
Q?Q用攑֛的方式随机选择两个父个体p 和p ?br />Q?Q对其进行杂交和变异Q生两个个体c 和c ?br />Q?Q如果[d(p Qc )+d(p Qc )] [d(p Qc )+d(p Qc )]Q则
如果fQc Q?gt;fQp Q?则用c 代替p Q否则保留p ?br />如果fQc Q?gt;fQp Q,则用c 替换p Q否则保留p ?br />如果fQc Q?gt;fQp Q,则用c 替换p Q否则保留p ?br />如果fQc Q?gt;fQp Q,则用c 替换p Q否则保留p ?br />其中QN 是种规模,的dQiQjQ是个体i 和个体j 之间的距R?br />2 限制锦标赛算?br />限制锦标赛选择QRestricted tournament selection RTSQ算法由Harik 提出。该法属于拥挤法范畴Q采用了个体与种中其它个体q行竞争的模式,竞争的内容包括适应值和个体之间的距R该法的过E如下:(x)
限制锦标赛算法(重复G代)
重复下列步骤N/2ơ:(x)
Q?Q 用有放回的方式随机选择两个父个体p 和p ?br />Q?Q 对其进行杂夹和变异Q生两个子个体c 和才c ?br />Q?Q 分别ؓ(f)c 和c从当前的U群中随机的选择出w个个体?br />Q?Q?不失一般性,设d 和d 分别是w个个体的中与c 和c 距离最q的两个个体?br />Q?Q?如果fQc Q?gt;fQd Q,则用c 替d 换,否则保留d ?br />如果fQc Q?gt;fQd Q,则用c 替换d Q否则保留d ?br />3多小生境拥挤法
多小生境拥挤法QMulti-niche crowdingQMNCQ由Cedeno提出。该法属拥挤算法的范畴Q采用种中的若q个体相互竞争的模式Q竞争的内容包括适应值和个体之间的距R竞争选择出的老个体被C体生的子个体替换。算法的q程如下Q?br />多小生境拥挤法Q重复G 代)
重复下列步骤N/2ơ:(x)
Q?Q 用有放回的方式随机选父个体p ?br />Q?Q 从U群中随机选择C 个体作ؓ(f)p 的交配候选集Q从中选出与p 最接近的个体p ?br />Q?Q 对p 和p q行杂交和变异,产生两个个体c 和c ?br />Q?Q 分别ؓ(f)c 和c 从中当前U群中各随机选择出C 个体,每群个体包含w个个体?br />Q?Q 每一个体都选出一个与对应字个体距Lq的个体。这样就为每个个体生了C 个替换候选个体?br />Q?Q 不׃般性,设d 和d 是两个替换候选集中适应值最低的个体?br />Q?Q 用c 替换d Q用c 替换d ?br />Cedeno q给ZC Qw和C 的最优参数倹{C 应该在区间[2Q?]内,C 和w臛_应该两倍于用户希望扑ֈ的全局C数。该法的步?提出了一中基于试探性的Ҏ(gu)的限制交配策略?br />4 标准适应值共享算?br />标准适应值共享算法(Standard fitness sharing SHQ由Goldberg 和Richardson 提出。该法属于适应值共享算法范_(d)事先需要给I间中小生境的半径,q假设解I间中峰半径均相同。算法的q程如下Q?br />标准的适应值共享算法(重复G 代)
Q?Q 计种中个体之间的共享函数值shQd Q?br />shQd Q?
其中Q?是事先给出的峰半径,d 是个体i和个体j之间的距, 是控制共享函数Ş状的参数Q一般取 =1Q线形共享函敎ͼ。两个个体之间共享函数D大,则两个个体越接近?br />Q?Q 计种中个体的小生境数m
m =
其中QN 是种规模。个体的生境数大Q则该个体周围绕着多其它个体?br />Q?Q 计种中个体׃n后的适应值f
f =f / m
Q?Q 用个体׃n后的适应D行选择Q杂交和变异出新的个体,生成C代种?br />Deb和Goldberg 在假设解I间中峰均匀分布q且峰半径相同的前提下,提出计算峰半径的计算公式。此外它们还提供了一U基于峰半径的限制交配策略,从而保证所有的杂交均在同一物种q行Q确保了后代和父母的均属于同一生境。标准适应值共享算法计距ȝ旉复杂度ؓ(f)OQN Q?br />5 清除法
清除QClearingQ算法由Petrowski 提出。该法属于适应值共性算法范_(d)事先需要给I间的小生境半径 Q重要参敎ͼ和小生境的容?Q次要参敎ͼQƈ假设解空间中峰值半径均相同。算法的q程如下Q?br />清除法QGQ?br />Q?Q 按照适应值对个体q行降序排列?br />Q?Q 将W一个体指定为第一个小生境中心?br />Q?Q 从W二个个体开始顺序执行下列步骤到最后一个个体:(x)
Q?.1Q如果当前个体到所有已指定生境中心的距离均大于,则该个体被指定ؓ(f)一个新的小生境中心。该个成Z胜者?br />Q?.2Q如果当前个体到某个已指定的生境中心的距离于Qƈ且该生境个数小于,则该个体加入到该生境中去,该小生境的个体L增加1。该个体成ؓ(f)优胜者?br />Q?.3Q其它个体均为失败者?br />Q?.4Q维持所有优胜者的适应度不变,所有失败者的适应值置??br />Q?Q用个体修改后的适应D行选择Q杂交和变异出新个体Q生成新一代种?br />清除法计算距离的时间复杂度为OQkNQ,其中k是该法l持的小生境数量。如果将优胜者的生境数看ؓ(f)一Q而将p|者的生境看作无I大Q则清除法也可看作标准适应值共享算法的改进?br />6 l合适应值共享的自适应k均Dcȝ?br />l合适应值共享的自适应法k均Dcȝ法(Adaptive k-mean clustering with fitness sharingQ算法由Yin 和German提出。该法属于适应值共性算法范_(d)事先需要给I间中小生境中新建的最距?和小生境中的个体到该生境中心之间的最大距?。解I间中峰半径可能不相同。算法的q程如下Q结合适应值共享的自适应k均值均cȝ法(重复G代)
Q?Q 按照适应值对个体q行降序排列?br />Q?Q? 产生在[1QN]之间的随机整数kQ初始小生境个数Q?br />Q?Q 将前k个个体分别放入不同的生境中q成为小生境中心。确保所?生境中心间距离大于 Q如果不能满一条gQ则合ƈ生境,新的生境中心就是该生境中所有个体的中心?br />Q?Q 对于其它N-k个个体中的每一个,计算其与当前所有想生境中心之间 的距R如果距d?Q则生成新的生境,该个体成为新生境的中心。否则将该个体安排到距离最q的生境中厅R据需要确保所有小生境中心间的距离均大?Q如果不能满一条gQ则需要合q小生境?br />Q?Q 所有个体均被安|完毕后Q固定小生境的中心,所有个体按照最?
距离原则安排到最q的生境中厅R?br />Q?Q 计计种个体的生境数m
m =n - n Qd /2 Q?若x C
其中Qn 是第c个小生境中包含个个体LQd 是个体i与它归属的小生境中心之间的距,x 是第i个个体,C Wc 个小生境的个体基Q?是控制函数Ş状的参数Q通常 =1?br />Q?Q 用公式计算个体׃n后的适应倹{?br />Q?Q 用个体׃n后的适应D行选择Q杂交和变异出新的个体,生成C 代个体种?br />l合适应值共享的自适应性k均Dcȝ法计距ȝ旉复杂度ؓ(f)OQKnQ?br />7 动态小生境׃n法
动态小生境׃n方法(Dynamic niche sharingQ是由Miller和Shaw 提出。该法属于适应值共享算法范_(d)事先需要给I间中小生境的半?和小生境的数量k。算法的q程如下Q?br />动态小生境׃n法Q重复G代)
Q?Q 按照适应值对个体q行降序排列?br />Q?Q 将W一个个体指定ؓ(f)W一个小生境中心?br />Q?Q 从W二个个体开始顺序执行下列步骤到最后一个个体:(x)
Q?.1Q如果当前个体与所有已指定的小生境中心之间的距d?Q而且已指定的生境数量小于kQ则形成一个新的小生境Q该个体成ؓ(f)新小生境的中心?br />Q?.2Q如果当前个体与所有小生境中心之间的距d大于 Q而且已指定的生境数量不于kQ则该个体成为独立个体?br />Q?Q?对于那些属于某个生境的个体Q其生境数是它所属的生境中个体的数量。对于那些独立个体,采用公式计算生境数?br />Q?Q?用公式计个体共享后的适应倹{?br />Q?Q?用共享后的适应D行选择Q杂交和变异出新的个体,生成C代种。动态小生境׃n法计算距离的时间复杂度为OQKnQ?br />8 自适应生境算?br />自适应生境算法(Adaptive nickingQ由Goldberg ?Wang 提出。该法属于适应值共享算法范_(d)事先需要给I间中小生境的半?和小生境的数量k。算法包含两个分别被UCؓ(f)֮和商家的个体,利用q两个个体群的共同演化实现多C化的目的。顾客群cM于其它适应值共享算法中的种,而商家群则代表搜索空间中峰的集合。商家群的个体数量k略大于其它适应值共享算法中的小生境?wi)立功能。顾客群中的个体的适应g其它适应值共享算法中个体的适应值相同,而商家群中的个体的适应值是属于该商家所有顾客的适应g和?br />法需要首先在搜烦I间中随机放|商家群的个体,其余的过E如下;
自适应生境算法(重复G 代)
Q?Q 将每一个顾客群中的个体都安排到最q的商家中去?br />Q?Q 计所有顾客的生境数Q其归属的商家所拥有的顾客数量)?br />Q?Q 用公式计算֮的个体׃n后的适应倹{?br />Q?Q 用֮中个体׃n后的适应值尽心选择Q杂交和变异出新的个体,生成C代顾客群?br />Q?Q 顺序选择每一个商家群中的个体q对其进行变异操作以产生新的商家。如果新商家的适应值比老商家的适应高,而且与其它商家之间的距离均小于,则新商家代替老商家。否则进行另外一ơ变异操作,直到产生可以替换的新商家或变异操作的ơ数过指定的最大变异ؓ(f)止?br />自适应生境算法计距ȝ旉的复杂度为O(Kn).
from: http://qbwh.com/viewthread_123913.html
U翰麦卡锡于1960q发表了一非凡的论文,他在q篇论文中对~程的A(ch)献有如欧几里德对几何的A(ch)?1 他向我们展示?在只l定几个单的操作W和一个表C函数的记号的基? 如何构造出一个完整的~程语言. 麦卡锡称q种语言为Lisp, 意ؓ(f)List Processing, 因ؓ(f)他的主要思想之一是用一U简单的数据l构?list)来代表代码和数据.
值得注意的是,麦卡锡所作的发现,不仅是计机史上划时代的大事, 而且是一U在我们q个时代~程来趋向的模式.我认为目前ؓ(f)止只有两U真正干净利落, 始终如一的编E模?C语言模式和Lisp语言模式.此二者就象两座高? 在它们中间是如沼泽的低?随着计算机变得越来越强大,新开发的语言一直在坚定地趋向于Lisp模式. 二十q来,开发新~程语言的一个流行的U决?取C语言的计模?逐渐地往上加Lisp模式的特?例如q行时类型和无用单元攉.
在这文章中我尽可能用最单的术语来解释约麦卡锡所做的发现. 关键是我们不仅要学习(fn)某个人四十年前得出的有趣理论l果, 而且展示~程语言的发展方? Lisp的不同寻怹?-也就是它优质的定?-是它能够自己来编写自? Z理解U翰麦卡锡所表述的这个特?我们追溯他的步?q将他的数学标记转换成能够运行的Common Lisp代码.
开始我们先定义表达?/em>.表达式或是一?em>原子(atom),它是一个字母序??foo),或是一个由零个或多个表辑ּl成?em>?/em>(list), 表达式之间用I格分开, 攑օ一Ҏ(gu)号中. 以下是一些表辑ּ:
foo () (foo) (foo bar) (a b (c) d)最后一个表辑ּ是由四个元素l成的表, W三个元素本w是׃个元素组成的?
在算术中表达?1 + 1 得出?. 正确的Lisp表达式也有? 如果表达?i>e得出?i>v,我们?i>eq回v. 下一步我们将定义几种表达式以?qing)它们的q回?
如果一个表辑ּ是表,我们U第一个元素ؓ(f)操作W?/em>,其余的元素ؓ(f)自变?/em>.我们定义七个原?从公理的意义上说)操作W? quote,atom,eq,car,cdr,cons,?cond.
> (quote a) a > 'a a > (quote (a b c)) (a b c)
> (atom 'a) t > (atom '(a b c)) () > (atom '()) t
既然有了一个自变量需要求值的操作W? 我们可以看一下quote的作? 通过引用(quote)一个表,我们避免它被求? 一个未被引用的表作变量传给?atomq样的操作符被视ؓ(f)代码:
> (atom (atom 'a)) t
反之一个被引用的表仅被视ؓ(f)? 在此例中是有两个元素的?
> (atom '(atom 'a)) ()
q与我们在英语中使用引号的方式一? Cambridge(剑桥)是一个位于麻萨诸塞州?0000人口的城? 而``Cambridge''是一个由9个字母组成的单词.
引用看上d能有点奇怪因为极有其它语言有类似的概念. 它和Lisp最与众不同的特征紧密联p?代码和数据由相同的数据结构构? 而我们用quote操作W来区分它们.
> (eq 'a 'a) t > (eq 'a 'b) () > (eq '() '()) t
> (car '(a b c)) a
> (cdr '(a b c)) (b c)
> (cons 'a '(b c)) (a b c) > (cons 'a (cons 'b (cons 'c '()))) (a b c) > (car (cons 'a '(b c))) a > (cdr (cons 'a '(b c))) (b c)
> (cond ((eq 'a 'b) 'first) ((atom 'a) 'second)) second
当表辑ּ以七个原始操作符中的五个开头时,它的自变量L要求值的.2 我们U这?的操作符?em>函数.
((lambda (...
) e)
...
)
则称?em>函数调用.它的D如?每一个表辑ּ先求?然后e再求??i>e的求DE中,每个出现?i>e中的
的值是相应?img height="28" alt="$a_{i}$" src="http://daiyuwen.freeshell.org/gb/rol/img7.png" width="18" align="middle" border="0" />在最q一ơ的函数调用中的?
> ((lambda (x) (cons x '(b))) 'a) (a b) > ((lambda (x y) (cons x (cdr y))) 'z '(a b c)) (z b c)如果一个表辑ּ的第一个元?i>f是原子且f不是原始操作W?
(f ...
)
q且f的值是一个函?lambda (...
)),则以上表辑ּ的值就?
((lambda (...
) e)
...
)
的? 换句话说,参数在表辑ּ中不但可以作变量也可以作为操作符使用:
> ((lambda (f) (f '(b c))) '(lambda (x) (cons 'a x))) (a b c)
有另外一个函数记号得函数能提及(qing)它本w?q样我们p方便地定义递归函数.3 记号
(label f (lambda (...
) e))
表示一个象(lambda (...
) e)那样的函?加上q样的特? M出现?i>e中的f求gؓ(f)此label表达? 好?i>f是此函数的参?
假设我们要定义函?subst x y z), 它取表达?i>x,原子y和表z做参?q回一个象z那样的表, 不过z中出现的y(在Q何嵌套层ơ上)?i>x代替.
> (subst 'm 'b '(a b (a b c) d)) (a m (a m c) d)我们可以q样表示此函?
(label subst (lambda (x y z) (cond ((atom z) (cond ((eq z y) x) ('t z))) ('t (cons (subst x y (car z)) (subst x y (cdr z)))))))我们?i>f=(label f (lambda (
(defun f (...
) e)
于是
(defun subst (x y z) (cond ((atom z) (cond ((eq z y) x) ('t z))) ('t (cons (subst x y (car z)) (subst x y (cdr z))))))偶然地我们在q儿看到如何写cond表达式的~省子句. W一个元素是't的子句L?x)成功? 于是
(cond (x y) ('t z))
{同于我们在某些语言中写?
if x then y else z
> (cadr '((a b) (c d) e)) (c d) > (caddr '((a b) (c d) e)) e > (cdar '((a b) (c d) e)) (b)我们q用(list
> (cons 'a (cons 'b (cons 'c '()))) (a b c) > (list 'a 'b 'c) (a b c)
现在我们定义一些新函数. 我在函数名后面加了点,以区别函数和定义它们的原始函?也避免与现存的common Lisp的函数冲H?
(defun null. (x) (eq x '())) > (null. 'a) () > (null. '()) t
(defun and. (x y) (cond (x (cond (y 't) ('t '()))) ('t '()))) > (and. (atom 'a) (eq 'a 'a)) t > (and. (atom 'a) (eq 'a 'b)) ()
(defun not. (x) (cond (x '()) ('t 't))) > (not. (eq 'a 'a)) () > (not. (eq 'a 'b)) t
(defun append. (x y) (cond ((null. x) y) ('t (cons (car x) (append. (cdr x) y))))) > (append. '(a b) '(c d)) (a b c d) > (append. '() '(c d)) (c d)
(defun pair. (x y) (cond ((and. (null. x) (null. y)) '()) ((and. (not. (atom x)) (not. (atom y))) (cons (list (car x) (car y)) (pair. (cdr) (cdr y)))))) > (pair. '(x y z) '(a b c)) ((x a) (y b) (z c))
(defun assoc. (x y) (cond ((eq (caar y) x) (cadar y)) ('t (assoc. x (cdr y))))) > (assoc. 'x '((x a) (y b))) a > (assoc. 'x '((x new) (x a) (y b))) new
(defun eval. (e a) (cond ((atom e) (assoc. e a)) ((atom (car e)) (cond ((eq (car e) 'quote) (cadr e)) ((eq (car e) 'atom) (atom (eval. (cadr e) a))) ((eq (car e) 'eq) (eq (eval. (cadr e) a) (eval. (caddr e) a))) ((eq (car e) 'car) (car (eval. (cadr e) a))) ((eq (car e) 'cdr) (cdr (eval. (cadr e) a))) ((eq (car e) 'cons) (cons (eval. (cadr e) a) (eval. (caddr e) a))) ((eq (car e) 'cond) (evcon. (cdr e) a)) ('t (eval. (cons (assoc. (car e) a) (cdr e)) a)))) ((eq (caar e) 'label) (eval. (cons (caddar e) (cdr e)) (cons (list (cadar e) (car e)) a))) ((eq (caar e) 'lambda) (eval. (caddar e) (append. (pair. (cadar e) (evlis. (cdr e) a)) a))))) (defun evcon. (c a) (cond ((eval. (caar c) a) (eval. (cadar c) a)) ('t (evcon. (cdr c) a)))) (defun evlis. (m a) (cond ((null. m) '()) ('t (cons (eval. (car m) a) (evlis. (cdr m) a)))))eval.的定义比我们以前看到的都要长. 让我们考虑它的每一部分是如何工作的.
eval.有两个自变量: e是要求值的表达? a是由一些赋l原子的值构成的?q些值有点象函数调用中的参数. q个形如pair.的返回值的表叫?em>环境. 正是Z构造和搜烦q种表我们才写了pair.和assoc..
eval.的骨架是一个有四个子句的cond表达? 如何对表辑ּ求值取决于它的cd. W一个子句处理原? 如果e是原? 我们在环境中L它的?
> (eval. 'x '((x a) (y b))) a
W二个子句是另一个cond, 它处理Ş?a ...)的表辑ּ, 其中a是原? q包括所有的原始操作W? 每个对应一条子?
> (eval. '(eq 'a 'a) '()) t > (eval. '(cons x '(b c)) '((x a) (y b))) (a b c)q几个子?除了quote)都调用eval.来寻找自变量的?
最后两个子句更复杂? Z求cond表达式的值我们调用了一个叫 evcon.的辅助函? 它递归地对cond子句q行求?LW一个元素返回t的子? 如果扑ֈ了这L(fng)子句, 它返回此子句的第二个元素.
> (eval. '(cond ((atom x) 'atom) ('t 'list)) '((x '(a b)))) list
W二个子句的最后部分处理函数调? 它把原子替换为它的?应该是lambda 或label表达?然后Ҏ(gu)得结果表辑ּ求? 于是
(eval. '(f '(b c)) '((f (lambda (x) (cons 'a x)))))变ؓ(f)
(eval. '((lambda (x) (cons 'a x)) '(b c)) '((f (lambda (x) (cons 'a x)))))它返?a b c).
eval.的最后cond两个子句处理W一个元素是lambda或label的函数调?Z对label 表达式求? 先把函数名和函数本n压入环境, 然后调用eval.对一个内部有 lambda的表辑ּ求? ?
(eval. '((label firstatom (lambda (x) (cond ((atom x) x) ('t (firstatom (car x)))))) y) '((y ((a b) (c d)))))变ؓ(f)
(eval. '((lambda (x) (cond ((atom x) x) ('t (firstatom (car x))))) y) '((firstatom (label firstatom (lambda (x) (cond ((atom x) x) ('t (firstatom (car x))))))) (y ((a b) (c d)))))最l返回a.
最?对Ş?(lambda (...
) e)
...
)的表辑ּ求?先调用evlis.来求得自变量(
...
)对应的?
...
),?
)...(
)d到环境里, 然后?i>e求? 于是
(eval. '((lambda (x y) (cons x (cdr y))) 'a '(b c d)) '())变ؓ(f)
(eval. '(cons x (cdr y)) '((x a) (y (b c d))))最l返?a c d).
既然理解了eval是如何工作的, 让我们回q头考虑一下这意味着什? 我们在这儿得C一个非怼的计算模型. 仅用quote,atom,eq,car,cdr,cons,和cond, 我们定义了函数eval.,它事实上实现了我们的语言,用它可以定义M我们惌的额外的函数.
当然早已有了各种计算模型--最著名的是囄? 但是囄机程序难以读? 如果你要一U描q算法的语言, 你可能需要更抽象? 而这是U翰麦卡锡定?Lisp的目标之一.
U翰麦卡锡于1960q定义的语言q缺不少东西. 它没有副作用, 没有q箋执行 (它得和副作用在一h有用), 没有实际可用的数,4 没有动态可视域. 但这些限制可以o(h)人惊讶地用极的额外代码来补? Steele和Sussman在一叫做``解释器的艺术''的著名论文中描述了如何做到这?5
如果你理解了U翰麦卡锡的eval, 那你׃仅仅是理解了E序语言历史中的一个阶D? q些思想至今仍是Lisp的语义核? 所以从某种意义? 学习(fn)U翰麦卡锡的原著向我们展CZLispI竟是什? 与其说Lisp是麦卡锡的设?不如说是他的发现. 它不是生来就是一门用于h工智? 快速原型开发或同等层次d的语a. 它是你试囑օ理化计算的结?之一).
随着旉的推U? 中语言, 卌中间层程序员使用的语a, 正一致地向Lisp靠近. 因此通过理解eval你正在明白将来的L计算模式?x)是什么样.
在约麦卡锡的论文中,假用f来表C? 而不是空? 我用I表示假以使例子能在Common Lisp中运? (fixme)
我略q了构造dotted pairs, 因ؓ(f)你不需要它来理解eval. 我也没有提apply, 虽然是apply(它的早期形式, 主要作用是引用自变量), 被约麦卡锡?960q称为普遍函? eval只是不过是被apply调用的子E序来完成所有的工作.
我定义了list和cxr{作为简记法因ؓ(f)麦卡锡就是这么做? 实际?cxr{可以被定义为普通的函数. List也可以这? 如果我们修改eval, q很Ҏ(gu)做到, 让函数可以接受Q意数目的自变?
麦卡锡的论文中只有五个原始操作符. 他用了cond和quote,但可能把它们作ؓ(f)他的元语a的一部分. 同样他也没有定义逻辑操作Wand和not, q不是个问题, 因ؓ(f)它们可以被定义成合适的函数.
在eval.的定义中我们调用了其它函数如pair.和assoc.,但Q何我们用原始操作W定义的函数调用都可以用eval.来代? ?
(assoc. (car e) a)能写?
(eval. '((label assoc. (lambda (x y) (cond ((eq (caar y) x) (cadar y)) ('t (assoc. x (cdr y)))))) (car e) a) (cons (list 'e e) (cons (list 'a a) a)))
麦卡锡的eval有一个错? W?6行是(相当?(evlis. (cdr e) a)而不?cdr e), q得自变量在一个有名函数的调用中被求g? q显C当论文发表的时? eval的这U描q还没有用IBM 704机器语言实现. 它还证明了如果不去运行程? 要保证不多短的E序的正性是多么困难.
我还在麦卡锡的论文中到一个问? 在定义了eval之后, 他l给Z一些更高的函?-接受其它函数作ؓ(f)自变量的函数. 他定义了maplist:
(label maplist (lambda (x f) (cond ((null x) '()) ('t (cons (f x) (maplist (cdr x) f))))))然后用它写了一个做微分的简单函数diff. 但是diff传给maplist一个用x做参数的函数, 对它的引用被maplist中的参数x所捕获.6
q是关于动态可视域危险性的雄辩证据, 即是最早的更高U函数的例子也因为它而出? 可能麦卡锡在1960q还没有充分意识到动态可视域的含? 动态可视域令h惊异地在Lisp实现中存在了相当长的旉--直到Sussman和Steele?1975q开发了Scheme. 词法可视域没使eval的定义复杂多? 却ɾ~译器更隑ֆ?
This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.48)
Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html-split=0 roots_of_lisp.tex
The translation was initiated by Dai Yuwen on 2003-10-24
法是计机U学领域最重要的基石之一Q但却受C国内一些程序员的冷落。许多学生看C些公司在招聘时要求的~程语言五花八门׃生了一U误解,认ؓ(f)学计机是学各U编E语aQ或者认为,学习(fn)最新的语言、技术、标准就是最好的\Ҏ(gu)。其实大安被这些公司误g。编E语a虽然该学Q但是学?fn)计机法和理论更重要Q因机法和理论更重要Q因机语言和开发^台日新月异,但万变不d宗的是那些算法和理论Q例如数据结构、算法、编译原理、计机体系l构、关pd数据库原理等{。在“开复学生网”上Q有位同学生动地把这些基评比拟为“内功”,把新的语a、技术、标准比拟ؓ(f)“外功”。整天赶旉的h最后只懂得招式Q没有功力,是不可能成ؓ(f)高手的?
法与我
当我?980q{入计机U学pLQ还没有多少人的专业方向是计机U学。有许多其他pȝ人嘲W我们说Q“知道ؓ(f)什么只有你们系要加一个‘科?’,而没有‘物理科学系’或‘化学科学系’吗Q因Zh家是真的U学Q不需要画蛇添I而你们自己心虚,生怕不‘科学’,才这h盖I彰。”其实,q点他们d弄错了。真正学懂计机的hQ不只是“编E匠”)都对数学有相当的造诣Q既能用U学家的严}思维来求证,也能用工E师的务实手D|解决问题——而这U思维和手D늚最xl就是“算法”?/p>
记得我读博时写的Othello对弈软g获得了世界冠军。当Ӟ得第二名的h认ؓ(f)我是靠oq才打赢他,不服气地问我的程序^均每U能搜烦多少步棋Q当他发现我的Y件在搜烦效率上比他快60多倍时Q才d服输。ؓ(f)什么在同样的机器上Q我可以多做60倍的工作呢?q是因ؓ(f)我用了一个最新的法Q能够把一个指数函数{换成四个q似的表Q只要用常数旉可得到q似的答案。在q个例子中,是否用对法才是能否赢得世界冠军的关键?/p>
q记?988q贝?dng)实验室副总裁亲自来访问我的学校,目的是Z想了解ؓ(f)什么他们的语音识别pȝ比我开发的慢几十倍,而且Q在扩大臛_词汇pȝ后,速度差异更有几百倍之多。他们虽然买了几台超U计机Q勉pȝ跑了hQ但q么늚计算资源让他们的产品部门很反感,因ؓ(f)“昂贵”的技术是没有应用前景的。在与他们探讨的q程中,我惊讶地发现一个O(n*m)的动态规?dynamic programming)居然被他们做成了O (n*n*m)。更惊讶的是Q他们还为此发表了不文章,甚至q法起了一个很特别的名字,q将法提名C个科学会(x)议里Q希望能得到大奖。当Ӟ贝尔实验室的研究员当然绝聪明,但他们全都是学数学、物理或甉|nQ从未学q计机U学或算法,才犯了这么基本的错误。我想那些h以后再也不会(x)嘲笑学计机U学的h了吧Q?/p>
|络时代的算?/b>
有h也许?x)说Q“今天计机q么快,法q重要吗Q”其实永q不?x)有太快的计机Q因为我们M(x)惛_新的应用。虽然在摩尔定律的作用下Q计机的计能力每q都在飞快增长,h也在不断下降。可我们不要忘记Q需要处理的信息量更是呈指数U的增长。现在每人每天都?x)创造出大量数据Q照片,视频Q语韻I文本{等Q。日益先q的U录和存储手D我们每个人的信息量都在爆炸式的增ѝ互联网的信息流量和日志定w也在飞快增长。在U学研究斚wQ随着研究手段的进步,数据量更是达C前所未有的程度。无论是三维囑Ş、v量数据处理、机器学?fn)、语韌别,都需要极大的计算量。在|络时代Q越来越多的挑战需要靠卓越的算法来解决?/p>
再D另一个网l时代的例子。在互联|和手机搜烦Q如果要Nq的咖啡店,那么搜烦引擎该怎么处理q个h呢?最单的办法是把整个城市的咖啡馆都扑և来,然后计算出它们的所在位|与你之间的距离Q再q行排序Q然后返回最q的l果。但该如何计距dQ图论里有不算法可以解册个问题?/p>
q么做也许是最直观的,但绝对不是最q速的。如果一个城市只有ؓ(f)C多的咖啡馆,那么q么做应该没什么问题,反正计算量不大。但如果一个城市里有很多咖啡馆Q又有很多用户都需要类似的搜烦Q那么服务器所承受的压力就大多了。在q种情况下,我们该怎样优化法呢?
首先Q我们可以把整个城市的咖啡馆做一ơ“预处理”。比如,把一个城市分成若q个“格?grid)”,然后Ҏ(gu)用户所在的位置把他攑ֈ某一个格子里Q只Ҏ(gu)子里的咖啡馆q行距离排序?/p>
问题又来了,如果格子大小一P那么l大多数l果都可能出现在市中心的一个格子里Q而郊区的格子里只有极的l果。在q种情况下,我们应该把市中心多分出几个格子。更q一步,格子应该是一个“树(wi)l构”,最层是一个大格——整个城市,然后逐层下降Q格子越来越,q样有利于用戯行精搜索——如果在最底层的格子里搜烦l果不多Q用户可以逐上升Q放大搜索范围?/p>
上述法对咖啡馆的例子很实用Q但是它h通用性吗Q答案是否定的。把咖啡馆抽象一下,它是一个“点”,如果要搜索一个“面”该怎么办呢Q比如,用户惛_一个水库玩Q而一个水库有好几个入口,那么哪一个离用户最q呢Q这个时候,上述“树(wi)l构”就要改成“r-tree”,因ؓ(f)?wi)中间的每一个节炚w是一个范_(d)一个有边界的范_(d)参?http://www.cs.umd.edu/~hjs/rtrees/index.htmlQ?/p>
通过q个例子,我们看到Q应用程序的要求千变万化Q很多时候需要把一个复杂的问题分解成若q简单的问题,然后再选用合适的法和数据结构?/p>
q行法QGoogle的核心优?/b>
上面的例子在Google里就要算是小case了!每天Google的网站要处理十亿个以上的搜烦QGMail要储存几千万用户?G邮箱Q?Google Earth要让数十万用户同时在整个地球上遨游,q将合适的囄l过互联|提交给每个用户。如果没有好的算法,q些应用都无法成为现实?/p>
在这些的应用中,哪怕是最基本的问题都?x)给传统的计带来很大的挑战。例如,每天都有十亿以上的用戯问Google的网站,使用Google的服务,也生很多很多的日志(Log)。因为Log每䆾每秒都在飞速增加,我们必须有聪明的办法来进行处理。我曄在面试中问过关于如何对Logq行一些分析处理的问题Q有很多面试者的回答虽然在逻辑上正,但是实际应用中是几乎不可行的。按照它们的法Q即便用上几万台机器Q我们的处理速度都根不上数据产生的速度?/p>
那么Google是如何解册些问题的Q?/p>
首先Q在|络时代Q就有最好的法Q也要能在ƈ行计的环境下执行。在Google的数据中心,我们使用的是大的ƈ行计机。但传统的ƈ行算法运行时Q效率会(x)在增加机器数量后q速降低,也就是说Q十台机器如果有五倍的效果Q增加到一千台时也许就只有几十倍的效果。这U事半功倍的代h(hun)是没有哪家公司可以负担得L(fng)。而且Q在许多q行法中,只要一个结点犯错误Q所有计都?x)前功尽弃?/p>
那么Google是如何开发出既有效率又能定w的ƈ行计的呢?
Google最资深的计机U学家Jeff Dean认识刎ͼGoogle所需的绝大部分数据处理都可以归结Z个简单的q行法QMap and ReduceQ?a class="contentlink" target="_blank">http://labs.google.com/papers/mapreduce.htmlQ。这个算法能够在很多U计中辑ֈ相当高的效率Q而且是可扩展的(也就是说Q一千台机器q不能辑ֈ一千倍的效果Q至也可以辑ֈ几百倍的效果Q?Map and Reduce的另外一大特色是它可以利用大批廉L(fng)机器l成功能强大的server farm。最后,它的定w性能异常Q就一?server farm宕掉一半,整个fram依然能够q行。正是因个天才的认识Q才有了Map and Reduce法。借助该算法, Google几乎能无限地增加计算量,与日新月异的互联|应用一同成ѝ?/p>
法q不局限于计算机和|络
举一个计机领域外的例子Q在高能物理研究斚wQ很多实验每U钟都能几个TB的数据量。但因ؓ(f)处理能力和存储能力的不Q科学家不得不把l大部分未经处理的数据丢弃掉。可大家要知道,新元素的信息很有可能p在我们来不及(qing)处理的数据里面。同L(fng)Q在其他M领域里,法可以改变人类的生zR例如hcd因的研究Q就可能因ؓ(f)法而发明新的医疗方式。在国家安全领域Q有效的法可能避免下一?11的发生。在气象斚wQ算法可以更好地预测未来天灾的发生,以拯救生命?/p>
所以,如果你把计算机的发展攑ֈ应用和数据飞速增长的大环境下Q你一定会(x)发现Q算法的重要性不是在日益减小Q而是在日益加强?/p>
from: http://www.yuanma.org/data/2006/0824/article_1397.htm
密码学是理论计算机的一个很大的方向。之前准备先写密码学概论再提在hash函数破解上做出重大A(ch)献的王小云教授的工作Q不q前两天王小云获得求是杰出科学家奖以?00万奖?/a>Q在媒体上又掀起了一轮宣传狂潮,但是有些报道极端弱智Q错误百出,所以我机U正一下,q介l密码学的一个组成部分——hash函数Q以?qing)王云在这上面的工作?/p>
王小云的主要工作是关于hash函数的破解工作。她?005一个密码学?x)议上宣布破解了SHA-1Q震惊了全世界。所以要介绍和理解她的工作,先看一下hash函数具体是怎么回事?/p>
单的_(d)hash函数是把Q意长的输入字W串变化成固定长的输出字W串的一U函数。通俗得说Qhash函数用来生成信息的摘要。输出字W串的长度称为hash函数?strong>位数
目前应用最为广泛的hash函数?strong>SHA-1?strong>MD5Q大多是128位和更长?/p>
hash函数在现实生zM应用十分q泛。很多下载网站都提供下蝲文g的MD5码校验,可以用来判别文g是否完整。另外,比如在WordPress的数据库Q所有密码都是保存的MD5码,q样即数据库的理员也无法知道用户的原始密码,避免隐私泄露Q很多h在不同地斚w是用的同一个密码)?/p>
如果两个输入串的hash函数的gP则称q两个串是一?strong>撞(Collision)。既然是把Q意长度的字符串变成固定长度的字符Ԍ所以,必有一个输Z对应无穷多个输入Ԍ撞是必然存在的?/p>
一个“优良”的hash函数 f 应当满以下三个条gQ?/p>
上面的“非常困䏀的意思是除了枚D外不可能有别的更快的Ҏ(gu)。比如第3条,Ҏ(gu)生日定理Q要x到这L(fng)x1Qx2Q理Z需要大U?^(n/2)的枚举次数?/p>
几乎所有的hash函数的破解,都是指的破坏上面的第三条性质Q即扑ֈ一个碰撞(前两条都能被破坏的hash函数也太׃点,早就被h抛弃了)。在密码学上q有一个概忉|理论破解Q指的是提出一个算法,使得可以用低于理论值得枚Dơ数扑ֈ撞?/p>
王小云的主要工作是给ZMD5Q?a target="_blank">SHA-0的碰撞,以及(qing)SHA-1的理论破解,她证明了160位SHA-1Q只需要大U?^69ơ计就能找出来Q而理论值是2^80ơ。她的寻找MD5撞的方法是极端高效的。传说王云当时在会(x)议上把碰撞写出来Q结果被下面的h验证发现不对Q原来她把MD5法的一个步骤弄错了。但是她立马联系她的当时留在中国的学生,修正法Qƈ扑ֈ一个新的碰撞。这一个是对的?/p>
看到q里Q那些认Z国国安局应该这些结果封存作为秘密武器甚臛_想用q些成果来袭ȝ国之徒可以停住你们的YY了。这UŞ式上的破解,在大多数情况下没有实际性的作用。更别提MD5早就被美国h抛弃了?/p>
但是Q说q种破解一点实际意义都没有Q那׃׃q大密码学家的智商,密码学家不会(x)无缘无故的弄出碰撞这么一个概忉|。下面简单的介绍一下在特定情况下,怎么利用l定的碰撞来做坏?译?a target="_blank">Attacking Hash Functions)Q?/p>
Caesarl实?fn)生Alice叫写了一推荐信(letter)。同一天,Alice叫Caesar在推荐信上数字签名,q提供了一份推荐信的电(sh)子板。Caesar打开文gQ发现和原g一模一栗所以他在文件上{了名?/p>
几个月后QCaesar发现他的U密文g被非法察看。这到底是怎么回事呢?
a25f7f0b 29ee0b39 68c86073 8533a4b9
事实上,Alice要求Caesar{的文?a target="_blank">letter已经被Alice做了手脚Q准地_(d)Aliceq准备了另外一个文?a target="_blank">orderQ它们的MD5码完全一致。而Caesar的数字签名还依赖于MD5法Q所以Alice用order文g替换Letter文g之后QCaesar的数字签名依然有效。那orderlAlice提供了察看秘密文件的权限?/p>
具体的实现方法可?a target="_blank">Hash Functions and the Blind Passenger Attack。我在这里简单的解释一?只是大致思\Q具体实现方式,需要对文gl构信息有所了解)Q?/p>
letter文g的内Ҏ(gu)Q?/p>
if(x1==x1) show "letter" else show "order"
order文g的内Ҏ(gu)Q?/p>
if(x2==x1) show "letter" else show "order"
其中字符?letter"?order"代表两封信实际显C的内容。x1Qx2是一个MD5的碰撞?/p>
上面的方法,只供参考和学术用途,实际使用所引v的后果概不负责?/p>
参考:(x)
PSQ我跟王云老师的接触很,上过俩次她的讨论班而已Q亦感觉到王云老师的严谨和耐心。在d一个Turing奖获得者的演讲上,王小云提问的时候竟口而出“I ask who”的中式pQ在引v哄笑的同Ӟ我也极端佩服她的勇气。也许只有这h能做出非常好的工作吧?/p>
PS2: wikipedia在国内可以通过free_door览?
http://zhiqiang.org/blog/446.html
参阅: 王小?/a>,
上面的计进行P代,直到 Q(mo)n ?Q(mo)n-1之间的差别小于一个阈|如果计算没有聚合Q我们就在P代超q一定次数后停止。上?的第三副图,是5ơP代后的结果。表3时一些计方法,后面的实验表明,C比较好。A叫做 sparceQB叫做 exceptedQC叫做verbose
qo(h)
q代出的l果是一U[多匹配]Q可能包含有用的匚w子集?br /> 三个步骤Q?br /> 1。用E序定义的[限制条g]q行qo(h)?br /> 2。用双向图中的匹配上下文技术进行过?br /> 3。比较各U技术的有效性(满用户需求的能力Q?br /> 限制Q主要有两种Q一个是[cd]限制Q比如只考虑[列]的匹配(匚w双方都是列)。第二个?cardinality 限制Q即模式S1中的所有元素都要在S2中有一个映?br />
stable marriage问题Qn奛_n男配对,不存在这L(fng)两对 (x; y)?x0; y0)Q其中x喜欢 y0 胜过 yQ而且 y0 喜欢 x 胜过 x0。具有stable marriage的匹配结果的total satisfaction可能?x)比不具有stable marriage的匹配结果还低!
匚w质量的评?br />
基本的评估思想Q就是?用户对匹配结果做的修改越,匚w质量p高(修改l果包括L错误的pairQ加上正的pairQ?br /> n是找到的匚w敎ͼm是理想的匚w敎ͼc是用户作Z正的数目?br />
from: http://www.cnblogs.com/anf/archive/2006/08/15/477700.html
For example, we may have the results of measurements taken by experts on some widgets. For each widget we know what is the value for each measurement and what was decided, if to pass, scrap, or repair it. That is, we have a record with as non categorical attributes the measurements, and as categorical attribute the disposition for the widget.
Here is a more detailed example. We are dealing with records reporting on weather conditions for playing golf. The categorical attribute specifies whether or not to Play. The non-categorical attributes are:
ATTRIBUTE | POSSIBLE VALUES ============+======================= outlook | sunny, overcast, rain ------------+----------------------- temperature | continuous ------------+----------------------- humidity | continuous ------------+----------------------- windy | true, false ============+=======================
and the training data is:
OUTLOOK | TEMPERATURE | HUMIDITY | WINDY | PLAY ===================================================== sunny | 85 | 85 | false | Don't Play sunny | 80 | 90 | true | Don't Play overcast| 83 | 78 | false | Play rain | 70 | 96 | false | Play rain | 68 | 80 | false | Play rain | 65 | 70 | true | Don't Play overcast| 64 | 65 | true | Play sunny | 72 | 95 | false | Don't Play sunny | 69 | 70 | false | Play rain | 75 | 80 | false | Play sunny | 75 | 70 | true | Play overcast| 72 | 90 | true | Play overcast| 81 | 75 | false | Play rain | 71 | 80 | true | Don't PlayNotice that in this example two of the attributes have continuous ranges, Temperature and Humidity. ID3 does not directly deal with such cases, though below we examine how it can be extended to do so. A decision tree is important not because it summarizes what we know, i.e. the training set, but because we hope it will classify correctly new cases. Thus when building classification models one should have both training data to build the model and test data to verify how well it actually works.
A simpler example from the stock market involving only discrete ranges has Profit as categorical attribute, with values {up, down}. Its non categorical attributes are:
ATTRIBUTE | POSSIBLE VALUES ============+======================= age | old, midlife, new ------------+----------------------- competition | no, yes ------------+----------------------- type | software, hardware ------------+----------------------- and the training data is: AGE | COMPETITION | TYPE | PROFIT ========================================= old | yes | swr | down --------+-------------+---------+-------- old | no | swr | down --------+-------------+---------+-------- old | no | hwr | down --------+-------------+---------+-------- mid | yes | swr | down --------+-------------+---------+-------- mid | yes | hwr | down --------+-------------+---------+-------- mid | no | hwr | up --------+-------------+---------+-------- mid | no | swr | up --------+-------------+---------+-------- new | yes | swr | up --------+-------------+---------+-------- new | no | hwr | up --------+-------------+---------+-------- new | no | swr | up --------+-------------+---------+--------For a more complex example, here are files that provide records for a series of votes in Congress. The first file describes the structure of the records. The second file provides the Training Set, and the third the Test Set.
The basic ideas behind ID3 are that:
Definitions
If there are n equally probable possible messages, then the probability p of each is 1/n and the information conveyed by a message is -log(p) = log(n). [In what follows all logarithms are in base 2.] That is, if there are 16 messages, then log(16) = 4 and we need 4 bits to identify each message.
In general, if we are given a probability distribution P = (p1, p2, .., pn) then the Information conveyed by this distribution, also called the Entropy of P, is:
I(P) = -(p1*log(p1) + p2*log(p2) + .. + pn*log(pn))For example, if P is (0.5, 0.5) then I(P) is 1, if P is (0.67, 0.33) then I(P) is 0.92, if P is (1, 0) then I(P) is 0. [Note that the more uniform is the probability distribution, the greater is its information.]
If a set T of records is partitioned into disjoint exhaustive classes C1, C2, .., Ck on the basis of the value of the categorical attribute, then the information needed to identify the class of an element of T is Info(T) = I(P), where P is the probability distribution of the partition (C1, C2, .., Ck):
P = (|C1|/|T|, |C2|/|T|, ..., |Ck|/|T|)
In our golfing example, we have Info(T) = I(9/14, 5/14) = 0.94,
and in our stock market example we have Info(T) = I(5/10,5/10) = 1.0.
If we first partition T on the basis of the value of a non-categorical attribute X into sets T1, T2, .., Tn then the information needed to identify the class of an element of T becomes the weighted average of the information needed to identify the class of an element of Ti, i.e. the weighted average of Info(Ti):
|Ti| Info(X,T) = Sum for i from 1 to n of ---- * Info(Ti) |T|
In the case of our golfing example, for the attribute Outlook we have
Info(Outlook,T) = 5/14*I(2/5,3/5) + 4/14*I(4/4,0) + 5/14*I(3/5,2/5) = 0.694
Consider the quantity Gain(X,T) defined as
Gain(X,T) = Info(T) - Info(X,T)
This represents the difference between the information needed to identify an element of T and the information needed to identify an element of T after the value of attribute X has been obtained, that is, this is the gain in information due to attribute X.
In our golfing example, for the Outlook attribute the gain is:
Gain(Outlook,T) = Info(T) - Info(Outlook,T) = 0.94 - 0.694 = 0.246.
If we instead consider the attribute Windy, we find that Info(Windy,T) is 0.892 and Gain(Windy,T) is 0.048. Thus Outlook offers a greater informational gain than Windy.
We can use this notion of gain to rank attributes and to build decision trees where at each node is located the attribute with greatest gain among the attributes not yet considered in the path from the root.
The intent of this ordering are twofold:
The ID3 Algorithm
The ID3 algorithm is used to build a decision tree, given a set of non-categorical attributes C1, C2, .., Cn, the categorical attribute C, and a training set T of records.
function ID3 (R: a set of non-categorical attributes, C: the categorical attribute, S: a training set) returns a decision tree; begin If S is empty, return a single node with value Failure; If S consists of records all with the same value for the categorical attribute, return a single node with that value; If R is empty, then return a single node with as value the most frequent of the values of the categorical attribute that are found in records of S; [note that then there will be errors, that is, records that will be improperly classified]; Let D be the attribute with largest Gain(D,S) among attributes in R; Let {dj| j=1,2, .., m} be the values of attribute D; Let {Sj| j=1,2, .., m} be the subsets of S consisting respectively of records with value dj for attribute D; Return a tree with root labeled D and arcs labeled d1, d2, .., dm going respectively to the trees ID3(R-{D}, C, S1), ID3(R-{D}, C, S2), .., ID3(R-{D}, C, Sm); end ID3;
In the Golfing example we obtain the following decision tree:
Outlook / | \ / | \ overcast / |sunny \rain / | \ Play Humidity Windy / | | \ / | | \ <=75 / >75| true| \false / | | \ Play Don'tPlay Don'tPlay Play In the stock market case the decision tree is: Age / | \ / | \ new/ |mid \old / | \ Up Competition Down / \ / \ no/ \yes / \ Up Down
Here is the decision tree, just as produced by c4.5, for the voting example introduced earlier.
In building a decision tree we can deal with training sets that have records with unknown attribute values by evaluating the gain, or the gain ratio, for an attribute by considering only the records where that attribute is defined.
In using a decision tree, we can classify records that have unknown attribute values by estimating the probability of the various possible results. In our golfing example, if we are given a new record for which the outlook is sunny and the humidity is unknown, we proceed as follows:
We can deal with the case of attributes with continuous ranges as follows. Say that attribute Ci has a continuous range. We examine the values for this attribute in the training set. Say they are, in increasing order, A1, A2, .., Am. Then for each value Aj, j=1,2,..m, we partition the records into those that have Ci values up to and including Aj, and those that have values greater than Aj. For each of these partitions we compute the gain, or gain ratio, and choose the partition that maximizes the gain. Pruning of the decision tree is done by replacing a whole subtree by a leaf node. The replacement takes place if a decision rule establishes that the expected error rate in the subtree is greater than in the single leaf. For example, if the simple decision tree
is obtained with one training red success record and two training blue Failures, and then in the Test set we find three red failures and one blue success, we might consider replacing this subtree by a single Failure node. After replacement we will have only two errors instead of five failures.
Winston shows how to use Fisher's exact test to determine if the category attribute is truly dependent on a non-categorical attribute. If it is not, then the non-categorical attribute need not appear in the current path of the decision tree.
Quinlan and Breiman suggest more sophisticated pruning heuristics.
It is easy to derive a rule set from a decision tree: write a rule for each path in the decision tree from the root to a leaf. In that rule the left-hand side is easily built from the label of the nodes and the labels of the arcs.
The resulting rules set can be simplified:
Let LHS be the left hand side of a rule. Let LHS' be obtained from LHS by eliminating some of its conditions. We can certainly replace LHS by LHS' in this rule if the subsets of the training set that satisfy respectively LHS and LHS' are equal.
A rule may be eliminated by using metaconditions such as "if no other rule applies".
Using Gain Ratios
The notion of Gain introduced earlier tends to favor attributes that have a large number of values. For example, if we have an attribute D that has a distinct value for each record, then Info(D,T) is 0, thus Gain(D,T) is maximal. To compensate for this Quinlan suggests using the following ratio instead of Gain:
Gain(D,T)
GainRatio(D,T) = ----------
SplitInfo(D,T)
where SplitInfo(D,T) is the information due to the split of T on the basis
of the value of the categorical attribute D. Thus SplitInfo(D,T) is
I(|T1|/|T|, |T2|/|T|, .., |Tm|/|T|)
where {T1, T2, .. Tm} is the partition of T induced by the value of D.
In the case of our golfing example SplitInfo(Outlook,T) is
-5/14*log(5/14) - 4/14*log(4/14) - 5/14*log(5/14) = 1.577
thus the GainRatio of Outlook is 0.246/1.577 = 0.156. And
SplitInfo(Windy,T) is
-6/14*log(6/14) - 8/14*log(8/14) = 6/14*0.1.222 + 8/14*0.807
= 0.985
thus the GainRatio of Windy is 0.048/0.985 = 0.049
You can run PAIL to see how ID3 generates the decision tree [you need to have an X-server and to allow access (xhost) from yoda.cis.temple.edu].
C4.5 Extensions
C4.5 introduces a number of extensions of the original ID3 algorithm.
We move from the Outlook root node to the Humidity node following
the arc labeled 'sunny'. At that point since we do not know
the value of Humidity we observe that if the humidity is at most 75
there are two records where one plays, and if the humidity is over
75 there are three records where one does not play. Thus one
can give as answer for the record the probabilities
(0.4, 0.6) to play or not to play.
In our Golfing example, for humidity, if T is the training set, we determine the information for each partition and find the best partition at 75. Then the range for this attribute becomes {<=75, >75}. Notice that this method involves a substantial number of computations.
Pruning Decision Trees and Deriving Rule Sets
The decision tree built using the training set, because of the way it was built, deals correctly with most of the records in the training set. In fact, in order to do so, it may become quite complex, with long and very uneven paths.
Color
/ \
red/ \blue
/ \
Success Failure
You can run the C45 program here [you need to have an X-server and to allow access (xhost) from yoda.cis.temple.edu].
Classification Models in the Undergraduate AI Course
It is easy to find implementations of ID3. For example, a Prolog program by Shoham and a nice Pailmodule.
The software for C4.5 can be obtained with Quinlan's book. A wide variety of training and test data is available, some provided by Quinlan, some at specialized sites such as the University of California at Irvine.
Student projects may involve the implementation of these algorithms. More interesting is for students to collect or find a significant data set, partition it into training and test sets, determine a decision tree, simplify it, determine the corresponding rule set, and simplify the rule set.
The study of methods to evaluate the error performance of a decision tree is probably too advanced for most undergraduate courses.
Breiman,Friedman,Olshen,Stone: Classification and Decision Trees Wadsworth, 1984 A decision science perspective on decision trees. Quinlan,J.R.: C4.5: Programs for Machine Learning Morgan Kauffman, 1993 Quinlan is a very readable, thorough book, with actual usable programs that are available on the internet. Also available are a number of interesting data sets. Quinlan,J.R.: Simplifying decision trees International Journal of Man-Machine Studies, 27, 221-234, 1987 Winston,P.H.: Artificial Intelligence, Third Edition Addison-Wesley, 1992 Excellent introduction to ID3 and its use in building decision trees and, from them, rule sets.
ingargiola@cis.temple.edu
from: http://www.cis.temple.edu/~ingargio/cis587/readings/id3-c45.html
有mQm?Q个球,Cؓ(f)q1、q2、…、qmQ其中有且仅有一个坏球,光量与其他的不同,C用无砝码的天q行称量,令n为称量次敎ͼ问:(x)能确保找到坏球ƈ指出它与好球的轻重关pȝn的最值是多少Q?/font>
先来看理Z要多次。每ơ称量有左边轅R^衡和双d3U可能的情况Q而坏球的可能l果有q1轅Rq1重、q2轅Rq2重、…、qm轅Rqm重等?mU。因此,Ҏ(gu)商农的信息论Q此问题的熵是需要的U量ơ数Q又因ؓ(f)n是整敎ͼ所以有Q?img alt="" hspace="0" src="http://blog.vckbase.com/images/vckbase_com/localvar/701/o_ball-01.gif" border="0" />
不过理论l归是理论,直接拿到现实生活中往往行不通。一个很单的情况Q?个球Q上面的公式?ơ称量就够了。但你可以想惛_法,反正我是没找Cơ解决问题的Ҏ(gu)?
那,是理论错了吗Q唔Q我可不敢怀疑商农,我只敢怀疑我自己。来看看我们错在哪了吧。对4个球的情况,W一ơ称量只有两个可选的Ҏ(gu)Q方?Qq1攑ַ盘,q2攑֏盘。若不^衡(׃对称性,只分析左边轻的情况,下同Q,则可能的l果q剩q1dq2重,再称一ơ就能找到坏球;若^衡,则可能的l果q剩q3轅Rq3重、q4dq4?个,再套用一下商农的定理Q此时还要称ơ。所以方?被否冟뀂方?Qq1、q2攑ַ盘,q3、q4攑֏盘。此时天q定不?x)^衡,U量后,可能的结果有q1轅Rq2轅Rq3重和q4?个。同L(fng)道理Q方?也难逃被否决的命q?/font>
?个球q么单的情况下就撞得满头是包Q未免让人难以接受,ȝ一下经验教训吧Q把上面的分析归U一下ƈ推广C般情况,是Q整个称量过E中Q要辑ֈ目的Q倒数Wkơ称量前的可能结果数hQ必Lx件h?k?/font>
上面的得出的l论虽然不能让我们找到问题的{案Q但却有助于我们定每次U量的方案,特别是第一ơ如何做。假设我们计划的U量ơ数是nQ第一ơ在左右两盘中各放x个球Q则保证下面两个不等式同时成立是解决问题的必要条Ӟ(x)
2(m-2x)?n-1 Q^衡时Q?/font>
2x?n-1 Q不qӞ
把这两个不等式稍加变换,成了下面的样子Q?/font>
注意到x是整敎ͼ3n-1是奇敎ͼ2m是偶敎ͼ所以上面的不等式等价于Q?/font>
昄Q在n一定的情况下,m大Qx的取D围越,而当x只能取?img alt="" hspace="0" src="http://blog.vckbase.com/images/vckbase_com/localvar/701/o_ball-05.gif" border="0" />Ӟml箋增大Q就?x)导致nơ称量找到坏球的计划破。籍此,可以得出在n一定的情况下m的取D_(d)(x)。发C吗?现在m的最大值正好比我们最初的l果了1。同时此l果也与前面提到?个球的实际情늛W?/font>
但分析了半天Q我们只证明了m不在取D围内Ӟnơ称量不能确保找到坏球。那m在取D围内的时候,肯定能找到吗Q答案是肯定的,不过马上证明它有炚wQ先来看两个单一点的命题?/font>
命题1Q有A、B两组球,球的个数分别为a、bQ且0≤b-a?Q已知这些球中有且仅有一个坏球,若它在Al中Q则比正常球轻,在Bl中则比正常球重。另有一个好球。先使用无砝码的天^U量Qo(h)Q则可以扑ֈ一个称量方案,使得最多经qnơ称量,可以找到坏球(此时肯定能指出它与好球的重量关系Q?/font>
使用数学归纳法证明如下:(x)
①当n=1Ӟa、b的取值可能有{0Q?}、{1Q?}、{1Q?}三组Q由于还有一个已知的好球Q所以不N证此时命题成立?br /> ②假讑ֽn=k时命题也成立?br /> ③当n=k+1时。我们将A、B两组球分别尽量^均得分ؓ(f)三组Q记为A1、A2、A3、B1、B2和B3。不影响一般性,假设q六l球按球C到多的排列ơ序也与前面的顺序一_(d)且A1有球a1个。则W一ơ称量时的称量方案与每组球个数的对应关系如下Q其中需要注意的是:(x)在带蓝色的两U情况下Q必?img alt="" hspace="0" src="http://blog.vckbase.com/images/vckbase_com/localvar/701/o_ball-08.gif" border="0" />Q否则就与命题的前提不符了?/font>
A1 | A2 | A3 | B1 | B2 | B3 | U量Ҏ(gu) |
a1 | a1 | a1 | a1 | a1 | a1 | A1、B1攑ַ盘;A2、B2攑֏? |
a1 | a1 | a1 | a1 | a1 | a1+1 | A1、B1攑ַ盘;A2、B2攑֏? |
a1 | a1 | a1+1 | a1 | a1 | a1+1 | A1、B3攑ַ盘;A3、B1攑֏? |
a1 | a1 | a1+1 | a1 | a1+1 | a1+1 | A1、B2攑ַ盘;A2、B3攑֏? |
a1 | a1+1 | a1+1 | a1 | a1+1 | a1+1 | A2、B2攑ַ盘;A3、B3攑֏? |
a1 | a1+1 | a1+1 | a1+1 | a1+1 | a1+1 | A2、B2攑ַ盘;A3、B3攑֏?/font> |
很明显,不管l果是什么,W一ơ称量之后,问题都能转化为n=k时的情Ş。所以,命题1是真命题?/font>
前面已经证明Ӟnơ称量无法确保找到坏球ƈ指出其轻重关pR但如果此时也有一个已知的好球的话Q答案就不一样了Q这时nơ称量就已经_Q命?Q。仍使用数学归纳法?/font>
①当n=2Ӟm=4Q验证一下可知命题成立。?br /> ②假讑ֽn=k时命题也成立。?br /> ③当n=k+1时。我们把q些球尽量^均的分成三组Q则每组球的个数分别为:(x)?img alt="" hspace="0" src="http://blog.vckbase.com/images/vckbase_com/localvar/701/o_ball-10.gif" border="0" />?img alt="" hspace="0" src="http://blog.vckbase.com/images/vckbase_com/localvar/701/o_ball-11.gif" border="0" />。第一ơ称量时Q第一l和那个好球攑ַ盘,W三l放右盘。若qQ问题{化ؓ(f)n=k时的情ŞQ不qQ问题{化ؓ(f)命题1的情形。命题成立?
有了前面两个证明作基Q最初的问题很单了Q再ơ祭出数据学归纳法。由于m<5时的情况有些Ҏ(gu)(考虑只有一个球或两个球的情?Q不能作为递推得依据,所以我们从n=3Q也是m=5开始?/font>
①当n=3Ӟm??2之间Q?3的情况已l被排除在外Q,通过一一验证可知命题成立。?br /> ②假讑ֽn=k时命题也成立。?br /> ③当n=k+1Ӟ扑ֈ一个满不{式的xQ在天^左右两盘中各放x个球。如果天q_^衡,问题转化为n=k时的情Ş或命?中的情ŞQ不qQ则转化为命?的情形。命题成立?/font>
lg所qͼU球问题的完整答案是Q当球数Ӟnơ称量时p保扑ֈ坏球Qƈ指出它与好球的轻重关p;当球?img alt="" hspace="0" src="http://blog.vckbase.com/images/vckbase_com/localvar/701/o_ball-09.gif" border="0" />Ӟnơ称量只能确保找到坏球,而无法指出它与好球的轻重关系。要x重关p,可能需要多q行一ơ称量。但如果此时再有一个好球,又可以把这ơ称量省掉了?br />
from: http://blog.vckbase.com/localvar/archive/2005/07/17/9717.aspx
| ||||
|
一. 信号量的概念 | |||
| ||||
| ||||
|
? 实例 | |||
| ||||
|
一. 信号量的概念 |
1Q?信号量的cd定义 |
每个信号量至须记录两个信息Q信号量的值和{待该信号量的进E队列。它的类型定义如下:(x)Q用cPASCAL语言表述Q?BR> semaphore = record value: integer; queue: ^PCB; end; 其中PCB是进E控制块Q是操作pȝ为每个进E徏立的数据l构?BR>s.value>=0Ӟs.queue为空Q? s.value<0Ӟs.value的绝对gؓ(f)s.queue中等待进E的个数Q?BR> |
q回 |
|
2Q?PV原语 |
对一个信号量变量可以q行两种原语操作Qp操作和v操作Q定义如下:(x) procedure p(var s:samephore); { s.value=s.value-1; if (s.value<0) asleep(s.queue); } procedure v(var s:samephore); { s.value=s.value+1; if (s.value<=0) wakeup(s.queue); } 其中用到两个标准q程Q?BR> asleep(s.queue);执行此操作的q程的PCBq入s.queueNQ进E变成等待状?BR> wakeup(s.queue);s.queue头进E唤醒插入就l队?BR>s.value初gؓ(f)1Ӟ可以用来实现q程的互斥?BR>p操作和v操作是不可中断的E序D,UCؓ(f)原语。如果将信号量看作共享变量,则pv操作为其临界区,多个q程不能同时执行Q一般用gҎ(gu)保证。一个信号量只能|一ơ初|以后只能对之q行p操作或v操作?BR>由此也可以看刎ͼ信号量机制必L公共内存Q不能用于分布式操作pȝQ这是它最大的q?BR> |
q回 |
|
? 实例 |
1Q?生?消费者问题(有bufferQ? |
问题描述Q?BR> 一个仓库可以存放K件物品。生产者每生一件品,品放入仓库,仓库满了停止生产。消费者每ơ从仓库中去一件物品,然后q行消费Q仓库空时就停止消费?BR>解答Q?BR> q程QProducer - 生者进E,Consumer - 消费者进E? 共有的数据结构:(x) buffer: array [0..k-1] of integer; in,out: 0..k-1; ?in记录W一个空~冲区,out记录W一个不I的~冲? s1,s2,mutex: semaphore; ?s1控制~冲Z?s2控制~冲ZI?mutex保护临界区; 初始化s1=k,s2=0,mutex=1 producerQ生产者进E)Q? Item_Type item; { while (true) { produce(&item); p(s1); p(mutex); buffer[in]:=item; in:=(in+1) mod k; v(mutex); v(s2); } } consumerQ消费者进E)Q? Item_Type item; { while (true) { p(s2); p(mutex); item:=buffer[out]; out:=(out+1) mod k; v(mutex); v(s1); consume(&item); } } 例程演示 |
q回 |
|
2Q?W一c读-写者问? |
问题描述Q?BR> 一些读者和一些写者对同一个黑板进行读写。多个读者可同时读黑板,但一个时d能有一个写者,读者写者不能同时用黑ѝ对使用黑板优先U的不同规定使读?写者问题又可分为几cR第一c问题规定读者优先较高Q仅当无读者时允许写者用黑ѝ?BR>解答Q?BR> q程Qwriter - 写者进E,reader - 读者进E? 共有的数据结构:(x) read_account:integer; r_w,mutex: semaphore; ?r_w控制谁用黑?mutex保护临界区,初值都? reader - (读者进E)Q? { while (true) { p(mutex); read_account++; if(read_account=1) p(r_w); v(mutex); read(); p(mutex); read_account--; if(read_account=0) v(r_w);; v(mutex); } } writer - (写者进E)Q? { while (true) { p(mutex); write(); v(mutex); } } 例程演示 |
q回 |
|
3Q?哲学安? |
问题描述Q?BR> 一个房间内?个哲学家Q他们的生活是思考和q食。房间里有一张圆桌,中间攄一盘通心_(假定通心_无限多Q。桌子周围放有五把椅子,分别属于五位哲学家每两位哲学家之间有一把叉子,哲学家进食时必须同时使用左右两把叉子?BR>解答Q?BR> q程Qphilosopher - 哲学? 共有的数据结?amp;q程Q?BR> state: array [0..4] of (think,hungry,eat); ph: array [0..4] of semaphore; ?每个哲学家有一个信号量Q初gؓ(f)0 mutex: semaphore; ?mutex保护临界区,初?1 procedure test(i:0..4); { if ((state[i]=hungry) and (state[(i+1)mod 5]<>eating) and (state[(i-1)mod 5]<>eating)) { state[i]=eating; V(ph[i]); } } philosopher(i:0..4)Q? { while (true) { think(); p(mutex); state[i]=hungry; test(i); v(mutex); p(ph[i]); eat(); p(mutex); state[i]=think; test((i-1) mod 5); test((i+1) mod 5); v(mutex); } } 例程演示 |
q回 |
什么是计算与计的cd
在大众的意识里,计算首先指的是数的加减乘除Q其ơ则为方E的求解、函数的微分U分{;懂的多一点的人知道,计算在本质上q包括定理的证明推导。可以说Q“计”是一个无Z知元Z晓的数学概念Q但是,真正能够回答计算的本质是什么的人恐怕不多。事实上Q直?930q代Q由于哥德尔QK.GodelQ?906-1978Q、丘?A.ChurchQ?903-1995)、图?A.M.TUI-ingQ?912-1954){数学家的工作,Z才弄清楚什么是计算的本质,以及(qing)什么是可计的、什么是不可计算的等Ҏ(gu)性问题?/FONT>
抽象地说Q所谓计,是从一个符号串f变换成另一个符号串g。比如说,从符号串12+3变换?5是一个加法计。如果符号串f?IMG height=15 src="http://cfc.nankai.edu.cn/readings/image/lijie/1.jpg" width=15>Q而符号串g?x,从f到g的计就是微分。定理证明也是如此,令f表示一l公理和推导规则Qo(h)g是一个定?那么从f到g的一pd变换是定理g的证明。从q个角度看,文字译也是计算Q如f代表一个英文句子,而g为含意相同的中文句子Q那么从f到g是把英文翻译成中文。这些变换间有什么共同点Qؓ(f)什么把它们都叫做计?因ؓ(f)它们都是从己知符??开始,一步一步地改变W号(?Q经q有限步骤,最后得C个满预先规定的W号(?的变换过E?/FONT>
从类型上Ԍ计算主要有两大类Q数D和W号推导。数D包括实数和函数的加减乘除、幕q算、开方运、方E的求解{。符h导包括代C各种函数的恒{式、不{式的证?几何命题的证明等。但无论是数D还是符h?它们在本质上是等L(fng)、一致的Q即二者是密切兌的,可以怺转化Q具有共同的计算本质。随着数学的不断发?q可能出现新的计类型?/FONT>
计算的实质与E奇-囄论点
Z回答I竟什么是计算、什么是可计性等问题Qh们采取的是徏立计模型的Ҏ(gu)。从20世纪30q代?0q代Q数理逻辑学家相提出了四U模型,它们是一般递归函数、d计算函数、图灉|和L斯特(E.L.PostQ?897-1954)pȝ。这U种模型完全从不同的角度探究计算q程或证明过E,表面上看区别很大Q但事实上却是等L(fng)Q即它们完全h一L(fng)计算能力D在这一事实基础上,最lŞ成了如今著名的丘?囄论点Q凡是可计算的函数都是一般递归函数(或是囄机可计算函数{?。这q立了计算与可计算性的数学含义。下面主要对一般递归函数作一要介l?/FONT>
哥d?dng)首先?931q提Z原始递归函数的概c(din)所谓原始递归函数,是由初始函数出发,l过有限ơ的使用代h与原始递归式而做出的函数。这里所说的初始函数是指下列三种函数Q?/FONT>
(1) 零函?(x)=0(函数值恒为零)Q?/FONT>
(2) 媄函数(x1,x2,?xn)=xi(1≤i≤n)(函数的gWi个自变元的值相?Q?/FONT>
后函数S(x)=x+1(其gؓ(f)x的直接后l数)?/FONT>
代h与原始递归式是构造新函数的算子?/FONT>
代h(又名叠置、P|?Q它是最单又最重要的算?其一般Ş式是:׃个m元函数f与m个n元函数g1Qg2Q…,gm造成新函数f(g1(x1,x2,?xn),g2(x1,x2,?xn),?gm(x1,x2,?xn))?/FONT>
原始递归式,其一般Ş式ؓ(f)
Ҏ(gu)Cؓ(f)
其特Ҏ(gu)Q不能由g,h两已知函数直接计新函数的一般值f(u,x),而只能依ơ计f(u,0)Qf(u,1)Qf(u,2)Q…;但只要依ơ计,必能把Q何一个f(u,x)Q对值都出来。换句话_(d)只要g,h有定义且可计,则新函数f也有定义且可计算?/FONT>
Ҏ(gu)埃尔布朗(J.HerbrandQ?908-1931)一信的暗C,哥d?dng)?934q引q了一般递归函数的概c(din)后l克?S.C.KleeneQ?909-1994)的改q与阐明Q便出现了现在普遍采用的定义。所谓一般递归函数Q就是由初始函数出发Q经q有限次使用代h、原始递归式和μ子而做成的有定义的函数?q里的μ算子就是造逆函数的子或求根算子?/FONT>
如此定义的一般递归函数比原始递归函数更广Q这是没有Q何疑问的。但是,Zq是可以问:(x)q样定义的函数是否已l包括了所有直观上的可计算函数Q如果还有更q的可计函数又该怎样定义Q在受到q类问题困惑的同Ӟ丘奇、克林又提出了一cd计算函数Q叫做d计算函数。但事隔不久Q丘奇和克林便分别证明了λ可计函数正好就是一般递归函数Q即q两cd计算函数是等L(fng)、一致的。在q一有力的证据基上,丘奇?936q公开发表了他早在两年前就孕育q的一个论点,卌名的丘奇论点Q每个能行地可计的函数都是一般递归函数?/FONT>
与此同时Q图灵定义了另一cd计算函数Q叫做图灉|可计性函?q且提出了著名的囄论点Q能行可计算函数都是用图灉|可计的函数。图灉|是图灉|出的一U计模型,或一台理机口它可以说是对hc计与机器计算的最一般、最高度的抽象。一q后Q图灵进一步证明了囄机可计算函数与d定义函数是一致的Q当然也和一般递归函数一致、等仗于是,表面上不同的三类可计函数在本质上就是一cR这样一来,丘奇论点和图灵论点也是一回事了,现将它们合称Z?囄论点Q即直观的能行可计算函数{同于一般递归函数、可λ定义函数和图灉|可计函数?/FONT>
丘奇Q图灵论点的提出Q标志着人类对可计算函数与计本质的认识辑ֈ了空前的高度Q它是数学史上一块夺目的里程?/FONT>
一般递归函数比较抽象Qؓ(f)此给ZU较为直观的解释。大家知道,凡能够计的Q即使是“心”,d以把其计过E记录下来,而且是逐个步骤逐个步骤地记录下来。所谓计过E,是指从初始符h已知W号开始,一步一步地改变(变换)W号Q最后得C个满预先规定的条g的符Pq从该符h照一定方法得到所求结果,x求函数的值的全过E。可如此计算的函敎ͼ一般称为可以在有限步骤内计的函数。现已证明:(x)凡是可以从某些初始符号开始,而在有限步骤内计的函数都是递归函数。由此可以看刎ͼ“能够记录下来”便W合了可计算性或递归性的本质要求。一般递归函数的实质也由此昑־十分直观易懂?/FONT>
丘奇Q图灵论点的提出与确认,在数学和计算机科学上h重大的理论和现实意义。正如我国数理逻辑专家莫绍揆教授所aQ有了这个论点以后,可以断定某些问题是不能能行地解x不能能行地判定的。对于计机U学Q丘?囄论点的意义在于它明确ȝ了计机的本质或计算机的计算能力Q确定了计算机只能计一般递归函数Q对于一般递归函数之外的函敎ͼ计算机是无法计算的?/FONT>
DNA计算:新型计算方式的出?/FONT>
1994q?1月,国计算机科学家阿d勒曼(L.Adleman)在美国《科学》上公布DNA计算机的理论Qƈ成功q用DNA计算决了一个有向哈密顿路径问题?DNA计算机的提出Q生于q样一个发玎ͼ即生物与数学的相似性:(x)(1)生物体异常复杂的l构是对由DNA序列表示的初始信息执行简单操?复制、剪?的结果;(2)可计函数f(ω)的结果可以通过在ω上执行一pd基本的简单函数而获得?/FONT>
阿d勒曼不仅意识到这两个q程的相似性,而且意识到可以利用生物过E来模拟数学q程。更切地说是,DNA串可用于表示信息Q酶可用于模拟简单的计算。这是因为:(x)首先QDNA是由UC核昔酸的一些单元组成,q些核昔酔R着附在其上的化学组或基的不同而不同。共有四U基Q腺嘌呤、鸟嘌呤、胞(yu)嘧啶和胸腺嘧Ӟ分别用A、G、C、T表示。单链DNA可以看作是由W号A、G、C、Tl成的字W串。从数学上讲Q这意味着可以用一个含有四个字W的字符集∑ =A、G、C、T来ؓ(f)信息~码(?sh)子计算Z使用0?q两个数?。其ơ,DNA序列上的一些简单操作需要酶的协助,不同的酶发挥不同的作用。v作用的有四种Ӟ(x)限制性内切酶Q主要功能是切开包含限制性位点的双链DNAQDNAq接?它主要是把一个DNA铄端点同另一个链q接在一PDNA聚合?它的功能包括DNA的复制与促进DNA的合成;外切Ӟ它可以有选择地破坏双链或单链DNA分子。正是基于这四种酶的协作实现了DNA计算?/FONT>
不过Q目前DNA计算够处理的问题Q还仅仅是利用分子技术解决的几个特定问题Q属一ơ性实验。DNA计算没有一个固定的E式。由于问题的多样性,D所采用的分子生物学技术的多样性,具体问题需要设计具体的实验Ҏ(gu)口这便引Z两个Ҏ(gu)性问?也是阿d勒曼最早意识到?Q?1)DNA计算机可以解军_些问题确切地_(d)DNA计算机是完备的吗Q即通过操纵DNA能完成所有的(囄?可计函数吗Q?2)是否可设计出可编E序的DNA计算机?x否存在类g?sh)子计算机的通用计算模型——图灉|——那L(fng)通用DNApȝ(模型)Q目前,Z正处在对q两个根本性问题的研究q程之中口在W者看来,q就cM于在?sh)子计算生之前?0世纪三四十年代理机的研IDc(din)如今,已经提出了多UDNA计算模型Q但各有千秋Q公认的DNA计算机的“图灉|”还没有诞生。相对而言Q一U被UCؓ(f)“剪接系l”的DNA计算机模型较为成功?/FONT>
有了“剪接系l”这个DNA计算机的数学模型后,便可以来回答前面提出的DNA计算的完备性与通用性问题。前面讲q,丘奇-囄论点深刻地刻MM实际计算机的计算能力——Q何可计算函数都是可由囄的函数(一般递归函数)。现已证明:(x)剪接pȝ是计完备的Q即M可计函数都可用剪接pȝ来计D反之亦然。这回{了DNA计算机可以解军_些问题——全部图灉|可计问题。至于是否存在基于剪接的可编E计机Q也有了肯定的答案:(x)Ҏ(gu)个给定的字符集TQ都存在一个剪接系l,其公理集和规则集都是有限的,而且对于以T为终l字W集的一cȝl是通用的。这是_(d)理论上存在一个基于剪接操作的通用可编E的DNA计算机。这些计机使用的生物操作只有合成、剪?切割-q接)和抽取?/FONT>
DNA计算机理论的出现意味着计算方式的重大变革。当Ӟ引v计算方式重大变革的远不止DNA计算机,光学计算机、量子计机、蛋白质计算机等新型计算机模型层ZIP它们使原有的计算方式发生了前所未有的变化?/FONT>
计算方式?qing)其演?/FONT>
单地Ԍ所谓计方式就是符号变换的操作方式Q尤其指最基本的动作方式。广义地Ԍq应包括W号的蝲体或W号的外在表现Ş式,亦即信息的表征或表达。比如,中国古代的筹,是用一l竹表征的计算方式Q后来的珠算则是用算盘或珠表征的计方式,再后来的W算又是一U用文字W号表征的计方式,q一pd计算方式的变化,表现方式的多样性与不断q化的趋ѝ相对于后来出现的机器计方式,上述各种计算方式均可归结为“手工计方式”,其特Ҏ(gu)用手工操作符P实施W号的变换?/FONT>
不过Q真正具有革命性的计算方式Q还是随着?sh)子计算机的产生才出现的。机器计的历史可以q溯?641q_(d)当年18岁的法国数学家帕斯卡从机械时钟得到启C:(x)齿轮也能计数Q于是成功地制作了一台轮传动的八位加法计算机口q人类计算方式、计技术进入了一个新的阶Dc(din)后来经qh们数癑ֹ的艰辛努力,l于?945q成功研制出了世界上W一台电(sh)子计机。从此,人类q入了一个全新的计算技术时代?/FONT>
从最早的帕斯卡轮机C天最先进的电(sh)子计机Q计机已经历了四大发展时期。计技术有了长的发展。这时计表Cؓ(f)一U物理性质的机械的操作q程。符号不再是用竹、算珠、字母表征,而是用轮表征,用电(sh)表征,用电(sh)压表征等{。但是,无论是手工计还是机器计,其计方式——操作的基本动作都是一U物理性质的符号变?具体是由“加”“减”这U基本动作构?。二者的区别在于Q前者是手工的,q算速度比较慢;后者则是自动的Q运速度极快?/FONT>
如今出现的DNA计算无疑有着更大的本质性变化,计算不再是一U物理性质的符号变换,而是一U化学性质的符号变换,即不再是物理性质的“加”“减”操作,而是化学性质的切割和_脓(chung)、插人和删除。这U计方式将d改变计算机硬件的性质Q改变计机基本的运作方式,其意义将是极为深q的。阿德勒曼在提出DNA计算机的时候就怿QDNA计算机所蕴涵的理念可使计的方式产生q化?/FONT>
量子计算机在理论上的出现Q计算方式的进化又有了新的可能。电(sh)子计机的理论模型是l典的通用囄机——一U确定型囄机,量子计算机的理论模型——量子图灉|则是一U概率型囄机。直观一些说Q传l电(sh)脑是通过芯片上微型晶体电(sh)位的“开”和“关”状态来表达二进位制??Q从而进行信息数据的处理和储存。每个电(sh)位只能处理一个数据,??Q许多个?sh)位依次串连hQ才能共同完成一ơ复杂的q算。这U线性计方式遵循普通的物理学原则,h明显的局限性。而量子计机的运方式则建立在原子运动的层面上,H破了分子物理的界限。根据量子论原理Q原子具有在同一时刻处于两个不同位置、又同时向上下两个相反方向旋转的Ҏ(gu),UCؓ(f)“量子超态”。而一旦有外力q扰Q模p运动的原子又可以马上归于准的定位。这U似是而非的沌状态与Z熟知的常规世界相矛盾Q但如果利用其表达信息,却能发挥出其瞬息之间千变万化而又万变不离其宗的神奇功效。因为当许多个量子状态的原子U缠在一hQ它们又因量子位的“叠加性”,可以同时一起展开“ƈ行计”,从而其具备超高速的q算能力。电(sh)子线性计方式如同万只蜗牛排队过独木桥,而量子ƈ行运好比万只飞鸟同时升上天I?/FONT>
计算方式演变的意?/FONT>
计算方式的不断进化有着十分重要的理论意义和现实意义Q笔者认表明以下两斚w。其一Q计方式是一U历史的l果Q而非计算本性的逻辑必然。加拿大的卡?L.Kari)指出Q“DNA计算是考察计算问题的一U全新方式。或许这正是大自然做数学的方法:(x)不是用加和减Q而是用切割和_脓(chung)、用插入和删除。正如用十进制计数是因ؓ(f)我们有十个手指那P或许我们目前计算中的基本功能仅因Zhcd史然。正如h们已l采用其他进制计CP或许现在是考虑其他的计方式的时候了。”笔者以为,q一说法是很有启C性的。确实,仔细回顾一下hc计方式或计算技术的历史,׃难体?x)到计算方式是一U历史的l果Q而非计算本性的逻辑必然?/FONT>
也就是说Q计之所以ؓ(f)计算Q在于它h一U根本的递归性,或在于它是一U可一步一步进行的W号串变换操作。至于这U符号变换的操作方式如何Q以?qing)符L(fng)载体或其外在表现形式如何Q都不是本质性的东西Q它们元不是一U历史的l果Q无不处于一U不断变革或q化的过E之中。不同表征下的符号变换有着不同的操作方式,甚至同一U表征下的符号变换都可以有不同的操作方式Q既可以是物理性的方式Q也可以是化学性的方式Q即可以是经典的方式,也可以是量子的方式;既可以是定性的方式Q也可以是概率性的方式。在此,计算本质的统一性与计算方式的多h得C深刻的体现。笔者相信,DNA计算机、量子计机{的出现已经打开了h们畅x来计方式的思维视窗Q随着U学技术的不断发展Q计方式的多样性还?x)有新的表现?/FONT>
其二Q计方式的历史性、多h反观了计算本性的逻辑必然性、统一性。由丘奇-囄论点所揭示的计本质是非常普适的Q它不仅包括数D、定理推导等不同形式的计,而且包括、电(sh)子计机{不同“计器”的计算。大家不要忘了,以丘?囄论点为基石的可计性理论是在电(sh)子计机诞生之前?930q代提出的,卛_q在对?sh)子计算行ȝ与抽象的基础上提出,但又深刻地刻M?sh)子计算机的计算本质。如今最先进的电(sh)子计机在本质上是一台图灉|Q或者凡是计机可计的函数都是一般递归函数。现在h们又q一步认识到Q目前尚在实验室阶段的DNA计算机、量子计机Q在本质上也是一U图灵计。这说明不同形式的计、不同“计器”的计算Q在计算本质上是一致的Q这是递归计算或图灵计?BR>
转自Q?A >http://cfc.nankai.edu.cn/readings/lijie.htm