posts - 403, comments - 310, trackbacks - 0, articles - 7
            BlogJava :: 首頁(yè) :: 新隨筆 :: 聯(lián)系 :: 聚合  :: 管理

          C/C++中的序列點(diǎn)

          Posted on 2008-05-16 10:42 ZelluX 閱讀(2148) 評(píng)論(1)  編輯  收藏 所屬分類: C/C++
          發(fā)信人: NetMD (C++), 信區(qū): CPlusPlus
          標(biāo) ?題: [FAQ] C/C++中的序列點(diǎn)
          發(fā)信站: 水木社區(qū) (Wed Feb ?7 01:13:41 2007), 站內(nèi)

          C/C++中的序列點(diǎn)


          0. 什么是副作用(side effects)

          C99定義如下
          Accessing a volatile object, modifying an object, modifying a file, or
          calling a function that does any of those operations are all side effects,
          which are changes in the state of the execution environment.

          C++2003定義如下
          Accessing an object designated by a volatile lvalue, modifying an object,
          calling a library I/O function, or calling a function that does any of
          those operations are all side effects, which are changes in the state of
          the execution environment.

          可以看出C99和C++2003對(duì)副作用的定義基本類似,一個(gè)程序可以看作一個(gè)狀態(tài)機(jī),在
          任意一個(gè)時(shí)刻程序的狀態(tài)包含了它的所有對(duì)象內(nèi)容以及它的所有文件內(nèi)容(標(biāo)準(zhǔn)輸入
          輸出也是文件),副作用會(huì)導(dǎo)致狀態(tài)的跳轉(zhuǎn)

          一個(gè)變量一旦被聲明為volatile-qualified類型,則表示該變量的值可能會(huì)被程序之
          外的事件改變,每次讀取出來(lái)的值只在讀取那一刻有效,之后如果再用到該變量的值
          必須重新讀取,不能沿用上一次的值,因此讀取volatile-qualified類型的變量也被
          認(rèn)為是有副作用,而不僅僅是改寫

          注,一般不認(rèn)為程序的狀態(tài)包含了CPU寄存器的內(nèi)容,除非該寄存器代表了一個(gè)變量,
          例如
          void foo() {
          ??register int i = 0; ?// 變量i被直接放入寄存器中,本文中被稱為寄存器變量
          ?? ? ? ? ? ? ? ? ? ? ? // 注,register只是一個(gè)建議,不一定確實(shí)放入寄存器中
          ?? ? ? ? ? ? ? ? ? ? ? // 而且沒有register關(guān)鍵字的auto變量也可能放入寄存器
          ?? ? ? ? ? ? ? ? ? ? ? // 這里只是用來(lái)示例,假設(shè)i確實(shí)放入了寄存器中
          ??i = 1; ?// 寄存器內(nèi)容改變,對(duì)應(yīng)了程序狀態(tài)的改變,該語(yǔ)句有副作用
          ??i + 1; ?// 編譯時(shí)該語(yǔ)句一般有警告:“warning: expression has no effect”
          ?? ? ? ? ?// CPU如果執(zhí)行這個(gè)語(yǔ)句,也肯定會(huì)改變某個(gè)寄存器的值,但是程序狀態(tài)
          ?? ? ? ? ?// 并未改變,除了代表i的寄存器,程序狀態(tài)不包含其他寄存器的內(nèi)容,
          ?? ? ? ? ?// 因此該語(yǔ)句沒有任何副作用
          }
          特別的,C99和C++2003都指出,no effect的expression允許不被執(zhí)行
          An actual implementation need not evaluate part of an expression if it
          can deduce that its value is not used and that no needed side effects
          are produced (including any caused by calling a function or accessing
          a volatile object).


          1. 什么是序列點(diǎn)(sequence points)

          C99和C++2003對(duì)序列點(diǎn)的定義相同
          At certain specified points in the execution sequence called sequence
          points, all side effects of previous evaluations shall be complete and
          no side effects of subsequent evaluations shall have taken place.

          中文表述為,序列點(diǎn)是一些被特別規(guī)定的位置,要求在該位置前的evaluations所
          包含的一切副作用在此處均已完成,而在該位置之后的evaluations所包含的任何
          副作用都還沒有開始

          例如C/C++都規(guī)定完整表達(dá)式(full-expression)后有一個(gè)序列點(diǎn)
          extern int i, j;
          i = 0;
          j = i;
          上面的代碼中i = 0以及j = i都是一個(gè)完整表達(dá)式,;說(shuō)明了表達(dá)式的結(jié)束,因此
          在;處有一個(gè)序列點(diǎn),按照序列點(diǎn)的定義,要求在i = 0之后j = i之前的那個(gè)序列
          點(diǎn)上對(duì)i = 0的求值以及副作用全部結(jié)束(0被寫入i中),而j = i的任何副作用都
          還沒有開始。由于j = i的副作用是把i的值賦給j,而i = 0的副作用是把i賦值為
          0,如果i = 0的副作用發(fā)生在j = i之后,就會(huì)導(dǎo)致賦值后j的值是i的舊值,這顯
          然是不對(duì)的

          由序列點(diǎn)以及副作用的定義很容易看出,在一個(gè)序列點(diǎn)上,所有可能影響程序狀態(tài)
          的動(dòng)作均已完成,那這樣能否推斷出在一個(gè)序列點(diǎn)上一個(gè)程序的狀態(tài)應(yīng)該是確定的
          呢?!答案是不一定,這取決于我們代碼的寫法。但是,如果在一個(gè)序列點(diǎn)上程序
          的狀態(tài)不能被確定,那么標(biāo)準(zhǔn)規(guī)定這樣的程序是undefined behavior,稍后會(huì)解釋
          這個(gè)問題


          2. 表達(dá)式求值(evaluation of expressions)與副作用發(fā)生的相互順序

          C99和C++2003都規(guī)定
          Except where noted, the order of evaluation of operands of individual
          operators and subexpressions of individual expressions, and the order
          in which side effects take place, is unspecified.

          也就是說(shuō),C/C++都指出一般情況下在表達(dá)式求值過(guò)程中的操作數(shù)求值順序以及副
          作用發(fā)生順序是未說(shuō)明的(unspecified)。為什么C/C++不詳細(xì)定義這些順序呢?
          原因是因?yàn)镃/C++都是極端追求效率的語(yǔ)言,不規(guī)定這些順序,是為了允許編譯器
          有更大的優(yōu)化余地,例如
          extern int *p;
          extern int i;
          *p = i++; ?// (1)
          根據(jù)前述規(guī)定,在表達(dá)式(1)中到底是*p先被求值還是i++先被求值是由編譯器決定
          的;兩次副作用(對(duì)*p賦值以及i++)發(fā)生的順序是由編譯器決定的;甚至連子表
          達(dá)式i++的求值(就是初始時(shí)i的值)以及副作用(將i增加1)都不需要同步發(fā)生,
          編譯器可以先用初始時(shí)i的值(即子表達(dá)式i++的值)對(duì)*p賦值,然后再將i增加1,
          這樣就把子表達(dá)式i++的整個(gè)計(jì)算過(guò)程分成了兩個(gè)不相鄰的步驟。而且通常編譯器
          都是這么實(shí)現(xiàn)的,原因在于i++的求值過(guò)程同*p = i++是有區(qū)別的,對(duì)于單獨(dú)的表
          達(dá)式i++,執(zhí)行順序一般是(假設(shè)不考慮inc指令):先將i加載到某個(gè)寄存器A(如
          果i是寄存器變量則此步驟可以跳過(guò))、將寄存器A的值加1、將寄存器A的新值寫回
          i的地址;對(duì)于*p = i++,如果要先完整的計(jì)算子表達(dá)式i++,由于i++表達(dá)式的值
          是i的舊值,因此還需要一個(gè)額外的寄存器B以及一條額外的指令來(lái)輔助*p = i++的
          執(zhí)行,但是如果我們先將加載到A的值寫回到*p,然后再執(zhí)行對(duì)i增加1的指令,則
          只需要一個(gè)寄存器即可,這種做法在很多平臺(tái)都有重要意義,因?yàn)榧拇嫫鞯臄?shù)目往
          往是有限的,特別是假如有人寫出如下的語(yǔ)句
          extern int i, j, k, x;
          x = (i++) + (j++) + (k++);
          編譯器可以先計(jì)算(i++) + (j++) + (k++)的值,然后再對(duì)i、j、k各自加1,最后
          將i、j、k、x寫回內(nèi)存,這比每次完整的執(zhí)行完++語(yǔ)義效率要高


          3. 序列點(diǎn)對(duì)副作用的限制

          C99和C++2003都有類似的如下規(guī)定
          Between the previous and next sequence point a scalar object shall
          have its stored value modified at most once by the evaluation of an
          expression. Furthermore, the prior value shall be accessed only to
          determine the value to be stored. The requirements of this paragraph
          shall be met for each allowable ordering of the subexpressions of a
          full expression; otherwise the behavior is undefined.

          也就是說(shuō),在相鄰的兩個(gè)序列點(diǎn)之間,一個(gè)對(duì)象只允許被修改一次,而且如果一個(gè)
          對(duì)象被修改則在這兩個(gè)序列點(diǎn)之間對(duì)該變量的讀取的唯一目的只能是為了確定該對(duì)
          象的新值(例如i++,需要先讀取i的值以確定i的新值是舊值+1)。特別的,標(biāo)準(zhǔn)
          要求任意可能的執(zhí)行順序都必須滿足該條件,否則代碼將是undefined behavior

          之所以序列點(diǎn)會(huì)對(duì)副作用有如此的限制,就是因?yàn)镃/C++標(biāo)準(zhǔn)沒有規(guī)定子表達(dá)式求
          值以及副作用發(fā)生之間的順序,例如
          extern int i, a[];
          extern int foo(int, int);
          i = ++i + 1; ?// 該表達(dá)式對(duì)i所做的兩次修改都需要寫回對(duì)象,i的最終值取決
          ?? ? ? ? ? ? ?// 于到底哪次寫回最后發(fā)生,如果賦值動(dòng)作最后寫回,則i的值
          ?? ? ? ? ? ? ?// 是i的舊值加2,如果++i動(dòng)作最后寫回,則i的值是舊值加1,
          ?? ? ? ? ? ? ?// 因此該表達(dá)式的行為是undefined
          a[i++] = i; ?// 如果=左邊的表達(dá)式先求值并且i++的副作用被完成,則右邊的
          ?? ? ? ? ? ? // 值是i的舊值加1,如果i++的副作用最后完成,則右邊的值是i
          ?? ? ? ? ? ? // 的舊值,這也導(dǎo)致了不確定的結(jié)果,因此該表達(dá)式的行為將是
          ?? ? ? ? ? ? // undefined
          foo(foo(0, i++), i++); ?// 對(duì)于函數(shù)調(diào)用而言,標(biāo)準(zhǔn)沒有規(guī)定函數(shù)參數(shù)的求值
          ?? ? ? ? ? ? ? ? ? ? ? ?// 順序,但是標(biāo)準(zhǔn)規(guī)定所有參數(shù)求值完畢進(jìn)入函數(shù)體
          ?? ? ? ? ? ? ? ? ? ? ? ?// 執(zhí)行之前有一個(gè)序列點(diǎn),因此這個(gè)表達(dá)式有兩種執(zhí)
          ?? ? ? ? ? ? ? ? ? ? ? ?// 行方式,一種是先求值外層foo調(diào)用的i++然后求值
          ?? ? ? ? ? ? ? ? ? ? ? ?// foo(0, i++),然后進(jìn)入到foo(0, i++)執(zhí)行,這之
          ?? ? ? ? ? ? ? ? ? ? ? ?// 前有個(gè)序列點(diǎn),這種執(zhí)行方式還是在兩個(gè)相鄰序列
          ?? ? ? ? ? ? ? ? ? ? ? ?// 點(diǎn)之間修改了i兩次,undefined
          ?? ? ? ? ? ? ? ? ? ? ? ?// 另一種執(zhí)行方式是先求值foo(0, i++),由于這里
          ?? ? ? ? ? ? ? ? ? ? ? ?// 有一個(gè)序列點(diǎn),隨后的第二個(gè)i++求值是在新序列
          ?? ? ? ? ? ? ? ? ? ? ? ?// 點(diǎn)之后,因此不算是兩個(gè)相鄰的序列點(diǎn)之間修改i
          ?? ? ? ? ? ? ? ? ? ? ? ?// 兩次
          ?? ? ? ? ? ? ? ? ? ? ? ?// 但是,前面已經(jīng)指出標(biāo)準(zhǔn)規(guī)定任意可能的執(zhí)行路徑
          ?? ? ? ? ? ? ? ? ? ? ? ?// 都必須滿足條件才是定義好的行為,這種代碼仍然
          ?? ? ? ? ? ? ? ? ? ? ? ?// 是undefined

          前面我提到在一個(gè)序列點(diǎn)上程序的狀態(tài)不一定是確定的,原因就在于相鄰的兩個(gè)序
          列點(diǎn)之間可能會(huì)發(fā)生多個(gè)副作用,這些副作用的發(fā)生順序是未指定的,如果多于一
          個(gè)的副作用用于修改同一個(gè)對(duì)象,例如示例代碼i = ++i + 1;,則程序的結(jié)果是依
          賴于副作用發(fā)生順序的;另外,如果某個(gè)表達(dá)式既修改了某個(gè)對(duì)象又需要讀取該對(duì)
          象的值,且讀取對(duì)象的值并不用于確定對(duì)象新值,則讀取和修改兩個(gè)動(dòng)作的先后順
          序也會(huì)導(dǎo)致程序的狀態(tài)不能唯一確定
          所幸的是,“在相鄰的兩個(gè)序列點(diǎn)之間,一個(gè)對(duì)象只允許被修改一次,而且如果一
          個(gè)對(duì)象被修改則在這兩個(gè)序列點(diǎn)之間只能為了確定該對(duì)象的新值而讀一次”這一強(qiáng)
          制規(guī)定保證了符合要求的程序在任何一個(gè)序列點(diǎn)位置上其狀態(tài)都可以確定下來(lái)

          注,由于對(duì)于UDT類型存在operator重載,函數(shù)語(yǔ)義會(huì)提供新的序列點(diǎn),因此某些
          對(duì)于built-in類型是undefined behavior的表達(dá)式對(duì)于UDT確可能是良好定義的,
          例如
          i = i++; ?// 如果i是built-in類型對(duì)象,則該表達(dá)式在兩個(gè)相鄰的序列點(diǎn)之間對(duì)
          ?? ? ? ? ?// i修改了兩次,undefined
          ?? ? ? ? ?// 如果i是UDT類型該表達(dá)式也許是i.operator=(i.operator++(int)),
          ?? ? ? ? ?// 函數(shù)參數(shù)求值完畢后會(huì)有一個(gè)序列點(diǎn),因此該表達(dá)式并沒有在兩個(gè)
          ?? ? ? ? ?// 相鄰的序列點(diǎn)之間修改i兩次,OK

          由此可見,常見的問題如printf("%d, %d", i++, i++)這種寫法是錯(cuò)誤的,這類問
          題作為筆試題或者面試題是沒有任何意義的
          類似的問題同樣發(fā)生在cout << i++ << i++這種寫法上,如果overload resolution
          選擇成員函數(shù)operator<<,則等價(jià)于(cout.operator<<(i++)).operator<<(i++),
          否則等價(jià)于operator<<(operator<<(cout, i++), i++),如果i是built-in類型對(duì)
          象,這種寫法跟foo(foo(0, i++), i++)的問題一致,都是未定義行為,因?yàn)榇嬖?br />某條執(zhí)行路徑使得i會(huì)在兩個(gè)相鄰的序列點(diǎn)之間被修改兩次;如果i是UDT則該寫法
          是良好定義的,跟i = i++一樣,但是這種寫法也是不推薦的,因?yàn)闃?biāo)準(zhǔn)對(duì)于函數(shù)
          參數(shù)的求值順序是unspecified,因此哪個(gè)i++先計(jì)算是不能預(yù)計(jì)的,這仍舊會(huì)帶來(lái)
          移植性的問題,這種寫法應(yīng)該避免


          4. 編譯器的跨序列點(diǎn)優(yōu)化

          根據(jù)前述討論可知,在同一個(gè)表達(dá)式內(nèi)對(duì)于同一個(gè)變量i,允許的行為是
          A. 不讀取,改寫一次,例如
          ?? ? i = 0;
          B. 讀取一次或者多次,改寫一次,但所有讀取僅僅用于決定改寫后的新值,例如
          ?? ? i = i + 1; ?// 讀取一次,改寫一次
          ?? ? i = i & (i - 1); ?// 讀取兩次,改寫一次,感謝puke給出的例子
          C. 不改寫,讀取一次或者多次,例如
          ?? ? j = i & (i - 1);

          對(duì)于情況B和C,編譯器是有一定的優(yōu)化權(quán)利的,它可以只讀取一次變量的值然后
          直接使用該值多次

          但是,當(dāng)該變量是volatile-qualified類型時(shí)編譯器允許的行為究竟如何目前還
          沒有找到明確的答案,ctrlz認(rèn)為如果在兩個(gè)相鄰序列點(diǎn)之間讀取同一個(gè)volatile-
          qualified類型對(duì)象多次仍舊是undefined behavior,原因在于該讀取動(dòng)作有副作
          用且該副作用等價(jià)于修改該對(duì)象,RoachCock的意見是兩個(gè)相鄰的序列點(diǎn)之間讀取
          同一個(gè)volatile-qualified類型應(yīng)該是合法的,但是不能被優(yōu)化成只讀一次。一
          段在嵌入式開發(fā)中很常見的代碼示例如下
          extern volatile int i;
          if (i != i) { ?// 探測(cè)很短的時(shí)間內(nèi)i是否發(fā)生了變化
          ??// ...
          }
          如果i != i被優(yōu)化為只讀一次,則結(jié)果恒為false,故RoachCock認(rèn)為編譯器不能
          夠?qū)olatile-qualified類型的變量做出只讀一次的優(yōu)化。ctrlz則認(rèn)為這段代碼
          本身是不正確的,應(yīng)該改寫成
          int j = i;
          if (j != i) { ?// 將對(duì)volatile-qualified類型變量的多次讀取用序列點(diǎn)隔開
          ??// ...
          }

          雖然尚不能確定volatile-qualified類型的變量在相鄰兩個(gè)序列點(diǎn)之間讀取多次
          行為是否合法以及將如何優(yōu)化(不管怎么樣,對(duì)于volatile-qualified類型這種
          代碼應(yīng)該盡量避免),但是可以肯定的是,對(duì)于volatile-qualified類型的變量
          在跨序列點(diǎn)之后必須要重新讀取,volatile就是用來(lái)阻止編譯器做出跨序列點(diǎn)的
          過(guò)激優(yōu)化的,而對(duì)于non-volatile-qualified類型的跨序列點(diǎn)多次讀取則可能被
          優(yōu)化成只讀一次(直到某個(gè)語(yǔ)句或者函數(shù)對(duì)該變量發(fā)生了修改,在此之前編譯器
          可以假定non-volatile-qualified類型的變量是不會(huì)變化的,因?yàn)槟壳暗腃/C++
          抽象機(jī)器模型是單線程的),例如
          bool flag = true;
          void foo() {
          ??while (flag) { ?// (2)
          ?? ?// ...
          ??}
          }
          如果編譯器探測(cè)到foo()沒有任何語(yǔ)句(包括foo()調(diào)用過(guò)的函數(shù))對(duì)flag有過(guò)修
          改,則也許會(huì)把(2)優(yōu)化成只在進(jìn)入foo()的時(shí)候讀一次flag的值而不是每次循環(huán)
          都讀一次,這種跨序列點(diǎn)的優(yōu)化很有可能導(dǎo)致死循環(huán)。但是這種代碼在多線程編
          程中很常見,雖然foo()沒有修改過(guò)flag,也許在另一個(gè)線程的某個(gè)函數(shù)調(diào)用中
          會(huì)修改flag以終止循環(huán),為了避免這種跨序列點(diǎn)優(yōu)化帶來(lái)到錯(cuò)誤,應(yīng)該把flag聲
          明為volatile bool,C++2003對(duì)volatile的說(shuō)明如下
          [Note: volatile is a hint to the implementation to avoid aggressive
          optimization involving the object because the value of the object
          might be changed by means undetectable by an implementation. See 1.9
          for detailed semantics. In general, the semantics of volatile are
          intended to be the same in C++ as they are in C. ]


          5. C99定義的序列點(diǎn)列表

          — The call to a function, after the arguments have been evaluated.
          — The end of the first operand of the following operators:
          ?? ? logical AND && ;
          ?? ? logical OR || ;
          ?? ? conditional ? ;
          ?? ? comma , .
          — The end of a full declarator:
          ?? ? declarators;
          — The end of a full expression:
          ?? ? an initializer;
          ?? ? the expression in an expression statement;
          ?? ? the controlling expression of a selection statement (if or switch);
          ?? ? the controlling expression of a while or do statement;
          ?? ? each of the expressions of a for statement;
          ?? ? the expression in a return statement.
          — Immediately before a library function returns.
          — After the actions associated with each formatted input/output function
          ?? conversion specifier.
          — Immediately before and immediately after each call to a comparison
          ?? function, and also between any call to a comparison function and any
          ?? movement of the objects passed as arguments to that call.


          6. C++2003定義的序列點(diǎn)列表

          所有C99定義的序列點(diǎn)同樣是C++2003所定義的序列點(diǎn)
          此外,C99只是規(guī)定庫(kù)函數(shù)返回之后有一個(gè)序列點(diǎn),并沒有規(guī)定普通函數(shù)返回之后
          有一個(gè)序列點(diǎn),而C++2003則特別指出,進(jìn)入函數(shù)(function-entry)和退出函數(shù)
          (function-exit)各有一個(gè)序列點(diǎn),即拷貝一個(gè)函數(shù)的返回值之后同樣存在一個(gè)
          序列點(diǎn)

          需要特別說(shuō)明的是,由于operator||、operator&&以及operator,可以重載,當(dāng)它
          們使用函數(shù)語(yǔ)義的時(shí)候并不提供built-in operators所規(guī)定的那幾個(gè)序列點(diǎn),而
          僅僅只是在函數(shù)的所有參數(shù)求值后有一個(gè)序列點(diǎn),此外函數(shù)語(yǔ)義也不支持||、&&
          的短路語(yǔ)義,這些變化很有可能會(huì)導(dǎo)致難以發(fā)覺的錯(cuò)誤,因此一般不建議重載這
          幾個(gè)運(yùn)算符


          7. C++2003中兩處關(guān)于lvalue的修改對(duì)序列點(diǎn)的影響

          在C語(yǔ)言中,assignment operators的結(jié)果是non-lvalue,C++2003則將assignment
          operators的結(jié)果改成了lvalue,目前尚不清楚這一改動(dòng)對(duì)于built-in類型有何意
          義,但是它卻導(dǎo)致了很多在合法的C代碼在目前的C++中是undefined behavior,例

          extern int i;
          extern int j;
          i = j = 1;
          由于(j = 1)的結(jié)果是lvalue,該結(jié)果作為給i賦值的右操作數(shù),需要一個(gè)lvalue-
          to-rvalue conversion,這個(gè)conversion代表了一個(gè)讀取語(yǔ)義,因此i = j = 1就
          是先將1賦值給j,然后讀取j的值賦值給i,這個(gè)行為是undefined,因?yàn)闃?biāo)準(zhǔn)規(guī)定
          兩個(gè)相鄰序列點(diǎn)之間的讀取只能用于決定修改對(duì)象的新值,而不能發(fā)生在修改之后
          再讀取
          由于C++2003規(guī)定assignment operators的結(jié)果是lvalue,因此下列在C99中非法的
          代碼在C++2003中卻是可以通過(guò)編譯的
          extern int i;
          (i += 1) += 2;
          顯然按照C++2003的規(guī)定這個(gè)代碼的行為是undefined,它在兩個(gè)相鄰的序列點(diǎn)之間
          修改了i兩次

          類似的問題同樣發(fā)生在built-in類型的前綴++/--operators上,C++2003將前綴++/--
          的結(jié)果從rvalue修改為lvalue,這甚至導(dǎo)致了下列代碼也是undefined behavior
          extern int i;
          extern int j;
          i = ++j;
          同樣是因?yàn)閘value作為assignment operator的右操作數(shù)需要一個(gè)左值轉(zhuǎn)換,該轉(zhuǎn)
          換導(dǎo)致了一個(gè)讀取動(dòng)作且這個(gè)讀取動(dòng)作發(fā)生在修改對(duì)象之后

          C++的這一改動(dòng)顯然是考慮不周的,導(dǎo)致了很多C語(yǔ)言的習(xí)慣寫法都成了undefined
          behavior,因此Andrew Koenig在1999年的時(shí)候就向C++標(biāo)準(zhǔn)委員會(huì)提交了一個(gè)建
          議要求為assignment operators增加新的序列點(diǎn),但是到目前為止C++標(biāo)準(zhǔn)委員會(huì)
          都還沒有就該問題達(dá)成一致意見,我將Andrew Koenig的提議附后,如果哪位有時(shí)
          間有興趣,可以看看,不過(guò)不看也不會(huì)有任何損失 :-)


          222. Sequence points and lvalue-returning operators
          Section: 5 ?expr ? ? Status: drafting ? ? Submitter: Andrew Koenig ? ? Date: 20 Dec 1999

          I believe that the committee has neglected to take into account one of the differences between C and C++ when defining sequence points. As an example, consider

          ?? ?(a += b) += c;

          where a, b, and c all have type int. I believe that this expression has undefined behavior, even though it is well-formed. It is not well-formed in C, because += returns an rvalue there. The reason for the undefined behavior is that it modifies the value of `a' twice between sequence points.

          Expressions such as this one are sometimes genuinely useful. Of course, we could write this particular example as

          ?? ?a += b; a += c;

          but what about

          ?? ?void scale(double* p, int n, double x, double y) {
          ?? ? ? ?for (int i = 0; i < n; ++i) {
          ?? ? ? ? ? ?(p[i] *= x) += y;
          ?? ? ? ?}
          ?? ?}

          All of the potential rewrites involve multiply-evaluating p[i] or unobvious circumlocations like creating references to the array element.

          One way to deal with this issue would be to include built-in operators in the rule that puts a sequence point between evaluating a function's arguments and evaluating the function itself. However, that might be overkill: I see no reason to require that in

          ?? ?x[i++] = y;

          the contents of `i' must be incremented before the assignment.

          A less stringent alternative might be to say that when a built-in operator yields an lvalue, the implementation shall not subsequently change the value of that object as a consequence of that operator.

          I find it hard to imagine an implementation that does not do this already. Am I wrong? Is there any implementation out there that does not `do the right thing' already for (a += b) += c?

          5.17 ?expr.ass paragraph 1 says,

          The result of the assignment operation is the value stored in the left operand after the assignment has taken place; the result is an lvalue.
          What is the normative effect of the words "after the assignment has taken place"? I think that phrase ought to mean that in addition to whatever constraints the rules about sequence points might impose on the implementation, assignment operators on built-in types have the additional constraint that they must store the left-hand side's new value before returning a reference to that object as their result.

          One could argue that as the C++ standard currently stands, the effect of x = y = 0; is undefined. The reason is that it both fetches and stores the value of y, and does not fetch the value of y in order to compute its new value.

          I'm suggesting that the phrase "after the assignment has taken place" should be read as constraining the implementation to set y to 0 before yielding the value of y as the result of the subexpression y = 0.

          Note that this suggestion is different from asking that there be a sequence point after evaluation of an assignment. In particular, I am not suggesting that an order constraint be imposed on any side effects other than the assignment itself.
          Francis Glassborow:

          My understanding is that for a single variable:

          Multiple read accesses without a write are OK
          A single read access followed by a single write (of a value dependant on the read, so that the read MUST happen first) is OK
          A write followed by an actual read is undefined behaviour
          Multiple writes have undefined behaviour
          It is the 3) that is often ignored because in practice the compiler hardly ever codes for the read because it already has that value but in complicated evaluations with a shortage of registers, that is not always the case. Without getting too close to the hardware, I think we both know that a read too close to a write can be problematical on some hardware.

          So, in x = y = 0;, the implementation must NOT fetch a value from y, instead it has to "know" what that value will be (easy because it has just computed that in order to know what it must, at some time, store in y). From this I deduce that computing the lvalue (to know where to store) and the rvalue to know what is stored are two entirely independent actions that can occur in any order commensurate with the overall requirements that both operands for an operator be evaluated before the operator is.

          Erwin Unruh:

          C distinguishes between the resulting value of an assignment and putting the value in store. So in C a compiler might implement the statement x=y=0; either as x=0;y=0; or as y=0;x=0; In C the statement (x += 5) += 7; is not allowed because the first += yields an rvalue which is not allowed as left operand to +=. So in C an assignment is not a sequence of write/read because the result is not really "read".

          In C++ we decided to make the result of assignment an lvalue. In this case we do not have the option to specify the "value" of the result. That is just the variable itself (or its address in a different view). So in C++, strictly speaking, the statement x=y=0; must be implemented as y=0;x=y; which makes a big difference if y is declared volatile.

          Furthermore, I think undefined behaviour should not be the result of a single mentioning of a variable within an expression. So the statement (x +=5) += 7; should NOT have undefined behaviour.

          In my view the semantics could be:

          if the result of an assignment is used as an rvalue, its value is that of the variable after assignment. The actual store takes place before the next sequence point, but may be before the value is used. This is consistent with C usage.
          if the result of an assignment is used as an lvalue to store another value, then the new value will be stored in the variable before the next sequence point. It is unspecified whether the first assigned value is stored intermediately.
          if the result of an assignment is used as an lvalue to take an address, that address is given (it doesn't change). The actual store of the new value takes place before the next sequence point.
          Jerry Schwarz:

          My recollection is different from Erwin's. I am confident that the intention when we decided to make assignments lvalues was not to change the semantics of evaluation of assignments. The semantics was supposed to remain the same as C's.

          Ervin seems to assume that because assignments are lvalues, an assignment's value must be determined by a read of the location. But that was definitely not our intention. As he notes this has a significant impact on the semantics of assignment to a volatile variable. If Erwin's interpretation were correct we would have no way to write a volatile variable without also reading it.

          Lawrence Crowl:

          For x=y=0, lvalue semantics implies an lvalue to rvalue conversion on the result of y=0, which in turn implies a read. If y is volatile, lvalue semantics implies both a read and a write on y.

          The standard apparently doesn't state whether there is a value dependence of the lvalue result on the completion of the assignment. Such a statement in the standard would solve the non-volatile C compatibility issue, and would be consistent with a user-implemented operator=.

          Another possible approach is to state that primitive assignment operators have two results, an lvalue and a corresponding "after-store" rvalue. The rvalue result would be used when an rvalue is required, while the lvalue result would be used when an lvalue is required. However, this semantics is unsupportable for user-defined assignment operators, or at least inconsistent with all implementations that I know of. I would not enjoy trying to write such two-faced semantics.

          Erwin Unruh:

          The intent was for assignments to behave the same as in C. Unfortunately the change of the result to lvalue did not keep that. An "lvalue of type int" has no "int" value! So there is a difference between intent and the standard's wording.

          So we have one of several choices:

          live with the incompatibility (and the problems it has for volatile variables)
          make the result of assignment an rvalue (only builtin-assignment, maybe only for builtin types), which makes some presently valid programs invalid
          introduce "two-face semantics" for builtin assignments, and clarify the sequence problematics
          make a special rule for assignment to a volatile lvalue of builtin type
          I think the last one has the least impact on existing programs, but it is an ugly solution.

          Andrew Koenig:

          Whatever we may have intended, I do not think that there is any clean way of making

          ?? ?volatile int v;
          ?? ?int i;

          ?? ?i = v = 42;

          have the same semantics in C++ as it does in C. Like it or not, the subexpression v = 42 has the type ``reference to volatile int,'' so if this statement has any meaning at all, the meaning must be to store 42 in v and then fetch the value of v to assign it to i.

          Indeed, if v is volatile, I cannot imagine a conscientious programmer writing a statement such as this one. Instead, I would expect to see

          ?? ?v = 42;
          ?? ?i = v;

          if the intent is to store 42 in v and then fetch the (possibly changed) value of v, or
          ?? ?v = 42;
          ?? ?i = 42;

          if the intent is to store 42 in both v and i.

          What I do want is to ensure that expressions such as ``i = v = 42'' have well-defined semantics, as well as expressions such as (i = v) = 42 or, more realistically, (i += v) += 42 .

          I wonder if the following resolution is sufficient:

          Append to 5.17 ?expr.ass paragraph 1:

          There is a sequence point between assigning the new value to the left operand and yielding the result of the assignment expression.
          I believe that this proposal achieves my desired effect of not constraining when j is incremented in x[j++] = y, because I don't think there is a constraint on the relative order of incrementing j and executing the assignment. However, I do think it allows expressions such as (i += v) += 42, although with different semantics from C if v is volatile.

          Notes on 10/01 meeting:

          There was agreement that adding a sequence point is probably the right solution.

          Notes from the 4/02 meeting:

          The working group reaffirmed the sequence-point solution, but we will look for any counter-examples where efficiency would be harmed.

          For drafting, we note that ++x is defined in 5.3.2 ?expr.pre.incr as equivalent to x+=1 and is therefore affected by this change. x++ is not affected. Also, we should update any list of all sequence points.

          Notes from October 2004 meeting:

          Discussion centered around whether a sequence point “between assigning the new value to the left operand and yielding the result of the expression” would require completion of all side effects of the operand expressions before the value of the assignment expression was used in another expression. The consensus opinion was that it would, that this is the definition of a sequence point. Jason Merrill pointed out that adding a sequence point after the assignment is essentially the same as rewriting

          ?? ?b += a

          as

          ?? ?b += a, b

          Clark Nelson expressed a desire for something like a “weak” sequence point that would force the assignment to occur but that would leave the side effects of the operands unconstrained. In support of this position, he cited the following expression:

          ?? ?j = (i = j++)

          With the proposed addition of a full sequence point after the assignment to i, the net effect is no change to j. However, both g++ and MSVC++ behave differently: if the previous value of j is 5, the value of the expression is 5 but j gets the value 6.

          Clark Nelson will investigate alternative approaches and report back to the working group.

          評(píng)論

          # re: C/C++中的序列點(diǎn)  回復(fù)  更多評(píng)論   

          2013-12-09 15:38 by 除美滅日平韓
          說(shuō)的太好了,解決我長(zhǎng)久的困擾!
          主站蜘蛛池模板: 滁州市| 壶关县| 新田县| 锦州市| 长子县| 本溪市| 潞西市| 贵港市| 社会| 长春市| 珠海市| 宣化县| 六安市| 马龙县| 额济纳旗| 贺州市| 高州市| 普格县| 四川省| 县级市| 庆阳市| 韩城市| 浦县| 霞浦县| 虎林市| 金平| 宁陕县| 百色市| 津市市| 葫芦岛市| 恭城| 仁寿县| 江山市| 临潭县| 绥化市| 兴义市| 体育| 瓦房店市| 大方县| 阿克苏市| 浦江县|