emu in blogjava

            BlogJava :: 首頁 :: 新隨筆 :: 聯(lián)系 :: 聚合  :: 管理 ::
            171 隨筆 :: 103 文章 :: 1052 評論 :: 2 Trackbacks
          <2025年7月>
          293012345
          6789101112
          13141516171819
          20212223242526
          272829303112
          3456789

          公告

          常用鏈接

          留言簿(92)

          隨筆分類(20)

          隨筆檔案(171)

          文章分類(89)

          文章檔案(103)

          相冊

          收藏夾(46)

          友情連接

          收藏

          搜索

          積分與排名

          最新評論

          閱讀排行榜

          評論排行榜

          Problem Statement
          ????
          A cancer screening protocol is given data about a cell and gives the cell a score. The protocol also specifies a threshold value. The protocol classifies cells with scores less than the threshold as non-cancerous, and cells with scores greater than or equal to the threshold as cancerous.
          The effectiveness of a protocol is calculated by applying it to data about known cells from a test bank. The "squared_error" for a cell is 0 if the protocol successfully classifies it. If a cell is misclassified, its squared_error is calculated as the square of the difference between the threshold and the cell's score. The "mean_squared_error" for the protocol is the average of the squared_errors from all the cells.
          We want to choose a threshold that will minimize the mean_squared_error for our protocol. Create a class MSQErr that contains a method minErr that is given a int[] scores giving the scores calculated by the protocol for the test cells, and a string cancerous that indicates for each test cell whether it is cancerous ('C') or non-cancerous ('N'). The method chooses an optimal threshold and returns the resulting minimum mean_squared_error.
          The i-th character of cancerous corresponds to the i-th element of scores.
          Definition
          ????
          Class:
          MSQErr
          Method:
          minErr
          Parameters:
          int[], string
          Returns:
          double
          Method signature:
          double minErr(int[] score, string cancerous)
          (be sure your method is public)
          ????

          Notes
          -
          The returned value must be accurate to within a relative or absolute value of 1E-9
          Constraints
          -
          cancerous will contain between 2 and 50 characters inclusive.
          -
          The number of elements in scores will equal the number of characters in cancerous.
          -
          Each character in cancerous will be 'N' or 'C'.
          -
          Each value in scores will be between -1,000 and 1,000 inclusive.
          Examples
          0)

          ????
          {3,3,1,8}
          "NNNC"
          Returns: 0.0
          By choosing a threshold of 5, this protocol classifies all of the test cells correctly.
          1)

          ????
          {5,2,3,6}
          "CCNC"
          Returns: 0.125
          By choosing the threshold at 2.5, the cells with scores of 2 and 3 will both be misclassified. This will result in a mean_squared_error of (0 + (2.5-2)^2 + (3-2.5)^2 + 0)/4 = 0.125. This is the only threshold that will give such a low mean_squared_error.
          2)

          ????
          {5,2,3,6,2}
          "CCNCN"
          Returns: 0.1
          The same threshold is best, but now the sum of the squared errors is divided by 5 instead of 4.
          3)

          ????
          {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}
          "NNNCNNNCNNNCNCCCCCCC"
          Returns: 2.34

           
          This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2003, TopCoder, Inc. All rights reserved.

          posted on 2005-08-16 09:28 emu 閱讀(1264) 評論(3)  編輯  收藏 所屬分類: google編程大賽模擬題及入圍賽真題

          評論

          # emu的解法 2005-08-16 09:39 emu
          我估計均方差是一條類似拋物線的曲線(有一個頂點或者一個平的頂部),用二分法收斂到曲線的頂點上。但是由于計算機(jī)二進(jìn)制計算精度限制的原因,實際收斂到的地方可能和頂點有微小的偏差(比如0.1,0.2,0.3,0.4用二進(jìn)制表示都是循環(huán)小數(shù),因此用二分法如果不做特別處理就收斂不到這些點上)。


          public class MSQErr {
          public static void main(String[] args) {
          MSQErr err = new MSQErr();
          assertEquals(err.minErr(new int[] {3, 3, 1, 8}, "NNNC"), 0);
          assertEquals(err.minErr(new int[] {5, 2, 3, 6}, "CCNC"), 0.125);
          assertEquals(err.minErr(new int[] {5, 2, 3, 6, 2}, "CCNCN"), 0.1);
          assertEquals(err.minErr(new int[] {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
          12, 13, 14, 15, 16, 17, 18, 19, 20},
          "NNNCNNNCNNNCNCCCCCCC"), 2.34);
          System.out.println("all pass");
          }

          private static void assertEquals(double a, double b) {
          if (a != b)
          throw new RuntimeException("assert failed!! Expected " + b +
          " but got " + a);
          }

          public double minErr(int[] score, String cancerous) {
          double max = 0, min = 1000;
          for (int i = 0; i < score.length; i++) {
          int s = score[ i ];
          if (s > max)
          max = s;
          if (s < min)
          min = s;
          }
          double mid = (max + min) / 2;
          double d1 = 1, d2 = 2;
          while (d1 != d2) {
          d1 = getErr(score, cancerous, mid);
          d2 = getErr(score, cancerous, mid + (max - min) / 1000000000);
          if (d1 > d2)
          min = mid;
          else
          max = mid;
          mid = (max + min) / 2;
          }
          return Math.floor(d1 * 1000000000) / 1000000000;
          }

          public double getErr(int[] score, String cancerous, double threshold) {
          double result = 0;
          for (int i = 0; i < score.length; i++) {
          double d = score[ i ] - threshold;
          char c = cancerous.charAt(i);
          if (d > 0 && c == 'N' || d < 0 && c == 'C') {
          result += d * d;
          }
          }
          return result / score.length;
          }
          }



          對它的正確性沒有充分驗證過,因為
          d2 = getErr(score, cancerous, mid + (max - min) / 1000000000);
          我只是想知道向x軸正方向走一小步看看拋物線的走向是向上還是向下。如果這一小步恰好跨過了拋物線頂點的話對方向的判斷有可能就是錯的,不過這樣的事件發(fā)生的概率實在是太小了。
          參數(shù)驗證沒有做,如果給錯誤參數(shù)的話,有可能拋異常(cancerous串太短的話),也有可能忽略錯誤的數(shù)據(jù)(cancerous太長或者不完全由N和C構(gòu)成)。題目中沒有明確的提出應(yīng)該怎么處理這樣的數(shù)據(jù),由它去吧。
            回復(fù)  更多評論
            

          # 秋水無恨的javascript解法 2005-08-16 09:42 emu
          用了一些高深的數(shù)學(xué)原理,不過對“th+=e/n;”能否把閘值修正到曲線線的定點,我有一點懷疑。

          <SCRIPT LANGUAGE="JavaScript">
          <!--
          function minErr(s,c)
          {
          for(var th=2;th<50;++th)
          {
          var e = 0,n=0;
          for(var i=0;i<s.length;++i)
          {
          if(c.charAt(i )!=(s[i]>=th?"C":"N"))
          {
          ++n;
          e+=(s[i]-th);
          }
          }
          if(e<=0)
          {
          th+=e/n;
          break;
          }
          }
          e = 0;
          for(var i=0;i<s.length;++i)
          {
          if(c.charAt(i )!=(s[i]>=th?"C":"N"))
          {
          e+=(s[i]-th)*(s[i]-th);
          }
          }
          return e/s.length;
          }

          //-->
          </SCRIPT>
            回復(fù)  更多評論
            

          # re: MSQErr(賽前模擬題) 2006-07-23 19:10 清風(fēng)劍
          用求導(dǎo)的方法取得拋物線頂點可行嗎?  回復(fù)  更多評論
            

          主站蜘蛛池模板: 习水县| 顺义区| 普兰店市| 壤塘县| 克拉玛依市| 城固县| 金寨县| 南充市| 北安市| 四会市| 莫力| 前郭尔| 东乌| 通州区| 英吉沙县| 城市| 河北省| 景宁| 历史| 普定县| 邵东县| 鹰潭市| 门源| 公安县| 肇源县| 波密县| 义乌市| 隆林| 浦北县| 滦平县| 万山特区| 梅河口市| 大庆市| 郓城县| 汉阴县| 灵台县| 九台市| 锡林郭勒盟| 东兰县| 华坪县| 织金县|