隨筆 - 312, 文章 - 14, 評論 - 1393, 引用 - 0
          數(shù)據(jù)加載中……

          《程序員》第9期智慧擂臺題目——高頻詞匯提取

          本文為原創(chuàng),如需轉(zhuǎn)載,請注明作者和出處,謝謝!

              面對浩瀚的信息海洋,找到想要的資源有時真的是不容易。在大量文字中搜索高頻詞匯是信息搜索和數(shù)據(jù)壓縮的共通課題。
          這次智慧擂臺請大家在一個比較龐大的英文文本中找出M個數(shù)量最多的短語(由N個單詞組成)。統(tǒng)一處理相同的文本文件,該文本只包含英文單詞、空格和回行符,比較誰的程序效率最高。

            程序輸入:M,N,文本文件路徑(M不超過20,N不超過8)
          程序輸出:高頻短語及其數(shù)量清單

          擂臺規(guī)則:提交符合以上要求的可執(zhí)行程序,語言招式不限,點到為止;
          我們將在統(tǒng)一的環(huán)境下對每位選手的作品進行公平的測試,
          比較出綜合用時最少的程序。

          源程序


          import java.io.*;
          import java.util.*;

          class tt
          {
              
          public String phrase;
              
          public int count;
          }

          public class searchphrase
          {

              
          private static LinkedHashMap phrase = new LinkedHashMap();
              
          static tt[] max_phrase;

              
          private static Vector SeparateString(String s)
              {
                  Vector v 
          = new Vector();
                  String temp 
          = "";
                  
          for (int i = 0; i < s.length(); i++)
                  {
                      
          if (s.charAt(i) != ' ')
                      {
                          temp 
          += s.charAt(i);
                      }
                      
          else
                      {
                          
          if (temp != "")
                              v.add(temp);
                          temp 
          = "";
                      }
                  }
                  
          if (temp != "")
                      v.add(temp);
                  
          return v;
              }

              
          private static void swap(int pos, int count, String phrase)
              {
                  
          int i;
                  
          if (max_phrase[pos - 1].count < count)
                  {
                      
          for (i = pos - 1; i > 0; i--)
                      {
                          
          if (max_phrase[i - 1].count > max_phrase[i].count)
                              
          break;
                      }
                      max_phrase[pos].count 
          = max_phrase[i].count;
                      max_phrase[pos].phrase 
          = max_phrase[i].phrase;
                      max_phrase[i].count 
          = count;
                      max_phrase[i].phrase 
          = phrase;
                  }

              }

              
          private static void adjust_max(int count, String phrase)
              {
                  
          int i, j;
                  
          if (count <= max_phrase[max_phrase.length - 1].count)
                      
          return;
                  
          for (i = max_phrase.length - 1; i >= 0; i--)
                  {
                      
          if (max_phrase[i].phrase.equals(phrase))
                      {
                          max_phrase[i].count 
          = count;
                          
          if (i > 0)
                          {
                              swap(i, count, phrase);
                          }
                          
          return;
                      }
                  }
                  max_phrase[max_phrase.length 
          - 1].count = count;
                  max_phrase[max_phrase.length 
          - 1].phrase = phrase;
                  
          if (i > 0)
                  {
                      swap(max_phrase.length 
          - 1, count, phrase);
                  }
              }

              
          private static void js(Vector v, int n)
              {
                  String s;
                  
          for (int i = 0; i < v.size() - n + 1; i++)
                  {
                      s 
          = "";
                      
          for (int j = i; j < i + n; j++)
                      {
                          s 
          += v.get(j) + " ";
                      }
                      
          int count = 1;
                      
          if (phrase.containsKey(s.hashCode()))
                      {
                          count 
          = Integer.parseInt(phrase.get(s.hashCode()).toString());
                          count
          ++;
                      }
                      phrase.put(s.hashCode(), count);
                      adjust_max(count, s);
                  }
              }

              
          public static void main(String[] args)
              {
                  
          try
                  {
                      
          long t;
                      
          int m, n;
                      String path;
                      m 
          = Integer.parseInt(args[0]);
                      n 
          = Integer.parseInt(args[1]);
                      path 
          = args[2];
                      max_phrase 
          = new tt[m];
                      
          for (int i = 0; i < m; i++)
                      {
                          max_phrase[i] 
          = new tt();
                          max_phrase[i].count 
          = 0;
                          max_phrase[i].phrase 
          = "";
                      }
                      t 
          = (new java.util.Date()).getTime();
                      java.io.FileReader fr 
          = new java.io.FileReader(path);
                      java.io.BufferedReader br 
          = new BufferedReader(fr);
                      String s;

                      Vector v 
          = null;

                      
          while ((s = br.readLine()) != null)
                      {
                          v 
          = SeparateString(s);
                          js(v, n);
                      }
                      
          for (int i = 0; i < m; i++)
                      {
                          System.out.println(max_phrase[i].phrase);
                          System.out.println(max_phrase[i].count);
                          System.out.println();
                      }
                      t 
          = (new java.util.Date()).getTime() - t;
                      System.out.print(t);
                      System.out.println(
          " ms");
                  }
                  
          catch (Exception e)
                  {
                      System.out.println(e.getMessage());
                  }

              }

          }

          測試結(jié)果1:m = 20 n = 8 

          under games played won drawn lost goals for
          71

          tabulated under games played won drawn lost goals
          70

          games played won drawn lost goals for against
          70

          May Xinhua Following are the results from the
          69

          played won drawn lost goals for against and
          59

          won drawn lost goals for against and points
          59

          Jan Xinhua Following are the results from the
          48

          Chinas economic efficiency indicators of the sector of
          39

          The industrial statistics include all stateowned enterprises and
          39

          industrial statistics include all stateowned enterprises and the
          39

          statistics include all stateowned enterprises and the nonstateowned
          39

          include all stateowned enterprises and the nonstateowned ones
          39

          all stateowned enterprises and the nonstateowned ones with
          39

          stateowned enterprises and the nonstateowned ones with annual
          39

          enterprises and the nonstateowned ones with annual sales
          39

          and the nonstateowned ones with annual sales income
          39

          Xinhua Chinas economic efficiency indicators of the sector
          39

          the nonstateowned ones with annual sales income over
          39

          nonstateowned ones with annual sales income over million
          39

          up percent over the same period last year
          35

          13594 ms


          測試結(jié)果2  m = 10 n = 5

          Xinhua Following are the results
          295

          May Xinhua Following are the
          209

          Following are the results from
          183

          are the results from the
          176

          April Xinhua Following are the
          141

          Jan Xinhua Following are the
          122

          billion yuan billion US dollars
          120

          won drawn lost goals for
          88

          played won drawn lost goals
          88

          Dec Xinhua Following are the
          87

          12437 ms

           
          以上源程序是采用的是最簡單的方法,誰有更好,效率更高的方法,請跟貼!!





          Android開發(fā)完全講義(第2版)(本書版權(quán)已輸出到臺灣)

          http://product.dangdang.com/product.aspx?product_id=22741502



          Android高薪之路:Android程序員面試寶典 http://book.360buy.com/10970314.html


          新浪微博:http://t.sina.com.cn/androidguy   昵稱:李寧_Lining

          posted on 2008-05-10 09:37 銀河使者 閱讀(1673) 評論(1)  編輯  收藏 所屬分類: javaalgorithm 原創(chuàng)

          評論

          # re: 《程序員》第9期智慧擂臺題目——高頻詞匯提取  回復  更多評論   

          恩,程序不錯,應該是最簡單的了
          2008-09-01 11:51 | andyluohao
          主站蜘蛛池模板: 通江县| 柏乡县| 静乐县| 龙州县| 曲阳县| 桃园县| 绥阳县| 宿州市| 远安县| 搜索| 乐陵市| 黔西县| 高陵县| 天镇县| 东宁县| 二连浩特市| 澄城县| 杭锦后旗| 大洼县| 万年县| 新巴尔虎右旗| 额尔古纳市| 东乡县| 杭锦后旗| 容城县| 柞水县| 桐柏县| 武宁县| 拉萨市| 安乡县| 伊金霍洛旗| 香港 | 尉犁县| 七台河市| 吴旗县| 安义县| 织金县| 囊谦县| 天台县| 镇平县| 盐亭县|