小明思考

          Just a software engineer
          posts - 124, comments - 36, trackbacks - 0, articles - 0
            BlogJava :: 首頁 :: 新隨筆 :: 聯系 :: 聚合  :: 管理

          詭異的mysql latin1編碼

          Posted on 2012-02-24 14:54 小明 閱讀(1538) 評論(0)  編輯  收藏 所屬分類: 開發日志

          Mysql 的latin1 不等于標準的latin1(iso-8859-1) 和cp1252,比iso-8859-1多了0x80-0x9f字符,比cp1252多了0x81,0x8d,0x8f,0x90,0x9d 一共5個字符。

           

          http://dev.mysql.com/doc/refman/5.0/en/charset-we-sets.html

          latin1 is the default character set. MySQL's latin1 is the same as the Windows cp1252 character set. This means it is the same as the official ISO 8859-1 or IANA (Internet Assigned Numbers Authority) latin1, except that IANA latin1 treats the code points between 0x80 and 0x9f as “undefined,” whereas cp1252, and therefore MySQL's latin1, assign characters for those positions. For example, 0x80 is the Euro sign. For the “undefined” entries in cp1252, MySQL translates 0x81 to Unicode 0x0081, 0x8d to 0x008d, 0x8f to 0x008f, 0x90 to 0x0090, and 0x9d to 0x009d.

          這樣在Java中,如果使用標準的iso-8859-1或者cp1252解碼可能出現亂碼。
          s.getBytes("iso-8859-1") 或者 s.getBytes("cp1252");

          寫了一段代碼來解決這個問題
          private String convertCharset(String s){
                  
          if(s!=null){
                      
          try {
                          
          int length = s.length();
                          
          byte[] buffer = new byte[length];
                          
          //0x81 to Unicode 0x0081, 0x8d to 0x008d, 0x8f to 0x008f, 0x90 to 0x0090, and 0x9d to 0x009d.
                          for(int i=0;i<length;++i){
                              
          char c = s.charAt(i);
                              
          if(c==0x0081){
                                  buffer[i]
          =(byte)0x81;
                              }
                              
          else if(c==0x008d){
                                  buffer[i]
          =(byte)0x8d;
                              }
                              
          else if(c==0x008f){
                                  buffer[i]
          =(byte)0x8f;
                              }
                              
          else if(c==0x0090){
                                  buffer[i]
          =(byte)0x90;
                              }
                              
          else if(c==0x009d){
                                  buffer[i]
          =(byte)0x9d;
                              }
                              
          else{
                                  buffer[i] 
          = Character.toString(c).getBytes("cp1252")[0];
                              }
                          }
                          String result 
          = new String(buffer,"utf-8");
                          
          return result;
                      } 
          catch (UnsupportedEncodingException e) {
                          logger.error(
          "charset convert error", e);
                      }
                  }
                  
          return null;
              }
          主站蜘蛛池模板: 汤原县| 建宁县| 色达县| 富顺县| 南城县| 卓尼县| 醴陵市| 阳城县| 徐闻县| 卢龙县| 灵宝市| 吉隆县| 汽车| 托克托县| 滕州市| 资溪县| 依安县| 灵寿县| 讷河市| 西吉县| 华宁县| 中山市| 新建县| 怀安县| 陆良县| 郎溪县| 岚皋县| 澄迈县| 福清市| 巴楚县| 上栗县| 贡觉县| 林甸县| 紫阳县| 锡林浩特市| 孟连| 陇南市| 宜兴市| 台州市| 洞口县| 四川省|