首頁新隨筆新文章聯系聚合

posts - 495,comments - 227,trackbacks - 0

2006年9月

>

日

一

二

三

四

五

六

27

28

29

30

31

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

30

1

2

3

4

5

6

7

常用鏈接

留言簿(46)

隨筆分類(476)

隨筆檔案(495)

搜索

積分與排名

積分 - 1395648
排名 - 16

閱讀排行榜

評論排行榜

我在網上找的與utf、gbk轉換相關的資料

編程時中文編碼問題總是令人頭疼，加班中也遇到了需要把UTF-8轉為GBK的情況：

1、疑問：用new String(str.getBytes("UTF-8"),"GBK")，為什么不能把utf-8轉為gbk???

2、想到一個BT一點的轉碼方式：）

URLDecoder.decode(URLEncoder.encode(str,"gbk"),"gbk")，

其中str為utf-8 String，結果被轉為GBK，呵呵，很有趣。?

**************************************************************************

在url中,多字節被轉換成了application/x-www-form-urlencoded MIME 格式.你自己編轉換程序也沒用.你應該用URLDecoder 類來首先將那種格式轉換成UTF-8,然后就可轉換成 GBK 了:

System.out.println("反對"+new String(URLDecoder(s, "utf-8").getBytes("utf-8"),"GBK"));

**************************************************************************************************

UTF-8版本轉換為GBK版本

適用人群：從MOLYX等UTF-8版本轉換過來的，裝了UTF-8版本后悔的
簡易程度：簡單
教程制作：夢遲教程（jiaocheng.org）

方法如下：

本人為客服之家（kefu.net.cn）做論壇一開始用的是molyx2.5的程序，經過一段時間感覺不是很習慣，于是決定轉換程序，去DZ的官方論壇找到一個轉dz4的轉換程序，轉換過程非常順利。但是轉換必須是用UTF-8版本的程序，然而好多插件和風格都是用的GBK格式，所以造成沒法使用，于是就想在寫pw4轉換molyx2.5的時候需要先將數據庫從GBK轉換到utf-8，所以想到如果現在將UTF-8轉換回GBK在導入到GBK格式的論壇是否可以呢。于是試驗了一下，結果也是很順利，廢話少說，下面開始實戰。

一、首先將DZ論壇的UTF-8數據庫在后臺導出，然后下載到本地，存起來。

二、下載convertz這個軟件，下載可從本站下載：

http://www.jiaocheng.org/soft/convertz802.zip

三、用convertz轉換格式

1.解壓縮convertz，運行里面的ConvertZ.exe如下圖

2.點擊文件按鈕，按照以下動畫演示做。

四、全新安裝dz的GBK格式論壇程序，把轉換后的文件上傳到備份目錄如：forumdata文件夾下。

五、登陸新系統，在后臺數據庫將轉換后的數據庫導入新論壇，更新緩存等。即可

演示：

原UTF8版本

http://msvip.com.cn

轉換后GBK??

http://kefu.net.cn

轉換后基本沒有什么錯誤，只是有些文字亂碼，不是很嚴重。有不清楚的可以問我，謝謝大家支持

夢遲教程原創（jiaocheng.org），轉載請注明出處

***********************************************************************************************

GBK 漢字轉 UTF-8 漢字

來源：CSDN???發布會員：新書城收集整理???發布時間：2006-8-8???人氣：93

近日做一程序，需要將不同內碼的文字轉換成為某一種統一編碼的文字（例如將 GBK 編碼的漢字轉換為 UTF-8 編碼的漢字）。網上關于不同內碼文字處理的文章，大都是關于解決漢字亂碼問題的。而我需要做的，類似于 UltraEdit 中的 convertion 中的功能。

開始時，嘗試了諸如
??? new String(str.getBytes("GBK"), "UTF-8");
之類的方法。對于內碼轉換來說，這些方法都不是正確的。這些方法，對于解決漢字顯示亂碼是實用的，但是并不能正確地將 GBK 漢字映射到具有相同意義的 UTF-8 漢字上去。

我們都知道，在 JVM 內部，所有的字符串都是轉換成為 Unicode 編碼來處理的。我們從一個 GBK 編碼的文本中讀取的內容，寫到另外一個 UTF-8 編碼的文本文件中去，并不會出現亂碼的問題。似乎可以猜測到，我們可以利用 Java IO 中的 Stream 來良好的處理內碼轉換的問題。為了方便起見，可以借助 Apache Commons-IO 項目中提供的實用工具來編寫代碼。
??? /* gbkString 為一 GBK 編碼的字符串 */
??? String utf8String = IOUtils.toString(IOUtils.toInputStream(gbkString, "UTF-8"));
utf8String中字符，皆變為 UTF-8 編碼。

附，com.apache.commons.io.IOUtils 中相關代碼如下：
??? /**
???? * Convert the specified string to an input stream, encoded as bytes
???? * using the specified character encoding.
???? * <p>
???? * Character encoding names can be found at
???? * <a >IANA</a>.
???? *
???? * @param input the string to convert
???? * @param encoding the encoding to use, null means platform default
???? * @throws IOException if the encoding is invalid
???? * @return an input stream
???? * @since Commons IO 1.1
???? */
??? public static InputStream toInputStream(String input, String encoding) throws IOException {
??????? byte[] bytes = encoding != null ? input.getBytes(encoding) : input.getBytes();
??????? return new ByteArrayInputStream(bytes);
??? }

**************************************************************************************************************

tomcat中文問題的解決

在tomcat5中發現了以前處理tomcat4的方法不能適用于處碇苯油ü齯rl提交的請求，上網找資料終于發現了最完美的解決辦法，不用每個地方都轉換了，而且無論get,和post都正常。寫了個文檔，貼出來希望跟我有同樣問題的人不再像我一樣痛苦一次:-)

問題描述：

1 表單提交的數據，用request.getParameter(“xxx”)返回的字符串為亂碼或者？？
2 直接通過url如http://localhost/a.jsp?name=中國，這樣的get請求在服務端用request. getParameter(“name”)時返回的是亂碼；按tomcat4的做法設置Filter也沒有用或者用request.setCharacterEncoding("GBK");也不管用

原因：

1 tomcat的j2ee實現對表單提交即post方式提示時處理參數采用缺省的iso-8859-1來處理
2 tomcat對get方式提交的請求對query-string 處理時采用了和post方法不一樣的處理方式。(與tomcat4不一樣,所以設置setCharacterEncoding(“gbk”))不起作用。

解決辦法：

首先所有的jsp文件都加上:

1 實現一個Filter.設置處理字符集為GBK。(在tomcat的webapps/servlet-examples目錄有一個完整的例子。請參考web.xml和SetCharacterEncodingFilter的配置。)

1)只要把%TOMCAT安裝目錄%/ webapps\servlets-examples\WEB-INF\classes\filters\SetCharacterEncodingFilter.class文件拷到你的webapp目錄/filters下，如果沒有filters目錄，就創建一個。
2)在你的web.xml里加入如下幾行：

??? <filter>
??????? <filter-name>Set Character Encoding</filter-name>
??????? <filter-class>filters.SetCharacterEncodingFilter</filter-class>
??????? <init-param>
??????????? <param-name>encoding</param-name>
??????????? <param-value>GBK</param-value>
??????? </init-param>
??? </filter>
??????? <filter-mapping>
??????? <filter-name>Set Character Encoding</filter-name>
??????? <url-pattern>/*</url-pattern>
??? </filter-mapping>

3)完成.

2 get方式的解決辦法

1) 打開tomcat的server.xml文件，找到區塊，加入如下一行：URIEncoding=”GBK”

完整的應如下：

<Connector port="80"? maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
?????????????? enableLookups="false" redirectPort="8443" acceptCount="100"
?????????????? debug="0" connectionTimeout="20000"
?????????????? disableUploadTimeout="true"
?????????????? URIEncoding="GBK"/>

2)重啟tomcat,一切OK。

執行如下jsp頁頁測試是否成功

<%@ page contentType="text/html;charset=gb2312"%>
<%@ page import="java.util.*"%>
<%
??????? String q=request.getParameter("q");
??????? q = q == null? "沒有值" : q;
%>
<HTML>
<HEAD>
<TITLE>新聞列表顯示</TITLE>
<META http-equiv=Content-Type content="text/html; charset=gb2312">
<META http-equiv=pragma content=no-cache>
<body>你提交了：<%=q%><br>
<form action="tcnchar.jsp" method="post">
?輸入中文:<input type="text" name="q"><input type="submit" value="確定">?<br>
<a href="tcnchar.jsp?q=中國">通過get方式提交</a>
</form>
</BODY>
</HTML>

*******************************************************************************************

http://jakarta.apache.org/commons/httpclient/methods/post.html

*********************************************************************************************

請教個UTF-8轉GBK的問題，謝謝

目的：要從一個WEB server下載內容，格式為UTF-8, 要轉換為GBK輸出

出現的問題：只有部分中文字可以轉換為GBK，不能轉換的都輸出為“？”號，比如“我”可以正常轉換，但“道”字就不能正常轉換了,大家幫幫看看是什么問題吧

解決問題的思路：

? ?1。先去了解了UTF-8的編碼方法，將讀出后BUFF的字串轉換了byte數據，按16進制輸出，發現不通轉換的那些中文的編碼不正常，比如“載”字，正確的UTF-8編碼應該為E8BDBD ，但輸出結果為E8BD3F，因此懷疑是WEB服務器傳過來就有問題，但沒理由的呀，瀏覽器顯示卻是正常的。
? ?2。用SNIFFER 抓取數據包來分析，數據包里的編碼也沒問題。。。會不會是java的數據流讀取那里出了問題呢。
? ?3。更換讀取數據流的方法，由讀取String，改為讀取byte(比讀取String麻煩多了)，輸出。。一切都正常了。。
? ?
? ?
[code]
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? String urlstring="http://**.com";
? ? ? ? ? ? ? ?
? ? ? ? try{
? ? ? ? ? ? ? ? URL url = new URL(urlstring);
? ? ? ? ? ? ? ? URLConnection conn = url.openConnection();
? ? ? ? ? ? ? ? InputStream in =conn.getInputStream();
? ? ? ? ? ? ? ? byte[] tempbuff=new byte[100];??//臨時數組
? ? ? ? ? ? ? ? byte[] buff =new byte[10240];??//定義一下足夠大的數組
? ? ? ? ? ? ? ? int count=0;? ?//讀取字節個數
? ? ? ? ? ? ? ? int rbyte=0;? ?//每次讀取的個數
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? while((rbyte=in.read(tempbuff))!=-1){? ?
? ? ? ? ? ? ? ? ? ? ? ? for(int i=0;i<rbyte;i++)
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? buff[count+i]=tempbuff[i];
? ? ? ? ? ? ? ? ? ? ? ? count+=rbyte;
? ? ? ? ? ? ? ? }
? ? ? ?? ???

? ? ? ? ? ? ? ? byte[] result=new byte[count];? ?
? ? ? ? ? ? ? ? for(int i=0;i<count;i++)
? ? ? ? ? ? ? ? ? ? ? ? result[i]=buff[i];? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? String output=new String(result,"UTF-8");

? ? ? ? ? ? ? ? System.out.println(output);
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? }
? ? ? ? ? ? ? ? catch (MalformedURLException e)
? ? ? ? ? ? ? ? {
? ? ? ? ? ? ? ? e.printStackTrace();
? ? ? ? ? ? ? ? }
? ? ? ? ? ? ? ? catch (IOException e)
? ? ? ? ? ? ? ? {
? ? ? ? ? ? ? ? e.printStackTrace();
? ? ? ? ? ? ? ? }

**********************************************************************************

?UTF轉換成GBK

? public final static String readUTF(byte[] data) throws IOException {
????????? int utflen = data.length;
????????? StringBuffer str = new StringBuffer(utflen);
????????? byte bytearr[] = data;
????????? int c, char2, char3;
????????? int count = 0;

????????? while (count < utflen) {
????????????? c = (int) bytearr[count] & 0xff;
????????????? switch (c >> 4) {
????????????????? case 0:
????????????????? case 1:
????????????????? case 2:
????????????????? case 3:
????????????????? case 4:
????????????????? case 5:
????????????????? case 6:
????????????????? case 7:
????????????????????? /* 0xxxxxxx*/
????????????????????? count++;
????????????????????? str.append( (char) c);
????????????????????? break;
????????????????? case 12:
????????????????? case 13:

????????????????????? /* 110x xxxx?? 10xx xxxx*/
????????????????????? count += 2;
????????????????????? if (count > utflen) {
????????????????????????? throw new UTFDataFormatException(
????????????????????????????? "UTF Data Format Exception");
????????????????????? }
????????????????????? char2 = (int) bytearr[count - 1];
????????????????????? if ( (char2 & 0xC0) != 0x80) {
????????????????????????? throw new UTFDataFormatException();
????????????????????? }
????????????????????? str.append( (char) ( ( (c & 0x1F) << 6) | (char2 & 0x3F)));
????????????????????? break;
????????????????? case 14:

????????????????????? /* 1110 xxxx? 10xx xxxx? 10xx xxxx */
????????????????????? count += 3;
????????????????????? if (count > utflen) {
????????????????????????? throw new UTFDataFormatException(
????????????????????????????? "UTF Data Format Exception");
????????????????????? }
????????????????????? char2 = (int) bytearr[count - 2];
????????????????????? char3 = (int) bytearr[count - 1];
????????????????????? if ( ( (char2 & 0xC0) != 0x80) || ( (char3 & 0xC0) != 0x80)) {
????????????????????????? throw new UTFDataFormatException();
????????????????????? }
????????????????????? str.append( (char) ( ( (c & 0x0F) << 12)
????????????????????????? | ( (char2 & 0x3F) << 6) | ( (char3 & 0x3F) << 0)));
????????????????????? break;
????????????????? default:

????????????????????? /* 10xx xxxx,? 1111 xxxx */
????????????????????? throw new UTFDataFormatException(
????????????????????????? "UTF Data Format Exception");
????????????? }
????????? }
????????? // The number of chars produced may be less than utflen
????????? return new String(str);
????? }

***************************************************************************************

?根據傳入的UTF-8類型的字節數組生成Unicode字符串的方法.

下面的代碼根據utf8轉換成unicode.

/**
???? * 根據傳入的UTF-8類型的字節數組生成Unicode字符串
???? * @param????? UTF-8類型的字節數組
???? * @return???? Unicode字符串
???? * @exception? IOException?????????? 產生IO異常
???? * @exception? UTFDataFormatException? 傳入了非UTF-8類型的字節數組
???? */
??? public final static String readUTF(byte[] data) throws IOException {
??????? int utflen = data.length;
??????? StringBuffer str = new StringBuffer(utflen);
??????? byte bytearr[] = data;
??????? int c, char2, char3;
??????? int count = 0;

??????? while (count < utflen) {
??????????? c = (int) bytearr[count] & 0xff;
??????????? switch (c >> 4) {
??????????????? case 0:
??????????????? case 1:
??????????????? case 2:
??????????????? case 3:
??????????????? case 4:
??????????????? case 5:
??????????????? case 6:
??????????????? case 7:
??????????????????? /* 0xxxxxxx*/
??????????????????? count++;
??????????????????? str.append( (char) c);
??????????????????? break;
??????????????? case 12:
??????????????? case 13:

??????????????????? /* 110x xxxx?? 10xx xxxx*/
??????????????????? count += 2;
??????????????????? if (count > utflen) {
??????????????????????? throw new UTFDataFormatException(
??????????????????????????? "UTF Data Format Exception");
??????????????????? }
??????????????????? char2 = (int) bytearr[count - 1];
??????????????????? if ( (char2 & 0xC0) != 0x80) {
??????????????????????? throw new UTFDataFormatException();
??????????????????? }
??????????????????? str.append( (char) ( ( (c & 0x1F) << 6) | (char2 & 0x3F)));
??????????????????? break;
??????????????? case 14:

??????????????????? /* 1110 xxxx? 10xx xxxx? 10xx xxxx */
??????????????????? count += 3;
??????????????????? if (count > utflen) {
??????????????????????? throw new UTFDataFormatException(
??????????????????????????? "UTF Data Format Exception");
??????????????????? }
??????????????????? char2 = (int) bytearr[count - 2];
??????????????????? char3 = (int) bytearr[count - 1];
??????????????????? if ( ( (char2 & 0xC0) != 0x80) || ( (char3 & 0xC0) != 0x80)) {
??????????????????????? throw new UTFDataFormatException();
??????????????????? }
??????????????????? str.append( (char) ( ( (c & 0x0F) << 12)
??????????????????????? | ( (char2 & 0x3F) << 6) | ( (char3 & 0x3F) << 0)));
??????????????????? break;
??????????????? default:

??????????????????? /* 10xx xxxx,? 1111 xxxx */
??????????????????? throw new UTFDataFormatException(
??????????????????????? "UTF Data Format Exception");
??????????? }
??????? }
??????? // The number of chars produced may be less than utflen
??????? return new String(str);
??? }

************************************************************************************

GB/BIG5/UTF-8 文件編碼批量轉換程序September 12th, 2006

GB/BIG5/UTF-8 文件編碼批量轉換程序昨天我需要將一個 GB 編碼的 WEB 應用改變成 UTF-8 編碼，整個 WEB 程序涉及 300 多個 ASP 和 HTML 文件….于是乎，我上網搜索能將 GB 文件批量轉換成 UTF-8 編碼的軟件。找來找去，多是一些僅能在網頁中實時編碼的 VBS、JS 或 PHP 腳本，而沒有進行大量文件編碼轉換的工具。

因為時間緊迫，后來只好使用最原始的辦法，用 Windows 的記事本打開一個個 ASP 文件，使用“另存為…”的方式變成 UTF-8 編碼。真是郁悶得要S….最后急S我了，只好再去找軟件，拼了！！！

終于發現了這款很棒的GB/BIG5/UTF-8 文件編碼批量轉換程序，用下來感覺確實挺不錯，推薦一下！

軟件很小，才25KB，希望對于網站開發或者其他網頁編輯人員有幫助。

下載地址：http://beebee.com.cn/jinnylife/wp-content/rar/gb2utf8.rar
解壓縮密碼：http://beebee.com.cn/jinnylife/

posted on 2006-09-27 09:30 SIMONE 閱讀(2567) 評論(1) 編輯收藏

FeedBack:

# re: 我在網上找的與utf、gbk轉換相關的資料

2009-07-13 13:35 | 阿陽

博主，我有一個問題想請教你，希望不吝賜教。
問題是這樣的，看了你的博文，我把commons-io-1.4.jar引進了我的工程，亦如你在博文中的寫法（GBK轉UTF-8），我將String utf8String = IOUtils.toString(IOUtils.toInputStream(gbkString, "UTF-8")); 寫進了我的controller中，但奇怪的是，在頁面中出現了這樣的亂碼“用戶: una 變更的內容已保存??”，我想請教一下這是為什么？
我用的環境：JDK1.4，tomcat4，Intellij5.1.2
希望你盡快給我答復，萬分感謝！回復更多評論

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問管理