以下是下載一個網頁,然后再控制臺顯示全部的HTML代碼的例程,其中使用了String自帶的編碼方案。
一個問題就是,在處理一個網頁的時候,不能知道這個網頁的編碼到底是是什么。
應該用urlcon.getContentEncoding()函數來取。但是,在CSDN這個網頁里面,剛好取不到。
因為我想,它沒有把charset放到單獨的Meta標簽里面。


?1?package?mynet;
?2?
?3?import?java.io.IOException;
?4?import?java.io.InputStream;
?5?import?java.net.MalformedURLException;
?6?import?java.net.URL;
?7?import?java.util.Date;
?8?
?9?import?sun.net.www.protocol.http.HttpURLConnection;
10?
11?public?class?URLDemo?{
12?????public?static?void?main(String[]?args)?{
13?
14?????????System.out.println("Starting
");
15?????????int?c;
16?
17?????????HttpURLConnection?urlcon?=?null;
18?????????try?{
19?????????????URL?url?=?new?URL("http://www.csdn.net");
20?????????????try?{
21?????????????????urlcon?=?(HttpURLConnection)?url.openConnection();
22?????????????}?catch?(IOException?e)?{
23?
24?????????????}
25?????????????System.out.println("the?date?is?:"?+?new?Date(urlcon.getDate()));
26?????????????System.out.println("content_type?:"?+?urlcon.getContentType());
27?????????????try?{
28?????????????????InputStream?in?=?urlcon.getInputStream();
29?????????????????int?all?=?in.available();
30?????????????????String?webpage?=?null;
31?????????????????while?(all?>?0)?{
32?????????????????????byte[]?b?=?new?byte[all];
33?????????????????????in.read(b);
34?????????????????????webpage?=?new?String(b,?"UTF-8");
35?????????????????????System.out.println(webpage);
36?????????????????????all?=?in.available();
37?????????????????????Thread.sleep(2000);//給它點下載的時間,每兩秒鐘讀取一次
38?????????????????}
39?????????????????in.close();
40?????????????????System.out.println(webpage);
41?????????????}?catch?(Exception?e)?{
42?????????????????System.out.println(""?+?e);
43?????????????}
44?
45?????????}?catch?(MalformedURLException?e)?{
46?????????????System.out.println(""?+?e);
47?????????}
48?
49?????}
50?
51?}