漁人碼頭

          天行健,君子以自強(qiáng)不息。地勢(shì)坤,君子以厚德載物。
          posts - 12, comments - 16, trackbacks - 0, articles - 43
            BlogJava :: 首頁 :: 新隨筆 :: 聯(lián)系 :: 聚合  :: 管理

          在用Java的HttpURLConnection 來下載網(wǎng)頁,發(fā)現(xiàn)訪問google的網(wǎng)站時(shí),會(huì)被google拒絕掉。

          ?????? try
          ??????? {
          ??????????? url = new URL(urlStr);
          ??????????? httpConn = (HttpURLConnection) url.openConnection();
          ??????????? HttpURLConnection.setFollowRedirects(true);

          ??????????? // logger.info(httpConn.getResponseMessage());
          ??????????? in = httpConn.getInputStream();
          ??????????? out = new FileOutputStream(new File(outPath));

          ??????????? chByte = in.read();
          ??????????? while (chByte != -1)
          ??????????? {
          ??????????????? out.write(chByte);
          ??????????????? chByte = in.read();
          ??????????? }
          ??????? }
          ??????? catch (MalformedURLException e)
          ????????{
          ?????????}
          ??????? }



          經(jīng)過一段時(shí)間的研究和查找資料,發(fā)現(xiàn)是由于上面的代碼缺少了一些必要的信息導(dǎo)致,增加更加詳細(xì)的屬性

          ??????????? httpConn.setRequestMethod("GET");
          ??????????? httpConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");

          完整代碼如下:
          ?? public static void DownLoadPages(String urlStr, String outPath)
          ??? {
          ??????? int chByte = 0;
          ??????? URL url = null;
          ??????? HttpURLConnection httpConn = null;
          ??????? InputStream in = null;
          ??????? FileOutputStream out = null;

          ??????? try
          ??????? {
          ??????????? url = new URL(urlStr);
          ??????????? httpConn = (HttpURLConnection) url.openConnection();
          ??????????? HttpURLConnection.setFollowRedirects(true);
          ??????????? httpConn.setRequestMethod("GET");
          ??????????? httpConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");
          ???????????
          ??????????? // logger.info(httpConn.getResponseMessage());
          ??????????? in = httpConn.getInputStream();
          ??????????? out = new FileOutputStream(new File(outPath));

          ??????????? chByte = in.read();
          ??????????? while (chByte != -1)
          ??????????? {
          ??????????????? out.write(chByte);
          ??????????????? chByte = in.read();
          ??????????? }
          ??????? }
          ??????? catch (MalformedURLException e)
          ??????? {
          ??????????? e.printStackTrace();
          ??????? }
          ??????? catch (IOException e)
          ??????? {
          ??????????? e.printStackTrace();
          ??????? }
          ??????? finally
          ??????? {
          ??????????? try
          ??????????? {
          ??????????????? out.close();
          ??????????????? in.close();
          ??????????????? httpConn.disconnect();
          ??????????? }
          ??????????? catch (Exception ex)
          ??????????? {
          ??????????????? ex.printStackTrace();
          ??????????? }
          ??????? }
          ??? }

          此外,還有第二種方法可以訪問Google的網(wǎng)站,就是用apache的一個(gè)工具HttpClient 模仿一個(gè)瀏覽器來訪問Google

          ??????? Document document = null;
          ??????? HttpClient httpClient = new HttpClient();
          ???????
          ??????? GetMethod getMethod = new GetMethod(url);
          ??????? getMethod.setFollowRedirects(true);
          ??????? int statusCode = httpClient.executeMethod(getMethod);
          ???????
          ??????? if (statusCode == HttpStatus.SC_OK)
          ??????? {
          ??????????? InputStream in = getMethod.getResponseBodyAsStream();
          ??????????? InputSource is = new InputSource(in);

          ??????????? DOMParser domParser = new DOMParser();?? //nekoHtml 將取得的網(wǎng)頁轉(zhuǎn)換成dom
          ??????????? domParser.parse(is);
          ??????????? document = domParser.getDocument();
          ???????????
          ??????????? System.out.println(getMethod.getURI());
          ???????????
          ??????? }
          ??????? return document;

          推薦使用第一種方式,使用HttpConnection 比較輕量級(jí),速度也比第二種HttpClient 的快。


          評(píng)論

          # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

          2006-12-11 16:08 by Fisher
          轉(zhuǎn)載一些代碼,使用HttpUrlConnection來模擬ie form登陸web:


          關(guān)于java模擬ie form登陸web的問題

          HttpURLConnection urlConn=(HttpURLConnection)(new URL(url).openConnection());
          urlConn.addRequestProperty("Cookie",cookie);
          urlConn.setRequestMethod("POST");
          urlConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");
          urlConn.setFollowRedirects(true);
          urlConn.setDoOutput(true); // 需要向服務(wù)器寫數(shù)據(jù)
          urlConn.setDoInput(true); //
          urlConn.setUseCaches(false); // 獲得服務(wù)器最新的信息
          urlConn.setAllowUserInteraction(false);
          urlConn.setRequestProperty("Content-Type","application/x-www-form-urlencoded");
          urlConn.setRequestProperty("Content-Language","en-US" );
          urlConn.setRequestProperty("Content-Length", ""+data.length());

          DataOutputStream outStream = new DataOutputStream(urlConn.getOutputStream());
          outStream.writeBytes(data);
          outStream.flush();
          outStream.close();

          cookie=urlConn.getHeaderField("Set-Cookie");
          BufferedReader br=new BufferedReader(new InputStreamReader(urlConn.getInputStream(),"gb2312"));


          # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

          2007-04-09 17:03 by dongle
          好文,解決我的大問題了

          # 這樣真的能解決問題嗎?  回復(fù)  更多評(píng)論   

          2007-05-31 22:12 by Rachel
          我寫了段提取網(wǎng)頁內(nèi)容的程序,批量訪問此網(wǎng)站下的明細(xì)網(wǎng)頁內(nèi)容并抓?。?a target="_new" rel="nofollow">http://cn.made-in-china.com)

          測(cè)試時(shí)執(zhí)行沒問題
          執(zhí)行到幾十次后,返回都是空
          再后來一次都不靈了
          訪問URL的代碼跟你寫的幾乎一樣
          獲取的是以下結(jié)果


          <p>Due to network security, your access to Made-in-China.com
          has been temporarily denied.</p>
          <p>In order to provide you with safe and stable web services,
          we have to prevent abuse of Made-in-China.com by implementing
          additional security measures. We hope you understand
          and cooperate with us.</p>

          # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

          2007-05-31 22:35 by Rachel
          最后
          拔掉router
          再插上
          解析正常 :-)

          # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

          2007-07-04 16:51 by smalltiger
          非常感謝你的這篇目文章!幫了我的大忙了,想和你交個(gè)朋友,可以的話請(qǐng)加我的Q:109030035或者M(jìn)SN:109030035@qq.com

          # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

          2008-02-25 10:00 by Fisher
          好久沒有搞Java了,想不到這么多朋友看了我的帖子,呵呵
          很高興能幫到樓上的那個(gè)朋友。

          最近我發(fā)現(xiàn)有個(gè)叫網(wǎng)絡(luò)爬蟲的開源組建那些,應(yīng)該會(huì)比我這個(gè)辦法好

          # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

          2008-02-26 22:51 by qyxxpd.com
          @Rachel
          我寫了段提取網(wǎng)頁內(nèi)容的程序,批量訪問此網(wǎng)站下的明細(xì)網(wǎng)頁內(nèi)容并抓?。?a target="_new" rel="nofollow">http://cn.made-in-china.com)

          其實(shí)你用.MainWebFetcher.DownLoadPages("http://cn.made-in-china.com/", "C://tmp//test.txt");

          http://cn.made-in-china.com后加/就行了.

          # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

          2008-05-08 09:43 by abyer
          你這個(gè)如何驗(yàn)證用戶名和密碼啊

          # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

          2011-03-09 09:27 by whs
          你這是模擬IE嗎?你是模擬火狐好不?標(biāo)題都搞錯(cuò)

          # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

          2011-08-26 11:57 by noname
          傻子,別看到一個(gè)Mozilla/4.0就以為是火狐,半吊子好好學(xué)著,別出來丟人現(xiàn)眼。@whs
          主站蜘蛛池模板: 台南县| 上栗县| 蓬溪县| 汾阳市| 如皋市| 嵊州市| 奉化市| 乌苏市| 拜城县| 永昌县| 平罗县| 鄂托克前旗| 库尔勒市| 河东区| 克什克腾旗| 怀化市| 大埔区| 涿州市| 西乌珠穆沁旗| 城口县| 佛山市| 杨浦区| 应用必备| 积石山| 华阴市| 化州市| 宝应县| 宁夏| 台中市| 灵台县| 百色市| 高碑店市| 天全县| 昌吉市| 丰顺县| 瓮安县| 玉田县| 黑龙江省| 武鸣县| 刚察县| 阿鲁科尔沁旗|