鎴戣繖閲岃鐨勪笉鏄庝箞浣跨敤鎼滅儲(chǔ)寮曟搸錛岃屾槸鎬庝箞璁╃▼搴忓埄鐢ㄦ悳绱㈠紩鎿庢潵鎼滈泦緗戝潃錛岃繖鏈変粈涔堢敤錛熷緢鏈夌敤錛佺綉涓婂姩杈勬湁浜哄彨鍗栫綉鍧鏁版嵁搴擄紝濡傚彂甯冭蔣浠剁綉鍧銆侀偖浠跺湴鍧銆佽鍧涚綉鍧銆佽涓氱綉鍧錛岃繖浜涚綉鍧鏄庝箞鏉ョ殑鍛紵涓嶅彲鑳芥槸浜烘墜宸ユ敹闆嗚屾潵鐨勶紝閮芥槸璁╃▼搴忓埄鐢ㄦ悳绱㈠紩鎿庡彇鍒扮殑錛屽鏋滄?zhèn)ㄩ渶瑕佹煇綾葷綉鍧淇℃伅鏁版嵁錛屽氨璺熸垜鏉ヤ竴璧風(fēng)爺絀朵竴涓嬶紝闈炲父綆鍗曘?/p>
銆銆鏈枃閲囩敤Java璇█鍐欐垚錛屼互google鍜岀櫨搴︽悳绱㈠紩鎿庝負(fù)瀵硅薄銆?/p>
銆銆鎴戜滑瑕佸埄鐢╣oogle銆佺櫨搴︽悳绱㈠紩鎿庣殑鎼滅儲(chǔ)瑙勫垯涓殑涓ゆ潯錛屽叧閿瓧鎼滅儲(chǔ)鍜宨nurl鎼滅儲(chǔ)銆備粈涔堟槸inurl鎼滅儲(chǔ)錛屽氨鏄綘鎵瑕佹悳绱㈢殑緗戝潃涓湰韜甫鏈夌殑鍏抽敭瀛楋紝姣斿http://www.xxx.com/post.asp ,榪欎釜緗戝潃灝卞惈鏈塸ost.asp榪欐牱鐨勫叧閿瓧錛屽湪鎼滅儲(chǔ)寮曟搸涓~鍐欒鍒欐槸 inurl:post.asp,榪欐槸鏀墮泦緗戝潃鐨勫叧閿紝鍥犱負(fù)寰堝緗戝潃鏈韓浼?xì)甯︽湁鐗瑰畾鐨勪俊鎭Q屾瘮濡傝蔣浠跺彂甯冪殑緗戦〉緗戝潃淇℃伅涓鍚湁 publish銆乻ubmit銆乼uijian榪欐牱鐨勪俊鎭紝濡俬ttp://www.xxx.com/publish.asp,榪欐牱鐨勭綉鍧澶氭槸鍙戝竷淇℃伅鐨勭綉欏碉紝鍦ㄧ粨鍚堢綉欏典腑鏈韓鍙兘鍚湁鐨勫叧閿瓧錛屽氨鍙互鐢ㄦ悳绱㈠紩鎿庢悳绱㈠嚭緇撴灉錛岀劧鍚庢垜浠埄鐢ㄧ▼搴忓皢緇撴灉鍙栧洖錛屽HTML欏甸潰榪涜鍒嗘瀽錛屽幓闄ゆ病鏈夌敤鐨勪俊鎭紝灝嗘湁鐢ㄧ殑緗戝潃淇℃伅鍐欏叆鏂囦歡鎴栬呮暟鎹簱錛屽氨鍙互緇欏叾瀹冨簲鐢ㄧ▼搴忔垨鑰呬漢鏉ヤ嬌鐢ㄤ簡(jiǎn)銆?/p>
銆銆絎竴姝ワ紝鐢ㄧ▼搴忓皢鎼滅儲(chǔ)緇撴灉鍙栧洖錛屽厛浠ョ櫨搴︿負(fù)渚嬶紝姣斿鎴戜滑瑕佹悳绱㈣蔣浠跺彂甯冪殑緗戦〉錛屽叧閿瓧閲囩敤 鈥滆蔣浠跺彂甯?鐗堟湰 inurl:publish.asp",鍏堢櫥褰曠櫨搴︾湅鐪嬶紝灝嗗叧閿瓧鍐欏叆錛岀劧鍚庢彁浜わ紝鍦ㄥ湴鍧鏍忓氨浼?xì)鐪嬪?http://www.baidu.com/s?ie=gb2312&bs=%C8%ED%BC%FE%B7%A2%B2%BC+%C8%ED%BC%FE%B0%E6%B1%BE+inurl%3Apublish.asp&sr=&z=&cl=3&f=8&wd=%C8%ED%BC%FE%B7%A2%B2%BC+%B0%E6%B1%BE+inurl%3Apublish.asp&ct=0 ,涓枃鍏抽敭瀛楀叏閮藉彉鎴愮紪鐮佷簡(jiǎn)錛屾病鏈夊叧緋伙紝鎴戜滑鍦ㄧ▼搴忎腑鐩存帴鐢ㄤ腑鏂囦篃鏄彲浠ョ殑錛屽叾涓涓叧閿瓧鐢紜鍙風(fēng)浉榪烇紝鍘繪帀涓浜涙病鏈夌敤鐨勪俊鎭紝鎴戜滑鍙互鎶婂湴鍧浼樺寲鎴?http://www.baidu.com/s?lm=0&si=&rn=20&ie=gb2312&ct=0& wd=杞歡鍙戝竷+鐗堟湰+inurl%3Apublish%2Easp&pn=0&cl=0錛屽叾涓璻n琛ㄧず涓欏墊樉紺哄灝戜釜緇撴灉錛寃d=琛ㄧず浣犺鎼滅儲(chǔ)鐨勫叧閿瓧錛宲n琛ㄧず浠庣鍑犳潯寮濮嬫樉紺猴紝榪欎釜pn灝嗘槸鎴戜滑紼嬪簭寰幆鍙栫粨鏋滅殑鍙橀噺錛屾瘡20鏉″驚鐜竴嬈°傛垜浠敤Java鍐欑殑紼嬪簭鏉ユā鎷熻繖涓悳绱㈢殑榪囩▼錛岀敤鍒扮殑鍏抽敭綾諱負(fù) java.net.HttpURLConnection,java.net.URL錛屽厛鍐欎竴涓彁浜ゆ悳绱㈢殑class,鍏抽敭浠g爜濡備笅錛?/p>
class Search { 銆public URL url; 銆public HttpURLConnection http; 銆public java.io.InputStream urlstream; 銆...... 銆for(int i=0;i++;i <100) 銆{ 銆銆...... 銆銆try { 銆銆銆url = new URL("www.baidu.com/s?lm=0&si=&rn=20&ie=gb2312&ct=0& wd=杞歡鍙戝竷+鐗堟湰+inurl%3Apublish%2Easp&pn="+beginrecord+"&cl=0"); 銆銆}catch(Exception ef){}; 銆銆try { 銆銆銆http = (HttpURLConnection) url.openConnection(); 銆銆銆http.connect(); 銆銆銆urlstream = http.getInputStream(); 銆銆}catch(Exception ef){}; 銆銆java.io.BufferedReader l_reader = new java.io. 銆銆BufferedReader(new java.io.InputStreamReader(urlstream)); 銆銆try { 銆銆銆while ((currentLine = l_reader.readLine()) != null) { 銆銆銆銆totalstring += currentLine; 銆銆銆} 銆銆} catch (IOException ex3) {} 銆銆.... 銆銆//鏈鎼滅儲(chǔ)鐨勭粨鏋滃凡緇忔斁鍒皌otalstring涓簡(jiǎn)錛屾槸涓浜汬TML浠g爜錛岄渶瑕佷笅涓姝ヨ繘琛屽垎鏋愪簡(jiǎn)銆?br />} 銆銆鍐嶄互google涓轟緥錛岀◢寰湁浜涗笉鍚岋紝google瀵規(guī)祻瑙堝櫒榪涜浜?jiǎn)涓浜涙嫻嬶紝緙栫爜涔熶笉鍚岋紝URL涓篽ttp: //www.google.com/search?q=杞歡鍙戝竷+鐗堟湰+inurl:publish.asp&hl=zh-CN&lr= &newwindow=1&start=0&sa=N&ie=UTF-8,鍏朵腑緙栫爜瑕佺敤ie=UTF-8,start琛ㄧず浠庣鍑犳潯璁板綍鏄劇ず錛岄渶瑕佹敞鎰忕殑鏄痝oogle瀵規(guī)祻瑙堝櫒榪樿媯(gè)鏌ワ紝濡傛灉嫻忚鍣ㄤ笉絎﹀悎瀹冪殑瑕佹眰錛屽皢榪斿洖閿欒浠g爜錛屾墍浠ュ湪妯℃嫙嫻忚鍣ㄦ彁浜や腑錛屾垜浠澶氬姞涓琛屼唬鐮侊紝淇敼鍏抽敭閮ㄥ垎瑕佸皢http灞炴т腑鐨刄ser-Agent璁劇疆涓哄父鐢ㄧ殑嫻忚鍣紝姣斿Mozilla/4.0,浠g爜濡備笅錛?/p>
try { 銆http = (HttpURLConnection) url.openConnection(); 銆http.setRequestProperty("User-Agent", "Mozilla/4.0"); 銆http.connect(); 銆urlstream = http.getInputStream(); }catch(Exception ef){};
銆銆絎簩姝ワ紝瀵瑰彇鍥炵殑HTML緙栫爜榪涜鍒嗘瀽錛屽彇鍑哄叾涓殑鏈夌敤緗戝潃淇℃伅錛屽茍鍐欏叆鏂囦歡鎴栬呮暟鎹簱錛岀敱浜庤繖浜涙悳绱㈠紩鎿庨兘鏈夌綉欏靛揩鐓у拰鐩鎬技緗戦〉絳夌綉鍧淇℃伅娣鋒潅鍦℉TML涓紝鎴戜滑瑕佸皢榪欎簺緗戝潃淇℃伅鍓旈櫎鎺夛紝鍓旈櫎鐨勫叧閿氨鏄壘鍑哄叾涓殑瑙勫緥錛岀櫨搴︽悳绱㈠紩鎿庝腑鐨勭綉欏靛揩鐓у拰鍏跺畠娌℃湁鐢ㄧ殑鐨勫湴鍧閮藉惈鏈塨aidu榪欎釜鍏抽敭瀛楋紝鑰実oogle涓惈鏈夌殑鏃犵敤緗戝潃淇℃伅鍚湁鍏抽敭瀛?google鍜宑ache,鎴戜滑灝辨牴鎹繖浜涘叧閿瓧鍓旈櫎鏃犵敤緗戝潃淇℃伅銆傚湪Java涓瀵瑰瓧絎︿覆榪涜鍒嗘瀽蹇呯劧瑕佺敤鍒?java.util.StringTokenize榪欎釜綾伙紝鐢ㄦ潵灝嗗瓧絎︿覆浠ョ壒瀹氱殑鍒嗛殧絎﹀垎寮錛宩ava.util.regex.Pattern鍜?java.util.regex.Matcher鐢ㄦ潵鍖歸厤瀛楃涓詫紝鍏抽敭浠g爜濡備笅錛?/p>
class CompareStr { 銆public boolean comparestring(String oristring,String tostring) 銆{ 銆銆Pattern p=null; //姝e垯琛ㄨ揪寮?br />銆銆Matcher m=null; //鎿嶄綔鐨勫瓧絎︿覆 銆銆boolean b; 銆銆p = Pattern.compile(oristring,Pattern.CASE_INSENSITIVE); 銆銆m = p.matcher(tostring); 銆銆b = m.find(); 銆銆return b; 銆} }
class AnalyUrl { 銆...... 銆StringTokenizer token = new StringTokenizer(totalstring," <> \""); 銆String firstword; 銆CompareStrcompstr = new CompareStr(); 銆String dsturl = null; 銆while (token.hasMoreTokens()) 銆{ 銆銆firstword = token.nextToken(); 銆銆if (!compstr.comparestring("google.com", firstword) && !compstr.comparestring("cache",firstword)) 銆銆{ 銆銆銆if (firstword.length() > 7) 銆銆銆{ 銆銆銆銆dsturl = firstword.substring(6,firstword.length() - 1); 銆銆銆銆WriteUrl(dsturl); //鎴愬姛鍙栧埌URL錛岃褰曞埌鏂囦歡涓?br />銆銆銆} 銆銆} 銆} } 銆銆閫氳繃浠ヤ笂紼嬪簭錛屾垜浠氨鍙互鏀墮泦鍒拌嚜宸辮鐨勭綉鍧淇℃伅浜?jiǎn)锛寴q樺彲浠ュ啀鍐欏彟澶栦竴涓簲鐢ㄧ▼搴忥紝瀵規(guī)敹闆嗗埌鐨勭綉鍧淇℃伅榪涗竴姝ュ垎鏋愶紝鍙栧嚭鑷繁闇瑕佺殑淇℃伅錛岃繖閲屽氨涓嶅啀绱禈錛岄亾鐞嗛兘鏄竴鏍風(fēng)殑銆傛渶鍚庨渶璇存槑涓鐐癸紝google鎼滅儲(chǔ)寮曟搸鎼滅儲(chǔ)鎵鑳借繑鍥炵殑緇撴灉涓嶈兘瓚呰繃1000鏉★紝榪囦簡(jiǎn)1000鏉★紝灝辯洿鎺ユ彁紺衡滃涓嶈搗錛?Google 涓烘墍鏈夋煡璇㈢殑緇撴灉鏁伴兘涓嶄細(xì)瓚呰繃 1000 涓傗濓紝鐧懼害鎼滅儲(chǔ)寮曟搸榪斿洖鐨勭粨鏋滀笉鑳借秴榪?00澶氭潯錛屾墍浠ユ垜浠鎼滅儲(chǔ)鏃跺敖鍙兘澶氬姞鍏抽敭瀛楋紝灝嗙粨鏋滆寖鍥寸緝?yōu)畯銆?/p>
|