??xml version="1.0" encoding="utf-8" standalone="yes"?>99国产一区,亚洲精品在线a,日本一区二区三区视频在线观看http://www.aygfsteel.com/pyguru/category/470.htmlA blog of technology and life.zh-cnFri, 02 Mar 2007 01:59:37 GMTFri, 02 Mar 2007 01:59:37 GMT60Add RSS feeds to your Web site with Perl XML::RSShttp://www.aygfsteel.com/pyguru/archive/2005/02/17/1268.htmlpygurupyguruWed, 16 Feb 2005 19:04:00 GMThttp://www.aygfsteel.com/pyguru/archive/2005/02/17/1268.htmlhttp://www.aygfsteel.com/pyguru/comments/1268.htmlhttp://www.aygfsteel.com/pyguru/archive/2005/02/17/1268.html#Feedback0http://www.aygfsteel.com/pyguru/comments/commentRss/1268.htmlhttp://www.aygfsteel.com/pyguru/services/trackbacks/1268.html Guest Contributor, TechRepublic
December 22, 2004
URL: http://www.builderau.com.au/architect/webservices/0,39024590,39171461,00.htm


TechRepublic

Take advantage of the XML::RSS CPAN package, which is specifically designed to read and parse RSS feeds.

You've probably already heard of RSS, the XML-based format which allows Web sites to publish and syndicate the latest content on their site to all interested parties. RSS is a boon to the lazy Webmaster, because (s)he no longer has to manually update his or her Web site with new content.

Instead, all a Webmaster has to do is plug in an RSS client, point it to the appropriate Web sites, and sit back and let the site "update itself" with news, weather forecasts, stock market data, and software alerts. You've already seen, in previous articles, how you can use the ASP.NET platform to manually parse an RSS feed and extract information from it by searching for the appropriate elements. But I'm a UNIX guy, and I have something that's even better than ASP.NET. It's called Perl.

Installing XML::RSS
RSS parsing in Perl is usually handled by the XML::RSS CPAN package. Unlike ASP.NET, which comes with a generic XML parser and expects you to manually write RSS-parsing code, the XML::RSS package is specifically designed to read and parse RSS feeds. When you give XML::RSS an RSS feed, it converts the various <item>s in the feed into array elements, and exposes numerous methods and properties to access the data in the feed. XML::RSS currently supports versions 0.9, 0.91, and 1.0 of RSS.

Written entirely in Perl, XML::RSS isn't included with Perl by default, and you must install it from CPAN. Detailed installation instructions are provided in the download archive, but by far the simplest way to install it is to use the CPAN shell, as follows:

shell> perl -MCPAN -e shell
cpan> install XML::RSS

If you use the CPAN shell, dependencies will be automatically downloaded for you (unless you told the shell not to download dependent modules). If you manually download and install the module, you may need to download and install the XML::Parser module before XML::RSS can be installed. The examples in this tutorial also need the LWP::Simple package, so you should download and install that one too if you don't already have it.

Basic usage
For our example, we'll assume that you're interested in displaying the latest geek news from Slashdot on your site. The URL for Slashdot's RSS feed is located here. The script in Listing A retrieves this feed, parses it, and turns it into a human-readable HTML page using XML::RSS:

Listing A

#!/usr/bin/perl

# import packages
use XML::RSS;
use LWP::Simple;

# initialize object
$rss = new XML::RSS();

# get RSS data
$raw = get('http://www.slashdot.org/index.rss');

# parse RSS feed
$rss->parse($raw);

# print HTML header and page
print "Content-Type: text/html\n\n";
print ""; print ""; print "";
print "";
print "
" . $rss->channel('title') . "
"; # print titles and URLs of news items foreach my $item (@{$rss->{'items'}}) { $title = $item->{'title'}; $url = $item->{'link'}; print "$title

"; } # print footers print "

";
print "";

Place the script in your Web server's cgi-bin/ directory/. Remember to make it executable, and then browse to it using your Web browser. After a short wait for the RSS file to download, you should see something like Figure A.

Figure A


Slashdot RSS feed

How does the script in Listing A work? Well, the first task is to get the RSS feed from the remote system to the local one. This is accomplished with the LWP::Simple package, which simulates an HTTP client and opens up a network connection to the remote site to retrieve the RSS data. An XML::RSS object is created, and this raw data is then passed to it for processing.

The various elements of the RSS feed are converted into Perl structures, and a foreach() loop is used to iterate over the array of items. Each item contains properties representing the item name, URL and description; these properties are used to dynamically build a readable list of news items. Each time Slashdot updates its RSS feed, the list of items displayed by the script above will change automatically, with no manual intervention required.

The script in Listing A will work with other RSS feeds as well—simply alter the URL passed to the LWP's get() method, and watch as the list of items displayed by the script changes.


Here are some RSS feeds to get you started

Tip: Notice that the RSS channel name (and description) can be obtained with the object's channel() method, which accepts any one of three arguments (title, description or link) and returns the corresponding channel value.


Adding multiple sources and optimising performance
So that takes care of adding a feed to your Web site. But hey, why limit yourself to one when you can have many? Listing B, a revision of the Listing A, sets up an array containing the names of many different RSS feeds, and iterates over the array to produce a page containing multiple channels of information.

Listing B

#!/usr/bin/perl

# import packages
use XML::RSS;
use LWP::Simple;

# initialize object
$rss = new XML::RSS();

# get RSS data
$raw = get('http://www.slashdot.org/index.rss');

# parse RSS feed
$rss->parse($raw);

# print HTML header and page
print "Content-Type: text/html\n\n";
print ""; print ""; print "";
print "";
print "
" . $rss->channel('title') . "
"; # print titles and URLs of news items foreach my $item (@{$rss->{'items'}}) { $title = $item->{'title'}; $url = $item->{'link'}; print "$title

"; } # print footers print "

";
print "";

Figure B shows you what it looks like.

Figure B


Several RSS feeds

You'll notice, if you're sharp-eyed, that Listing B uses the parsefile() method to read a local version of the RSS file, instead of using LWP to retrieve it from the remote site. This revision results in improved performance, because it does away with the need to generate an internal request for the RSS data source every time the script is executed. Fetching the RSS file on each script run not only causes things to go slow (because of the time taken to fetch the RSS file), but it's also inefficient; it's unlikely that the source RSS file will change on a minute-by-minute basis, and by fetching the same data over and over again, you're simply wasting bandwidth. A better solution is to retrieve the RSS data source once, save it to a local file, and use that local file to generate your page.

Depending on how often the source file gets updated, you can write a simple shell script to download a fresh copy of the file on a regular basis.

Here's an example of such a script:

#!/bin/bash
/bin/wget http://www.freshmeat.net/backend/fm.rdf -O freshmeat.rdf

This script uses the wget utility (included with most Linux distributions) to download and save the RSS file to disk. Add this to your system crontab, and set it to run on an hourly or daily basis.

If you find performance unacceptably low even after using local copies of RSS files, you can take things a step further, by generating a static HTML snapshot from the script above, and sending that to clients instead. To do this, comment out the line printing the "Content-Type" header in the script above and then run the script from the console, redirecting the output to an HTML file. Here's how:

$ ./rss.cgi > static.html

Now, simply serve this HTML file to your users. Since the file is a static file and not a script, no server-side processing takes place before the server transmits it to the client. You can run the command-line above from your crontab to regenerate the HTML file on a regular basis. Performance with a static file should be noticeably better than with a Perl script.

Looks easy? What are you waiting for—get out there and start hooking your site up to your favorite RSS news feeds.



pyguru 2005-02-17 03:04 发表评论
]]>
LilinaQRSS聚合器构Z人门?Write once, publish anywhere)http://www.aygfsteel.com/pyguru/archive/2005/02/17/1267.htmlpygurupyguruWed, 16 Feb 2005 19:00:00 GMThttp://www.aygfsteel.com/pyguru/archive/2005/02/17/1267.htmlhttp://www.aygfsteel.com/pyguru/comments/1267.htmlhttp://www.aygfsteel.com/pyguru/archive/2005/02/17/1267.html#Feedback0http://www.aygfsteel.com/pyguru/comments/commentRss/1267.htmlhttp://www.aygfsteel.com/pyguru/services/trackbacks/1267.htmlLilinaQ?b style="color: black; background-color: rgb(160, 255, 255);">RSS聚合器构Z人门?Write once, publish anywhere)

最q搜?b style="color: black; background-color: rgb(160, 255, 255);">RSS解析工具中找CMagPieRSS 和基于其设计?a >LilinaQLilina的主要功能:(x)

1 ZWEB界面?b style="color: black; background-color: rgb(160, 255, 255);">RSS理Q添加,删除QOPML导出Q?b style="color: black; background-color: rgb(160, 255, 255);">RSS后台~存机制Q避免对数据源服务器产生q大压力Q,ScriptLet: cM于Del.icio.us it的收藏夹x订阅JS脚本Q?/p>

2 前台发布Q将自己的首|成了用Lilina发布我常看的几个朋友的网志,也省M很多更新自己|页的工作,需?strong>php 4.3 + mbstring iconv
lilina.png
开源Y件对i18n的支持越来越好了Qphp 4.3.xQ?--enable-mbstring' '--with-iconv'后比较好的同时处理了UTF-8和其他中文字W集发布?b style="color: black; background-color: rgb(160, 255, 255);">RSS?br> 需要感谢Steve在PHPq行转码斚w?a >MagPieRSSq行和XML Hacking工作。至目前ؓ(f)止:(x)Add to my yahooq不能很好的处理utf-8字符集的RSS收藏?/p>

记得q初Wen Xin在CNBlog的研讨会(x)上介l了个h门户的概念,随着RSS在CMS技术中的成熟,来多的服务可以让个h用户Ҏ(gu)自己需求构建门P也算是符合了互联|的非中心化势吧,比如利用Add to My Yahoo!功能Q用户可以轻杄实现自己从更多数据源q行新闻订阅。想象一下把你自qdel.icio.us书签收藏 / flickr囄收藏 / Yahoo!新闻都通过q样一?b style="color: black; background-color: rgb(160, 255, 255);">RSS聚合器聚?发布h。其传播效率有多快?/p>

好比软g开发通过中间q_/虚拟机实玎ͼ(x)一ơ写成,随处q行QWrite once, run anywhereQ,通过RSS/XMLq个中间层,信息发布也实CQ一ơ写成,随处发布QWrite once, publish anywhere...Q?/p>

安装Lilina需要PHP 4.3 以上Qƈ带有iconv mbstring{函数的支持Q请认一?a --with-iconv'

另外是一个需要能通过服务器端向外部服务器发送RPChQ这?1.NET不支持。感?a >PowWeb的服?/a>很不错,很多~省的包都安装好了:(x)

iconv
iconv support enabled
iconv implementation unknown
iconv library version unknown

Directive Local Value Master Value
iconv.input_encoding ISO-8859-1 ISO-8859-1
iconv.internal_encoding ISO-8859-1 ISO-8859-1
iconv.output_encoding ISO-8859-1 ISO-8859-1

mbstring
Multibyte Support enabled
Japanese support enabled
Simplified chinese support enabled
Traditional chinese support enabled
Korean support enabled
Russian support enabled
Multibyte (japanese) regex support enabled

安装包解包Q下载文件扩展名?gz 其实?tgzQ需要重命名一下)Q上传到服务器相应目录下Q注意:(x)相应cache目录和当前目录的可写入属性设|,然后配置一下conf.php中的参数卛_开始用?/p>

何东l我的徏议:(x)
1Q右边的一栏,W一的sources最好跟hobby、友情链接一P加个囄?br> 2Q一堆检索框在那儿,有些乱,只有一个,其它的放C个二U页面上?br> 3Q把联系方式?qing)cc,分别做成一条或一个图片,攑֜双一栏中Q具体的内容可以攑ֈ二面上,因ؓ(f)我觉得好象没有多h?x)细读这些文字?br> 4Q如果可能,把lilina的头部链接汉化一下吧Q?/p>

一些改q计划:(x)
1 删除q长的摘要,可以通过LW??

" 实现Q?br> 2 分组功能Q将RSSq行l输出;

修改默认昄实现QLilina~省昄最q?天发表的文章Q如果需要改成其他时间周期可以找刎ͼ(x)
$TIMERANGE = ( $_REQUEST['hours'] ? $_REQUEST['hours']*3600 : 3600*24 ) ;

q行改动?/p>

RSS是一个能自q所有资源:(x)WIKI / BLOG / 邮g聚合h的轻量协议Q以后无Z在何处书写,只要?b style="color: black; background-color: rgb(160, 255, 255);">RSS接口都可以通过一定方式进行再ơ的汇聚和发布v来,从而大大提高了个h知识理和发?传播效率?/p>

以前?b style="color: black; background-color: rgb(160, 255, 255);">RSS理解非常:(x)不就是一个DTD嘛,真了解v解析器来Q才知道namespace的重要性,一个好的协议也应该是这L(fng)Qƈ非没有什么可加的Q但肯定是没有什么可“减”的了,而真的要做到q个其实很难很难……?/p>

我会(x)再尝试一下JAVA的相兌析器Q将其扩展到WebLucene目中,更多Java相关Open Source RSS解析器资?/a>?/p>

另外扑ֈ?个?b style="color: black; background-color: rgb(255, 255, 102);">Perlq行RSS解析的包Q?br> 使用XML::RSS::Parser::Lite?a >XML::RSS::Parser 解析RSS

XML::RSS::Parser::Lite的代码样例如下:(x)

#!/usr/bin/perl -w
# $Id$
# XML::RSS::Parser::Lite sample

use strict;
use XML::RSS::Parser::Lite;
use LWP::Simple;


my $xml = get("http://www.klogs.org/index.xml");
my $rp = new XML::RSS::Parser::Lite;
$rp->parse($xml);

# print blog header
print "<a href=\"".$rp->get('url')."\">" . $rp->get('title') . " - " . $rp->get('description') . "</a>\n";

# convert item to <li>
print "<ul>";
for (my $i = 0; $i < $rp->count(); $i++) {
my $it = $rp->get($i);
print "<li><a href=\"" . $it->get('url') . "\">" . $it->get('title') . "</a></li>\n";
}
print "</ul>";

安装Q?br> 需要SOAP-Lite

优点Q?br> Ҏ(gu)单,支持q程抓取Q?/p>

~点Q?br> 只支持title, url, descriptionq?个字D,不支持时间字D,

计划用于单的抓取RSS同步服务设计Q每个h都可以出版自p阅的RSS?/p>


XML::RSS::Parser代码样例如下Q?br> #!/usr/bin/perl -w
# $Id$
# XML::RSS::Parser sample with Iconv charset convert

use strict;
use XML::RSS::Parser;
use Text::Iconv;
my $converter = Text::Iconv->new("utf-8", "gbk");


my $p = new XML::RSS::Parser;
my $feed = $p->parsefile('index.xml');

# output some values
my $title = XML::RSS::Parser->ns_qualify('title',$feed->rss_namespace_uri);
# may cause error this line: print $feed->channel->children($title)->value."\n";
print "item count: ".$feed->item_count()."\n\n";
foreach my $i ( $feed->items ) {
map { print $_->name.": ".$converter->convert($_->value)."\n" } $i->children;
print "\n";
}

优点Q?br> 能够直接数据按字段输出Q提供更底层的界面;

~点Q?br> 不能直接解析q程RSSQ需要下载后再解析;

2004-12-14:
从cnblog的Trackback中了解到?a >Planet RSS聚合?/a>

Planet的安装:(x)解包后,直接在目录下q行Qpython planet.py examples/config.ini 可以在output目录中看到缺省样例FEED中的输出了index.htmlQ另外还有opml.xml?b style="color: black; background-color: rgb(160, 255, 255);">rss.xml{输出(q点比较好)

我用几个RSS试了一下,UTF-8的没有问题,但是GBK的全部都q了,planetlib.py中和XML字符集处理的只有以下代码Q看来所有的非UTF-8都被当作iso8859_1处理了:(x)
try:
data = unicode(data, "utf8").encode("utf8")
logging.debug("Encoding: UTF-8")
except UnicodeError:
try:
data = unicode(data, "iso8859_1").encode("utf8")
logging.debug("Encoding: ISO-8859-1")
except UnicodeError:
data = unicode(data, "ascii", "replace").encode("utf8")
logging.warn("Feed wasn't in UTF-8 or ISO-8859-1, replaced " +
"all non-ASCII characters.")

q期学习(fn)一下Python的unicode处理Q感觉是一个很z的语言Q有比较好的try ... catch 机制和logging

关于MagPieRSS性能问题的疑虑:(x)
对于Planet和MagPieRSS性能的主要差异在是缓存机制上Q关于用缓存机制加速WEB服务可以参考:(x)可缓存的cms设计?/p>

可以看到QLilina的缓存机制是每次h的时候遍历缓存目录下?b style="color: black; background-color: rgb(160, 255, 255);">RSS文gQ如果缓存文件过期,q要动态向RSS数据源进行请求。因此不能支持后台太多的RSS订阅和前端大量的q发讉KQ会(x)造成很多的I/O操作Q?/p>

Planet是一个后台脚本,通过脚本订阅的RSS定期汇聚成一个文件输出成静态文件?/p>

其实只要在MagPieRSS前端增加一个wget脚本定期index.php的数据输出成index.htmlQ然后要求每ơ访问先讉Kindex.html~存Q这样不和Planet的每时生成index.html静态缓存一样了吗?/p>

所以在不允许自己配|服务器脚本的虚拟主机来说PlanetҎ(gu)是无法运行的?/p>

更多关于PHP中处理GBK的XML解析问题请参考:(x)
MagPieRSS中UTF-8和GBK?b style="color: black; background-color: rgb(160, 255, 255);">RSS解析分析

2004-12-19
正如在SocialBrain 2005q的讨论?x)中QIsaac Mao所_(d)(x)Blog is a 'Window', also could be a 'Bridge'QBlog是个?l织对外的“窗口”,?b style="color: black; background-color: rgb(160, 255, 255);">RSS更方便你这些窗口组合v来,成ؓ(f)光的“桥梁”,有了q样的中间发布层QBlog不仅从单点发布,更到P2P自助传播Q越来越看到?b style="color: black; background-color: rgb(160, 255, 255);">RSS在网l传播上的重要性?/p>

Posted by chedong at December 11, 2004 12:34 AM Edit
Last Modified at December 19, 2004 04:40 PM

Trackback Pings

TrackBack URL for this entry:
http://www.chedong.com/cgi-bin/mt3/mt-tb.cgi/27

Listed below are links to weblogs that reference LilinaQ?b style="color: black; background-color: rgb(160, 255, 255);">RSS聚合器构Z人门?Write once, publish anywhere):

MagPieRSS中UTF-8和GBK?b style="color: black; background-color: rgb(160, 255, 255);">RSS解析分析Q附Qphp中的面向字符~程详解Q?/a> from 车东BLOG
W一ơ尝试MagpieRSSQ因为没有安装iconv和mbstringQ所以失败了Q今天在服务器上安装了iconv和mtstring的支持,我今天仔l看了一下lilina中的rss_fetch的用法:(x)最重要的是制定RSS的输出格式ؓ(f)'MAGPIE_OU...
[Read More]

Tracked on December 19, 2004 12:37 AM

?lilina ?blogline 来看 blog from Philharmania's Weblog
看到一?a rel="nofollow">介绍 lilina 的文?/a>后就自己安装了一?/a>试了下?a rel="nofollow">lilina 是一个用 PHP ?[Read More]

Tracked on December 26, 2004 01:57 PM

CNBlog作者群RSS征集?/a> from CNBlog: Blog on Blog
在CNBLOG上搭Z
Lilina RSS聚合?/a>Q请各位志愿者将各自|志或者和与cnblog相关专栏?b style="color: black; background-color: rgb(160, 255, 255);">RSS提交l我 ?直接在评Z回复卛_? 推广使用RSS聚合工具主要的目? . [Read More]

Tracked on December 26, 2004 07:42 PM

关于加快 lilina 昄速度的一些设|?/a> from Kreny's Blog
我的 lilina 在设定了几位朋友?blog 和一?news 以后Q发现打开速度异常的慢Q于是请教了车东Q解决了问题? 解决的关键在于:(x)

直接以下语句加入到 index.php 头部卛_QLILINA中你 .
[Read More]

Tracked on January 14, 2005 06:14 PM

MT的模板修改和界面皮肤讄 from 车东BLOG
分类索引Q?首页~省有按月归档的索引Q没有分cȝ录的索引Q看了手册里面也没有具体的参数定义,只好直接看SOURCEQ尝试着把MonthlyҎ(gu)CategoryQ居然成?:-) q到了Movable Style的MT样式站,... [Read More]

Tracked on January 17, 2005 01:25 PM

Comments

请问如果更改默认昄7天的新闻Q谢谢?/p>

Posted by: honren at December 12, 2004 10:20 PM

我用lilina已经一D|间了?br> http://news.yanfeng.org
E微改了一点UI?br> 如果你能改进它,那就好了?/p>

Posted by: mulberry at December 13, 2004 09:24 AM

老R同志Q没觉得你用lilina以来Q主늚讉K速度h吗?攑ּ吧,臛_没必要当作首,lilinaq在技术还不成熟`~

Posted by: kalen at December 16, 2004 10:33 AM

可以考虑一下用drupal

Posted by: shunz at December 28, 2004 06:46 PM

可以试试我做的:(x)http://blog.terac.com

?时抓取blog,然后每个?条最新的Q排序,聚合Q生成静态xmlQ用xsl格式化显C。。?/p>

Posted by: andy at January 6, 2005 12:53 PM

车东同志Q这样做不好QP
rss本来在|上Q你聚合它在你的|页上不仅损害了你自׃늚质量Q而且qh了搜索引擎,造成你痛斥的“门L(fng)站损宛_作热情”的效果。还是不要聚合的好!



pyguru 2005-02-17 03:00 发表评论
]]>
Using RSS News Feeds with Perlhttp://www.aygfsteel.com/pyguru/archive/2005/02/17/1266.htmlpygurupyguruWed, 16 Feb 2005 18:59:00 GMThttp://www.aygfsteel.com/pyguru/archive/2005/02/17/1266.htmlhttp://www.aygfsteel.com/pyguru/comments/1266.htmlhttp://www.aygfsteel.com/pyguru/archive/2005/02/17/1266.html#Feedback0http://www.aygfsteel.com/pyguru/comments/commentRss/1266.htmlhttp://www.aygfsteel.com/pyguru/services/trackbacks/1266.htmlAbstract

The Rich Site Summary (RSS) format, previously known as the RDF Site Summary, has quietly become the dominant format for distributing news headlines on the Web.

In this Mother of Perl tutorial, we will write a short Perl script (less than 100 lines) that retrieves an XML RSS file from the Web or local file system and converts it to HTML. Using a Server Side Include (SSI) or similar method, you can easily add news headlines from any number of sources to your Web site.

History

Where did RSS come from you ask? Netscape invented the RSS format for "channels" on Netscape Netcenter (http://my.netscape.com). It was released to the public in March of 1999. The first non-Netscape Web site to incorporate the new format was Scripting News, a popular technology news site run by Dave Winer, president of Userland Software (think Frontier). Interestingly enough, Scripting News had been using its own XML format, scriptingNews, since December of 1997.

In May of 1999, Dave Winer released a new version of the scriptingNews XML format, which added new content-rich elements. Netscape followed suit by adopting most of the new scriptingNews elements into RSS 0.91, which was released in July of 1999.

Userland Software also rolled out their own flavor of my.netscape.com. If you haven't already guessed, it's available at http://my.userland.com.

As far as I know, RSS is the most widely used XML format on the Web today. RSS headlines are available for many popular news sites like Slashdot, Forbes, and CNET News.com, and the list is growing daily.

In a time when "stickiness" is a good, displaying news headlines on your Web site can really help give it the extra "umph" that will encourage users to return. After all, users can only read your president's bio but so many times.

Required Modules

For rss2html.pl to work on your system, you should have a recent version of Perl installed, 5.003 or better. 5.005 is recommended. You will also need the XML::Parser and XML::RSS modules installed.

To install the modules on a *nix system, type:
perl -MCPAN -e "install XML::Parser"
perl -MCPAN -e "install XML::RSS"

If you're using a win32 machine (Win95/98/NT), you have a recent installation of Activestate Perl. If you don't have a recent version, visit http://www.activestate.com.

To install XML::Parser on a win32 machine type:
ppm install XML-Parser

To install XML::RSS on a win32 machine (you must have a C compiler and nmake):

Next, we'll examine the RSS format in more detail.

rss2html.pl Get the source
This script converts an RSS file on the Web or local file system to HTML.

RSS 0.9

The first public version of RSS, 0.9, includes basic headline information. Below is an example RSS file for Freshmeat.net, a popular news site for Linux software:

<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://my.netscape.com/rdf/simple/0.9/">

<channel>
<title>freshmeat.net</title>
<link>http://freshmeat.net</link>
<description>the one-stop-shop for all your Linux softwar needs</description>
</channel>

<image>
<title>freshmeat.net</title>
<url>http://freshmeat.net/images/fm.mini.jpg</url>
<link>http://freshmeat.net</link>
</image>

<item>
<title>Geheimnis 0.59</title>
<link>http://freshmeat.net/news/1999/06/21/930004162.html</link>
</item>

<item>
<title>Firewall Manager 1.3 PRO</title>
<link>http://freshmeat.net/news/1999/06/21/930004148.html</link>
</item>

<textinput>
<title>quick finder</title>
<description>Use the text input below to search the fresh
meat application database</description>
<name>query</name>
<link>http://core.freshmeat.net/search.php3</link>
</textinput>

</rdf:RDF>

The first major element is channel which contains the following elements:

  • title - the title of the channel
  • link - the link to the channel Web site
  • description - short description of the channel

An RSS channel may also contain an image element as in the example above which contains the following elements:

  • title - the text describing the image
  • url - the URL of the image
  • link - the URL that the image is linked to

The item element contains the real channel content which is comprised of a title and a link element. An RSS file may contain up to 15 items.

An RSS 0.9 file may alternatively contain a textinput element which allows users to type a string into a HTML text input field and submit it via the HTTP GET method to the URL specified in the link element.

Next, we will examine RSS 0.91 which was released by Netscape in July of 1999.

RSS 0.91

The latest version of RSS added a few new elements. Below is a sample RSS file from XML.com, an excellent XML resource site:

<?xml version="1.0"?>

<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">

<channel>
<title>XML News and Features from XML.com</title>
<description>XML.com features a rich mix of information and services for the XML community.</description>
<language>en-us</language>
<link>http://xml.com/pub</link>
<copyright>Copyright 1999, O'Reilly and Associates and Seybold Publications</copyright>
<managingEditor>dale@xml.com (Dale Dougherty)</managingEditor>
<webMaster>peter@xml.com (Peter Wiggin)</webMaster>

<image>
<title>XML News and Features from XML.com</title>
<url>http://xml.com/universal/images/xml_tiny.gif</url>
<link>http://xml.com/pub</link>
<width>88</width>
<height>31</height>
</image>

<item>
<title>Issue: XML Data Servers</title>
<link>http://xml.com/pub?wwwrrr_rss</link>
<description>Although not everyone agrees that XML should become a full-fledged data-management discipline, object-database vendors are busy repositioning their object-database products as XML data servers. Jon Udell looks at one of these, Object Design's eXcelon and finds it a solid product.</description>
</item>

<item>
<title>O'Reilly Labs Review: Object Design's eXcelon 1.1</title>
<link>http://xml.com/pub/1999/08/excelon/index.html?wwwrrr_rss</link>
<description>Jon Udell takes a look at eXcelon, Object Design's XML data servers, and explains its user interface and general approach to XML. </description>
</item>

<item>
<title>Report from Montreal</title>
<link>http://xml.com/pub/1999/08/excelon/montreal.html?wwwrrr_rss</link>
<description>Lisa Rein reports from MetaStructures 99 and XML Developers' Day.</description>
</item>

<item>
<title>Reviews: Bluestone Software's XML Suite: Promising App, Rough Around the Edges</title>
<link>http://xml.com/pub/1999/08/bluestone/index.html?wwwrrr_rss</link>
<description>Our reviewer tested Bluestone's XML Suite (XML Server and Visual XML) on the Windows NT platform, simulating a two-way exchange of business information between a book publisher and book stores. The results were encouraging (with a few caveats).</description>
</item>

<item>
<title>Interviews: CBL: Ecommerce Componentry</title>
<link>http://xml.com/pub/1999/08/glushko/glushko.html?wwwrrr_rss</link>
<description>In this audio interview, Bob Glushko of Commerce One talks about the Common Business Library (CBL) as a set of building blocks for XML document types and schemas used in ecommerce.</description>
</item>

<item>
<title>Backends Sharing Data</title>
<link>http://xml.com/pub/1999/08/rpc/index.html?wwwrrr_rss</link>
<description>What if you could script remote procedure calls between web sites as easily as you can between programs? Edd Dumbill shows how it can be done in PHP.</description>
</item>

<item>
<title>Back Issue: XML Suite</title>
<link>http://xml.com/pub/1999/08/18/index.html?wwwrrr_rss</link>
<description> Barry Nance runs Bluestone's XML Suite through the paces. The tools show promise for passing data between databases and XML. But there are still a few kinks to be worked out.</description>
</item>

<item>
<title>Back Issue: XML-RPC</title>
<link>http://xml.com/pub/1999/08/11/index.html?wwwrrr_rss</link>
<description>A major promise of XML is its ability to pass data simply from one place to another, regardless of platform. In this issue, Edd Dumbill shows how to use XML-RPC in PHP to pass data from a web site to a PDA.</description>
</item>

<item>
<title>News: InDelv XML/XSL Client Version 0.4.</title>
<link>http://xml.com/pub/coverpage/newspage.html#ni1999-08-27-a?wwwrrr_rss</link>
<description> A posting from Rob Brown reports on the public availability of the new InDelv XML Client version 0.4. This version represent an upgrade to InDelv's previously released XML Browser, but "it has been renamed as a 'Client' to reflect the fact that it now contains both an XML/XSL browser and an XML/XSL editor. The browser is available free for all uses. The editor comes packaged with the browser as a demo, which can later be upgraded to a full commercial version. This is a 100% Java appl...
</description>
</item>

<item>
<title>News: OpenJade Development Team Releases OpenJade 1.3pre1 (Beta).</title>
<link>http://xml.com/pub/coverpage/newspage.html#ni1999-08-27-g?wwwrrr_rss</link>
<description> A recent posting from Avi Kivity and the OpenJade Development Team announced the release of OpenJade 1.3pre1 (Beta). "OpenJade is the DSSSL user community's open source implementation of DSSSL, Document Style Semantics and Specification Language, an ISO standard for rendering SGML and XML documents. OpenJade is based on James Clark's widely used Jade. OpenJade 1.3pre1 is a more complete implementation of the DSSSL standard, and introduces many new features, including (1) Implementat...
</description>
</item>

<item>
<title>News: IBM XML Parser Update: XML4C2 Version 2.3.1 Released.</title>
<link>http://xml.com/pub/coverpage/newspage.html#ni1999-08-27-b?wwwrrr_rss</link>
<description> Dean Roddey posted an announcement for the update of XML4C. IBM's XML for C++ parser (XML4C) "is a validating XML parser written in a portable subset of C++. XML4C makes it easy to give an application the ability to read and write XML data. Its two shared libraries provide classes for parsing, generating, manipulating, and validating XML documents. XML4C is faithful to the XML 1.0 Recommendation and associated standards (DOM 1.0, SAX 1.0). Source code, samples and API documentation ...
</description>
</item>

<item>
<title>News: Platform for Privacy Preferences (P3P) Specification Working Draft.</title>
<link>http://xml.com/pub/coverpage/newspage.html#ni1999-08-27-h?wwwrrr_rss</link>
<description> As part of the W3C P3P Activity, a fifth public working draft of the Platform for Privacy Preferences (P3P) Specification has been published for review by W3C members. The working draft "describes the Platform for Privacy Preferences (P3P). P3P enables Web sites to express their privacy practices and enables users to exercise preferences over those practices. P3P compliant products will allow users to be informed of site practices (in both machine and human readable formats), to deleg...
</description>
</item>

<item>
<title>News: Extended XLink with XSLT.</title>
<link>http://xml.com/pub/coverpage/newspage.html#ni1999-08-27-c?wwwrrr_rss</link>
<description> Nikita Ogievetsky (President, Cogitech, Inc.) posted an announcement for the availability of slides from the Metastructures '99 presentation "HTML Form Templates with XML. All in One and One for All. XSLT template library for WEB applications." The paper describes building XSLT template library for web applications. The goal was to "demonstrate data processing on the web made easy with XSL transformations: Generate a data maintenance web with data-structure controlled by XML, scree...
</description>
</item>

<item>
<title>News: HyBrick Web Site Reopens.</title>
<link>http://xml.com/pub/coverpage/newspage.html#ni1999-08-27-d?wwwrrr_rss</link>
<description> A posting from Toshimitsu Suzuki (Fujitsu Laboratories Ltd.) to the XLXP-DEV mailing list recently announced the reopening of the HyBrick Web site. 'HyBrick' is "an advanced SGML/XML browser developed by Fujitsu Laboratories, the research arm of Fujitsu. HyBrick is based on an architecture that supports advanced linking and formatting capabilities. HyBrick includes a DSSSL renderer and XLink/XPointer engine running on top of James Clark's SP and Jade. HyBrick supports: (1) Both v...
</description>
</item>

<item>
<title>News: Extended DocBook Synopses Version 1.0.</title>
<link>http://xml.com/pub/coverpage/newspage.html#ni1999-08-27-e?wwwrrr_rss</link>
<description> Norman Walsh has posted an announcement for a preliminary release of 'Extended DocBook Synopses'. Extended DocBook Synopses is a customization layer that extends DocBook, "adding a function synopsis element, ClassSynopsis for modern, mostly object-oriented, programming languages such as Java, C++, Perl, and IDL." DocBook is an SGML [and XML] DTD maintained by the DocBook Technical Committee of OASIS that particularly well suited to books and papers about computer hardware and softwar...
</description>
</item>

</channel>
</rss>

Notice that there are more descriptive elements for the channel, image, amd items elements. These are referred to as "fat elements" because they contain a more detailed description of each channel item.

The XML::RSS Module

Now that you've had a change to glance at two RSS examples, it's time to introduct the XML::RSS module. XML::RSS is a subclass of XML::Parser, a Perl module maintained by Clark Cooper that utilizes James Clark's Expat C library. XML::RSS was developed to simplify the task of manipulating and parsing RSS files. A deep understanding of XML is not a prerequisite for using XML::RSS since the XML details are hidden inside the class interface.

While XML::RSS is capable of creating RSS files, we will be focusing on parsing existing RSS files in this column. You can read more about the capabilities of XML::Parser in the module's documentation or by typing:
perldoc XML::RSS

The Code

Well, let's look at the code shall we? Lines 16-17 load the XML::RSS and LWP::Simple modules. We've already talked about XML::RSS in brief, but what does LWP::Simple do? Good question! The answer is simple (puns intended). It's a procedural interface for interacting with a Web server. It's also the little cousin of LWP::UserAgent, a fuller object oriented interface. We'll be using one of the library's subroutines later in the code to fetch an RSS file from the Web.

In lines 20-21 we initialize two variables that we're going to use later.

Line 25 starts the main code body. The first thing we do is verify that the user typed exactly one command-line parameter. This parameter is then assigned to the $arg variable in line 28.

Next we create a new instance of the XML::RSS class and assign the reference to the $rss variable on line 31.

Now we must determine whether the command-line parameter the user entered is an HTTP URL or a file on the local file system (lines 34-46). On line 34, we us a regular expression to look for the characters http:.

If the command-line argument starts with these characters, we can safely assume that the user intends to retrieve an RSS file from a Web server. On line 35 we pass the argument to the get() function, which is a part of LWP::Simple, and assign the results to the $content variable. On line 36 we call die() if $content is empty. If this happens, it means there was an error retrieving the RSS file. If the RSS file was downloaded successfully, $rss->parse($content) is called which parses the RSS file and stores the results in the object's internal structure (line 38).

If the command-line argument does not contain the http: characters, we assume the argument is a file instead of a URL on lines 41-46. The first thing we do is assign the value of $arg to the $file variable and test for the existence of the file (lines 42-43).

Then we call $rss->parsefile($file) (line 45), which parses the RSS file and stores the results in the object's internal structure. The parsefile() method parses a file, whereas the parse() method parses the string that's passed to it.

Lastly, we call the print_html subroutine on line 49, which converts the RSS object in nicely formatted HTML.

print_html

As you examine this subroutine, you will begin to understand the internal structure of the XML::RSS object. The critical portion of the subroutine is contained on lines 76-79. In this foreach loop, we iterate over each of the RSS items.

Next, let's take a look at rss2html.pl in action.

rss2html.pl in Action

I've added the following cron jobs that run once per hour on the Webreference server (Scheduler is the NT counterpart):

rss2html.pl http://slashdot.org/slashdot.rdf > slashdot.html
rss2html.pl http://freshmeat.net/backend/fm.rdf > freshmeat.html
rss2html.pl http://www.linuxtoday.com/backend/my-netscape.rdf > linuxtoday.html
rss2html.pl http://www.xml.com/xml/news.rdf > xmlnews.html
rss2html.pl http://www.perlxml.com/rdf/moperl.rdf > mop.html

The commands above fetch the RSS files off the Web and convert them to HTML. Using Server-Side Includes (SSI), I've included the results below:

Slashdot:

Slashdot:

  • WiMax Technology Could Blanket the US?

  • Microsoft Anti-Spyware to Be Free of Charge
  • ACM to Honor TCP/IP Creators with Turing Award
  • New Rules Proposed on Electronic Evidence
  • Intel From Behind the Curtain
  • Kyoto Protocol Comes Into Force
  • Cory Doctorow's 'I, Robot' Posted
  • Straczynski Offers To Re-Boot Star Trek
  • Building The MareNostrum COTS Supercomputer
    Search Slashdot stories

  • freshmeat.net announcements (Global)
  • Zolera SOAP Infrastructure 1.7 (Default branch)
  • XBible 3.0 (Default branch)
  • PDFdirectory 0.2.04 (Default branch)
  • XC-AST 0.7.0 (Default branch)
  • Imagero Reader 1.73 (Default branch)
  • GNU ccAudio2 0.4.0 (Testing branch)
  • quisp 1.27 (Default branch)
  • shsql 1.27 (Default branch)
  • samhain 2.0.4 (Default branch)
  • CANDIDv2 2.40 (Default branch)
  • ADV: Dialing for Dollars
  • libferris 1.1.46 (Default branch)
  • FUDforum 2.6.10 (Stable branch)
  • HORRORss 1.0 (Default branch)
  • Roxen WebServer 4.0.325-release 4 (Default branch)
  • Configuration File Library 1.0 (Default branch)
  • Goggles 0.7.11 (Default branch)
  • Pluto DCE library 2.0.0.9 (Default branch)
  • Pluto Bi-Directional Comm library 2.0.0.9 (Default branch)
  • zen Platform 2.0.4 (Default branch)
  • ADV: Gimme Shelter
  • MIME Email message class 2005.02.15 (Default branch)
  • ELF statifier 1.6.3 (Default branch)
  • SekHost 1.2 (Default branch)
  • ulogd 1.21 (Default branch)
  • Journaled Files LIBrary 0.1.0-0.0.0 (Default branch)
  • FastTemplate.php3 1.2.0 (Default branch)
  • iptables 1.3.0 (Default branch)
  • Very Simple Control Protocol Daemon 0.1.4 (Default branch)
  • C Parameters 0.9.0 (Default branch)
  • ADV: Dialing for Dollars
  • eXtreme Project Management Tool 0.7beta1 (Development branch)
  • gccc 1.099 (Default branch)
  • Magellan Metasearch 1.00-RC3 (Default branch)
  • CAN Abstraction Layer 0.1.4 (Default branch)
  • TreeLine 0.11.1 (Default branch)
  • GNOME Sensors Applet 0.6.1 (Default branch)
  • iODBC Driver Manager and SDK 3.52.2 (Default branch)
  • DISLIN 8.3 (Default branch)
  • Pluto Home 2.0.0.9 (Default branch)
  • ADV: Dialing for Dollars
  • Expense Report Software 1.07 (Default branch)
  • Yzis M3 (Default branch)
  • Q Light Controller 2.4.1 (Default branch)
  • Menc 0.3 (Default branch)
  • Another File Integrity Checker 2.7-0 (Default branch)
  • BibShelf 1.4.0-1 (Default branch)
  • Eleven 1.0 (Default branch)
  • Linice 2.5 (Default branch)
  • JDirt 1.3 (Default branch)
  • ADV: Dialing for Dollars
  • Nazghul 0.4.0 (Default branch)
  • Rush 2005 0.4.10 (Default branch)
  • Monesa 0.24.1 (Stable branch)
  • Persist.NET 0.9.1 beta (Default branch)
  • Roundup 0.8 (Default branch)
  • Aquarium Web Application Framework 2.0 (Default branch)
  • sn9c102 Video Grabber 1.7.0 (Default branch)
  • GRAVEMAN 0.3.8 (Default branch)
  • viewurpmi 0.2 (Default branch)
  • ADV: Dialing for Dollars
  • NuFW 1.0-rc1 (Stable branch)
  • OpenSceneGraph Editor 0.6.0 (Default branch)
  • HPGS - HPGl Script 0.6.0 (Default branch)
  • lustre 1.4.1-rc1 (Default branch)
  • IBM HeapAnalyzer 1.3.3 (Default branch)
  • CANDIDv2 2.3.6 (Default branch)
  • NetSPoC 2.5 (Default branch)
  • Metal Mech 0.0.3 (Default branch)
  • radmind 1.5.0 (Default branch)
  • ADV: Dialing for Dollars
  • iPodBackup 1.4 (Default branch)
  • db4o 4.3 (Mono branch)
  • web2ldap 0.15.9 (Default branch)
  • Mantissa 5.6 (Default branch)
  • Drone IRC Bot 1.2 (Default branch)
  • NoFuss POS 0.06 (Default branch)
  • xlog 1.1 (Stable branch)
  • ActiveBPEL 1.0.7 (Default branch)
  • Java Embedded Python 1.1 (Default branch)
  • ADV: Dialing for Dollars
  • Neveredit 0.8 (Default branch)
  • The friendly interactive shell 1.1 (Default branch)
  • Webmatic 2.0.3 (Default branch)
  • JTMOS Operating System Build 7700 (Default branch)
  • BIRD 1.0.10 (Default branch)
  • Tune in 2 Me 050215 (Default branch)
  • HMSCalc 3.0 (Default branch)
  • Information Currency Web Services 0.0.4 (Default branch)
  • Nitro + Og 0.10.0 (Default branch)
  • ADV: Dialing for Dollars
  • Just For Fun Network Management System 0.8.0 (Stable branch)
  • rxvt-unicode 5.1 (Default branch)
  • PHPEmaillist 0.3 (Default branch)
  • ulogd-php 1.0 (Default branch)
  • mod_access_rbl2 1.0 (Default branch)
  • 5lack10.1 0.8 (Default branch)
  • profusemail 0.9.1 (Default branch)
  • Linux Today

    Linux Today

  • LWN.net: FSF Announces New Executive Director
  • LinuxPlanet: Novell Takes Enterprise Security Focus
  • CNET News: HP: Don't Like Software Patents? Learn to Deal
  • internetnews.com: CA Chief: Innovate, Cooperate
  • Boston Herald: Linux Show Plans BCEC Move
    Search Linux Today:

  • XML.com

    XML.com

  • Features: Very Dynamic Web Interfaces
  • Features: Comparing CSS and XSL: A Reply from Norm Walsh
  • Features: Top 10 XForms Engines
  • Features: An Introduction to TMAPI
  • XML Tourist: The Silent Soundtrack
  • Transforming XML: The XPath 2.0 Data Model
  • Features: SIMILE: Practical Metadata for the Semantic Web
  • Features: Hacking Open Office
  • Features: Formal Taxonomies for the U.S. Government
  • Features: Reviewing the Architecture of the World Wide Web
  • Features: Printing XML: Why CSS Is Better than XSL
  • Python and XML: Introducing the Amara XML Toolkit
  • Features: Introducing Comega
  • Features: SAML 2: The Building Blocks of Federated Identity
  • The Restful Web: Amazon's Simple Queue Service

    Copyright 2004, O'Reilly Media, Inc.

  • Conclusion

    Well, we've shown in this column that Perl can really pack a wallop in a short amount of code. With rss2html.pl, anyone can automatically add a news feed to their Web site.

    For more information on RSS, you might try visiting the following sites:

    rss2html.pl Get the source
    This script converts an RSS file on the Web or local file system to HTML.



    pyguru 2005-02-17 02:59 发表评论
    ]]>
    The Python Web services developer: RSS for Pythonhttp://www.aygfsteel.com/pyguru/archive/2005/02/17/1265.htmlpygurupyguruWed, 16 Feb 2005 18:48:00 GMThttp://www.aygfsteel.com/pyguru/archive/2005/02/17/1265.htmlhttp://www.aygfsteel.com/pyguru/comments/1265.htmlhttp://www.aygfsteel.com/pyguru/archive/2005/02/17/1265.html#Feedback0http://www.aygfsteel.com/pyguru/comments/commentRss/1265.htmlhttp://www.aygfsteel.com/pyguru/services/trackbacks/1265.htmlContent syndication for the Web

    Level: Introductory


    Mike Olson (mike.olson@fourthought.com), Principal Consultant, Fourthought, Inc.
    Uche Ogbuji (uche.ogbuji@fourthought.com), Principal Consultant, Fourthought, Inc.

    13 Nov 2002

    Column iconRSS is one of the most successful XML services ever. Despite its chaotic roots, it has become the community standard for exchanging content information across Web sites. Python is an excellent tool for RSS processing, and Mike Olson and Uche Ogbuji introduce a couple of modules available for this purpose.

    RSS is an abbreviation with several expansions: "RDF Site Summary," "Really Simple Syndication," "Rich Site Summary," and perhaps others. Behind this confusion of names is an astonishing amount of politics for such a mundane technological area. RSS is a simple XML format for distributing summaries of content on Web sites. It can be used to share all sorts of information including, but not limited to, news flashes, Web site updates, event calendars, software updates, featured content collections, and items on Web-based auctions.

    RSS was created by Netscape in 1999 to allow content to be gathered from many sources into the Netcenter portal (which is now defunct). The UserLand community of Web enthusiasts became early supporters of RSS, and it soon became a very popular format. The popularity led to strains over how to improve RSS to make it even more broadly useful. This strain led to a fork in RSS development. One group chose an approach based on RDF, in order to take advantage of the great number of RDF tools and modules, and another chose a more stripped-down approach. The former is called RSS 1.0, and the latter RSS 0.91. Just last month the battle flared up again with a new version of the non-RDF variant of RSS, which its creators are calling "RSS 2.0."

    RSS 0.91 and 1.0 are very popular, and used in numerous portals and Web logs. In fact, the blogging community is a great user of RSS, and RSS lies behind some of the most impressive networks of XML exchange in existence. These networks have grown organically, and are really the most successful networks of XML services in existence. RSS is a XML service by virtue of being an exchange of XML information over an Internet protocol (the vast majority of RSS exchange is simple HTTP GET of RSS documents). In this article, we introduce just a few of the many Python tools available for working with RSS. We don't provide a technical introduction to RSS, because you can find this in so many other articles (see Resources). We recommend first that you gain a basic familiarity with RSS, and that you understand XML. Understanding RDF is not required.

    [We consider RSS an 'XML service' rather than a 'Web service' due to the use of XML descriptions but the lack of use of WSDL. -- Editors]

    RSS.py
    Mark Nottingham's RSS.py is a Python library for RSS processing. It is very complete and well-written. It requires Python 2.2 and PyXML 0.7.1. Installation is easy; just download the Python file from Mark's home page and copy it to somewhere in your PYTHONPATH.

    Most users of RSS.py need only concern themselves with two classes it provides: CollectionChannel and TrackingChannel. The latter seems the more useful of the two. TrackingChannel is a data structure that contains all the RSS data indexed by the key of each item. CollectionChannel is a similar data structure, but organized more as RSS documents themselves are, with the top-level channel information pointing to the item details using hash values for the URLs. You will probably use the utility namespace declarations in the RSS.ns structure. Listing 1 is a simple script that downloads and parses an RSS feed for Python news, and prints out all the information from the various items in a simple listing.



    from RSS import ns, CollectionChannel, TrackingChannel

    #Create a tracking channel, which is a data structure that
    #Indexes RSS data by item URL
    tc = TrackingChannel()

    #Returns the RSSParser instance used, which can usually be ignored
    tc.parse("http://www.python.org/channews.rdf")

    RSS10_TITLE = (ns.rss10, 'title')
    RSS10_DESC = (ns.rss10, 'description')

    #You can also use tc.keys()
    items = tc.listItems()
    for item in items:
    #Each item is a (url, order_index) tuple
    url = item[0]
    print "RSS Item:", url
    #Get all the data for the item as a Python dictionary
    item_data = tc.getItem(item)
    print "Title:", item_data.get(RSS10_TITLE, "(none)")
    print "Description:", item_data.get(RSS10_DESC, "(none)")



    We start by creating a TrackingChannel instance, and then populate it with data parsed from the RSS feed at http://www.python.org/channews.rdf. RSS.py uses tuples as the property names for RSS data. This may seem an unusual approach to those not used to XML processing techniques, but it is actually a very useful way of being very precise about what was in the original RSS file. In effect, an RSS 0.91 title element is not considered to be equivalent to an RSS 1.0 one. There is enough data for the application to ignore this distinction, if it likes, by ignoring the namespace portion of each tuple; but the basic API is wedded to the syntax of the original RSS file, so that this information is not lost. In the code, we use this property data to gather all the items from the news feed for display. Notice that we are careful not to assume which properties any particular item might have. We retrieve properties using the safe form as seen in the code below.



    print "Title:", item_data.get(RSS10_TITLE, "(none)")

    Which provides a default value if the property is not found, rather than this example.



    print "Title:", item_data[RSS10_TITLE]

    This precaution is necessary because you never know what elements are used in an RSS feed. Listing 2shows the output from Listing 1.



    $ python listing1.py
    RSS Item: http://www.python.org/2.2.2/
    Title: Python 2.2.2b1
    Description: (none)
    RSS Item: http://sf.net/projects/spambayes/
    Title: spambayes project
    Description: (none)
    RSS Item: http://www.mems-exchange.org/software/scgi/
    Title: scgi 0.5
    Description: (none)
    RSS Item: http://roundup.sourceforge.net/
    Title: Roundup 0.4.4
    Description: (none)
    RSS Item: http://www.pygame.org/
    Title: Pygame 1.5.3
    Description: (none)
    RSS Item: http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/
    Title: Pyrex 0.4.4.1
    Description: (none)
    RSS Item: http://www.tundraware.com/Software/hb/
    Title: hb 1.88
    Description: (none)
    RSS Item: http://www.tundraware.com/Software/abck/
    Title: abck 2.2
    Description: (none)
    RSS Item: http://www.terra.es/personal7/inigoserna/lfm/
    Title: lfm 0.9
    Description: (none)
    RSS Item: http://www.tundraware.com/Software/waccess/
    Title: waccess 2.0
    Description: (none)
    RSS Item: http://www.krause-software.de/jinsitu/
    Title: JinSitu 0.3
    Description: (none)
    RSS Item: http://www.alobbs.com/pykyra/
    Title: PyKyra 0.1.0
    Description: (none)
    RSS Item: http://www.havenrock.com/developer/treewidgets/index.html
    Title: TreeWidgets 1.0a1
    Description: (none)
    RSS Item: http://civil.sf.net/
    Title: Civil 0.80
    Description: (none)
    RSS Item: http://www.stackless.com/
    Title: Stackless Python Beta
    Description: (none)

    Of course, you would expect somewhat different output because the news items will have changed by the time you try it. The RSS.py channel objects also provide methods for adding and modifying RSS information. You can write the result back to RSS 1.0 format using the output() method. Try this out by writing back out the information parsed in Listing 1. Kick off the script in interactive mode by running: python -i listing1.py . At the resuting Python prompt, run the following example.



    >>> result = tc.output(items)
    >>> print result

    The result is an RSS 1.0 document printed out. You must have RSS.py, version 0.42 or more recent for this to work. There is a bug in the output() method in earlier versions.

    rssparser.py
    Mark Pilgrim offers another module for RSS file parsing. It doesn't provide all the features and options that RSS.py does, but it does offer a very liberal parser, which deals well with all the confusing diversity in the world of RSS. To quote from the rssparser.py page:

    You see, most RSS feeds suck. Invalid characters, unescaped ampersands (Blogger feeds), invalid entities (Radio feeds), unescaped and invalid HTML (The Register's feed most days). Or just a bastardized mix of RSS 0.9x elements with RSS 1.0 elements (Movable Type feeds).
    Then there are feeds, like Aaron's feed, which are too bleeding edge. He puts an excerpt in the description element but puts the full text in the content:encoded element (as CDATA). This is valid RSS 1.0, but nobody actually uses it (except Aaron), few news aggregators support it, and many parsers choke on it. Other parsers are confused by the new elements (guid) in RSS 0.94 (see Dave Winer's feed for an example). And then there's Jon Udell's feed, with the fullitem element that he just sort of made up.

    It's funny to consider this in the light of the fact that XML and Web services are supposed to increase interoperability. Anyway, rssparser.py is designed to deal with all the madness.

    Installing rssparser.py is also very easy. You download the Python file (see Resources), rename it from "rssparser.py.txt" to "rssparser.py", and copy it to your PYTHONPATH. I also suggest getting the optional timeoutsocket module which improves the timeout behavior of socket operations in Python, and thus can help getting RSS feeds less likely to stall the application thread in case of error.

    Listing 3 is a script that is the equivalent of Listing 1, but using rssparser.py, rather than RSS.py.



    import rssparser
    #Parse the data, returns a tuple: (data for channels, data for items)
    channel, items = rssparser.parse("http://www.python.org/channews.rdf")

    for item in items:
    #Each item is a dictionary mapping properties to values
    print "RSS Item:", item.get('link', "(none)")
    print "Title:", item.get('title', "(none)")
    print "Description:", item.get('description', "(none)")



    As you can see, the code is much simpler. The trade-off between RSS.py and rssparser.py is largely that the former has more features, and maintains more syntactic information from the RSS feed. The latter is simpler, and a more forgiving parser (the RSS.py parser only accepts well-formed XML).

    The output should be the same as in Listing 2.

    Conclusion
    There are many Python tools for RSS, and we don't have space to cover them all. Aaron Swartz's page of RSS tools is a good place to start looking if you want to explore other modules out there. RSS is easy to work with in Python, because of all the great modules available for it. The modules hide all the chaos brought about by the history and popularity of RSS. If your XML services needs mostly involve the exchange of descriptive information for Web sites, we highly recommend using the most successful XML service technology in employment.

    Next month, we will explain how to use e-mail packages for Python for writing Web services over SMTP.

    Resources

    About the authors
    Photo of Mike Olson Mike Olson is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management applications. Fourthought develops 4Suite, an open source platform for XML middleware. You can contact Mr. Olson at mike.olson@fourthought.com.


    Photo of Uche Ogbuji Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management applications. Fourthought develops 4Suite, an open source platform for XML middleware. Mr. Ogbuji is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche.ogbuji@fourthought.com.



    pyguru 2005-02-17 02:48 发表评论
    ]]>
    "universal" RSS feed parserhttp://www.aygfsteel.com/pyguru/archive/2005/02/17/1264.htmlpygurupyguruWed, 16 Feb 2005 18:40:00 GMThttp://www.aygfsteel.com/pyguru/archive/2005/02/17/1264.htmlhttp://www.aygfsteel.com/pyguru/comments/1264.htmlhttp://www.aygfsteel.com/pyguru/archive/2005/02/17/1264.html#Feedback0http://www.aygfsteel.com/pyguru/comments/commentRss/1264.htmlhttp://www.aygfsteel.com/pyguru/services/trackbacks/1264.htmlFeed Parser

    This is a "universal" feed parser, suitable for reading syndicated feeds as produced by weblogs, news sites, wikis, and many other types of sites. It handles Atom feeds, CDF, and the nine different versions of RSS.

    This project is now hosted at SourceForge. Please check there for updates. This page contains old news and is no longer updated. (2004-06-21)



    pyguru 2005-02-17 02:40 发表评论
    ]]>
    How to replace a string in multiple files?http://www.aygfsteel.com/pyguru/archive/2005/02/16/1243.htmlpygurupyguruTue, 15 Feb 2005 18:18:00 GMThttp://www.aygfsteel.com/pyguru/archive/2005/02/16/1243.htmlhttp://www.aygfsteel.com/pyguru/comments/1243.htmlhttp://www.aygfsteel.com/pyguru/archive/2005/02/16/1243.html#Feedback0http://www.aygfsteel.com/pyguru/comments/commentRss/1243.htmlhttp://www.aygfsteel.com/pyguru/services/trackbacks/1243.html

    pyguru 2005-02-16 02:18 发表评论
    ]]>
    CVS的常用命令速查手册http://www.aygfsteel.com/pyguru/archive/2005/02/15/1236.htmlpygurupyguruTue, 15 Feb 2005 15:39:00 GMThttp://www.aygfsteel.com/pyguru/archive/2005/02/15/1236.htmlhttp://www.aygfsteel.com/pyguru/comments/1236.htmlhttp://www.aygfsteel.com/pyguru/archive/2005/02/15/1236.html#Feedback0http://www.aygfsteel.com/pyguru/comments/commentRss/1236.htmlhttp://www.aygfsteel.com/pyguru/services/trackbacks/1236.htmlCVS的常用命令速查手册
    蓝森?http://www.lslnet.com 2002q???11:08

    ?者:(x) 车东

    chedong@bigfoot.com 

    最后更斎ͼ(x)2002-08-30 13:18:41

    版权声明Q可以Q意{载,转蝲时请务必标明原始出处和作者信?br>

    概述Q?b style="color: black; background-color: rgb(255, 255, 102);">CVS是一个C/SpȝQ多个开发h员通过一个中心版本控制系l来记录文g版本Q从而达C证文件同步的目的?

           CVS服务器(文g版本库)
         /     |       \
    Q版 ??步)
       /       |         \  
    开发?  开发?   开发?

    以下是本文主要内容:(x)开发h员可以主要挑?, 6看就可以了,CVS的管理员则更需要懂的更多一?

    1. CVS环境初始?/a>Q?b style="color: black; background-color: rgb(255, 255, 102);">CVS? 境的搭徏                                           理?/li>
    2. CVS的日怋?/a>Q日常开发中最常用?b style="color: black; background-color: rgb(255, 255, 102);">CVS命o(h)Q?nbsp;             开发h?nbsp;     理?/li>
    3. CVS的分支开?/a>Q? 目按照不同q度和目标ƈ发进?nbsp;                         理?/li>
    4. CVS的用戯?/a>Q? 通过SSH的远E用戯证,安全Q简?nbsp;                      理?/li>
    5. CVSWEBQ?b style="color: black; background-color: rgb(255, 255, 102);">CVS的WEB 讉K界面大大提高代码版本比较的效?nbsp;                    理?/li>
    6. CVS TAGQ将$Id$加入代码注释中,方便开发过E的跟踪       开发h?/li>
    7. CVS vs VSS: CVS和Virsual SourceSafe的比?/li>

    一个系l?0%的功能往往能够满80%的需求,CVS也不例外Q以下是CVS最常用的功能,可能用到的还不到它全部命令选项?0%Q更多的功能请在实际应用q程中体?x),学?fn)q程中应该是用多,学多,用到了再学也不迟?


    CVS环境初始?br> ============

    环境讄Q指?b style="color: black; background-color: rgb(255, 255, 102);">CVS库的路径CVSROOT
    tcsh
    setenv CVSROOT /path/to/cvsroot
    bash
    CVSROOT=/path/to/cvsroot ; export CVSROOT

    后面q提到远E?b style="color: black; background-color: rgb(255, 255, 102);">CVS服务器的讄Q?br> CVSROOT=:ext:$USER@test.server.address#port:/path/to/cvsroot CVS_RSH=ssh; export CVSROOT CVS_RSH

    初始化:(x)CVS版本库的初始化?br> cvs init

    一个项目的首次导入
    cvs import -m "write some comments here" project_name vendor_tag release_tag
    执行后:(x)?x)将所有源文g?qing)目录导入?path/to/cvsroot/project_name目录?br> vender_tag: 开发商标记
    release_tag: 版本发布标记

    目导出Q将代码?b style="color: black; background-color: rgb(255, 255, 102);">CVS库里导出
    cvs checkout project_name
    cvs 创建project_name目录Qƈ最新版本的源代码导出到相应目录中。这个checkout和Virvual SourceSafe中的check out不是一个概念,相对于Virvual SourceSafe的check out?b style="color: black; background-color: rgb(255, 255, 102);">cvs updateQ?check in?b style="color: black; background-color: rgb(255, 255, 102);">cvs commit?/i>

    CVS的日怋?/b>  
    =============

    注意Q第一ơ导Z后,׃是通过cvs checkout来同步文件了Q而是要进入刚?b style="color: black; background-color: rgb(255, 255, 102);">cvs checkout project_name导出的project_name目录下进行具体文件的版本同步Q添加,修改Q删除)操作?/b>

    文件同步到最新的版本Q?br> cvs update
    不制定文件名Q?b style="color: black; background-color: rgb(255, 255, 102);">cvs
    同步所有子目录下的文gQ也可以制定某个文g?目录q行同步
    cvs update file_name
    最好每天开始工作前或将自己的工作导入到CVS库里前都要做一ơ,q养成“先同步 后修改”的?fn)惯Q和Virvual SourceSafe不同Q?b style="color: black; background-color: rgb(255, 255, 102);">CVS里没有文仉定的概念Q所有的冲突是在commit之前解决Q如果你修改q程中,有其他h修改qcommitCCVS库中Q?b style="color: black; background-color: rgb(255, 255, 102);">CVS?x)通知你文件冲H,q自动将冲突部分?br> >>>>>>
    content on cvs server
    <<<<<<
    content in your file
    >>>>>>
    标记出来Q由你确认冲H内容的取舍?br> 版本冲突一般是在多个h修改一个文仉成的,但这U项目管理上的问题不应该指望?b style="color: black; background-color: rgb(255, 255, 102);">CVS来解冟?/i>

    认修改写入?b style="color: black; background-color: rgb(255, 255, 102);">CVS库里Q?br> cvs commit -m "write some comments here" file_name

    注意Q?b style="color: black; background-color: rgb(255, 255, 102);">CVS的很多动作都是通过cvs commitq行最后确认ƈ修改的,最好每ơ只修改一个文件。在认的前Q还需要用户填写修Ҏ(gu)释,以帮助其他开发h员了解修改的原因。如果不用写-m "comments"而直接确认`cvs commit file_name` 的话Q?b style="color: black; background-color: rgb(255, 255, 102);">cvs?x)自动调用系l缺省的文字~辑?一般是vi)要求你写入注释?br> 注释的质量很重要Q所以不仅必要写,而且必须写一些比较有意义的内容:(x)以方便其他开发h员能够很好的理解
    不好的注释,很难让其他的开发h员快速的理解Q比如:(x) -m "bug fixed" 甚至 -m ""
    好的注释Q甚臛_以用中文: -m "在用h册过E中加入了Email地址校验"


    修改某个版本注释Q每ơ只认一个文件到CVS库里是一个很好的?fn)惯Q但隑օ有时候忘了指定文件名Q把多个文g以同h释commit?b style="color: black; background-color: rgb(255, 255, 102);">CVS库里了,以下命o(h)可以允许你修Ҏ(gu)个文件某个版本的注释Q?br> cvs admin -m 1.3:"write some comments here" file_name

    d文g
    创徏好新文g后,比如Qtouch new_file
    cvs add new_file
    注意Q对于图片,W(xu)ord文档{非U文本的目Q需要?b style="color: black; background-color: rgb(255, 255, 102);">cvs add -b选项Q否则有可能出现文g被破坏的情况
    比如Q?b style="color: black; background-color: rgb(255, 255, 102);">cvs add -kb new_file.gif
    然后认修改q注?
    cvs ci -m "write some comments here"

    删除文gQ?br> 某个源文g物理删除后,比如Qrm file_name
    cvs rm file_name
    然后认修改q注?br> cvs ci -m "write some comments here"
    以上面前2步合q的Ҏ(gu)为:(x)
    cvs rm -f file_name
    cvs ci -m "why delete file"

    注意Q很?b style="color: black; background-color: rgb(255, 255, 102);">cvs命o(h)都有~写形式Qcommit=>ci; update=>up; checkout=>co; remove=>rm;


    d目录Q?br> cvs add dir_name

    查看修改历史Q?b style="color: black; background-color: rgb(255, 255, 102);">cvs log file_name
    cvs history file_name

    查看当前文g不同版本的区?br> cvs diff -r1.3 -r1.5 file_name
    查看当前文gQ可能已l修改了Q和库中相应文g的区?br> cvs diff file_name
    cvs的web界面提供了更方便的定位文件修改和比较版本区别的方法,具体安装讄L(fng)后面的cvsweb使用

    正确的通过CVS恢复旧版本的Ҏ(gu)Q?br> 如果?b style="color: black; background-color: rgb(255, 255, 102);">cvs update -r1.2 file.name
    q个命o(h)是给file.name加一个STICK TAGQ?"1.2" Q虽然你的本意只是想它恢复?.2版本
    正确的恢复版本的Ҏ(gu)是:(x)cvs update -p -r1.2 file_name >file_name
    如果不小心已l加成STICK TAG的话Q用cvs update -A 解决

    Ud文gQ文仉命名
    cvs里没?b style="color: black; background-color: rgb(255, 255, 102);">cvs move?b style="color: black; background-color: rgb(255, 255, 102);">cvs renameQ因两个操作是先cvs remove old_file_nameQ然?b style="color: black; background-color: rgb(255, 255, 102);">cvs add new_file_name实现的?/p>

    删除Q移动目录:(x)
    最方便的方法是让管理员直接UdQ删除CVSROOT里相应目录(因ؓ(f)CVS一个项目下的子目录都是独立的,Ud?CVSROOT目录下都可以作ؓ(f)新的独立目Q好比一颗树(wi)Q其实砍下Q意一枝都能独立存?gu)z)Q对目录q行了修改后Q要求其开发h员重新导出项?b style="color: black; background-color: rgb(255, 255, 102);">cvs checkout project_name 或者用cvs update -dP同步?/p>

    CVS BranchQ项目多分支同步开?br> =============================

    认版本里程:(x)多个文g各自版本号不一P目C定阶D,可以l所有文件统一指定一个阶D里E碑版本P方便以后按照q个阶段里程版本号导出目Q同时也是项目的多个分支开发的基础?br> cvs tag release_1_0

    开始一个新的里E碑Q?br> cvs commit -r 2 标记所有文件开始进?.x的开?/p>

    注意Q?b style="color: black; background-color: rgb(255, 255, 102);">CVS里的revsion和Y件包的发布版本可以没有直接的关系。但所有文件用和发布版本一致的版本h较有助于l护?/i>

    在开发项目的2.x版本的时候发?.x有问题,?.x又不敢用Q则从先前标记的里程:(x)release_1_0导出一个分支release_1_0_patch
    cvs rtag -b -r release_1_0 release_1_0_patch proj_dir

    一些h先在另外一个目录下导出release_1_0_patchq个分支Q解?.0中的紧急问题,
    cvs checkout -r release_1_0_patch
    而其他h员仍旧在目的主q分?.x上开?/p>

    在release_1_0_patch上修正错误后Q标C?.0的错误修正版本号
    cvs tag release_1_0_patch_1

    如果2.0认ؓ(f)q些错误修改?.0里也需要,也可以在2.0的开发目录下合ƈrelease_1_0_patch_1中的修改到当前代码中Q?br> cvs update -j release_1_0_patch_1

    CVS的远E认证:(x)通过SSHq程讉KCVS
    ================================

    使用cvs本n的远E认证很ȝ,需要定义服务器和用L(fng)Q用户名Q设|密码等Q而且不安全,因此和系l本地帐可证ƈ通过SSH传输是比较好的办法,通过在客h?etc/profile里设|一下内容:(x)
    CVSROOT=:ext:$USER@test.server.address#port:/path/to/cvsroot CVS_RSH=ssh; export CVSROOT CVS_RSH
    所有客h所有本地用户都可以映射?b style="color: black; background-color: rgb(255, 255, 102);">CVS服务器相应同名帐号了?br>
    如果CVS所在服务器的SSH端口不在~省?2Q或者和客户端与CVS服务器端SSH~省端口不一_(d)有时候设|了Q?br> :ext:$USER@test.server.address#port:/path/to/cvsroot 

    仍然不行Q比如有以下错误信息Q?br> ssh: test.server.address#port: Name or service not known
    cvs [checkout aborted]: end of file from server (consult above messages if any)

    解决的方法是做一个脚本指定端口{向(不能使用aliasQ会(x)出找不到文g错误Q:(x)
    创徏一?usr/bin/ssh_cvs文gQ?br> #!/usr/bin/sh
    /path/to/ssh -p 34567 "$@"
    然后Qchmod +x /usr/bin/ssh_cvs
    qCVS_RSH=ssh_cvs; export CVS_RSH

    注意Qport是指相应服务器SSH的端口,不是cvs pserver的端?br>
    CVSWEBQ提高程序员比较文g修改效率
    ================================

    CVSWEB是CVS的WEB界面Q可以大大提高程序员定位修改的效?
    使用的样例可以看Q?a >http://www.freebsd.org/cgi/cvsweb.cgi

    CVSWEB的下载:(x)CVSWEB从最初的版本已经演化出很多功能界面更丰富的版本,q个是个人感觉觉得安装设|比较方便的Q?br> http://www.spaghetti-code.de/software/linux/cvsweb/

    下蝲解包Q?br> tar zxf cvsweb.tgz
    把配|文件cvsweb.conf攑ֈ安全的地方(比如和apache的配|放在同一个目录下Q,
    修改Qcvsweb.cgi让CGI扑ֈ配置文gQ?br> $config = $ENV{'CVSWEB_CONFIG'} || '/path/to/apache/conf/cvsweb.conf';

    转到/path/to/apache/conf下ƈ修改cvsweb.confQ?/p>

    1. 修改CVSROOT路径讄Q?br> %CVSROOT = (
      'Development' => '/path/to/cvsroot', #<==修改指向本地的CVSROOT
      );
    2. ~省不显C已l删除的文档Q?br> "hideattic" => "1",#<==~省不显C已l删除的文档
    3. 在配|文件cvsweb.conf中还可以定制头的描qC息,你可以修?long_intro成你需要的文字

    CVSWEB可不能随便开攄所有用P因此需要用WEB用户认证Q?br> 先生?passwd:
    /path/to/apache/bin/htpasswd -c cvsweb.passwd user

    修改httpd.conf: 增加
    <Directory "/path/to/apache/cgi-bin/cvsweb/">
    AuthName "CVS Authorization"
    AuthType Basic
    AuthUserFile /path/to/cvsweb.passwd
    require valid-user
    </Directory>

    CVS TAGS: who? when?
    ====================

    ?Id$ 加在E序文g开头的注释里是一个很好的?fn)惯Q?b style="color: black; background-color: rgb(255, 255, 102);">cvs能够自动解释更新其中的内Ҏ(gu)Qfile_name version time user_name 的格式,比如Qcvs_card.txt,v 1.1 2002/04/05 04:24:12 chedong ExpQ可以这些信息了解文件的最后修改h和修Ҏ(gu)?br>
    几个常用的缺省文Ӟ(x)
    default.php
    <?php
    /*
    * Copyright (c) 2002 Company Name.
    * $Header$
    */

    ?>

    ====================================
    Default.java: 注意文g头一般注释用 /* 开?JAVADOC注释?/** 开始的区别
    /*
    * Copyright (c) 2002 Company Name.
    * $Header$
    */

    package com.netease;

    import java.io;

    /**
    * comments here
    */
    public class Default {
        /**
        *
        * @param
        * @return
        */
        public toString() {

        }
    }

    ====================================
    default.pl:
    #!/usr/bin/perl -w
    # Copyright (c) 2002 Company Name.
    # $Header$

    # file comments here

    use strict;

    CVS vs VSS 
    ===========

    CVS没有文g锁定模式QVSS在check out同时Q同时记录了文g被导锁定?

    CVS是update commitQ?VSS是check out check in

    ?b style="color: black; background-color: rgb(255, 255, 102);">CVS中,标记自动更新功能~省是打开的,q样也带来一个潜在的问题Q就是不?kb方式dbinary文g的话?b style="color: black; background-color: rgb(255, 255, 102);">cvs自动更新时可能会(x)D文g失效?

    Virsual SourceSafe中这个功能称之ؓ(f)Keyword ExplainationQ缺省是关闭的,需要通过OPITION打开Qƈ指定需要进行源文g关键词扫描的cdQ?.txt,*.java,*.html...

    对于Virsual SourceSafe?b style="color: black; background-color: rgb(255, 255, 102);">CVS都通用的TAG有:(x)
    $Header$
    $Author$
    $Date$
    $Revision$

    量使用通用的关键词保证代码?b style="color: black; background-color: rgb(255, 255, 102);">CVS和VSS都能方便的跟t?

     

    相关资源Q?/p>

    CVS HOMEQ?br> http://www.cvshome.org

    CVS FAQQ?br> http://www.loria.fr/~molli/cvs-index.html

    相关|站:
    http://directory.google.com/Top/Computers/Software/Configuration_Management/Tools/Concurrent_Versions_System/

    CVS 免费?
    http://cvsbook.red-bean.com/

    CVS 命o(h)的速查卡片Q?br> http://www.refcards.com/about/cvs.html


    摘自Q?a target="_blank">http://www.chedong.com/tech/cvs_card.html



    pyguru 2005-02-15 23:39 发表评论
    ]]>
    WEB/APPLICATION/DATABASE服务器硬件DB配置http://www.aygfsteel.com/pyguru/archive/2005/02/15/1194.htmlpygurupyguruMon, 14 Feb 2005 18:09:00 GMThttp://www.aygfsteel.com/pyguru/archive/2005/02/15/1194.htmlhttp://www.aygfsteel.com/pyguru/comments/1194.htmlhttp://www.aygfsteel.com/pyguru/archive/2005/02/15/1194.html#Feedback0http://www.aygfsteel.com/pyguru/comments/commentRss/1194.htmlhttp://www.aygfsteel.com/pyguru/services/trackbacks/1194.html:  x一个WEB/APPLICATION/DATABASE服务器。准备用LINUX。预计用户量大概?000?nbsp;
    ?nbsp;
    :  时在U吧(CONCURRENT TRANSACTION能到200p)。我只用qREDHAT LINUX做一般开?nbsp;
    ?nbsp;
    :  的^収ͼ没有用它当过大用户量的服务器。机器准备自pQ? PROCESSOR 
    :  2.8GXEONQ?G-4G的MEMORYQ?20G - 300G的硬?SATA或者SCSI)Q问题是Q?nbsp;
    :  1、这个硬仉|行不行Q?nbsp;
    :  2、用什么LINUX好?REDHAT、FREEBSD、SUSE、其它的Q?nbsp;
    :  
    3、用什么DB好,PREGRESQLq是MYSQLQMYSQL现在也支持TRANSACTION了,但POSTGRESQL 
    :  好象q有很多跟ORACLE很接q的功能Q但从来没用q这个DB?nbsp;
    :  4、APPLICATION SERVER准备用TOMCAT5.0 + JDK1.5Q以前知道TOMCAT不能支持大用?nbsp;
    ?nbsp;
    :  Q不知道现在q是不是?nbsp;
    :  5、还有什么徏议? 
    :   
    :  多谢?nbsp;
    :   
     
    主要取决于这些transaction的复杂程?一般来说应该还可以.但如果有很多 
    varchar,blob之类的数据,比较玄?nbsp;
     
    至于OSQ推荐商业LinuxQ我们用RHAS比较多。SuSE也不错。考虑到要?nbsp;
    Java{,不要用FreeBSD。商业Linux的好处是你不用太费心d心Y件升U和l护?nbsp;
     
    DB之类Q能用商业Oracle或DB2Q性能要好得多。但如果省钱Q徏议还?nbsp;
    MySQLQ但要好好tuneQƈ且在Business Logic设计是,量减少和DB之间 
    的交互。MySQL的缺点还有,不支持Store Procedure。但你可把那些Business 
    Logic攑ֈ数据库外?nbsp;
     
    Application Server可能是最大的问题。Tomcat基本上是个轻型的Web/Servlet 
    Server, 大用户量,׃~Z一些支??x)比较困?另外,你有大量Transactions, 
    Tomcat本n没有Persistent的支持,你如果想在这一层上实现transactionQ?nbsp;
    恐怕得装其他containerQ如EJBQ或者Hibernate之类。在q一层上cache的数?nbsp;
    多Q对MySQL的以来就少。有些量不大的系l数据,可以通过一些技?nbsp;
    事先load到这一层,那么和数据库的交互就得多?nbsp;
     
    J2SE 5.0据说性能有提高,但我以ؓ(f)用它太冒q。不够Stable。如果没有transactionQ?nbsp;
    倒不是问题。另外,只有Tomcat 5.5以后的版本才能运行在J2SE 5.0上?nbsp;
    做服务器QBEA的JRockit VM不错?

    如果是普通的服务Q同时在Uh数最多也׃两百个hQ配|稍微好一点的pcp行?nbsp;
    人数如果多,最关键是内存一定要大,大好?br>

    pyguru 2005-02-15 02:09 发表评论
    ]]>
    Virtual hosts in Apachehttp://www.aygfsteel.com/pyguru/archive/2005/02/14/1157.htmlpygurupyguruSun, 13 Feb 2005 18:56:00 GMThttp://www.aygfsteel.com/pyguru/archive/2005/02/14/1157.htmlhttp://www.aygfsteel.com/pyguru/comments/1157.htmlhttp://www.aygfsteel.com/pyguru/archive/2005/02/14/1157.html#Feedback0http://www.aygfsteel.com/pyguru/comments/commentRss/1157.htmlhttp://www.aygfsteel.com/pyguru/services/trackbacks/1157.html Virtual hosts in Apache in Vhosts.conf, the following is the real setting. The docroot may not allow index, so you need to put index.html to test the virtual host

    ===============================================

    # Listen for virtual host requests on all IP addresses
    NameVirtualHost *:80


    DocumentRoot /var/www/html/web
    ServerName web.mydomain.com

    # Other directives here

    DocumentRoot /var/www/html/news
    ServerName news.mydomain.com

    # Other directives here


    DocumentRoot /var/www/html/photo
    ServerName photo.mydomain.com

    # Other directives here


    pyguru 2005-02-14 02:56 发表评论
    ]]>
    map linux network drive in Windowshttp://www.aygfsteel.com/pyguru/archive/2005/02/14/1156.htmlpygurupyguruSun, 13 Feb 2005 18:41:00 GMThttp://www.aygfsteel.com/pyguru/archive/2005/02/14/1156.htmlhttp://www.aygfsteel.com/pyguru/comments/1156.htmlhttp://www.aygfsteel.com/pyguru/archive/2005/02/14/1156.html#Feedback0http://www.aygfsteel.com/pyguru/comments/commentRss/1156.htmlhttp://www.aygfsteel.com/pyguru/services/trackbacks/1156.html首先创徏一个本地Unix账号Q?


      useradd -r myaccount

      q条命o(h)创徏了一个名为myaccount的普通Unix用户?

      然后Ҏ(gu)它创Z个Samba用户Q?


      smbadduser myaccount:mysmbact

      或者是Q?


      smbpasswd -a myaccount

    The password in Samba is not related to the unix account password.

    注意Q一旦你更新?b style="color: black; background-color: rgb(255, 255, 102);">samba配置文gQ你必须要通过使用/etc/init.d/samba restart (debian)来重起你?b style="color: black; background-color: rgb(255, 255, 102);">samba

    Then in windows, use the username and samba's password to map network drive.

    pyguru 2005-02-14 02:41 发表评论
    ]]>
    The Apache Web Serverhttp://www.aygfsteel.com/pyguru/archive/2005/02/14/1155.htmlpygurupyguruSun, 13 Feb 2005 18:35:00 GMThttp://www.aygfsteel.com/pyguru/archive/2005/02/14/1155.htmlhttp://www.aygfsteel.com/pyguru/comments/1155.htmlhttp://www.aygfsteel.com/pyguru/archive/2005/02/14/1155.html#Feedback0http://www.aygfsteel.com/pyguru/comments/commentRss/1155.htmlhttp://www.aygfsteel.com/pyguru/services/trackbacks/1155.html阅读全文

    pyguru 2005-02-14 02:35 发表评论
    ]]>
    վ֩ģ壺 | ϲ| ӱ| ˫| | | | | ݳ| ˮ| ˮ| ־| 㶫ʡ| ϰˮ| ǰ| ¯| | | | ˳| | ϵ| ɽ| | 㽭ʡ| | Ӫɽ| | Ӫ| | | ˮ| | | | Ƽ| ²| ӱ| | | ˶|