99国产一区,亚洲精品在线a,日本一区二区三区视频在线观看

Add RSS feeds to your Web site with Perl XML::RSS

pyguru — Wed, 16 Feb 2005 19:04:00 GMT

Guest Contributor, TechRepublic
December 22, 2004
URL: http://www.builderau.com.au/architect/webservices/0,39024590,39171461,00.htm

Take advantage of the XML::RSS CPAN package, which is specifically designed to read and parse RSS feeds.

You've probably already heard of RSS, the XML-based format which allows Web sites to publish and syndicate the latest content on their site to all interested parties. RSS is a boon to the lazy Webmaster, because (s)he no longer has to manually update his or her Web site with new content.

Instead, all a Webmaster has to do is plug in an RSS client, point it to the appropriate Web sites, and sit back and let the site "update itself" with news, weather forecasts, stock market data, and software alerts. You've already seen, in previous articles, how you can use the ASP.NET platform to manually parse an RSS feed and extract information from it by searching for the appropriate elements. But I'm a UNIX guy, and I have something that's even better than ASP.NET. It's called Perl.

Installing XML::RSS
RSS parsing in Perl is usually handled by the XML::RSS CPAN package. Unlike ASP.NET, which comes with a generic XML parser and expects you to manually write RSS-parsing code, the XML::RSS package is specifically designed to read and parse RSS feeds. When you give XML::RSS an RSS feed, it converts the various s in the feed into array elements, and exposes numerous methods and properties to access the data in the feed. XML::RSS currently supports versions 0.9, 0.91, and 1.0 of RSS.

Written entirely in Perl, XML::RSS isn't included with Perl by default, and you must install it from CPAN. Detailed installation instructions are provided in the download archive, but by far the simplest way to install it is to use the CPAN shell, as follows:

shell> perl -MCPAN -e shell
cpan> install XML::RSS

If you use the CPAN shell, dependencies will be automatically downloaded for you (unless you told the shell not to download dependent modules). If you manually download and install the module, you may need to download and install the XML::Parser module before XML::RSS can be installed. The examples in this tutorial also need the LWP::Simple package, so you should download and install that one too if you don't already have it.

Basic usage
For our example, we'll assume that you're interested in displaying the latest geek news from Slashdot on your site. The URL for Slashdot's RSS feed is located here. The script in Listing A retrieves this feed, parses it, and turns it into a human-readable HTML page using XML::RSS:

Listing A

#!/usr/bin/perl

# import packages
use XML::RSS;
use LWP::Simple;

# initialize object
$rss = new XML::RSS();

# get RSS data
$raw = get('http://www.slashdot.org/index.rss');

# parse RSS feed
$rss->parse($raw);

# print HTML header and page
print "Content-Type: text/html\n\n";
print ""; print ""; print "";
print "";
print "" . $rss->channel('title') .
"
";

# print titles and URLs of news items
foreach my $item (@{$rss->{'items'}})
{
        $title = $item->{'title'};
        $url = $item->{'link'};
        print "$title"; }

# print footers
print "
";
print "";

Place the script in your Web server's cgi-bin/ directory/. Remember to make it executable, and then browse to it using your Web browser. After a short wait for the RSS file to download, you should see something like Figure A.

Figure A

Slashdot RSS feed

How does the script in Listing A work? Well, the first task is to get the RSS feed from the remote system to the local one. This is accomplished with the LWP::Simple package, which simulates an HTTP client and opens up a network connection to the remote site to retrieve the RSS data. An XML::RSS object is created, and this raw data is then passed to it for processing.

The various elements of the RSS feed are converted into Perl structures, and a foreach() loop is used to iterate over the array of items. Each item contains properties representing the item name, URL and description; these properties are used to dynamically build a readable list of news items. Each time Slashdot updates its RSS feed, the list of items displayed by the script above will change automatically, with no manual intervention required.

The script in Listing A will work with other RSS feeds as well—simply alter the URL passed to the LWP's get() method, and watch as the list of items displayed by the script changes.

Here are some RSS feeds to get you started

Builder AU
Thinkgeek
CNET
Syndic8
Local weather forecasts

Tip: Notice that the RSS channel name (and description) can be obtained with the object's channel() method, which accepts any one of three arguments (title, description or link) and returns the corresponding channel value.

Adding multiple sources and optimising performance
So that takes care of adding a feed to your Web site. But hey, why limit yourself to one when you can have many? Listing B, a revision of the Listing A, sets up an array containing the names of many different RSS feeds, and iterates over the array to produce a page containing multiple channels of information.

Listing B

#!/usr/bin/perl

# import packages
use XML::RSS;
use LWP::Simple;

# initialize object
$rss = new XML::RSS();

# get RSS data
$raw = get('http://www.slashdot.org/index.rss');

# parse RSS feed
$rss->parse($raw);

# print HTML header and page
print "Content-Type: text/html\n\n";
print ""; print ""; print "";
print "";
print "" . $rss->channel('title') .
"
";

# print titles and URLs of news items
foreach my $item (@{$rss->{'items'}})
{
        $title = $item->{'title'};
        $url = $item->{'link'};
        print "$title"; }

# print footers
print "
";
print "";

Figure B shows you what it looks like.

Figure B

Several RSS feeds

You'll notice, if you're sharp-eyed, that Listing B uses the parsefile() method to read a local version of the RSS file, instead of using LWP to retrieve it from the remote site. This revision results in improved performance, because it does away with the need to generate an internal request for the RSS data source every time the script is executed. Fetching the RSS file on each script run not only causes things to go slow (because of the time taken to fetch the RSS file), but it's also inefficient; it's unlikely that the source RSS file will change on a minute-by-minute basis, and by fetching the same data over and over again, you're simply wasting bandwidth. A better solution is to retrieve the RSS data source once, save it to a local file, and use that local file to generate your page.

Depending on how often the source file gets updated, you can write a simple shell script to download a fresh copy of the file on a regular basis.

Here's an example of such a script:

#!/bin/bash
/bin/wget http://www.freshmeat.net/backend/fm.rdf -O freshmeat.rdf

This script uses the wget utility (included with most Linux distributions) to download and save the RSS file to disk. Add this to your system crontab, and set it to run on an hourly or daily basis.

If you find performance unacceptably low even after using local copies of RSS files, you can take things a step further, by generating a static HTML snapshot from the script above, and sending that to clients instead. To do this, comment out the line printing the "Content-Type" header in the script above and then run the script from the console, redirecting the output to an HTML file. Here's how:

$ ./rss.cgi > static.html

Now, simply serve this HTML file to your users. Since the file is a static file and not a script, no server-side processing takes place before the server transmits it to the client. You can run the command-line above from your crontab to regenerate the HTML file on a regular basis. Performance with a static file should be noticeably better than with a Perl script.

Looks easy? What are you waiting for—get out there and start hooking your site up to your favorite RSS news feeds.

pyguru 2005-02-17 03:04 发表评论

Lilina�Q�RSS聚合器构��Z��人门�?Write once, publish anywhere)

pyguru — Wed, 16 Feb 2005 19:00:00 GMT

Lilina�Q?b style="color: black; background-color: rgb(160, 255, 255);">RSS聚合器构��Z��人门�?Write once, publish anywhere)

最�q�搜�?b style="color: black; background-color: rgb(160, 255, 255);">RSS解析工具中找��C��MagPieRSS 和基于其设计�?a >Lilina�Q�Lilina的主要功能：(x��)

1 ��Z��WEB界面�?b style="color: black; background-color: rgb(160, 255, 255);">RSS��理�Q�添加，删除�Q�OPML导出�Q?b style="color: black; background-color: rgb(160, 255, 255);">RSS后台�~�存机制�Q�避免对数据源服务器产生�q�大压力�Q�，ScriptLet: �c�M��于Del.icio.us it的收藏夹��x��订阅JS脚本�Q?/p>

2 前台发布�Q�将自己的首��|��成了用Lilina发布我常看的几个朋友的网志，也省��M��很多更新自己�|�页的工作，需�?strong>php 4.3 + mbstring iconv

开源��Y件对i18n的支持越来越好了�Q�php 4.3.x�Q?--enable-mbstring' '--with-iconv'后比较好的同时处理了UTF-8和其他中文字�W�集发布�?b style="color: black; background-color: rgb(160, 255, 255);">RSS�?br> 需要感谢Steve在PHP�q�行转码斚w��?a >MagPieRSS�q�行和XML Hacking工作。至��目前�ؓ(f��)止：(x��)Add to my yahoo�q�不能很好的处理utf-8字符集的RSS收藏�?/p>

记得�q�初Wen Xin在CNBlog的研讨会(x��)上介�l�了个�h门户的概念，随着RSS在CMS技术中的成熟，��来��多的服务可以让个�h用户�Ҏ(gu��)��自己需求构建门��P��也算是符合了互联�|�的非中心化��势吧，比如利用Add to My Yahoo!功能�Q�用户可以轻杄��实现自己从更多数据源�q�行新闻订阅。想象一下把你自��q��del.icio.us书签收藏 / flickr囄��收藏 / Yahoo!新闻都通过�q�样一�?b style="color: black; background-color: rgb(160, 255, 255);">RSS聚合器聚�?发布��h��。其传播效率��有多快�?/p>

好比软�g开发通过中间�q�_��/虚拟机实玎ͼ�(x��)一�ơ写成，随处�q�行�Q�Write once, run anywhere�Q�，通过RSS/XML�q�个中间层，信息发布也实��C��Q�一�ơ写成，随处发布�Q�Write once, publish anywhere...�Q?/p>

安装Lilina需要PHP 4.3 以上�Q��ƈ带有iconv mbstring�{�函数的支持�Q�请��认一�?a --with-iconv'

另外��是一个需要能通过服务器端向外部服务器发送RPC��h��Q�这�?1.NET不支持。感�?a >PowWeb的服�?/a>很不错，很多�~�省的包都安装好了：(x��)

iconv
iconv support enabled
iconv implementation unknown
iconv library version unknown

Directive Local Value Master Value
iconv.input_encoding ISO-8859-1 ISO-8859-1
iconv.internal_encoding ISO-8859-1 ISO-8859-1
iconv.output_encoding ISO-8859-1 ISO-8859-1

mbstring
Multibyte Support enabled
Japanese support enabled
Simplified chinese support enabled
Traditional chinese support enabled
Korean support enabled
Russian support enabled
Multibyte (japanese) regex support enabled

��安装包解包�Q�下载文件扩展名�?gz 其实�?tgz�Q�需要重命名一下）�Q�上传到服务器相应目录下�Q�注意：(x��)相应cache目录和当前目录的可写入属性设�|�，然后配置一下conf.php中的参数卛_��开始��用�?/p>

何东�l�我的徏议：(x��)
1�Q�右边的一栏，�W�一��的sources最好跟hobby、友情链接一��P��加个囄��?br> 2�Q�一堆检索框在那儿，有些乱，��只有一个，其它的放��C��个二�U�页面上�?br> 3�Q�把联系方式�?qi��ng)cc,分别做成一条或一个图片，攑֜�双��一栏中�Q�具体的内容可以攑ֈ�二��面上，因�ؓ(f��)我觉得好象没有多��h�?x��)细读这些文字�?br> 4�Q�如果可能，把lilina的头部链接汉化一下吧�Q?/p>

一些改�q�计划：(x��)
1 删除�q�长的摘要，可以通过��L��W?�?

" 实现�Q?br> 2 分组功能�Q�将RSS�q�行�l�输出；

修改默认昄��实现�Q�Lilina�~�省昄��最�q?天发表的文章�Q�如果需要改成其他时间周期可以找刎ͼ�(x��)
$TIMERANGE = ( $_REQUEST['hours'] ? $_REQUEST['hours']*3600 : 3600*24 ) ;

�q�行改动�?/p>

RSS是一个能��自��q��所有资源：(x��)WIKI / BLOG / 邮�g聚合��h��的轻量��协议�Q�以后无��Z��在何处书写，只要�?b style="color: black; background-color: rgb(160, 255, 255);">RSS接口��都可以通过一定方式进行再�ơ的汇聚和发布�v来，从而大大提高了个�h知识��理和发�?传播效率�?/p>

以前�?b style="color: black; background-color: rgb(160, 255, 255);">RSS理解非常��：(x��)不就是一个DTD嘛，真了解�v解析器来�Q�才知道namespace的重要性，一个好的协议也应该是这��L(f��ng)��Q��ƈ非没有什么可加的�Q�但肯定是没有什么可“减”的了，而真的要做到�q�个其实很难很难……�?/p>

我会(x��)再尝试一下JAVA的相兌��析器�Q�将其扩展到WebLucene��目中，更多Java相关Open Source RSS解析器资�?/a>�?/p>

另外扑ֈ��?个��?b style="color: black; background-color: rgb(255, 255, 102);">Perl�q�行RSS解析的包�Q?br> 使用 XML::RSS::Parser::Lite�?a >XML::RSS::Parser 解析RSS

XML::RSS::Parser::Lite的代码样例如下：(x��)

#!/usr/bin/perl -w
# $Id$
# XML::RSS::Parser::Lite sample

use strict;
use XML::RSS::Parser::Lite;
use LWP::Simple;

my $xml = get("http://www.klogs.org/index.xml");
my $rp = new XML::RSS::Parser::Lite;
$rp->parse($xml);

# print blog header
print "get('url')."\">" . $rp->get('title') . " - " . $rp->get('description') . "\n";

# convert item to

print "

get('url') . "\">" . $it->get('title') . "

安装�Q?br> 需要SOAP-Lite

优点�Q?br> �Ҏ(gu��)��单，支持�q�程抓取�Q?/p>

�~�点�Q?br> 只支持title, url, description�q?个字�D�，不支持时间字�D�，

计划用于��单的抓取RSS同步服务设计�Q�每个�h都可以出版自��p��阅的RSS�?/p>

XML::RSS::Parser代码样例如下�Q?br> #!/usr/bin/perl -w
# $Id$
# XML::RSS::Parser sample with Iconv charset convert

use strict;
use XML::RSS::Parser;
use Text::Iconv;
my $converter = Text::Iconv->new("utf-8", "gbk");

my $p = new XML::RSS::Parser;
my $feed = $p->parsefile('index.xml');

# output some values
my $title = XML::RSS::Parser->ns_qualify('title',$feed->rss_namespace_uri);
# may cause error this line: print $feed->channel->children($title)->value."\n";
print "item count: ".$feed->item_count()."\n\n";
foreach my $i ( $feed->items ) {
map { print $_->name.": ".$converter->convert($_->value)."\n" } $i->children;
print "\n";
}

优点�Q?br> 能够直接��数据按字段输出�Q�提供更底层的界面；

�~�点�Q?br> 不能直接解析�q�程RSS�Q�需要下载后再解析；

2004-12-14:
从cnblog的Trackback中了解到�?a >Planet RSS聚合�?/a>

Planet的安装：(x��)解包后，直接在目录下�q�行�Q�python planet.py examples/config.ini ��可以在output目录中看到缺省样例FEED中的输出了index.html�Q�另外还有opml.xml�?b style="color: black; background-color: rgb(160, 255, 255);">rss.xml�{�输出（�q�点比较好）

我用几个RSS试了一下，UTF-8的没有问题，但是GBK的全部都��q��了，planetlib.py中和XML字符集处理的只有以下代码�Q�看来所有的非UTF-8都被当作iso8859_1处理了：(x��)
try:
data = unicode(data, "utf8").encode("utf8")
logging.debug("Encoding: UTF-8")
except UnicodeError:
try:
data = unicode(data, "iso8859_1").encode("utf8")
logging.debug("Encoding: ISO-8859-1")
except UnicodeError:
data = unicode(data, "ascii", "replace").encode("utf8")
logging.warn("Feed wasn't in UTF-8 or ISO-8859-1, replaced " +
"all non-ASCII characters.")

�q�期学习(f��n)一下Python的unicode处理�Q�感觉是一个很��z�的语言�Q�有比较好的try ... catch 机制和logging

关于MagPieRSS性能问题的疑虑：(x��)
对于Planet和MagPieRSS性能的主要差异在是缓存机制上�Q�关于��用缓存机制加速WEB服务可以参考：(x��)可缓存的cms设计�?/p>

可以看到�Q�Lilina的缓存机制是每次��h��的时候遍历缓存目录下�?b style="color: black; background-color: rgb(160, 255, 255);">RSS文�g�Q�如果缓存文件过期，�q�要动态向RSS数据源进行请求。因此不能支持后台太多的RSS订阅和前端大量的�q�发讉K��Q�会(x��)造成很多的I/O操作�Q��?/p>

Planet是一个后台脚本，通过脚本��订阅的RSS定期汇聚成一个文件输出成静态文件�?/p>

其实只要在MagPieRSS前端增加一个wget脚本定期��index.php的数据输出成index.html�Q�然后要求每�ơ访问先讉K��index.html�~�存�Q�这样不��和Planet的每��时生成index.html静态缓存一样了吗�?/p>

所以在不允许自己配�|�服务器脚本的虚拟主机来说Planet�Ҏ(gu��)��是无法运行的�?/p>

更多关于PHP中处理GBK的XML解析问题请参考：(x��)
MagPieRSS中UTF-8和GBK�?b style="color: black; background-color: rgb(160, 255, 255);">RSS解析分析

2004-12-19
正如在SocialBrain 2005�q�的讨论�?x��)中�Q�Isaac Mao所��_(d��)��(x��)Blog is a 'Window', also could be a 'Bridge'�Q�Blog是个�?�l�织对外的“窗口”，�?b style="color: black; background-color: rgb(160, 255, 255);">RSS更方便你��这些窗口组合�v来，成�ؓ(f��)光��的“桥梁”，有了�q�样的中间发布层�Q�Blog不仅从单点发布，更到P2P自助传播�Q�越来越看到�?b style="color: black; background-color: rgb(160, 255, 255);">RSS在网�l�传播上的重要性�?/p>

Posted by chedong at December 11, 2004 12:34 AM Edit
Last Modified at December 19, 2004 04:40 PM

2005改变你生�zȝ��50�U�方�?/a> 2005-01-31
首尔之行 2005-01-25
+1 rel="nofollow" = 互联�|��ؓ(f��)��链戴上的安全套?! ;-) 2005-01-21
可读性和更新�? RSS模板的atom化改�?/a> 2005-01-20
让搜索引擎Spider告诉你：(x��)什么时��_(d��)��从哪里，用什么��n份抓取了你的�|�站 2005-01-17

<img>

Trackback Pings

TrackBack URL for this entry:
http://www.chedong.com/cgi-bin/mt3/mt-tb.cgi/27

Listed below are links to weblogs that reference Lilina�Q?b style="color: black; background-color: rgb(160, 255, 255);">RSS聚合器构��Z��人门�?Write once, publish anywhere):

�� MagPieRSS中UTF-8和GBK�?b style="color: black; background-color: rgb(160, 255, 255);">RSS解析分析�Q�附�Q�php中的面向字符�~�程详解�Q?/a> from 车东BLOG
�W�一�ơ尝试MagpieRSS�Q�因为没有安装iconv和mbstring�Q�所以失败了�Q�今天在服务器上安装了iconv和mtstring的支持，我今天仔�l�看了一下lilina中的rss_fetch的用法：(x��)最重要的是制定RSS的输出格式�ؓ(f��)'MAGPIE_OU... [Read More]

Tracked on December 19, 2004 12:37 AM

�� ?lilina �?blogline 来看 blog from Philharmania's Weblog
看到一��?a rel="nofollow">介绍 lilina 的文�?/a>后就自己安装了一�?/a>试了下�?a rel="nofollow">lilina 是一个用 PHP �?[Read More]

Tracked on December 26, 2004 01:57 PM

�� CNBlog作者群RSS征集�?/a> from CNBlog: Blog on Blog
在CNBLOG上搭��Z��Lilina RSS聚合�?/a>�Q�请各位志愿者将各自�|�志或者和与cnblog相关专栏�?b style="color: black; background-color: rgb(160, 255, 255);">RSS提交�l�我 �?直接在评��Z��回复卛_��? 推广使用RSS聚合工具主要的目�? . [Read More]

Tracked on December 26, 2004 07:42 PM

�� 关于加快 lilina 昄��速度的一些设�|?/a> from Kreny's Blog
我的 lilina 在设定了几位朋友�?blog 和一�?news 以后�Q�发现打开速度异常的慢�Q�于是请教了车东�Q�解决了问题�? 解决的关键在于：(x��)

直接��以下语句加入到 index.php 头部卛_��Q�LILINA中你 .

[Read More]

Tracked on January 14, 2005 06:14 PM

�� MT的模板修改和界面皮肤讄�� from 车东BLOG
分类索引�Q?首页�~�省有按月归档的索引�Q�没有分�cȝ��录的索引�Q�看了手册里面也没有具体的参数定义，只好直接看SOURCE�Q�尝试着把Monthly�Ҏ(gu��)��Category�Q�居然成�?:-) �q�到了Movable Style的MT样式站，... [Read More]

Tracked on January 17, 2005 01:25 PM

Comments

请问如果更改默认昄��7天的新闻�Q�谢谢�?/p>

Posted by: honren at December 12, 2004 10:20 PM

我��用lilina已经一�D�|��间了�?br> http://news.yanfeng.org
�E�微改了一点UI�?br> 如果你能改进它，那就好了�?/p>

Posted by: mulberry at December 13, 2004 09:24 AM

老�R同志�Q�没觉得你��用lilina以来�Q�主��늚�讉K��速度��h��吗？攑ּ�吧，臛_��没必要当作首��，lilina�q�在技术还不成熟`~

Posted by: kalen at December 16, 2004 10:33 AM

可以考虑一下用drupal

Posted by: shunz at December 28, 2004 06:46 PM

可以试试我做的：(x��)http://blog.terac.com

�?��时抓取blog,然后每个�?条最新的�Q�排序，聚合�Q�生成静态xml�Q�用xsl格式化显�C�。。�?/p>

Posted by: andy at January 6, 2005 12:53 PM

车东同志�Q�这样做不好�Q�P
rss本来��在�|�上�Q�你聚合它在你的�|�页上不仅损害了你自�׃��늚�质量�Q�而且�q�h��了搜索引擎，造成你痛斥的“门��L(f��ng)��站损宛_��作热情”的效果。还是不要聚合的好！

pyguru 2005-02-17 03:00 发表评论

Using RSS News Feeds with Perl

pyguru — Wed, 16 Feb 2005 18:59:00 GMT

Abstract

The Rich Site Summary (RSS) format, previously known as the RDF Site Summary, has quietly become the dominant format for distributing news headlines on the Web.

In this Mother of Perl tutorial, we will write a short Perl script (less than 100 lines) that retrieves an XML RSS file from the Web or local file system and converts it to HTML. Using a Server Side Include (SSI) or similar method, you can easily add news headlines from any number of sources to your Web site.

History

Where did RSS come from you ask? Netscape invented the RSS format for "channels" on Netscape Netcenter (http://my.netscape.com). It was released to the public in March of 1999. The first non-Netscape Web site to incorporate the new format was Scripting News, a popular technology news site run by Dave Winer, president of Userland Software (think Frontier). Interestingly enough, Scripting News had been using its own XML format, scriptingNews, since December of 1997.

In May of 1999, Dave Winer released a new version of the scriptingNews XML format, which added new content-rich elements. Netscape followed suit by adopting most of the new scriptingNews elements into RSS 0.91, which was released in July of 1999.

Userland Software also rolled out their own flavor of my.netscape.com. If you haven't already guessed, it's available at http://my.userland.com.

As far as I know, RSS is the most widely used XML format on the Web today. RSS headlines are available for many popular news sites like Slashdot, Forbes, and CNET News.com, and the list is growing daily.

In a time when "stickiness" is a good, displaying news headlines on your Web site can really help give it the extra "umph" that will encourage users to return. After all, users can only read your president's bio but so many times.

Required Modules

For rss2html.pl to work on your system, you should have a recent version of Perl installed, 5.003 or better. 5.005 is recommended. You will also need the XML::Parser and XML::RSS modules installed.

To install the modules on a *nix system, type:
perl -MCPAN -e "install XML::Parser"
perl -MCPAN -e "install XML::RSS"

If you're using a win32 machine (Win95/98/NT), you have a recent installation of Activestate Perl. If you don't have a recent version, visit http://www.activestate.com.

To install XML::Parser on a win32 machine type:
ppm install XML-Parser

To install XML::RSS on a win32 machine (you must have a C compiler and nmake):

Download the module from: http://search.cpan.org/dist/XML-RSS/
Uncompress the zip file and cd to the XML-RSS-0.5 directory
type: perl Makefile.PL
type: nmake
type: nmake install

Next, we'll examine the RSS format in more detail.

rss2html.pl	Get the source
This script converts an RSS file on the Web or local file system to HTML.

RSS 0.9

The first public version of RSS, 0.9, includes basic headline information. Below is an example RSS file for Freshmeat.net, a popular news site for Linux software:


xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://my.netscape.com/rdf/simple/0.9/">
  

freshmeat.net
http://freshmeat.net
the one-stop-shop for all your Linux softwar needs

 

freshmeat.net
http://freshmeat.net/images/fm.mini.jpg
http://freshmeat.net

  

Geheimnis 0.59
http://freshmeat.net/news/1999/06/21/930004162.html

  

Firewall Manager 1.3 PRO
http://freshmeat.net/news/1999/06/21/930004148.html

  

quick finder
Use the text input below to search the fresh
meat application database
query
http://core.freshmeat.net/search.php3

The first major element is channel which contains the following elements:

title - the title of the channel
link - the link to the channel Web site
description - short description of the channel

An RSS channel may also contain an image element as in the example above which contains the following elements:

title - the text describing the image
url - the URL of the image
link - the URL that the image is linked to

The item element contains the real channel content which is comprised of a title and a link element. An RSS file may contain up to 15 items.

An RSS 0.9 file may alternatively contain a textinput element which allows users to type a string into a HTML text input field and submit it via the HTTP GET method to the URL specified in the link element.

Next, we will examine RSS 0.91 which was released by Netscape in July of 1999.

RSS 0.91

The latest version of RSS added a few new elements. Below is a sample RSS file from XML.com, an excellent XML resource site:

XML News and Features from XML.com
XML.com features a rich mix of information and services for the XML community.
en-us
http://xml.com/pub
Copyright 1999, O'Reilly and Associates and Seybold Publications
dale@xml.com (Dale Dougherty)
peter@xml.com (Peter Wiggin)

XML News and Features from XML.com
http://xml.com/universal/images/xml_tiny.gif
http://xml.com/pub
88
31

Issue: XML Data Servers
http://xml.com/pub?wwwrrr_rss
Although not everyone agrees that XML should become a full-fledged data-management discipline, object-database vendors are busy repositioning their object-database products as XML data servers. Jon Udell looks at one of these, Object Design's eXcelon and finds it a solid product.

O'Reilly Labs Review: Object Design's eXcelon 1.1
http://xml.com/pub/1999/08/excelon/index.html?wwwrrr_rss
Jon Udell takes a look at eXcelon, Object Design's XML data servers, and explains its user interface and general approach to XML.

Report from Montreal
http://xml.com/pub/1999/08/excelon/montreal.html?wwwrrr_rss
Lisa Rein reports from MetaStructures 99 and XML Developers' Day.

Reviews: Bluestone Software's XML Suite: Promising App, Rough Around the Edges
http://xml.com/pub/1999/08/bluestone/index.html?wwwrrr_rss
Our reviewer tested Bluestone's XML Suite (XML Server and Visual XML) on the Windows NT platform, simulating a two-way exchange of business information between a book publisher and book stores. The results were encouraging (with a few caveats).

Interviews: CBL: Ecommerce Componentry
http://xml.com/pub/1999/08/glushko/glushko.html?wwwrrr_rss
In this audio interview, Bob Glushko of Commerce One talks about the Common Business Library (CBL) as a set of building blocks for XML document types and schemas used in ecommerce.

Backends Sharing Data
http://xml.com/pub/1999/08/rpc/index.html?wwwrrr_rss
What if you could script remote procedure calls between web sites as easily as you can between programs? Edd Dumbill shows how it can be done in PHP.

Back Issue: XML Suite
http://xml.com/pub/1999/08/18/index.html?wwwrrr_rss
Barry Nance runs Bluestone's XML Suite through the paces. The tools show promise for passing data between databases and XML. But there are still a few kinks to be worked out.

Back Issue: XML-RPC
http://xml.com/pub/1999/08/11/index.html?wwwrrr_rss
A major promise of XML is its ability to pass data simply from one place to another, regardless of platform. In this issue, Edd Dumbill shows how to use XML-RPC in PHP to pass data from a web site to a PDA.

News: InDelv XML/XSL Client Version 0.4.
http://xml.com/pub/coverpage/newspage.html#ni1999-08-27-a?wwwrrr_rss
A posting from Rob Brown reports on the public availability of the new InDelv XML Client version 0.4. This version represent an upgrade to InDelv's previously released XML Browser, but "it has been renamed as a 'Client' to reflect the fact that it now contains both an XML/XSL browser and an XML/XSL editor. The browser is available free for all uses. The editor comes packaged with the browser as a demo, which can later be upgraded to a full commercial version. This is a 100% Java appl...

News: OpenJade Development Team Releases OpenJade 1.3pre1 (Beta).
http://xml.com/pub/coverpage/newspage.html#ni1999-08-27-g?wwwrrr_rss
A recent posting from Avi Kivity and the OpenJade Development Team announced the release of OpenJade 1.3pre1 (Beta). "OpenJade is the DSSSL user community's open source implementation of DSSSL, Document Style Semantics and Specification Language, an ISO standard for rendering SGML and XML documents. OpenJade is based on James Clark's widely used Jade. OpenJade 1.3pre1 is a more complete implementation of the DSSSL standard, and introduces many new features, including (1) Implementat...

News: IBM XML Parser Update: XML4C2 Version 2.3.1 Released.
http://xml.com/pub/coverpage/newspage.html#ni1999-08-27-b?wwwrrr_rss
Dean Roddey posted an announcement for the update of XML4C. IBM's XML for C++ parser (XML4C) "is a validating XML parser written in a portable subset of C++. XML4C makes it easy to give an application the ability to read and write XML data. Its two shared libraries provide classes for parsing, generating, manipulating, and validating XML documents. XML4C is faithful to the XML 1.0 Recommendation and associated standards (DOM 1.0, SAX 1.0). Source code, samples and API documentation ...

News: Platform for Privacy Preferences (P3P) Specification Working Draft.
http://xml.com/pub/coverpage/newspage.html#ni1999-08-27-h?wwwrrr_rss
As part of the W3C P3P Activity, a fifth public working draft of the Platform for Privacy Preferences (P3P) Specification has been published for review by W3C members. The working draft "describes the Platform for Privacy Preferences (P3P). P3P enables Web sites to express their privacy practices and enables users to exercise preferences over those practices. P3P compliant products will allow users to be informed of site practices (in both machine and human readable formats), to deleg...

News: Extended XLink with XSLT.
http://xml.com/pub/coverpage/newspage.html#ni1999-08-27-c?wwwrrr_rss
Nikita Ogievetsky (President, Cogitech, Inc.) posted an announcement for the availability of slides from the Metastructures '99 presentation "HTML Form Templates with XML. All in One and One for All. XSLT template library for WEB applications." The paper describes building XSLT template library for web applications. The goal was to "demonstrate data processing on the web made easy with XSL transformations: Generate a data maintenance web with data-structure controlled by XML, scree...

News: HyBrick Web Site Reopens.
http://xml.com/pub/coverpage/newspage.html#ni1999-08-27-d?wwwrrr_rss
A posting from Toshimitsu Suzuki (Fujitsu Laboratories Ltd.) to the XLXP-DEV mailing list recently announced the reopening of the HyBrick Web site. 'HyBrick' is "an advanced SGML/XML browser developed by Fujitsu Laboratories, the research arm of Fujitsu. HyBrick is based on an architecture that supports advanced linking and formatting capabilities. HyBrick includes a DSSSL renderer and XLink/XPointer engine running on top of James Clark's SP and Jade. HyBrick supports: (1) Both v...

News: Extended DocBook Synopses Version 1.0.
http://xml.com/pub/coverpage/newspage.html#ni1999-08-27-e?wwwrrr_rss
Norman Walsh has posted an announcement for a preliminary release of 'Extended DocBook Synopses'. Extended DocBook Synopses is a customization layer that extends DocBook, "adding a function synopsis element, ClassSynopsis for modern, mostly object-oriented, programming languages such as Java, C++, Perl, and IDL." DocBook is an SGML [and XML] DTD maintained by the DocBook Technical Committee of OASIS that particularly well suited to books and papers about computer hardware and softwar...

Notice that there are more descriptive elements for the channel, image, amd items elements. These are referred to as "fat elements" because they contain a more detailed description of each channel item.

The XML::RSS Module

Now that you've had a change to glance at two RSS examples, it's time to introduct the XML::RSS module. XML::RSS is a subclass of XML::Parser, a Perl module maintained by Clark Cooper that utilizes James Clark's Expat C library. XML::RSS was developed to simplify the task of manipulating and parsing RSS files. A deep understanding of XML is not a prerequisite for using XML::RSS since the XML details are hidden inside the class interface.

While XML::RSS is capable of creating RSS files, we will be focusing on parsing existing RSS files in this column. You can read more about the capabilities of XML::Parser in the module's documentation or by typing:
perldoc XML::RSS

The Code

Well, let's look at the code shall we? Lines 16-17 load the XML::RSS and LWP::Simple modules. We've already talked about XML::RSS in brief, but what does LWP::Simple do? Good question! The answer is simple (puns intended). It's a procedural interface for interacting with a Web server. It's also the little cousin of LWP::UserAgent, a fuller object oriented interface. We'll be using one of the library's subroutines later in the code to fetch an RSS file from the Web.

In lines 20-21 we initialize two variables that we're going to use later.

Line 25 starts the main code body. The first thing we do is verify that the user typed exactly one command-line parameter. This parameter is then assigned to the $arg variable in line 28.

Next we create a new instance of the XML::RSS class and assign the reference to the $rss variable on line 31.

Now we must determine whether the command-line parameter the user entered is an HTTP URL or a file on the local file system (lines 34-46). On line 34, we us a regular expression to look for the characters http:.

If the command-line argument starts with these characters, we can safely assume that the user intends to retrieve an RSS file from a Web server. On line 35 we pass the argument to the get() function, which is a part of LWP::Simple, and assign the results to the $content variable. On line 36 we call die() if $content is empty. If this happens, it means there was an error retrieving the RSS file. If the RSS file was downloaded successfully, $rss->parse($content) is called which parses the RSS file and stores the results in the object's internal structure (line 38).

If the command-line argument does not contain the http: characters, we assume the argument is a file instead of a URL on lines 41-46. The first thing we do is assign the value of $arg to the $file variable and test for the existence of the file (lines 42-43).

Then we call $rss->parsefile($file) (line 45), which parses the RSS file and stores the results in the object's internal structure. The parsefile() method parses a file, whereas the parse() method parses the string that's passed to it.

Lastly, we call the print_html subroutine on line 49, which converts the RSS object in nicely formatted HTML.

print_html

As you examine this subroutine, you will begin to understand the internal structure of the XML::RSS object. The critical portion of the subroutine is contained on lines 76-79. In this foreach loop, we iterate over each of the RSS items.

Next, let's take a look at rss2html.pl in action.

rss2html.pl in Action

I've added the following cron jobs that run once per hour on the Webreference server (Scheduler is the NT counterpart):

rss2html.pl http://slashdot.org/slashdot.rdf > slashdot.html rss2html.pl http://freshmeat.net/backend/fm.rdf > freshmeat.html rss2html.pl http://www.linuxtoday.com/backend/my-netscape.rdf > linuxtoday.html rss2html.pl http://www.xml.com/xml/news.rdf > xmlnews.html rss2html.pl http://www.perlxml.com/rdf/moperl.rdf > mop.html

The commands above fetch the RSS files off the Web and convert them to HTML. Using Server-Side Includes (SSI), I've included the results below:

Slashdot:

WiMax Technology Could Blanket the US?

Microsoft Anti-Spyware to Be Free of Charge

ACM to Honor TCP/IP Creators with Turing Award

New Rules Proposed on Electronic Evidence

Intel From Behind the Curtain

Kyoto Protocol Comes Into Force

Cory Doctorow's 'I, Robot' Posted

Straczynski Offers To Re-Boot Star Trek

Building The MareNostrum COTS Supercomputer

freshmeat.net announcements (Global)

Zolera SOAP Infrastructure 1.7 (Default branch)

XBible 3.0 (Default branch)

PDFdirectory 0.2.04 (Default branch)

XC-AST 0.7.0 (Default branch)

Imagero Reader 1.73 (Default branch)

GNU ccAudio2 0.4.0 (Testing branch)

quisp 1.27 (Default branch)

shsql 1.27 (Default branch)

samhain 2.0.4 (Default branch)

CANDIDv2 2.40 (Default branch)

ADV: Dialing for Dollars

libferris 1.1.46 (Default branch)

FUDforum 2.6.10 (Stable branch)

HORRORss 1.0 (Default branch)

Roxen WebServer 4.0.325-release 4 (Default branch)

Configuration File Library 1.0 (Default branch)

Goggles 0.7.11 (Default branch)

Pluto DCE library 2.0.0.9 (Default branch)

Pluto Bi-Directional Comm library 2.0.0.9 (Default branch)

zen Platform 2.0.4 (Default branch)

ADV: Gimme Shelter

MIME Email message class 2005.02.15 (Default branch)

ELF statifier 1.6.3 (Default branch)

SekHost 1.2 (Default branch)

ulogd 1.21 (Default branch)

Journaled Files LIBrary 0.1.0-0.0.0 (Default branch)

FastTemplate.php3 1.2.0 (Default branch)

iptables 1.3.0 (Default branch)

Very Simple Control Protocol Daemon 0.1.4 (Default branch)

C Parameters 0.9.0 (Default branch)

ADV: Dialing for Dollars

eXtreme Project Management Tool 0.7beta1 (Development branch)

gccc 1.099 (Default branch)

Magellan Metasearch 1.00-RC3 (Default branch)

CAN Abstraction Layer 0.1.4 (Default branch)

TreeLine 0.11.1 (Default branch)

GNOME Sensors Applet 0.6.1 (Default branch)

iODBC Driver Manager and SDK 3.52.2 (Default branch)

DISLIN 8.3 (Default branch)

Pluto Home 2.0.0.9 (Default branch)

ADV: Dialing for Dollars

Expense Report Software 1.07 (Default branch)

Yzis M3 (Default branch)

Q Light Controller 2.4.1 (Default branch)

Menc 0.3 (Default branch)

Another File Integrity Checker 2.7-0 (Default branch)

BibShelf 1.4.0-1 (Default branch)

Eleven 1.0 (Default branch)

Linice 2.5 (Default branch)

JDirt 1.3 (Default branch)

ADV: Dialing for Dollars

Nazghul 0.4.0 (Default branch)

Rush 2005 0.4.10 (Default branch)

Monesa 0.24.1 (Stable branch)

Persist.NET 0.9.1 beta (Default branch)

Roundup 0.8 (Default branch)

Aquarium Web Application Framework 2.0 (Default branch)

sn9c102 Video Grabber 1.7.0 (Default branch)

GRAVEMAN 0.3.8 (Default branch)

viewurpmi 0.2 (Default branch)

ADV: Dialing for Dollars

NuFW 1.0-rc1 (Stable branch)

OpenSceneGraph Editor 0.6.0 (Default branch)

HPGS - HPGl Script 0.6.0 (Default branch)

lustre 1.4.1-rc1 (Default branch)

IBM HeapAnalyzer 1.3.3 (Default branch)

CANDIDv2 2.3.6 (Default branch)

NetSPoC 2.5 (Default branch)

Metal Mech 0.0.3 (Default branch)

radmind 1.5.0 (Default branch)

ADV: Dialing for Dollars

iPodBackup 1.4 (Default branch)

db4o 4.3 (Mono branch)

web2ldap 0.15.9 (Default branch)

Mantissa 5.6 (Default branch)

Drone IRC Bot 1.2 (Default branch)

NoFuss POS 0.06 (Default branch)

xlog 1.1 (Stable branch)

ActiveBPEL 1.0.7 (Default branch)

Java Embedded Python 1.1 (Default branch)

ADV: Dialing for Dollars

Neveredit 0.8 (Default branch)

The friendly interactive shell 1.1 (Default branch)

Webmatic 2.0.3 (Default branch)

JTMOS Operating System Build 7700 (Default branch)

BIRD 1.0.10 (Default branch)

Tune in 2 Me 050215 (Default branch)

HMSCalc 3.0 (Default branch)

Information Currency Web Services 0.0.4 (Default branch)

Nitro + Og 0.10.0 (Default branch)

ADV: Dialing for Dollars

Just For Fun Network Management System 0.8.0 (Stable branch)

rxvt-unicode 5.1 (Default branch)

PHPEmaillist 0.3 (Default branch)

ulogd-php 1.0 (Default branch)

mod_access_rbl2 1.0 (Default branch)

5lack10.1 0.8 (Default branch)

profusemail 0.9.1 (Default branch)

Linux Today

LWN.net: FSF Announces New Executive Director

LinuxPlanet: Novell Takes Enterprise Security Focus

CNET News: HP: Don't Like Software Patents? Learn to Deal

internetnews.com: CA Chief: Innovate, Cooperate

Boston Herald: Linux Show Plans BCEC Move

XML.com

Features: Very Dynamic Web Interfaces

Features: Comparing CSS and XSL: A Reply from Norm Walsh

Features: Top 10 XForms Engines

Features: An Introduction to TMAPI

XML Tourist: The Silent Soundtrack

Transforming XML: The XPath 2.0 Data Model

Features: SIMILE: Practical Metadata for the Semantic Web

Features: Hacking Open Office

Features: Formal Taxonomies for the U.S. Government

Features: Reviewing the Architecture of the World Wide Web

Features: Printing XML: Why CSS Is Better than XSL

Python and XML: Introducing the Amara XML Toolkit

Features: Introducing Comega

Features: SAML 2: The Building Blocks of Federated Identity

The Restful Web: Amazon's Simple Queue Service

Conclusion

Well, we've shown in this column that Perl can really pack a wallop in a short amount of code. With rss2html.pl, anyone can automatically add a news feed to their Web site.

For more information on RSS, you might try visiting the following sites:

http://my.userland.com
http://www.scripting.com
http://www.perlxml.com

rss2html.pl	Get the source
This script converts an RSS file on the Web or local file system to HTML.

pyguru 2005-02-17 02:59 发表评论

The Python Web services developer: RSS for Python

pyguru — Wed, 16 Feb 2005 18:48:00 GMT

Content syndication for the Web

Level: Introductory

Mike Olson (mike.olson@fourthought.com), Principal Consultant, Fourthought, Inc.
Uche Ogbuji (uche.ogbuji@fourthought.com), Principal Consultant, Fourthought, Inc.

13 Nov 2002

RSS is one of the most successful XML services ever. Despite its chaotic roots, it has become the community standard for exchanging content information across Web sites. Python is an excellent tool for RSS processing, and Mike Olson and Uche Ogbuji introduce a couple of modules available for this purpose.

RSS is an abbreviation with several expansions: "RDF Site Summary," "Really Simple Syndication," "Rich Site Summary," and perhaps others. Behind this confusion of names is an astonishing amount of politics for such a mundane technological area. RSS is a simple XML format for distributing summaries of content on Web sites. It can be used to share all sorts of information including, but not limited to, news flashes, Web site updates, event calendars, software updates, featured content collections, and items on Web-based auctions.

RSS was created by Netscape in 1999 to allow content to be gathered from many sources into the Netcenter portal (which is now defunct). The UserLand community of Web enthusiasts became early supporters of RSS, and it soon became a very popular format. The popularity led to strains over how to improve RSS to make it even more broadly useful. This strain led to a fork in RSS development. One group chose an approach based on RDF, in order to take advantage of the great number of RDF tools and modules, and another chose a more stripped-down approach. The former is called RSS 1.0, and the latter RSS 0.91. Just last month the battle flared up again with a new version of the non-RDF variant of RSS, which its creators are calling "RSS 2.0."

RSS 0.91 and 1.0 are very popular, and used in numerous portals and Web logs. In fact, the blogging community is a great user of RSS, and RSS lies behind some of the most impressive networks of XML exchange in existence. These networks have grown organically, and are really the most successful networks of XML services in existence. RSS is a XML service by virtue of being an exchange of XML information over an Internet protocol (the vast majority of RSS exchange is simple HTTP GET of RSS documents). In this article, we introduce just a few of the many Python tools available for working with RSS. We don't provide a technical introduction to RSS, because you can find this in so many other articles (see Resources). We recommend first that you gain a basic familiarity with RSS, and that you understand XML. Understanding RDF is not required.

[We consider RSS an 'XML service' rather than a 'Web service' due to the use of XML descriptions but the lack of use of WSDL. -- Editors]

RSS.py
Mark Nottingham's RSS.py is a Python library for RSS processing. It is very complete and well-written. It requires Python 2.2 and PyXML 0.7.1. Installation is easy; just download the Python file from Mark's home page and copy it to somewhere in your PYTHONPATH.

Most users of RSS.py need only concern themselves with two classes it provides: CollectionChannel and TrackingChannel. The latter seems the more useful of the two. TrackingChannel is a data structure that contains all the RSS data indexed by the key of each item. CollectionChannel is a similar data structure, but organized more as RSS documents themselves are, with the top-level channel information pointing to the item details using hash values for the URLs. You will probably use the utility namespace declarations in the RSS.ns structure. Listing 1 is a simple script that downloads and parses an RSS feed for Python news, and prints out all the information from the various items in a simple listing.


  
from RSS import ns, CollectionChannel, TrackingChannel

#Create a tracking channel, which is a data structure that
#Indexes RSS data by item URL
tc = TrackingChannel()

#Returns the RSSParser instance used, which can usually be ignored
tc.parse("http://www.python.org/channews.rdf")

RSS10_TITLE = (ns.rss10, 'title')
RSS10_DESC = (ns.rss10, 'description')

#You can also use tc.keys()
items = tc.listItems()
for item in items:
    #Each item is a (url, order_index) tuple
    url = item[0]
    print "RSS Item:", url
    #Get all the data for the item as a Python dictionary
    item_data = tc.getItem(item)
    print "Title:", item_data.get(RSS10_TITLE, "(none)")
    print "Description:", item_data.get(RSS10_DESC, "(none)")

We start by creating a TrackingChannel instance, and then populate it with data parsed from the RSS feed at http://www.python.org/channews.rdf. RSS.py uses tuples as the property names for RSS data. This may seem an unusual approach to those not used to XML processing techniques, but it is actually a very useful way of being very precise about what was in the original RSS file. In effect, an RSS 0.91 title element is not considered to be equivalent to an RSS 1.0 one. There is enough data for the application to ignore this distinction, if it likes, by ignoring the namespace portion of each tuple; but the basic API is wedded to the syntax of the original RSS file, so that this information is not lost. In the code, we use this property data to gather all the items from the news feed for display. Notice that we are careful not to assume which properties any particular item might have. We retrieve properties using the safe form as seen in the code below.



    print "Title:", item_data.get(RSS10_TITLE, "(none)")

Which provides a default value if the property is not found, rather than this example.



    print "Title:", item_data[RSS10_TITLE]

This precaution is necessary because you never know what elements are used in an RSS feed. Listing 2shows the output from Listing 1.



$ python listing1.py 
RSS Item: http://www.python.org/2.2.2/
Title: Python 2.2.2b1
Description: (none)
RSS Item: http://sf.net/projects/spambayes/
Title: spambayes project
Description: (none)
RSS Item: http://www.mems-exchange.org/software/scgi/
Title: scgi 0.5
Description: (none)
RSS Item: http://roundup.sourceforge.net/
Title: Roundup 0.4.4
Description: (none)
RSS Item: http://www.pygame.org/
Title: Pygame 1.5.3
Description: (none)
RSS Item: http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/
Title: Pyrex 0.4.4.1
Description: (none)
RSS Item: http://www.tundraware.com/Software/hb/
Title: hb 1.88
Description: (none)
RSS Item: http://www.tundraware.com/Software/abck/
Title: abck 2.2
Description: (none)
RSS Item: http://www.terra.es/personal7/inigoserna/lfm/
Title: lfm 0.9
Description: (none)
RSS Item: http://www.tundraware.com/Software/waccess/
Title: waccess 2.0
Description: (none)
RSS Item: http://www.krause-software.de/jinsitu/
Title: JinSitu 0.3
Description: (none)
RSS Item: http://www.alobbs.com/pykyra/
Title: PyKyra 0.1.0
Description: (none)
RSS Item: http://www.havenrock.com/developer/treewidgets/index.html
Title: TreeWidgets 1.0a1
Description: (none)
RSS Item: http://civil.sf.net/
Title: Civil 0.80
Description: (none)
RSS Item: http://www.stackless.com/
Title: Stackless Python Beta
Description: (none)

Of course, you would expect somewhat different output because the news items will have changed by the time you try it. The RSS.py channel objects also provide methods for adding and modifying RSS information. You can write the result back to RSS 1.0 format using the output() method. Try this out by writing back out the information parsed in Listing 1. Kick off the script in interactive mode by running: python -i listing1.py . At the resuting Python prompt, run the following example.



>>> result = tc.output(items)
>>> print result

The result is an RSS 1.0 document printed out. You must have RSS.py, version 0.42 or more recent for this to work. There is a bug in the output() method in earlier versions.

rssparser.py
Mark Pilgrim offers another module for RSS file parsing. It doesn't provide all the features and options that RSS.py does, but it does offer a very liberal parser, which deals well with all the confusing diversity in the world of RSS. To quote from the rssparser.py page:

You see, most RSS feeds suck. Invalid characters, unescaped ampersands (Blogger feeds), invalid entities (Radio feeds), unescaped and invalid HTML (The Register's feed most days). Or just a bastardized mix of RSS 0.9x elements with RSS 1.0 elements (Movable Type feeds).

Then there are feeds, like Aaron's feed, which are too bleeding edge. He puts an excerpt in the description element but puts the full text in the content:encoded element (as CDATA). This is valid RSS 1.0, but nobody actually uses it (except Aaron), few news aggregators support it, and many parsers choke on it. Other parsers are confused by the new elements (guid) in RSS 0.94 (see Dave Winer's feed for an example). And then there's Jon Udell's feed, with the fullitem element that he just sort of made up.

It's funny to consider this in the light of the fact that XML and Web services are supposed to increase interoperability. Anyway, rssparser.py is designed to deal with all the madness.

Installing rssparser.py is also very easy. You download the Python file (see Resources), rename it from "rssparser.py.txt" to "rssparser.py", and copy it to your PYTHONPATH. I also suggest getting the optional timeoutsocket module which improves the timeout behavior of socket operations in Python, and thus can help getting RSS feeds less likely to stall the application thread in case of error.

Listing 3 is a script that is the equivalent of Listing 1, but using rssparser.py, rather than RSS.py.


  
import rssparser
#Parse the data, returns a tuple: (data for channels, data for items)
channel, items = rssparser.parse("http://www.python.org/channews.rdf")

for item in items:
    #Each item is a dictionary mapping properties to values
    print "RSS Item:", item.get('link', "(none)")
    print "Title:", item.get('title', "(none)")
    print "Description:", item.get('description', "(none)")

As you can see, the code is much simpler. The trade-off between RSS.py and rssparser.py is largely that the former has more features, and maintains more syntactic information from the RSS feed. The latter is simpler, and a more forgiving parser (the RSS.py parser only accepts well-formed XML).

The output should be the same as in Listing 2.

Conclusion
There are many Python tools for RSS, and we don't have space to cover them all. Aaron Swartz's page of RSS tools is a good place to start looking if you want to explore other modules out there. RSS is easy to work with in Python, because of all the great modules available for it. The modules hide all the chaos brought about by the history and popularity of RSS. If your XML services needs mostly involve the exchange of descriptive information for Web sites, we highly recommend using the most successful XML service technology in employment.

Next month, we will explain how to use e-mail packages for Python for writing Web services over SMTP.

Resources

Participate in the discussion forum on this article. (You can also click Discuss at the top or bottom of the article to access the forum.)
Check out the previous installments of The Python Web services developer columns.
There are several resources on RSS in IBM developerWorks.
- An introduction to RSS news feeds, by James Lewin, is older, but a good place to start. It covers RSS 0.91 and 1.0, and Perl interfaces. (developerWorks, November 2000)
- Grab headlines from a remote RDF file, by Nicholas Chase, shows some XSLT and JSP code for processing RSS 0.91 and 1.0. (developerWorks, April 2002)
XML.com also has several articles on RSS. Read RSS: Lightweight Web Syndication, by Rael Dornfest, for a good general introduction. In Building a Semantic Web Site, Eric van der Vlist provides an great technical introduction based on very practical examples. RSS Modularization, by Leigh Dodds, follows some very interesting conversation at a crucial juncture in RSS development.
Mark Nottingham is the author of RSS.py, and has a lot of other handy stuff on his home page, including an excellent RSS Tutorial for Content Publishers and Webmasters.
Mark Pilgrim is the author of rssparser.py, an "ultra liberal" RSS parser. The code is available as a text download. If you install it, I also recommend getting timeoutsocket.py.
Fredrik Lundh, the author of xmlrpclib.py and soaplib.py, is working on The EffNews Project: Building an RSS Newsreader, a python project for creating a GUI front end for reading news from RSS feeds.
Peerkat is a resource aggregator written in Python that allows people to use RSS to manage the Web content they follow.
Aaron Swartz maintains a list of RSS tools for all languages and platforms.

About the authors
Mike Olson is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management applications. Fourthought develops 4Suite, an open source platform for XML middleware. You can contact Mr. Olson at mike.olson@fourthought.com.

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management applications. Fourthought develops 4Suite, an open source platform for XML middleware. Mr. Ogbuji is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche.ogbuji@fourthought.com.

pyguru 2005-02-17 02:48 发表评论

"universal" RSS feed parser

pyguru — Wed, 16 Feb 2005 18:40:00 GMT

Feed Parser

This is a "universal" feed parser, suitable for reading syndicated feeds as produced by weblogs, news sites, wikis, and many other types of sites. It handles Atom feeds, CDF, and the nine different versions of RSS.

This project is now hosted at SourceForge. Please check there for updates. This page contains old news and is no longer updated. (2004-06-21)

pyguru 2005-02-17 02:40 发表评论

How to replace a string in multiple files?

pyguru — Tue, 15 Feb 2005 18:18:00 GMT

perl -pi -e 's/str1/str2/g' urfiles

pyguru 2005-02-16 02:18 发表评论

CVS的常用命令速查手册

pyguru — Tue, 15 Feb 2005 15:39:00 GMT

CVS的常用命令速查手册

蓝森�?http://www.lslnet.com 2002�q?�?�?11:08

�?者：(x��) 车东

chedong@bigfoot.com

最后更斎ͼ�(x��)2002-08-30 13:18:41

概述�Q?b style="color: black; background-color: rgb(255, 255, 102);">CVS是一个C/S�pȝ��Q�多个开发�h员通过一个中心版本控制系�l�来记录文�g版本�Q�从而达��C��证文件同步的目的�?

       CVS服务器（文�g版本库）
     /     |       \
�Q�版 �?�?步）
   /       |         \
开发�? 开发�?   开发�?

以下是本文主要内容：(x��)开发�h员可以主要挑�?, 6看就可以了，CVS的管理员则更需要懂的更多一�?

CVS环境初始�?/a>�Q?b style="color: black; background-color: rgb(255, 255, 102);">CVS�? 境的搭徏 ��理�?/li>
CVS的日�怋��?/a>�Q�日常开发中最常用�?b style="color: black; background-color: rgb(255, 255, 102);">CVS命��o(h��)�Q?nbsp; 开发�h�?nbsp; ��理�?/li>
CVS的分支开�?/a>�Q? ��目按照不同�q�度和目标�ƈ发进�?nbsp; ��理�?/li>
CVS的用戯��?/a>�Q? 通过SSH的远�E�用戯��证，安全�Q�简�?nbsp; ��理�?/li>
CVSWEB�Q?b style="color: black; background-color: rgb(255, 255, 102);">CVS的WEB 讉K��界面大大提高代码版本比较的效�?nbsp; ��理�?/li>
CVS TAG�Q�将$Id$加入代码注释中，方便开发过�E�的跟踪开发�h�?/li>
CVS vs VSS: CVS和Virsual SourceSafe的比�?/li>

一个系�l?0%的功能往往能够满��80%的需求，CVS也不例外�Q�以下是CVS最常用的功能，可能用到的还不到它全部命令选项�?0%�Q�更多的功能请在实际应用�q�程中体�?x��)，学�?f��n)�q�程中应该是用多��，学多��，用到了再学也不迟�?

CVS环境初始�?br> ============

环境讄��Q�指�?b style="color: black; background-color: rgb(255, 255, 102);">CVS库的路径CVSROOT
tcsh
setenv CVSROOT /path/to/cvsroot
bash
CVSROOT=/path/to/cvsroot ; export CVSROOT

后面�q�提到远�E?b style="color: black; background-color: rgb(255, 255, 102);">CVS服务器的讄��Q?br> CVSROOT=:ext:$USER@test.server.address#port:/path/to/cvsroot CVS_RSH=ssh; export CVSROOT CVS_RSH

初始化：(x��)CVS版本库的初始化�?br> cvs init

一个项目的首次导入
cvs import -m "write some comments here" project_name vendor_tag release_tag
执行后：(x��)�?x��)将所有源文�g�?qi��ng)目录导入�?path/to/cvsroot/project_name目录�?br> vender_tag: 开发商标记
release_tag: 版本发布标记

��目导出�Q�将代码�?b style="color: black; background-color: rgb(255, 255, 102);">CVS库里导出
cvs checkout project_name
cvs ��创建project_name目录�Q��ƈ��最新版本的源代码导出到相应目录中。这个checkout和Virvual SourceSafe中的check out不是一个概念，相对于Virvual SourceSafe的check out�?b style="color: black; background-color: rgb(255, 255, 102);">cvs update�Q?check in�?b style="color: black; background-color: rgb(255, 255, 102);">cvs commit�?/i>

CVS的日�怋��?/b>
=============

注意�Q�第一�ơ导��Z��后，��׃��是通过cvs checkout来同步文件了�Q�而是要进入刚�?b style="color: black; background-color: rgb(255, 255, 102);">cvs checkout project_name导出的project_name目录下进行具体文件的版本同步�Q�添加，修改�Q�删除）操作�?/b>

��文件同步到最新的版本�Q?br> cvs update
不制定文件名�Q?b style="color: black; background-color: rgb(255, 255, 102);">cvs��同步所有子目录下的文�g�Q�也可以制定某个文�g�?目录�q�行同步
cvs update file_name
最好每天开始工作前或将自己的工作导入到CVS库里前都要做一�ơ，�q�养成“先同步后修改”的�?f��n)惯�Q�和Virvual SourceSafe不同�Q?b style="color: black; background-color: rgb(255, 255, 102);">CVS里没有文仉��定的概念�Q�所有的冲突是在commit之前解决�Q�如果你修改�q�程中，有其他�h修改�q�commit��C��CVS库中�Q?b style="color: black; background-color: rgb(255, 255, 102);">CVS�?x��)通知你文件冲�H�，�q�自动将冲突部分�?br> >>>>>>
content on cvs server
<<<<<<
content in your file
>>>>>>
标记出来�Q�由你确认冲�H�内容的取舍�?br> 版本冲突一般是在多个�h修改一个文仉��成的，但这�U�项目管理上的问题不应该指望�?b style="color: black; background-color: rgb(255, 255, 102);">CVS来解冟�?/i>

��认修改写入�?b style="color: black; background-color: rgb(255, 255, 102);">CVS库里�Q?br> cvs commit -m "write some comments here" file_name

注意�Q?b style="color: black; background-color: rgb(255, 255, 102);">CVS的很多动作都是通过cvs commit�q�行最后确认�ƈ修改的，最好每�ơ只修改一个文件。在��认的前�Q�还需要用户填写修�Ҏ(gu��)��释，以帮助其他开发�h员了解修改的原因。如果不用写-m "comments"而直接确认`cvs commit file_name` 的话�Q?b style="color: black; background-color: rgb(255, 255, 102);">cvs�?x��)自动调用系�l�缺省的文字�~�辑�?一般是vi)要求你写入注释�?br> 注释的质量很重要�Q�所以不仅必��要写，而且必须写一些比较有意义的内容：(x��)以方便其他开发�h员能够很好的理解
不好的注释，很难让其他的开发�h员快速的理解�Q�比如：(x��) -m "bug fixed" 甚至 -m ""
好的注释�Q�甚臛_��以用中文: -m "在用��h��册过�E�中加入了Email地址校验"

修改某个版本注释�Q�每�ơ只��认一个文件到CVS库里是一个很好的�?f��n)惯�Q�但隑օ�有时候忘了指定文件名�Q�把多个文�g以同��h��释commit�?b style="color: black; background-color: rgb(255, 255, 102);">CVS库里了，以下命��o(h��)可以允许你修�Ҏ(gu��)��个文件某个版本的注释�Q?br> cvs admin -m 1.3:"write some comments here" file_name

��d��文�g
创徏好新文�g后，比如�Q�touch new_file
cvs add new_file
注意�Q�对于图片，W(xu��)ord文档�{�非�U�文本的��目�Q�需要��?b style="color: black; background-color: rgb(255, 255, 102);">cvs add -b选项�Q�否则有可能出现文�g被破坏的情况
比如�Q?b style="color: black; background-color: rgb(255, 255, 102);">cvs add -kb new_file.gif
然后��认修改�q�注�?
cvs ci -m "write some comments here"

删除文�g�Q?br> ��某个源文�g物理删除后，比如�Q�rm file_name
cvs rm file_name
然后��认修改�q�注�?br> cvs ci -m "write some comments here"
以上面前2步合�q�的�Ҏ(gu��)��为：(x��)
cvs rm -f file_name
cvs ci -m "why delete file"

注意�Q�很�?b style="color: black; background-color: rgb(255, 255, 102);">cvs命��o(h��)都有�~�写形式�Q�commit=>ci; update=>up; checkout=>co; remove=>rm;

��d��目录�Q?br> cvs add dir_name

查看修改历史�Q?b style="color: black; background-color: rgb(255, 255, 102);">cvs log file_name
cvs history file_name

查看当前文�g不同版本的区�?br> cvs diff -r1.3 -r1.5 file_name
查看当前文�g�Q�可能已�l�修改了�Q�和库中相应文�g的区�?br> cvs diff file_name
cvs的web界面提供了更方便的定位文件修改和比较版本区别的方法，具体安装讄��L(f��ng)��后面的cvsweb使用

正确的通过CVS恢复旧版本的�Ҏ(gu��)��Q?br> 如果�?b style="color: black; background-color: rgb(255, 255, 102);">cvs update -r1.2 file.name
�q�个命��o(h��)是给file.name加一个STICK TAG�Q?"1.2" �Q�虽然你的本意只是想��它恢复�?.2版本
正确的恢复版本的�Ҏ(gu��)��是：(x��)cvs update -p -r1.2 file_name >file_name
如果不小心已�l�加成STICK TAG的话�Q�用cvs update -A 解决

�U�d��文�g�Q�文仉��命名
cvs里没�?b style="color: black; background-color: rgb(255, 255, 102);">cvs move�?b style="color: black; background-color: rgb(255, 255, 102);">cvs rename�Q�因��两个操作是先cvs remove old_file_name�Q�然�?b style="color: black; background-color: rgb(255, 255, 102);">cvs add new_file_name实现的�?/p>
删除�Q�移动目录：(x��)
最方便的方法是让管理员直接�U�d��Q�删除CVSROOT里相应目录（因�ؓ(f��)CVS一个项目下的子目录都是独立的，�U�d��?CVSROOT目录下都可以作�ؓ(f��)新的独立��目�Q�好比一颗树(w��i)�Q�其实砍下�Q意一枝都能独立存?g��u)z�）�Q�对目录�q�行了修改后�Q�要求其开发�h员重新导出项�?b style="color: black; background-color: rgb(255, 255, 102);">cvs checkout project_name 或者用cvs update -dP同步�?/p>
CVS Branch�Q�项目多分支同步开�?br> =============================

��认版本里程��：(x��)多个文�g各自版本号不一��P��目��C��定阶�D�，可以�l�所有文件统一指定一个阶�D�里�E�碑版本��P��方便以后按照�q�个阶段里程��版本号导出��目�Q�同时也是项目的多个分支开发的基础�?br> cvs tag release_1_0

开始一个新的里�E�碑�Q?br> cvs commit -r 2 标记所有文件开始进�?.x的开�?/p>
注意�Q?b style="color: black; background-color: rgb(255, 255, 102);">CVS里的revsion和��Y件包的发布版本可以没有直接的关系。但所有文件��用和发布版本一致的版本��h��较有助于�l�护�?/i>

在开发项目的2.x版本的时候发�?.x有问题，�?.x又不敢用�Q�则从先前标记的里程��：(x��)release_1_0导出一个分支release_1_0_patch
cvs rtag -b -r release_1_0 release_1_0_patch proj_dir

一些�h先在另外一个目录下导出release_1_0_patch�q�个分支�Q�解�?.0中的紧急问题，
cvs checkout -r release_1_0_patch
而其他�h员仍旧在��目的主�q�分�?.x上开�?/p>
在release_1_0_patch上修正错误后�Q�标��C��?.0的错误修正版本号
cvs tag release_1_0_patch_1

如果2.0认�ؓ(f��)�q�些错误修改�?.0里也需要，也可以在2.0的开发目录下合�ƈrelease_1_0_patch_1中的修改到当前代码中�Q?br> cvs update -j release_1_0_patch_1

CVS的远�E�认证：(x��)通过SSH�q�程讉K��CVS
================================

使用cvs本��n的远�E�认证很�ȝ��,需要定义服务器和用��L(f��ng)��Q�用户名�Q�设�|�密码等�Q�而且不安全，因此和系�l�本地帐可��证�ƈ通过SSH传输是比较好的办法，通过在客��h��?etc/profile里设�|�一下内容：(x��)
CVSROOT=:ext:$USER@test.server.address#port:/path/to/cvsroot CVS_RSH=ssh; export CVSROOT CVS_RSH
所有客��h��所有本地用户都可以映射�?b style="color: black; background-color: rgb(255, 255, 102);">CVS服务器相应同名帐号了�?br>
如果CVS所在服务器的SSH端口不在�~�省�?2�Q�或者和客户端与CVS服务器端SSH�~�省端口不一��_(d��)��有时候设�|�了�Q?br> :ext:$USER@test.server.address#port:/path/to/cvsroot

仍然不行�Q�比如有以下错误信息�Q?br> ssh: test.server.address#port: Name or service not known
cvs [checkout aborted]: end of file from server (consult above messages if any)

解决的方法是做一个脚本指定端口�{向（不能使用alias�Q�会(x��)出找不到文�g错误�Q�：(x��)
创徏一�?usr/bin/ssh_cvs文�g�Q?br> #!/usr/bin/sh
/path/to/ssh -p 34567 "$@"
然后�Q�chmod +x /usr/bin/ssh_cvs
�q�CVS_RSH=ssh_cvs; export CVS_RSH

注意�Q�port是指相应服务器SSH的端口，不是cvs pserver的端�?br>
CVSWEB�Q�提高程序员比较文�g修改效率
================================

CVSWEB��是CVS的WEB界面�Q�可以大大提高程序员定位修改的效�?
使用的样例可以看�Q?a >http://www.freebsd.org/cgi/cvsweb.cgi

CVSWEB的下载：(x��)CVSWEB从最初的版本已经演化出很多功能界面更丰富的版本，�q�个是个人感觉觉得安装设�|�比较方便的�Q?br> http://www.spaghetti-code.de/software/linux/cvsweb/

下蝲解包�Q?br> tar zxf cvsweb.tgz
把配�|�文件cvsweb.conf攑ֈ�安全的地方（比如和apache的配�|�放在同一个目录下�Q�，
修改�Q�cvsweb.cgi让CGI扑ֈ�配置文�g�Q?br> $config = $ENV{'CVSWEB_CONFIG'} || '/path/to/apache/conf/cvsweb.conf';

转到/path/to/apache/conf下�ƈ修改cvsweb.conf�Q?/p>

修改CVSROOT路径讄��Q?br> %CVSROOT = (
'Development' => '/path/to/cvsroot', #<==修改指向本地的CVSROOT
);
�~�省不显�C�已�l�删除的文档�Q?br> "hideattic" => "1",#<==�~�省不显�C�已�l�删除的文档
在配�|�文件cvsweb.conf中还可以定制��头的描�q�C��息，你可以修�?long_intro成你需要的文字

CVSWEB可不能随便开攄��所有用��P��因此需要��用WEB用户认证�Q?br> 先生�?passwd:
/path/to/apache/bin/htpasswd -c cvsweb.passwd user

修改httpd.conf: 增加

AuthName "CVS Authorization"
AuthType Basic
AuthUserFile /path/to/cvsweb.passwd
require valid-user

CVS TAGS: who? when?
====================

��?Id$ 加在�E�序文�g开头的注释里是一个很好的�?f��n)惯�Q?b style="color: black; background-color: rgb(255, 255, 102);">cvs能够自动解释更新其中的内�Ҏ(gu��)��Q�file_name version time user_name 的格式，比如�Q�cvs_card.txt,v 1.1 2002/04/05 04:24:12 chedong Exp�Q�可以这些信息了解文件的最后修改�h和修�Ҏ(gu��)��?br>
几个常用的缺省文�Ӟ��(x��)
default.php
/*
* Copyright (c) 2002 Company Name.
* $Header$
*/

?>

====================================
Default.java: 注意文�g头一般注释用 /* 开�?JAVADOC注释�?/** 开始的区别
/*
* Copyright (c) 2002 Company Name.
* $Header$
*/

package com.netease;

import java.io;

/**
* comments here
*/
public class Default {
    /**
    *
    * @param
    * @return
    */
    public toString() {

    }
}

====================================
default.pl:
#!/usr/bin/perl -w
# Copyright (c) 2002 Company Name.
# $Header$

# file comments here

use strict;

CVS vs VSS　
===========

CVS没有文�g锁定模式�Q�VSS在check out同时�Q�同时记录了文�g被导��锁定�?

CVS是update commit�Q?VSS是check out check in

�?b style="color: black; background-color: rgb(255, 255, 102);">CVS中，标记自动更新功能�~�省是打开的，�q�样也带来一个潜在的问题�Q�就是不�?kb方式��d��binary文�g的话�?b style="color: black; background-color: rgb(255, 255, 102);">cvs自动更新时可能会(x��)��D��文�g失效�?

Virsual SourceSafe中这个功能称之�ؓ(f��)Keyword Explaination�Q�缺省是关闭的，需要通过OPITION打开�Q��ƈ指定需要进行源文�g关键词扫描的�c�d��Q?.txt,*.java,*.html...

对于Virsual SourceSafe�?b style="color: black; background-color: rgb(255, 255, 102);">CVS都通用的TAG有：(x��)
$Header$
$Author$
$Date$
$Revision$

��量使用通用的关键词保证代码�?b style="color: black; background-color: rgb(255, 255, 102);">CVS和VSS都能方便的跟�t��?

　

相关资源�Q?/p>
CVS HOME�Q?br> http://www.cvshome.org

CVS FAQ�Q?br> http://www.loria.fr/~molli/cvs-index.html

相关�|�站:
http://directory.google.com/Top/Computers/Software/Configuration_Management/Tools/Concurrent_Versions_System/

CVS 免费�?
http://cvsbook.red-bean.com/

CVS 命��o(h��)的速查卡片�Q?br> http://www.refcards.com/about/cvs.html

摘自�Q?a target="_blank">http://www.chedong.com/tech/cvs_card.html

pyguru 2005-02-15 23:39 发表评论

WEB/APPLICATION/DATABASE服务器硬件DB配置

pyguru — Mon, 14 Feb 2005 18:09:00 GMT
: ��x��一个WEB/APPLICATION/DATABASE服务器。准备用LINUX。预计用户量大概�?000�?nbsp;
�?nbsp;
: 时在�U�吧(CONCURRENT TRANSACTION能到200��p��)。我只用�q�REDHAT LINUX做一般开�?nbsp;
�?nbsp;
: 的��^収ͼ�没有用它当过大用户量的服务器。机器准备自��p��Q? PROCESSOR
: 2.8GXEON�Q?G-4G的MEMORY�Q?20G - 300G的硬�?SATA或者SCSI)�Q�问题是�Q?nbsp;
: 1、这个硬仉��|�行不行�Q?nbsp;
: 2、用什么LINUX好？REDHAT、FREEBSD、SUSE、其它的�Q?nbsp;
:
3、用什么DB好，PREGRESQL�q�是MYSQL�Q�MYSQL现在也支持TRANSACTION了，但POSTGRESQL
: 好象�q�有很多跟ORACLE很接�q�的功能�Q�但从来没用�q�这个DB�?nbsp;
: 4、APPLICATION SERVER准备用TOMCAT5.0 + JDK1.5�Q�以前知道TOMCAT不能支持大用�?nbsp;
�?nbsp;
: �Q�不知道现在�q�是不是�?nbsp;
: 5、还有什么徏议？
:
: 多谢�?nbsp;
:

主要取决于这些transaction的复杂程�?一般来说应该还可以.但如果有很多
varchar,blob之类的数据，��比较玄�?nbsp;

至于OS�Q�推荐商业Linux�Q�我们用RHAS比较多。SuSE也不错。考虑到要�?nbsp;
Java�{�，不要用FreeBSD。商业Linux的好处是你不用太费心��d��心��Y件升�U�和�l�护�?nbsp;

DB之类�Q�能用商业Oracle或DB2�Q�性能要好得多。但如果省钱�Q�徏议还�?nbsp;
MySQL�Q�但要好好tune�Q��ƈ且在Business Logic设计是，��量减少和DB之间
的交互。MySQL的缺点还有，不支持Store Procedure。但你可把那些Business
Logic攑ֈ�数据库外�?nbsp;

Application Server可能是最大的问题。Tomcat基本上是个轻型的Web/Servlet
Server, 大用户量,�׃��~�Z��一些支�?�?x��)比较困�?另外,你有大量Transactions,
Tomcat本��n没有Persistent的支持，你如果想在这一层上实现transaction�Q?nbsp;
恐怕得装其他container�Q�如EJB�Q�或者Hibernate之类。在�q�一层上cache的数�?nbsp;
��多�Q�对MySQL的以来就��少。有些量不大的系�l�数据，可以通过一些技�?nbsp;
事先load到这一层，那么和数据库的交互就��得多�?nbsp;

J2SE 5.0据说性能有提高，但我以�ؓ(f��)用它太冒�q�。不够Stable。如果没有transaction�Q?nbsp;
倒不是问题。另外，只有Tomcat 5.5以后的版本才能运行在J2SE 5.0上�?nbsp;
做服务器�Q�BEA的JRockit VM不错�?

如果是普通的服务�Q�同时在�U��h数最多也��׃��两百个�h�Q�配�|�稍微好一点的pc��p��行�?nbsp;
人数如果多，最关键是内存一定要大，��大��好�?br>

pyguru 2005-02-15 02:09 发表评论

Virtual hosts in Apache

pyguru — Sun, 13 Feb 2005 18:56:00 GMT

Virtual hosts in Apache
in Vhosts.conf, the following is the real setting. The docroot may not allow index, so you need to put index.html to test the virtual host

===============================================

# Listen for virtual host requests on all IP addresses
NameVirtualHost *:80

DocumentRoot /var/www/html/web
ServerName web.mydomain.com

# Other directives here
DocumentRoot /var/www/html/news
ServerName news.mydomain.com

# Other directives here

DocumentRoot /var/www/html/photo
ServerName photo.mydomain.com

# Other directives here

pyguru 2005-02-14 02:56 发表评论

map linux network drive in Windows

pyguru — Sun, 13 Feb 2005 18:41:00 GMT

首先创徏一个本地Unix账号�Q?

　　useradd -r myaccount

　　�q�条命��o(h��)创徏了一个名为myaccount的普通Unix用户�?

　　然后�Ҏ(gu��)��它创��Z��个Samba用户�Q?

　　smbadduser myaccount:mysmbact

　　或者是�Q?

　　smbpasswd -a myaccount

The password in Samba is not related to the unix account password.

注意�Q�一旦你更新�?b style="color: black; background-color: rgb(255, 255, 102);">samba配置文�g�Q�你必须要通过使用/etc/init.d/samba restart (debian)来重起你�?b style="color: black; background-color: rgb(255, 255, 102);">samba
Then in windows, use the username and samba's password to map network drive.

pyguru 2005-02-14 02:41 发表评论

The Apache Web Server

pyguru — Sun, 13 Feb 2005 18:35:00 GMT
     摘要: The Apache Web Server = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = In This Chapter Chapter 20 The Apache Web Server Downlo...  阅读全文

pyguru 2005-02-14 02:35 发表评论