d和操U?XML 文g的标准方法是 DOMQ“文档对象模型”)。遗憄是,q种Ҏ需要读取整个文件ƈ它存储到树l构中,因而效率不高、缓慢,q且会过度用资源?/FONT>
什么是 SAX
d和操U?XML 文g的标准方法是 DOMQ“文档对象模型”)。遗憄是,q种Ҏ需要读取整个文件ƈ它存储到树l构中,因而效率不高、缓慢,q且会过度用资源?/FONT>
一U替代方法是使用 Simple API for XML ?SAX。SAX 允许正在d文档时处理该文档Q这避免了在采取操作之前需要等待存储文档的所有内宏V?/FONT>
SAX 是由 XML-DEV 邮g列表的成员开发的QJava 版本?David Megginson l护。他们的目的是提供一U更自然的方法来使用 XMLQ这U方法不会涉及到使用 DOM 的那U开销?/FONT>
l果是基于事件的 API。解析器事Ӟ譬如Q元素的开始或l束Q发送给处理信息的事件处理程序。然后,应用E序自己可以处理数据。虽然原始文档保持不变,?SAX 提供了操U|据的ҎQ然后会该Ҏ导向另一个过E或文档?/FONT>
对于 SAXQ没有官方的标准Q万l网QW3CQ或其它官方l织不维?SAXQ但?XML C中,它是一个事实上的标准?/FONT>
SAX 处理是如何工作的
SAX 分析l过其的 XML ,q非常象老式的自动收报机U条。考虑以下 XML 代码片断Q?/FONT>
|
一般情况下QSAX 处理器分析这D代码将生成以下事gQ?/FONT>
|
SAX API 允许开发者捕莯些事Ӟq对它们q行操作?/FONT>
SAX 处理涉及以下几步Q?/FONT>
在SAX与DOM之间Q如何选择
选择 DOM q是 SAXQ这取决于几个因素:
C SAX ?DOM 不是互斥的,q一点很重要。可以?DOM 来创Z件的 SAX ,可以使用 SAX 来创?DOM 树。事实上Q大多数解析器实际常怋?SAX 来创?DOM 树!
使用 JAXP 来创析器
接下来我们来看一下JAXP的SAX Parser是怎么样工作的?/FONT>
首先声明 XMLReader xmlReader。然后?SAXParserFactory 来创?SAXParser。正?SAXParser l您?XMLReader?/FONT>
|
讄内容处理E序
一旦创Z解析器,则需要将 SurveyReader 讄为内容处理程序,以便于其接收事g?/FONT>
xmlReader ?setContentHandler() Ҏ完成q项工作?/FONT>
|
当然Q对于内容处理程序,q不是唯一的选项?/FONT>
解析 InputSource
ZҎ件进行实际地解析Q需?InputSource。这?SAX cd装了所有将要处理的数据Q所以不必担心它来自哪里?/FONT>
现在Q准备对文gq行实际解析。应用程序将装?InputSource 中的文g传递给 parse()Q然后应用程序会l箋q行?/FONT>
|
可以~译和运行该E序Q但q时应该什么也没有发生Q因为应用程序还没有定义M事g?/FONT>
创徏 ErrorHandler
当然M有可能在试图q行解析Ӟ数据有问题。在q样的情况下Q有一个处理程序来处理错误和内容将是有帮助的?/FONT>
如同创建内容处理程序一P可以创徏出错处理E序。通常Q将作ؓ ErrorHandler 的单独实例来创徏它,但ؓ了简化该CZQ出错处理正是包含在 SurveyResults 中。由于该cȝ承了 DefaultHandler 且没有扩?ContentHandlerQ所以这U双重用法是可能的?/FONT>
需要关注的事g有三个:警告、错误和致命错误?/FONT>
|
SAX 事g
以下事g是常用的Q它们都?org.xml.sax 包的 HandlerBase cM定义?/FONT>
q有更多?SAX 事gQ?/FONT>
转自Q?A >http://www-128.ibm.com/developerworks/cn/xml/x-cert/part8/
This tutorial is a simple introduction to DOM. There is a more in-depth and technical coverage of the DOM API in the "Complete DOM" tutorial.
As a Java programmer, once you have some XML documents, you will probably need to access the information contained in them using your Java code. Since the XML document is just a text file, you can write your own text file reader which interprets the information in the XML document in a way that your code can use. It is a very time consuming process to write such an XML document reader, and if you think about it, such code would have to be written over and over again for different programs that needed access to the information in XML documents. The W3C realized that this is the case and they created a standard way to create these "XML document readers" or XML parsers. The good news is that you don't have to write these parsers, free parsers are provided by many companies out there, including Sun, Datachannel and IBM. These parsers are written in Java too, you must use XML parsers written in Java in your Java programs.
A DOM XML parser is a Java program that converts your XML documents into some Java object model. Once you have parsed an XML document, it exists in the memory of you Java Virutal Machine as a bunch of objects. When you need to access or modify information stored in the XML document, you dont have to manipulate the XML document file directly, instead you must access and modify the information through these objects in memory. So the DOM XML parser creates a Java document object representaion of your XML document file.
The parser also performs some simple text processing as it creates an object model of your XML document; it expands all entities used in the file and it compares the structure of the information in the XML document to a DTD (if one is used). Once the simple processing is successful, the parser creates a document object representation of your XML document file. In order to access and modify the information in this document object, you need a reference object (of some class type) in order to call methods on the document object. The W3C has also created a set of Java interfaces called the Document Object Model API, which allow you to access and modify information in a document object created by the XML parser. So instead of accessing this document object with a reference of its implementation class type, the W3C expects you to access the document object with the standard DOM Java interfaces. The document class must implement these DOM interfaces.
DOM is just a set of Java interfaces which have been defined by the W3C, however, no implementations of these interfaces are provided. The XML parser writers must provide the implementations for the DOM interfaces themselves.
Since XML parsers may be written for any platform and language, the W3C has not provided an implementation for the DOM interfaces, and it leaves it up to the XML parser impementor to provide a good implementation. By doing this, another advantage is that your XML documents and Java programs (that you write) are not dependent on any particular parser.
In order for your Java programs to access the information in XML documents, the parser must read the XML files from the first layer. The parser processes the file by checking the information contained in it for validity (by using a DTD if one is used) and expanding all the entities used in the file. Then this (processed) XML document is converted into an XML document object in memory by the XML parser. This document object contains a tree of nodes that contain the data and structure of the information contained in the XML document. This tree of nodes can be accessed and modified using the DOM API.
DOM is similar to the Swing component models, like TableModel, ListModel and TreeModel. These models are simply interfaces which must be implemented by classes that contain the actual data. For example, if you wish to display information in a 2D array to a JTable, you must create a class that contains the 2D array and then have this class implement the TableModel interface. So the TableModel does not contain the data, it merely allows the JTable to access the underlying 2D data; the data exists in an instance (object) of the class that implements the model. The DOM does not contain the data, but merely allows you to access your underlying XML data from the Java programs that you write. The document object contains the data, the class (of which this document object is an instance of) also implements the appropriate DOM interfaces. The parser has the responsibility of providing the implementation classes for the interfaces and also a way to instantiate them.
So the parser creates some objects, which are instances of some classes that implement certain interfaces defined in the DOM API. These objects are instantiated when the Parser reads an XML document. Now, instead of accessing these objects directly, you rely on using some Java interfaces (defined in the DOM API by the W3C) to access and manipulate these objects. These objects effectively contain the information that is in your XML document, but they only allow you to access and modify this information using the Java interfaces defined in the DOM API. So the Document Object Model is very similar to the Swing models (like TreeModel, ListModel and TableModel) which is just a set of interfaces without the implementation. This entire process is illustrated in Figure 1.
The good news about DOM and XML Parsers is that you don't have to implement the DOM API or write the parser. There are many companies that provide XML Parsers for different languages and they all implement the DOM interfaces.
The DOM API allows hierarchical access to the information in an XML document. An XML Parser converts the information in an XML document into a tree of nodes after parsing the document. DOM allows you to access this tree of nodes using the Java interfaces defined in the DOM API. The entire XML document, no matter how simple or how complicated, is converted into a tree of nodes, and all the nodes start from one root node, which is called a document object, hence the name Document Object Model. Once a document object tree has been created (by the XML parser, or your own code), you can access elements in that tree and you can also modify, delete and create leaves and branches by using the interfaces in the API. Figure 2 shows how information in an XML document can be represented as a tree of nodes.
In the document object tree, everything is a node. A node may have other nodes inside it and the node itself can hold information, like its tag-name and value and its child nodes (if any). This hierarchical organization of information is similar to a file system, where files and folders are organized hierarchically, a folder may have files in it or other folders, and everything is descended from one root folder. Similarly, a document object is descended from one node, and it may have other nodes inside it. The document object itself is a node.
The DOM API defines a minimal set of (language and platform independent) interfaces for accessing and manipulating the content and structure of information stored in XML documents, in a hierarchical manner.
The interfaces for DOM are required (by the W3C) to exist in the org.w3c.dom package. You have to use the following include statement in your programs that use DOM: import org.w3c.dom.*; . The implementation for the Java interfaces in these packages are provided by the XML parser that you choose to use. The DOM interfaces are also shipped with the XML parser implementation that you use, and you have to put the implementation class files (or jar file) in your CLASSPATH.
However, the code to instantiate DOM objects is dependent on the parser that you are using. Remember, DOM is just a set of interfaces, and interfaces can't be instantiated in Java; only the classes that implement these interfaces may be instantiated. With the parsers used in this book, it only takes one line of XML parser specific code to instantiate a DOM object (using some specific XML parser). Once the XML parser specific object has been instantiated (that implements the appropriate DOM interfaces), you should access these objects only through the DOM interfaces. So, except for the one line of XML parser specific code, all your other code can be completely standards based and portable. If you use the factory pattern to instantiate the objects that implement DOM interfaces, then your classes will be completely portable and won't even have that one line of parser specific code embedded in them.
An XML document object that is created by the parser (after reading a XML file (contains a tree of Node objects (i.e. instances of some Node interface implementation class). In DOM, everything is a Node. The other interfaces are provided to make things more object oriented, but you can manipulate all your information in DOM by just using the Node interface. Figure 3 shows the inheritance relationships between some of the important interfaces.
The root Node object of the document tree is also a Document object; Document is a subclass of Node. Every DOM object must have a root. Another important interface is the Element interface (which is a subclass of Node); the Element interface can be used to access the elements in a DOM Document object tree.
The Node interface encapsulates access to a lot of information about a node, some of this information is listed in Table 1.
Figure 4 illustrates a sample XML Document object tree and describes all the Node objects that are contained in it.
You can find out if a Node has children or not by using the hasChildNodes() method. Node is a simple interface and you have to make sure that you follow the rules in Table 2 when using it.
If a Node has children, then it may or may not have a value. In Figure 4, the person element has children Nodes and a value. The email and name Nodes don't have any children and have values. If a Node has no children, it might also contain no value (like an element with an EMPTY data storage declaration).
The getNodeType() method returns the type of a Node; the type is just a constant int that is used to identify different types of Nodes, for example the Node.ELEMENT_NODE type identifies a Node to be an Element. You can use the getNodeValue() method to get the textual data stored inside a Node.
The Node interface has methods that allow the traversal of a Node tree. The getChildNodes() method is useful for getting all the elements inside a Node. This method returns all Nodes (if they exist) in an object that is a container for Node objects; this object implements the NodeList interface. NodeList is an iterator for a list of Nodes. Figure 5 illustrates how NodeList can be used.
In the "Complete DOM" tutorial, I will cover the DOM API in more depth and explain what all the interfaces do and have plenty of code examples. This tutorial is a simple introduction to the idea of DOM.
转自Q?A >http://www.developerlife.com/domintro/default.htm