摘要
The word “ontology” has gained a good popularity within the AI community. Ontology is usually viewed as a high-level description con- sisting of concepts that organize the upper parts of the knowledge base. However, meaning of the term “ontology” tends to be a bit vague, as the term is used in di?erent ways. In this paper we will attempt to clarify the meaning of the ontology including the philosophical views and show why ontologies are useful and important. We will give an overview of ontology structures in several particular systems. A field proposed within ontological e?orts, “ontological engi- neering”, will be also described. Usage of ontologies in several particular ways will be discussed. These include systems and ideas to support knowledge base sharing and reuse, both for computers and humans, ontology based communication in multi- agent systems, applications of ontologies for natural language processing, applications in documents search and enrichment of knowledge bases, both particularly for the World Wide Web environment and construction of educational systems, particularly intelligent tutoring systems.
本體(ontology)一詞在人工智能界已經有相當的知名度了。本體通常被認為是由概念所組成的高級描述,概念則是用來對知識庫進行組織的上層部分。然而,當“ontology”這個術語在不同的場合以不同方式加以應用時,其含義往往是有點兒含糊不清的。本文將力圖闡明本體的含義,包括哲學觀點上的含義,并指明為什么本體是很有用的,也是很重要的。我們將給出幾個特殊系統中的本體結構總體情況,并對“本體工程”這一最新被提議的研究領域加以闡述。本文還將討論本體的幾種特定的用法,包括支持人與計算機的知識庫共享與重用的系統和想法、多主體系統中基于本體的通信、本體在自然語言中的應用、本體在文本搜索和知識庫濃縮中的應用,同時包括在互聯網環境和教育系統中,特別是智能輔導系統。
1 Introduction
The word “ontology” has gained a good popularity within the AI community. Ontology is usually viewed as a high-level description consisting of concepts that organize the upper parts of the knowledge base. However, meaning of the term “ontology” tends to be a bit vague, as the term is used in different ways. In this paper we will attempt to clarify the meaning of the ontology and show why ontologies are useful and important. We will discuss usage of ontologies in several particular ways, such as knowledge base reuse, knowledge sharing, communication in multi-agent systems, applications of ontologies for WWW applications, for natural language processing, and for intelligent tutoring systems.
1 簡介
“本體”這個詞在AI領域中廣泛流傳。本體經常被視作一個高層次的描述方法,這個描述方法由一些概念組成,而這些概念被認為組成了知識庫的上層結構。但是,由于它被用在許多不同的地方,“本體”一詞的意思似乎很容易被混淆。在這份文件中,我們將嘗試弄清本體的真正意思,并且展示產生本體重要意義和實用性的原因。我們將用不同的方面討論本體的用處,例如知識庫的復用,知識庫的共享,多代理系統內部的通訊,用作網絡應用的本體應用程序,用作自然語言處理的本體應用程序以及用作智能輔助系統內的本體應用程序。
1.1 動機 在AI研究歷史中,定義了兩種研究類型[31,8]:面向形式的研究(機制理論)及面向內容的研究(內容理論)。前者處理邏輯與知識表達,而后者處理知識的內容。顯然前者時至今日是AI的勘察范圍,然而在最近,面向內容的研究已逐漸引起更多的關注,因為許多現實世界的問題的解決如知識的重用、agent通訊的簡化、通過理解集成媒體、大規模的知識基等等,不僅需要先進的理論或推理方法而且還需要對知識內容進行復雜的處理。 Formal theories such as predicate logic provides us with a powerful tool to guarantee sound reasoning and thinking. It even enables us to discuss the limits of our reasoning in a principled way. However, it cannot answer to any of the questions such as what knowledge we should have for solving given problems, what is knowledge at all, what properties a specific knowledge has, and so on. Sometimes, the AI community gets excited by some mechanisms such as neural nets, fuzzy logic, genetic algorithms, constraint propagation etc. These mechanisms are proposed as the “secret” of making intelligent machines. At other times, it is realized that, however wonderful the mechanism, it cannot do much without a good content theory of the domain on which it is to work. Moreover, we often recognize that once a good content theory is available, many di?erent mechanisms might be used equally well to implement e?ective systems, all using essentially the same content.
Importance of content-oriented research is being recognized more and more nowadays. Unfortunately it seems that there are no widely recognized sophisticated methodologies for content-oriented research now. Major results till later years were only development of knowl- edge bases. 以前的理論比如謂詞邏輯學提供了一種合理的推理和思考的工具。它甚至使我們可以在一定原則下來探討推理的局限性。然而,這一理論卻不能回答諸如“解決特定問題需要什么知識”,“究竟什么是知識”,“一種特定知識具備怎樣的特征”等等的問題。有時,人工智能領域因為一些理論機制而變得沸沸揚揚,比如神經網絡,模糊學,基因運算規則以及選擇性繁殖等。這些理論被認為是開發人工智能的“秘密”所在。而又有些時候,我們意識到不管這些機制多么令人贊嘆,如果在其作用領域內沒有一個完善的內容理論,它將難以發揮巨大作用。更進一步,我們常常發現一旦建立了完備的內容理論,許多不同的理論機制都能良好的實現有效的系統,而這些系統本質上都應用同樣的內容。現在,面向內容的研究的重要性已日益為我們所重視。遺憾的是目前還沒有形成面向內容的被廣泛認同的精確的方法論,近年來最大的成果也只是知識庫的開發。
The reasons for this can be [31]:
? content-oriented research tends to be ad hoc ? there is no methodology that enables to accumulate research results It is necessary to overcome these di?culties in the content-oriented research. Ontologies are proposed for that purpose. Ontology engineering, as proposed in e.g. [31], is a research methodology which gives us design rationale of a knowledge base, kernel conceptualization of the world of interest, strict definition of basic meanings of basic concepts together with sophis- ticated theories and technologies enabling accumulation of knowledge which is dispensable for modeling the real world. Interest in ontologies has also grown as researchers and system developers have become more interested in reusing or sharing knowledge across systems. Currently, one key imped- iment to sharing knowledge is that di?erent systems use di?erent concepts and terms for describing domains. These di?erences make it di?cult to take knowledge out of one system and use it in another. If we could develop ontologies that could be used as the basis for multi- ple systems, they would share a common terminology that would facilitate sharing and reuse. Developing such reusable ontologies is an important goal of ontology research. Similarly, if we could develop tools that would support merging ontologies and translating between them, sharing would be possible even between systems based on di?erent ontologies. 出現這種情況的原因或許有如下幾點:【31】
1.面向內容的研究更趨于專業化
2.對于研究結果的聚集尚無一定的方法論
內容研究必須克服這些難點,而本體就是基于這個目的提出的。本體設計,就像【31】所要求的,是一種內容研究的方法論,它提供了知識庫設計的基本原理,專業領域的核心概念,對基本概念含義的嚴格定義,以及模擬現實世界所必不可少的知識聚集的復雜理論和技術。
隨著研究人員和系統開發者對系統內的知識重用和共享越發感興趣,對本體論的興趣也日益增長。目前,阻礙知識共享的一個關鍵問題是不同系統使用不同的概念和術語來描述其領域。這種不同使得將一個系統的知識用于其他系統變得十分復雜。如果可以開發一些能夠用作多個系統的基礎的本體,這些系統就可以共享通用的術語以實現知識共享和重用。開發這樣的可重用本體是本體論研究的重要目標。類似的,如果我們可以開發一些支持本體合并以及本體間互譯的工具,那么即使是基于不同本體的系統也可以實現共享。
1.2 Philosophical View
哲學角度看本體
The term ontology was taken from philosophy. According toWebster’s Dictionary an ontology is ? a branch of metaphysics relating to the nature and relations of being ? a particular theory about the nature of being or the kinds of existence
Ontology (the “science of being”) is a word, like metaphysics, that is used in many di?erent senses. It is sometimes considered to be identical to metaphysics, but we prefer to use it in a more specific sense, as that part of metaphysics that specifies the most fundamental categories of existence, the elementary substances or structures out of which the world is made. Ontology will thus analyze the most general and abstract concepts or distinctions that underlay every more specific description of any phenomenon in the world, e.g. time, space, matter, process, cause and e?ect, system. Recently, the term of “ontology” has been up taken by researchers in Artificial Intelligence, who use it to designate the building blocks out of which models of the world are made.
An agent (e.g. an autonomous robot) using a particular model will only be able to perceive that part of the world that his ontology is able to represent. In this sense, only the things in his ontology can exist for that agent. In that way, an ontology becomes the basic level of a knowledge representation scheme. An example is set of link types for a semantic network representation which is based on a set of ”ontological” distinctions: changing–invariant, and general–specific.
本體這個術語來自于哲學。根據韋氏詞典的解釋,本體是
形而上學的一個分支,研究關于自然和存在的關系;
關于存在的本質的專門理論。
本體(指關于存在的科學)是個詞,就好象形而上學,可以用于各種不同的語境。有時候把本體等同于形而上學,但我們傾向于在更具體的意義上應用它,就像形而上學詳細說明了存在的最基本的范疇,組成世界的基本物質或結構。本體論因此將分析最普遍最抽象的概念或差別,這種差別成為對世界上各種現象(比如時間、空間、物質、過程、原因和結果、系統等)進行具體描述的根基。
最近,本體在人工智能領域中得以應用,它被認為是構建世界模型的積木。
一個使用特定模型的代理(比如一個自主機器人),只能理解它內部定義的本體所能代表的世界的某部分。在這個意義上,只有在代理本體里定義的事物對代理來說才是存在的。這樣,一個本體就代表了知識大綱的基本水平。例如對語義網的鏈接類型的表現是基于一系列“本體論的”定義:變更——固定;普遍——特殊。
2 What is an Ontology?
The term “ontology” is used in many di?erent ways. In this section we will discuss what an ontology is on several definitions that are currently used.
何謂本體論?
本體論這個術語應用于很多方面。這一節中我們將在幾個目前所使用的不同定義的基礎上討論什么是“本體論”。
2.1 Common Definitions
2.1 普遍定義
The most widespread definitions of ontology are given below. 1. Ontology is a term in philosophy and its meaning is “theory of existence”. 2. Ontology is an explicit specification of conceptualization [21]. 3. Ontology is a theory of vocabulary or concepts used for building artificial systems [31]. 4. Ontology is a body of knowledge describing some domain (eg. a common sense knowl- edge domain in CYC [45]) The definition 1 is radically di?erent from all the others (including additional ones dis- cussed below). We will shortly discuss some implications of its meaning for definition of “ontology” for AI purposes. The second definition is generally proposed as a definition of what an ontology is for the AI community. It may be classified as “syntactic”, but its precise meaning depends on the understanding of the terms “specification” and “conceptualization”. The third definition is a proposal for definition within the knowledge engineering community. The last fourth definition di?ers from the previous two ones — it views the ontology as an inner body of knowledge, not as the way to describe the knowledge. Although these definitions are compact, they are not su?cient for in-depth understanding of what an ontology is. We will try to give more comprehensive definitions and insights. 最廣為流傳的本體論定義如下:
1.本體論是一個哲學術語,意義為“關于存在的理論”
2.本體論是關于概念化的清楚詳細的說明
3.本體論是關于詞匯或概念的理論,它用于構建人工智能系統
4.本體論是用來定義某一領域的知識主體(比如:在CYC領域的常識性知識)
定義1與其他定義(包括下面將要討論的其他定義)有著本質不同。我們一會兒將討論在人工智能領域的“本體論”的深層含義。第二個定義通常認為是“本體論”在人工智能中的定義。它或許可以歸為符合造句法的一類,然而其更準確的含義要依靠對“詳細說明”和“概念化”的理解。第三個定義是知識工程師團體推薦的定義。最后第四個有別于前兩個定義——它把本體論看作知識的內主體,而不是描述知識的途徑。
這些定義雖然簡潔,但是要深層理解本體論這些是不夠的。我們將試著給出更多的更為全面的定義和觀點。
2.1.1 Ontology as a Philosophical Term
2.1.1 作為哲學名詞的"本體"
Following [24] we will use the convention that the uppercase initial letter “O” is to distinguish the “Ontology” as a philosophical discipline from other usages of this term. Ontology is a branch of philosophy that deals with the nature and the organization of reality. It tries to answer questions like “what is existence”, “what properties can explain the existence” etc. Aristotle defined Ontology as the science of being as such. Unlike the special sciences, each of which investigates a class of beings and their determinations, Ontology regards “all the species qua being and the attributes that belong to it qua being” (Aristotle, Metaphysics, IV, 1). In this sense Ontology tries to answer the question “what is the being?” or, in a meaningful reformulation “what are the features common to all beings?”. This is what is today called “General Ontology” in contrast with various Special or Re- gional Ontologies (eg. Biological, Social). From this, Formal Ontology is defined as an area that has to determinate the conditions of the possibility of the object in general and the in- dividualization of the requirements that every object’s constitution has to satisfy. According to [24] Formal Ontology can be defined as the systematic, formal, axiomatic development of the logic of all forms and modes of being. From this, Formal Ontology is not concerned so much in the existence of certain objects, but rather in the rigorous description of their forms of being, i.e. their structural features. In practice, Formal Ontology can be intended as the theory of the distinctions, which can be applied independently of the state of the world, i. e. the distinctions: ? among the entities of the world (physical objects, events, regions...) ? among the meta-level categories used to model the world (concept, property, quality, state, role, part...) In this sense, Formal Ontology, as a discipline, may be relevant to both Knowledge Rep- resentation and Knowledge Acquisition [24].
以下,我們使用首字母大寫的“O”時,指“Ontology”作為一門哲學學科,以此與它的其他用法進行區別。“Ontology”(哲學上的本體論)時哲學的一個分支,研究自然存在以及現實的組成結構。它試圖回答“什么是存在”,“存在的性質是什么”等等。亞里士多德也同樣定義“本體論”是存在的科學。每一門具體科學都研究一類事物和它們的性質,與之不同,本體論涉及的是“所有作為存在的事物以及它們作為存在的特性(亞里士多德, 形而上學,IV, 1). ”在這個意義上,本體論是試圖回答“存在是什么”的科學,或者這個問題可以表達為含義更清楚的形式,即“所有的存在有什么共性?”
這就是今天所說的“一般本體論”,它與各種特殊的專門的本體論相對(如,生物本體論,社會本體論)。從這個觀點出發,形式本體論是指這樣一個領域,它確定客觀事物總體上的可能的狀態,確定每個客觀事物的結構所必須滿足的個性化的需求。根據[24],形式本體論可以定義為有關存在的一切形式和模式的系統,正式,自明的發展。
由此看來,形式本體論并不是特別關注特定事物的存在,而是嚴格描述它們存在的形式,比如它們的結構特征。實踐中,形式本體論可以看作是區別理論,可以獨立應用于世界的狀態,如:
世界上不同實體之間的區別(物理實體、事件、地區等);
模擬世界的元范疇間的區別(概念、性質、質量、狀態、角色、部分等)
2.1.2 Ontology as a Specification of Conceptualization
2.1.2 作為概念化詳細說明的本體論
The second definition of ontology mentioned above, explicit specification of conceptualiza- tion, is briefly described in [20]. The definition comes from work [22] where the ontology is used in context of knowledge sharing. According to Thomas Gruber, explicit specification of conceptualization means that an ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. This definition is consistent with the usage of ontology as set of concept definitions, but more general.In this sense, ontology is important for the purpose of enabling knowledge sharing and reuse. An ontology is in this context a specification used for making ontological commitments. Practically, an ontological commitment is an agreement to use a vocabulary (i.e. ask queries and make assertions) in way that is consistent (but not complete) with respect to the theory specified by an ontology. Agents are then built that commit to ontologies and ontologies are designed so that the knowledge can be shared with and among these agents.
上面所提到的本體論第二個定義——概念化的清楚詳細的說明——在【20】中進行了簡要描述。這一定義來自【22】的工作,在這里本體用于知識共享。根據Thomas Gruber的解釋,概念化的清楚的詳細說明是指:一個本體是對概念和關系的描述(就像程序的詳細說明書),而這些概念和關系可能是針對一個代理或代理群體而存在的。這個定義與本體論在概念定義中的描述一致,但它更具普遍意義。在這個意義上,本體論對于知識共享和重用非常重要。此處,一個本體是用來進行本體委托的詳細說明。事實上,本體委托就是使用詞匯的一個協議(比如進行詢問和做出聲明),而使用的方法要與某個本體指定的理論一致(而不必完全的照本宣科)。然后就可以開發應用這些本體的代理,而本體設計的目的就是讓代理內部或者代理之間能夠共享知識。 The body of a knowledge is based on a conceptualization: the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationship that hold among them. A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose. Every knowledge base, knowledge-based system, or knowledge-level agent is committed to some conceptualization, explicitly or implicitly.
知識的主體是基于概念化的:客觀事物、概念以及其他實體存在于特定領域和其所處關系之中。概念化是對世界的抽象,是我們在一定目的下對期望表現的世界簡化觀察。每個知識庫,基于知識的系統,或者是知識水平上的代理都或明顯或潛在地遵照某些概念化的過程。 For these systems, what “exists” is that which can be represented. When the knowledge of a domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse. This set of objects and the describable relationships among them, are reflected in the representational vocabulary with which a knowledge-based program represents knowledge. Thus, in the context of AI, we can describe the ontology of a program by defining a set of representational terms. In such an ontology, definitions associate the names of entities in the universe of discourse (e.g. classes, relations, functions, or other objects) with human readable text describing what the names mean, and formal axioms that constraint the interpretation and well-formed use of these terms. Formally it can be said that an ontology is a statement of a logical theory [20].
對這些系統來說,存在的就是那些可以被表示的。當某個領域的知識以聲明的形式表示時,那些可以表示的對象的集合就稱為universe of discourse。這些對象集以及它們之間可描述的關系,可以用描述性詞匯來表示,這種詞匯被用于基于知識的系統表達知識。因此,在人工智能環境下,可以通過定義一套描述性術語來描繪程序的本體。在這種本體中,定義與universe of discourse中的實體名相交互,用人類可讀的文本來描述這些名字的含義,描述普遍真理,而這些真理規定了如何理解和正確使用這些術語。正規一些,我們可以說本體是對邏輯理論的闡述。 Ontologies are often equated with taxonomic hierarchies of classes without class definitions and the subsumption relation. Ontologies need not to be limited to these forms. Ontologies are also not limited to conservative definitions, that is, definitions in the traditional logic sense that only introduce terminology and do not add any knowledge about the world. To specify a conceptualization, one needs to state axioms that do constrain the possible interpretations for the defined terms. 本體常常等同于沒有類的定義也不包括它們之間的關系的類的分類等級。然而本體并不局限于此形式。它也不只限于保守的定義,即在傳統邏輯意義上的只包括術語而不附加任何關于知識的定義。要詳細說明概念化,必須說明那些對定義項目的理解進行限制的公理。 Pragmatically, a common ontology defines the vocabulary with which queries and as- sertions are exchanged among agents. The agents sharing a vocabulary need not share a knowledge base. An agent that commits to an ontology is not required to answer all queries that can be formulated in the shared vocabulary. In short, a commitment to a common ontol- ogy is a guarantee of consistency, but not completeness, with respect to queries and assertions using the vocabulary defined in the ontology. 實際運用中,一個一般性的本體定義代理之間進行詢問和聲明所用的詞匯表。共享詞匯表的代理之間不需要共享一個知識庫。遵循某個本體的代理也不需要能夠回答用共享詞匯表所構成的所有問題。總之,遵循一般性本體是連貫性的保證,但不是完整性的保證。
2.1.3 Ontology as a Representational Vocabulary
2.1.3作為代表性詞匯的本體 The third definition of ontology proposed above says that it is in fact a representational vo- cabulary [8, 31]. The vocabulary can be specialized to some domain or subject matter.
More
precisely, it is not the vocabulary as such that qualifies as an ontology, but the conceptu- alization that the terms in the vocabulary are intended to capture. Thus, translating the terms in an ontology from one language to another, for example from Czech to English, does not change the ontology conceptually.
In engineering design, one might discuss the ontology of an electronic devices domain, which might include vocabulary that describes conceptual
elements — transistors, operational amplifiers, and voltages — and the relations between these elements — operational amplifiers are a type-of electronic device, and transistors are component-of operational amplifiers. Identifying such a vocabulary and the underlying con- ceptualization generally requires careful analysis of the kinds of objects and relations that can exist in the domain.
上述本體的第三個定義認為本體實際上是一種代表性的詞匯。這種詞匯可以應用于特定領域或者主題。更確切的說,它不是像本體那樣嚴格定義的詞匯,而是一種概念化,這種概念化是詞匯表中的術語想要抽取出來的。因此,將這些術語用本體的形式在不同語言間翻譯時,比如由捷克語譯成英語,并不從概念上改變本體。在工程設計中,或許會討論到電子設備領域的本體,它包含一些描述基本概念的詞匯,比如晶體管,運算放大器,電壓等;也包含這些基本元素間的關系,運算放大器是電子設備的一種,而晶體管是運算放大器的組件。一般來說,識別這種詞匯和潛在的概念需要仔細分析領域內存在的各種對象和關系。
The term ontology is sometimes used to refer to a body of knowledge describing some domain (see below), typically a common sense knowledge domain, using a representational vocabulary. For example, CYC [45] often refers to its knowledge representation of some area of knowledge as its ontology. In other words, the representation vocabulary provides a set of terms with which one can describe the facts in some domain, while the body of knowledge using that vocabulary is a collection of facts about a domain. However, this distinction is not as clear as it might first appear. In the electronic-device example, that transistor is a component-of operational amplifier or that the latter is a type-of electronic device is just as much a fact about its domain as a CYC fact about some aspect of space, time or numbers. The distinction is that the former emphasizes the use of ontology as a set of terms for representing specific facts in an instance of the domain, while the latter emphasizes the view of ontology as a general set of facts to be shared.
本體這一術語有時候用于指描述某個領域的知識主體。比如,CYC常將它對某個領域知識的表示稱為本體。也就是說,表示詞匯提供了一套用于描述領域內事實的術語,而使用這些詞匯的知識主體是這個領域內事實的集合。但是,它們之間的這種區別并不明顯。在電子設備的例子中,晶體管是運算放大器的一個組件,或者運算放大器是一種電子設備也可以是領域內的一種事實,就像關于宇宙,時間或者數字的CYC事實一樣。兩者的區別在于,前者強調本體作為表現領域內特定事實的術語集而使用,而后者則強調本體是可以共享的普遍的事實的集合。
2.1.4 Ontology as a Body of Knowledge
2.1.4作為知識主體的本體 Sometimes, ontology is defined as a body of knowledge describing some domain, typically a common sense knowledge domain, using a representation vocabulary as described above. In this case, an ontology is not only the vocabulary, but the whole “upper” knowledge base (including the vocabulary that is used to describe this knowledge base). The typical example of this definition usage is project CYC (http://www.cyc.com/, [45]) that defines its knowledge base as an ontology for any other knowledge based system. CYC is the name of a very large, multi-contextual knowledge base and inference engine. The development of CYC started during the early 1980s headed by Douglas Lenat. CYC is an attempt to do symbolic AI on a massive scale. It is neither based on numerical methods such as statistical probabilities, nor is it based on neural networks or fuzzy logic. All of the knowledge in CYC is represented declaratively in the form of logical assertions. CYC contains over 400; 000 significant assertions [45], which include simple statements of fact, rules about what conclusions to draw if certain statements of fact are satisfied (true), and rules about how to reason with certain types of facts and rules. New conclusions are derived by the inference engine using deductive reasoning. The CYC team doesn’t believe there is any shortcut toward being intelligent or creating an artificial intelligence based agent. Addressing the need for a large body of knowledge with content and context may only be done by manually organizing and collating information.
有時候,本體被定義為描述某個領域的知識,通常是一般意義上的知識領域,它使用上面提到的表示性詞匯。這時,一個本體不僅僅是詞匯表,而是整個上層知識庫(包括用于描述這個知識庫的詞匯)。這種定義的典型應用是CYC工程,它以本體定義其知識庫,為其他知識庫系統所用。CYC是一個巨型的,多關系型知識庫和推理引擎。CYC的開發早在80年代就已經開始,重要負責人是Douglas Lenat。CYC是大型的符號型人工智能的一次嘗試。它不是基于數字方法,比如概率統計,也不是基于神經網絡或者模糊邏輯。 CYC中所有的知識都以邏輯聲明的形式表示。CYC包含400,000多個關鍵聲明,這其中包含對事實的簡單陳述,關于滿足特定事實陳述時得出何種結論的規則,以及關于通過一定類型的事實和規則如何推理的標準。新的結論由推理引擎通過演繹推理得到。CYC小組不相信在通往智能化或創造基于人工智能的代理的途中存在什么捷徑。他們強調需要有大型的內容知識主體,而聯系只能通過手工組織和比較信息而獲得。
This knowledge includes heuristic, rule of thumb problem solving strategies, as well as facts that can only be known to a machine if it is told. Much of the useful common sense knowledge needed for life is prescientific and has there- fore not been analyzed in detail. Thus a large part of the work of the CYC project is to formalize common relationships and fill in the gaps between the highly systematized knowl- edge used by specialists. It is not necessary to divide such a large knowledge base into smaller pieces to enable reasoning in reasonable time. Because of this, the CYC knowledge base uses a special context space [29], that is divided by 12 dimensions into smaller pieces (contexts) that have something in common and can be used to reason about a specific problem in that context. It is possible to “lift” assertion from one context to another when the problem requires it. The CYC common sense knowledge can be used as a body of a knowledge base for any knowledge intensive system. In this sense, this body of knowledge can be viewed as an ontology of the knowledge base of the system.
這種知識包括啟發、問題解決策略的檢索規則,也包含只能被機器理解的事實。生活中需要的常識知識大部分是近代科學以前的,因此尚未詳細分析。所以CYC很大一部分工作就是格式化一般的關系并填補它與專家使用的高度系統化的知識間的空白。為了在合理時間內完成推理而將這樣一個大型的知識庫分割成小部分是不必要的。為此,CYC知識庫使用特殊的關系空間,這一空間被十二個因素分割成小塊兒(關系),每個小塊有共同點,可以用來推理特定的問題。在需要的時候也可以將聲明從一個關系塊轉換到另一個關系塊。CYC常識知識庫可以被用作任何知識密集型系統的知識主體。在這個意義上,知識主體可以被看成系統知識庫的本體。
2.2 Other Ontology Definitions
/* 2.2 其它本體定義*/ 正如我們從上述討論中所見,還沒有明確的對本體的準確定義,然而可以看出上述定義有許多共同之處。除了上述定義外還有許多對本體定義的其它說法。[24]中收集的一些其它的定義有:1.非正式的概念體系 2.正式的語義說明3. 對概念體系用邏輯性的理論進行描述 (a) 用特定格式的屬性表現其特征(b) 僅按其特定的目標進行特征描述4. 邏輯性理論所采用的詞匯表5. 邏輯理論的規范。定義1和定義2將一個本體視為一個概念的“語義”實體,正式或 非正式的,而概念3,4和5的闡述則是一個具體的“語法”對象。根據 定義1,一個本體是一個被設想成能夠由特定知識庫支持的概念體系。而定義2則認為有知識庫支持的本體在語義層根據適當形式的結構予以表示。在上述2定義下,我們都可以說“知識庫A的本體與知識庫B的本體不同”。在定義3下,一個本體僅是一個邏輯理論。問題在于這樣一個理論要成為本體是否需要有特殊格式的屬性,或是否以讓人將一個邏輯理論作為本體考慮為目標。 后者可以由一個本體是關于事物的加注解和索引的聲明的集合的辯論來支持: “離開注解和索引,它變成一個聲明的集合:邏輯上何謂理論。(Pat Hayes 在 [24]中闡述的). 根據定義4,一個本體不作為一個邏輯理論,而是作為邏輯理論使用的詞匯表。如果一個本體被視為一個包含一系列邏輯定義的詞匯規范,則此定義轉化為3.a。可以預測當概念化試圖作為詞匯表時Gruber的定義描述(概念化規范)也將轉化為3.a。最后,在定義5下,基于一種認識:它指定了在特定領域的理論中使用的“構件”,一個本體被視為一個邏輯理論的規范*/*/As we can see from the above discussions, the exact definition of ontology is not obvious, however it can be seen that the definitions have much in common. In addition to the above definitions there are many other proposals for ontology definitions. Some other definitions collected from [24] are: 1. informal conceptual system 2. formal semantic account 3. representation of a conceptual system via a logical theory (a) characterized by specific formal properties (b) characterized only by its specific purposes 4. vocabulary used by a logical theory 5. (meta-level) specification of a logical theory Definitions 1 and 2 conceive an ontology as a conceptual “semantic” entity, either formal or informal, while according to the interpretations 3, 4 and 5 is a specific “syntactic” object. According to interpretation 1, an ontology is the conceptual system which may be assumed to underlay a particular knowledge base. Under interpretation 2, instead, the ontology, that underlies a knowledge base, is expressed in terms of suitable formal structures at the semantic level. In both cases, we may say that “the ontology of knowledge base A is di?erent from that of knowledge base B”. Under interpretation 3, an ontology is nothing else then a logical theory. The issue is whether such a theory needs to have particular formal properties in order to be an ontology or, rather, whether it is the intended purpose which lets us consider a logical theory as an ontology. The latter position can be supported by arguing that an ontology is an annotated and indexed set of assertion about something: “leaving o? the annotations and indexing, this is a collection of assertions: what in logic is called a theory” (Pat Hayes statement in [24]). According to interpretation 4, an ontology is not viewed as a logical theory, but just as the vocabulary used by a logical theory. Such an interpretation collapses into 3.a if an ontology is thought of as a specification of a vocabulary consisting of a set of logical definitions. We may anticipate that the Gruber’s interpretation (specification of conceptualization) collapses into 3.a as well when a conceptualization is intended as a vocabulary. Finally, under interpretation 5, an ontology is seen as a specification of a logical theory in the sense that it specifies the “architectural components” (or primitives) used within a particular domain theory. */
3 Ontology Structure
From the overview above we can see that an ontology can be perceived in basically two approaches. The first approach is an ontology as a representational vocabulary, where the conceptual structure of terms should remain unchanged during translation. The other ap- proach, that is discussed in this section, is an ontology as the body of knowledge describing a domain, in particular a common sense domain. An ontology can be divided in several ways. We will describe some of the proposals here. Particularly interesting is so called “upper ontology” that is intended to serve as an upper part of ontology of practically all knowledge based systems. Some of the ways of dividing presented here are intended to be used for merging to form an upper ontology standard in the IEEE Standard Upper Ontology Study Group [39]. On pages linked from [39] there are many other examples that could be used as some kind of an upper ontology. 根據以上看法可以得出一個本體基本上可以通過兩個步聚來認識。第一個步驟是本體是一個抽象詞匯表,在這個詞匯表里術語的概念結構在轉換的過程中應該保持不變。另一個步聚就是本節需要討論的,本體是用來描述一個領域,特別是一個公共領域的一個知識體系。本體有幾中劃分方式。我們將在這里來討論一些劃分的建議。特別有趣的是一種“上層本體”,它試圖用作幾乎所有的基于知識的系統的本體的上層部分。在IEEE標準上層本體研究組中所描述的一些劃分本體的方式試圖用來合并成一個上層本體標準。在[39]的鏈接網頁上有很多其它的例子可以作為一個上層本體。(感覺翻譯不太好!) (figure 1)
Figure 1: How ontologies di?er in their analyses of the most general concepts [8] It is interesting that many authors agree that the upper class1 of the ontology is “thing”, however even in the second level they do not agree on the separation, as can be seen in the figure 1. The initiative [39] tries to unify these views.
3.1 CYC
The ontology of CYC is based on a several terms that form the fundamental vocabulary of the CYC knowledge base. The universal set is #$Thing2 (see figure 1). It is the set of everything. Every CYC constant in the knowledge base is a member of this collection. In the prefix notation of the language CycL [10], we express that fact as (#$isa CONST #$Thing). Thus, too, every collection in the knowledge base is a subset of the collection #$Thing. In CycL, that fact is expressed as (#$genls COL #$Thing). The set #$Thing has some subsets, such as PathGeneric, Intangible, Individual, Sim- pleSegmentOfPath, PathSimple, MathematicalOrComputationalThing, IntangibleIndividual, Product, TemporalThing, SpatialThing, Situation, EdgeOnObject, FlowPath, ComputationalObject, Microtheory, plus about 1500 more public subsets and about 13600 unpublished subsets.
- $Individual is the collection of all things that are not sets or collections. Thus,
- $Individual includes (among other things) physical objects, temporal subabstractions of
physical objects, numbers, relations, and groups (#$Group). An element of #$Individual may have parts or a structure (including parts that are discontinuous), but no instance of
- $Individual can have elements or subsets.
- $Collection is the collection of all CYC collections. CYC collections are natural kinds
or classes, as opposed to mathematical sets. Their elements have some common attribute(s). Each CYC collection is like a set in so far as it may have elements, subsets, and supersets, and may not have parts or spatial or temporal properties. Sets, however, di?er from collections in that a mathematical set may be an arbitrary set of things which have nothing in common (#$Set-Mathematical). In contrast, the elements of a collection will all have in common some feature(s), some ‘intensional’ qualities. In addition, two instances of #$Collection can be co-extensional (i.e. have all the same elements) without being identical, whereas if two arbitrary sets had the same elements, they would be considered equal.
- $Individual and #$Collection are disjoint collections. No CYC constant can be an
instance of both.
- $Predicate is the set of all CYC predicates. Each element of #$Predicate is a truth-
functional relationship in CYC which takes some number of arguments. Each of those argu- ments must be of some particular type. Informally, one can think of elements of #$Predicate as functions that always return either true or false. More formally, when an element of
- $Predicate is applied to the legal number and type of arguments, an expression is formed
which is a well-formed formula (w?) in CycL. Such expressions are called atomic formulas if they contain variables, or ground atomic formulas (gaf) if they contain no variables.
- $isa:<#$ReifiableTerm> <#$Collection> expresses the ISA relationship. (#$isa EL
COL) means that EL is an element of the collection COL. CYC knows that #$isa distributes over #$genls. That is, if one asserts (#$isa EL COL) and (#$genls COL SUPER), CYC will infer that (#$isa EL SUPER). Therefore, in practice one only manually asserts a small fraction of the #$isa assertions — the vast majority are inferred automatically by CYC.
- $genls:<#$Collection> <#$Collection> expresses similar relationship for collections
(generalization). (#$genls COL SUPER) means that SUPER is one of the supersets of COL. Both arguments must be elements of #$Collection. Again, as with the #$isa, CYC knows that #$genls is transitive, therefore, in practice one only manually asserts a small fraction of the #$genls assertions since the rest is inferred inferred automatically. More details about the structure of the CYC ontology and about how the CYC knowledge base is constructed can be found at http://www.cyc.com.
3.2 Russell & Norvig’s General Ontology Russell & Norvig’大本體
Yet another view of general ontology structure is presented in Russell & Norvig’s book [38]. Every category of their ontology (see figure 2) is discussed in detail on example axioms. An example of this ontology in KIF [18] can be found at http://ltsc.ieee.org/suo/ ontologies/Russell-Norvig.txt.
在Russell & Norvig的書 [38] 中提及了另一種關于大本體結構的觀點。每個類別都有各自的本體(見圖2),這在例程公理中已詳細討論過了。
這種本體的KIF [18]可以在
Russell-Norvig.txt (http://ltsc.ieee.org/suo/ontologies/Russell-Norvig.txt) 找到。
(Figure 2)
Figure 2: Russell & Norvig’s general ontology structure [38] 圖2:Russell & Norvig的大本體結構 [38]
3.3 Ontology Engineering
3.3 本體工程
Ontology engineering is a field in artificial intelligence or computer science that is concerned with ontology creation and usage. Report [31], that proposes and comments this field, declares that the ultimate purpose of ontology engineering should be “to provide a basis of building models of all things in which computer science is interested”.
本體工程是人工智能或者計算機科學的一個領域, 它關注于本體的建立和使用. 在Report [31]中提出了這一新的領域并對其進行了注解,它宣稱本體工程的終極目標應該是"為計算機科學感興趣的所有事物提供一個建立模型的基礎".
3.3.1 Structure of Usage
3.3.1 用法的結構
An ontology can be divided into following subcategories according to [31] from the knowledge reuse and ontology engineering point of view as follows. This is rather a structure of ontologies from a point of view of their usage than a division of one general ontology. Some examples are included.
根據 [31]從知識重用和本體論工程指出的如下觀點,本體論可以被分成以下子類。與其說是一個通用本體的分類,不如說是一個通過它們的用途劃分的本體結構。包括一些例子。
? Workplace Ontology
工作場所本體
This is an ontology for workplace which a?ects task characteristics by specifying several boundary conditions which characterize and justify problem solving behaviour in the workplace. Workplace and task ontologies collectively specify the context in which domain knowledge is intended and used during the problem solving. Examples from circuit troubleshooting: fidelity, e?ciency, precision, high reliability. ? Task Ontology Task ontology is a system of vocabulary for describing problem solving structure of all the existing tasks domain independently. It does not cover the control structure. It covers components or primitives of unit inferences taking place during performing tasks. Task knowledge in turn specifies domain knowledge by giving roles to each objects and relations between them. Examples from scheduling tasks: schedule recipient, schedule resource, goal, constraint, availability, load, select, assign, classify, remove, relax, add.
? Domain ontology Domain ontology can be either task dependent or task independent. Task independent ontology usually relates to activities of objects. – Task-dependent ontology A task structure requires not all the domain knowledge but some specific domain knowledge in a certain specific organization. This special type of domain knowledge can be called task-domain ontology because it depends on the task. Examples from job-shop scheduling: job, order, line, due date, machine availability, tardiness, load, cost. – Task-independent ontology ? Activity-related ontology ? Object ontology. This ontology covers the structure, behaviour and function of the object. Examples from circuit boards: component, connection, line, chip, pin, gate, bus, state, role. ? Activity ontology. Examples from enterprise ontology: use, consume, produce, release, state, resource, commit, enable, complete, disable. ? Activity-independent ontology ? Field ontology. This ontology is related to theories and principles which govern the domain. It contains primitive concepts appearing in the theories and relations, formulas, and units constituting the theories and principles. ? Units ontology. Examples: mole, kilogram, meter, ampere, radian. ? Engineering mathematics ontology. Examples: linear algebra, physical quantity, physical dimension, unit of measure, scalar quantity, physical components. ? General or Common ontology Examples: things, events, time, space, causality or behaviour, function etc.
3.3.2 Ontology Engineering Subfields
We can also divide the ontology or ontologies from the point of view of ontology engineering as a field. The subjects which should be covered by ontology engineering are demonstrated in [31]. It includes basic issues in philosophy, knowledge representation, ontology design, standardization, EDI, reuse and sharing of knowledge, media integration, etc. which are the essential topics in the future knowledge engineering. Of course, they should be constantly refined through further development of ontology engineering. ? Basic Subfield – Philosophy(Ontology, Meta-mathematics) Ontology which philosophers have discussed since Aristotle is discussed as well as logic and meta-mathematics.
– Scientific philosophy Investigation on Ontology from the physics point of views, e.g., time, space, pro- cess, causality, etc. is made. – Knowledge representation Basic issues on knowledge representation, especially on representation of ontologi- cal stu?, are discussed. ? Subfield of Ontology Design – General(Common) ontology General ontologies such as time, space, process, causality, part/whole relation, etc. are designed. Both in-depth investigation on the meaning of every concept and relation and on formal representation of ontologies are discussed. – Domain ontologies Various ontologies in, say, Plant, Electricity, Enterprise, etc. are designed. ? Subfield of Common Sense Knowledge – Parallel to general ontology design, common sense knowledge is investigated and collected and knowledge bases of common sense are built. ? Subfield of Standardization – EDI (Electronic Data Interchange) and data element specification Standardization of primitive data elements which should be shared among people for enabling full automatic EDI. – Basic semantic repository Standardization of primitive semantic elements which should be shared among people for enabling knowledge sharing. – Conceptual schema modeling facility (CSMF) – Components for qualitative modeling Standardization of functional components such as pipe, valve, pump, boiler, regis- ter, battery, etc. for qualitative model building. ? Subfield of Data or Knowledge Interchange – Translation of ontology Translation methodologies of one ontology into another are developed. – Database transformation Transformation of data in a data base into another of di?erent conceptual schema. – Knowledge base transformation Transformation of a knowledge base into another built based on a di?erent ontology. ? Subfield of Knowledge Reuse – Task ontology Design of ontology for describing and modeling human ways of problem solving.
– T-domain ontology Task-dependent domain ontology is designed under some specific task context. – Methodology for knowledge reuse Development of methodologies for knowledge reuse using the above two ontologies. ? Subfield of Knowledge Sharing – Communication protocol Development of communication protocols between agents which can behave coop- eratively under a goal specified. – Cooperative task ontology Task ontology design for cooperative communication ? Subfield of Media Integration – Media ontology Ontologies of the structural aspects of documents, images, movies, etc. are de- signed. – Common ontologies of content of the media Ontologies common to all media such as those of human behavior, story, etc. are designed. – Media integration Development of meaning representation language for media and media integration through understanding media representation are done. ? Subfield of Ontology Design Methodology – Methodology – Support environment ? Subfield of ontology evaluation – Evaluation of ontologies designed is made using the real world problems by forming a consortium.