Generation of Heterogeneous Semantic Annotations of XML Pages: a Multi-viewpoints Approach

The semantic annotations presented in current research are based on consensual descriptions of domain knowledge which are used to generate a consensual interpretation of the resource content. Sometimes specialists in a domain want to enrich this interpretation with specific interpretations based on their specialties, consistent with the interpretations of other specialists. However, extant work cannot enrich consensual annotations with heterogeneous annotations. We propose an approach that constructs a consensual (global) annotation and then enriches the consensual annotation with heterogeneous (local) annotations, each of which is a specification of the global annotation. The heterogeneous annotations are consistent with the global interpretations and with one another. What is unique in our approach is that one element in a resource can be annotated with different specific concepts, based on different viewpoints. Our approach utilizes a multi-viewpoints ontology [1], with heterogeneity at the local level and consensus at the global level.


Introduction
To equip resources with explicit meanings which can be interpreted by software, it is necessary to associate these resources with semantic annotations [2]. In the semantic web, semantic annotation is defined as 'an approach to link ontologies to the original information sources' [3]. An ontology provides consensual descriptions of the elements of a domain [4,5]. Semantic annotations are based on these consensual descriptions to generate a consensual interpretation of the resource content.
Generally, in any given domain, there are different specialists. All of these specialists share a common description of the domain knowledge (a consensual description). In addition, each specialist has a specific description according to his or her specialty (his or her local viewpoint).
Current work such as [6,7,8,9,10,11,46] constructs only a consensual annotation of a resource, using a single ontology. Other work that constructs only a consensual semantic annotation of a resource, such as [12,13,14,15], uses multiple ontologies, intending to obtain a 'bigger' ontology with a larger set of concepts. Trichet, et al. [16] have developed a platform that allows users to utilize one or more ontologies to generate their own annotations for the same resource.
However, this approach allows constructing only heterogeneous annotations without a common, consensual annotation. In contrast to these works, we are interested in an annotation system with the following characteristics: − it should include a consensual (global) annotation; − the consensual annotation can be enriched by many specific local annotations (heterogeneous annotations) which are mutually consistent; − the local annotations can be enriched while maintaining consistency between them; and − a local semantic annotation of a page's content is generated for each different viewpoint. − However, we cannot accomplish these goals using a classical ontology (a mono-viewpoint ontology), because classical ontologies can generate only consensual annotations. Moreover, multi-ontologies allow only the construction of heterogeneous annotations, without a consensual annotation, so that it is necessary to reason about the alignment of disjoint ontologies. So, to accomplish our goals, we exploit the multiviewpoint ontology developed in [1]. A multi-viewpoints ontology is cohabited by heterogeneity at the local level and consensus at the global level. The knowledge (concepts and roles) common to all viewpoints is described in the global representation. The local knowledge for each viewpoint is described in a local representation, and the local representations are linked by bridge rules.
In this work, we search for the page elements that can be annotated according to the global viewpoint, in order to generate a global (consensual) annotation. Next, for each viewpoint, we search the page for the local attributes of the global concepts in order to identify the local concepts in the page that specify (are subsumed by) the global concepts. In addition, we will focus on the annotation of the DOMXML tree using a multi-viewpoints ontology. Each node in the DOMXML tree can be annotated (labeled) by an ontological element. We build a labeled DOMXML subtree for each viewpoint that contains the relevant page elements as seen from this viewpoint. Thus, a single node can be annotated by different specific concepts according to different viewpoints.
The remainder of this paper is organized as follows. Section 2 gives some terminology and definitions. Then, in section 3, we show how to link each DOMXML node to the appropriate ontological element. In section 4, we explain our proposed approach. In section 5, we present the formalization and implementation. Section 6 presents a discussion, and section 7 concludes this paper and talks about further work.

The viewpoint approach: an overview
For a given domain of knowledge, several criteria can be used to observe an object. These different perceptions of the world are called viewpoints or perspectives. In computer science, most of data modeling systems don't deal with the variety of perceptions related to the same universe of discourse and develop tools to create a single model for a single vision of the observed world. The viewpoint approach is opposed to this monolithic approach and makes it possible to model the same reality according to different points of view.
Several interpretations of viewpoint notion are possible. One of the first references to viewpoints was proposed by Minsky [17]: viewpoints correspond to different perceptions of an object with respect to observer's position. The second interpretation is a knowledge domain one: viewpoints correspond to the different ways to translate knowledge with respect to the social position, know-how and competence of an expert. In this interpretation, a viewpoint includes context and the perception of a person or group of persons [18].
The viewpoint mechanism has been integrated into various contexts and used to solve different problems. In the following, we identify the main objectives in integrating viewpoints into computer systems. Note that there is no single use of this concept that includes all of these objectives.
− The viewpoint as a means of providing multiple descriptions of an entity: the viewpoint concept seems to naturally result from the multiple views of objects of a specific study. As a matter of fact, a real-world entity can have many behavioral contexts and many states from which the notion of multiple descriptions has been derived. In this case, it is defined as the fact of conferring several partial descriptions to the same universe of discourse each of which describes it in a given viewpoint. − The viewpoint as a means of mastering system complexity: several research works are based on the viewpoint concept with the principal objective of explicitly taking into account the complexity of the system. The result of the study is then held by dividing it into partial descriptions according to different and complementary aspects. − The viewpoint as an approach for the modeling and distributed development of systems: many authors state that the modelling of complex systems cannot be handled with the same techniques as used for simple systems. Different works suggest a distributed development approach based on viewpoint notion. Hence, every development process can be represented by correlated viewpoints.

Multi-viewpoints ontology
Hemam et al. [18] and Hemam [1] investigated the problem of representing a domain ontology which takes into account the notion of a viewpoint. This type of ontology, which they call a Multi-Viewpoints Ontology (MvpOnt), is a multiple description of a single universe of discourse according to various viewpoints. The ontology is defined as a 4-tuple O = (CG, RG, VP, M), where CG is a set of global concepts, RG is a set of global roles, VP is a set of viewpoints and M is a set of bridge rules.
A viewpoint, which is a partial description of a universe of discourse from a particular perspective, is defined as a triple VPK= (CL, RL, AL), where CL is a set of local concepts, RL is a set of local roles and AL is a set of local individuals. Some fundamental definitions follow:

•
A global concept models a generic family in the real world and groups a set of individuals. Each global concept can be described according to different viewpoints. • A local concept is a concept that is seen and described locally, according to a particular viewpoint. • A global role is a relationship between two local concepts that are defined in two different perspectives. • A local role is a relationship between two local concepts defined in the same viewpoint. • A bridge rule represents consensual relationships between two local concepts or two local roles represented in two different viewpoints. Under a particular viewpoint, each particular individual is attached to only one local concept. This is motivated by the argument that an individual belonging to two unrelated local concepts can only do so under different viewpoints [45]. This organization is very useful since, from a particular viewpoint, one can classify an individual with regard to only the relevant properties [43]. As a consequence, the set of concepts to be considered will be small, whereas considering the entire lattice of possible structures would confuse the user.

Linking DOMXML nodes to ontological elements
Apparo and Pixley [19] noted that DOM is a specification of the W3C (World Wide Web Consortium) that allows defining the hierarchical structure of an XML document by a tree in which: • each XML tag is represented by an element node with the same name as the XML tag, • the imbrications between tags are represented by edges, • the value of an XML tag is represented by a text node, and • the attribute of an XML tag is represented by an attribute node. DOM has several methods for manipulating XML files, such as: • RemoveChild, which allows deleting a target node, and • hasChildNodes, which returns true if the node has children, and false otherwise. In the following figure, we present an example of an XML representation. This representation shows a set of apartments, each of which is defined by tenant, location and rent: A DOMXML tree is composed of several levels. The first level contains the element nodes which are parents of text nodes. The second level contains the element nodes which are parents of at least one element node at the first level. And so on: level Li contains the element nodes which are parents of at least one element node at level Li-1.
In the following, we will describe how to annotate each node with an element defined in the multi-viewpoints ontology. First, however, we show how to link a node of a DOMXML tree to the appropriate ontological element, which is represented by the OWL language.
We compare the node in DOMXML with the OWL representation of the ontological element at the structural and terminological levels. At the terminological level, we compare the name of the node with the name of the ontological element (as a string comparison). We use one of the metrics which are defined in the literature [20,21,22,23]. If the names are technical words, we search for an identity between the names or between their synonyms. We use technical dictionaries, such as UMLS [24] for the biomedical domain, to search for synonyms of technical words. If the names aren't technical words, we search for a similarity by searching for synonyms of the ontological element in WordNet [25].
At the structural level, Table 1 compares the DOMXML structure of the node with the OWL structure of the ontological element. There are several approaches to mapping an XML structure to OWL. Ferdinand et al. [26] and Yahia et al. [27] generate an ontology from XML based on mapping rules. Bohring and Auer [44] proposed mapping rules to map XML tags to OWL. These mapping rules show how to link a node to the appropriate ontological element, according to the structure of this node. In the DOMXML tree, each element node can have one of the following structures:

Presentation of the proposed approach
The annotation of the DOMXML tree is done by exploiting a multi-viewpoint ontology and consists of the following three phases. <Apartments> <Apartments Adress= "bel air street Constantine city, 25000" > <Tenant> Meriam </Tenant> <Location> Downtown </Location> <Rent> 500.00 </Rent> </Apartments>

Phase 1: Construction of the original DOMXML tree
The first phase consists of building the DOMXML tree for the resource page. We make n copies of this DOMXML tree, where n is the number of the viewpoints seen in the domain, plus one copy for the global annotation.

Phase 2: Generation of the global annotation
In this phase, we will build the annotated DOMXML subtree from the copy of the original DOMXML tree. We do this in two steps:

Step 1: Generation of the annotation
In this step, we annotate each node with a global ontological element, using the method described in the previous section. The unique characteristic of the multi-viewpoints ontology is that a global concept is a concept which is defined by a set of global attributes and a set of local attributes, and a subset of the global attributes comprises the key attributes for this concept. So the annotation of an element node by a global concept is related to the annotation of some nodes which are attached to this element node (its text node, its attributes node and its children) by the key attributes of this concept.

Step 2: Deletion of irrelevant nodes
A node is relevant to the global viewpoint if we can find its annotation; otherwise the node is irrelevant.
Irrelevant nodes are removed if they are attribute nodes, text nodes or element nodes which do not have annotated children or annotated text nodes.
Irrelevant nodes which are element nodes with annotated children or annotated text nodes are kept in order to retain relevant nodes, because there are relationships between these children in the page. However, there are no roles between the ontological elements which annotate these children.
At the end of this phase, we have an annotated (labeled) DOMXML subtree in accordance with the global viewpoint.

Phase 3: Generation of the local annotations
In this phase, we will build a subtree DOMVPi from a copy of the original DOMXML tree and the labeled subtree with the global viewpoint for each viewpoint VPi. DOMVPi will be a labelled DOMXML subtree with the viewpoint VPi. Initially, each DOMVPi is a copy of the original DOMXML tree. We generate the annotations level by level in each subtree DOMVPi. We begin with Lj as the first level and then move up one level at a time until we reach the root. This procedure involves the following steps:

Step 1: Generation of the annotations
Some nodes are already annotated in the global viewpoint, and we save the global annotations of these nodes. We annotate the other nodes with local ontological attributes, according to viewpoint VPi, by using the method described in the Table 1.
Then, for each node Nk annotated by a global concept Gk, we see whether we can find a local concept specifying (subsumed by) this global concept which can be used to annotate Nk according to VPi. Note that the node Nk has some children which have been annotated (in phase 2) by global attributes (and other children which have been annotated in phase 3 with local attributes according to VPi). The children which are annotated by local attributes help in finding the local concepts which specify the concept Gk.
If we can't find a specific local concept, the global annotation of Nk will not be changed. Thus, each node at level Lj of the subtree DOMVPi is annotated either by a global ontological element or by a local ontological element according to VPi.

Step 2: Propagation of the annotations
In the multi-viewpoints ontology, the local knowledge of each viewpoint is described in the local representation. The local representations are linked by bridge rules. Bridge rules allow communication to be established between the local presentations of the local knowledge. The bridge rules can link two concepts of different viewpoints. This allows new local individuals (instances of concepts) to be inferred in different viewpoints as well as detection of contradictions at the level of individuals. Annotation of an element on a resource page with an ontological concept means the creation of an individual instance of this concept. In this section, we explain how to apply the bridge rules to the annotations obtained at level Lj of each subtree DOMVPi. This allows local annotations to be propagated between all the subtrees at level Lj. Propagation of the annotations will allow the inference of new local annotations and detection of contradictions in all the subtrees at level Lj. For this, we generate a rulebased system composed of a knowledge base. The latter is composed of facts and production rules (inference rules): Construction of the facts. The local annotations which have been obtained allow creating instances of some ontological concepts. These local annotations will be transformed to facts.
As an example, if the node NE is annotated by the concept C according to VPi, we can create an instance of C from NE as follows: is_instance (NE, VPi: C) Construction of the production rules. The production rules will be built from the bridge rules which are defined in the multi-viewpoints ontology. A bridge rule is a statement of one of the three following forms: − An inclusion bridge rule, noted X:vpi→Y:vpj, means that an individual which is an instance of the source concept X under the vpi is also an instance of the target concept Y under the vpj. − A bi-directional inclusion bridge rule, noted X:vpi↔Y:vpj, means that the sets of possible extensions of the two local concepts under different viewpoints are equal. − A bi-directional exclusion bridge rule, noted X:vpi→⊥Y:vpj, means that the two local concepts X and Y cannot be at the same time representations of the same individual. Each of the previous types of bridge rules will be transformed to a production rule as follows: − A inclusion bridge rule will be transformed to a production rule that has: Premise: X is an individual instance of the concept C1 according to the viewpoint VP1, and Conclusion: X is an individual instance of the concept C2 according to the viewpoint VP2. if is_instance (X, VP1:C1) then is_instance (X, VP1:C2) − A bi-directional inclusion bridge rule will be transformed to two production rules. The first rule has Premise: X is an individual instance of the

Formalisation and implementation
The implementation of the proposed approach consists of generating a system that is composed of a set of modules. Each module allows implementing a phase or a step of the proposed approach. In order to represent the multi-viewpoints ontology, we have used the MVP-OWL, proposed in [28] as an extension of OWL. So, the multi-viewpoints ontology is saved in an MVP-OWL file. Both the owl file and the MVP-OWL file have an XML/RDF structure. So, these files can be manipulated as XML files or RDF files. The DOMXML model (Document Object Model) allows accessing and manipulating the contents of the XML file with PHP language. So, the XML structure of the MVP-OWL file can be exploited by the DOMXML model in PHP language. The obtained multi-viewpoints annotation will be saved in RDF files.
The new constructs, used to represent multiviewpoints ontological knowledge, are described in the following: • vpowl:Viewpoint that is used to represent viewpoints. • vpowl:GlobalClass that is used to represent global concepts. • vpowl:LocalClass that is used to represent local concepts. • vpowl:GlobalProperty that is used to represent global roles. • vpowl:LocalProperty that is used to represent local roles. • vpowl:underViewpoint that is used to specify the viewpoint of the local concept. • vpowl:onViewpoint that is used to specify the viewpoint of the local role and local attribute. • vpowl:belongtoViewpoint that is used to specify the viewpoint of the local individual. • vpowl:InclusionBridge, vpowl:EquivalenceBridge and vpowl:ExclusionBridge are used to represent bridge rules.
These constructs are represented as subclasses of OWL classes. For example, vpowl:GlobalClass is represented as a subclass of owl:Class, as shown in Fig 2. In addition, the production rules can be formalized as SWRL rules [43]. These rules are expressed in terms of the multi-viewpoint ontology concepts (i.e. classes, properties, individuals) and description logic, using a stamping technique. Table 2 shows some examples.

Results and discussion
This section evaluates our proposed process, shows the place of our work in the literary and compares our work with the existing works.
Semi-structured documents represent the hierarchical structure of data categories. Several approaches in this direction have been developed, for example, Alani et al. [34], Hignette et al. [35], Thiam [36] and Zhang et al. [11]. These approaches are based on exploiting the structure of the resource page. But they are also based on classical ontology and cannot take different viewpoints into consideration.
The work on multi-viewpoints is divided between two directions. The first direction represents multiple viewpoints by using disjoint ontologies. Each viewpoint is represented by an ontology, and the ontologies are linked by bridge rules or context mappings to establish interconnections between them. These authors are interested in the mechanisms which allow linking available ontologies. The work in this direction includes Borgida and Serafini [37], the model C-language OWL [38], and the MPV model [39]. This approach requires the alignment and matching of ontologies.
Some work has proposed semantic annotation based on multi-ontologies. The aim of using the multi-ontologies is to obtain a bigger ontology that contains a large set of concepts, this work includes, for example, Bhogal et al. [40], Gómez-Berbís et al. [13], Mena et al. [14] and Wang et al. [15]. These authors use one ontology to annotate some page elements, and they use another ontology to annotate other page elements because the first one is insufficient. Thus, the aim is not to generate a semantic annotation for each viewpoint, but rather to generate one semantic annotation that groups together all the knowledge of the different domains. In contrast, in our work, we aim to generate an annotation for each viewpoint with its own interpretation of the resource content based on its own knowledge.
Trichet, Aimé and Thovex [16] presented a platform that allows the generation of semantic annotations by taking multiple viewpoints and multiple users into consideration. The different viewpoints are defined through the use of multiple ontologies. These ontologies can be related to connected or unconnected domains. Users can use one or more ontologies to generate their annotations according to their own angle. In this work, one local annotation is manually generated for each user (viewpoint). This approach allows generating only heterogeneous annotations, without a consensual annotation.
In our previous work [42], we proposed an approach to multi-viewpoints semantic annotation based on annotation classes. An annotation class aims to link a page element to an ontological element according to a given viewpoint. The annotation class is manually generated by experts in the domain. The multi-viewpoints semantic annotation is the multi-instantiation of these classes. This approach requires unifying the structures of resource pages in the same domain. The modification of the structures is based on modifying the annotation processes.
Our current work differs from our earlier work in exploiting all the notions (global concepts, local concepts, subsumption between global and local concepts, …) which are defined in the multi-viewpoints ontology, in order to generate the consensual annotation and the heterogeneous annotations that specify the consensual annotation (i.e., the generation of each local annotation is not independent of the global annotation).
Additionally, our current work differs from our previous work in its use of the bridge rules that are defined in the multi-viewpoints ontology to develop a set of rules  It should be noted that, the execution time optimization is an advancement of our approach compared to the consummation of execution time in the case of the exploitation of a classical ontology. In fact, the local annotations according to different involved viewpoints can be realized in parallel (simultaneously).
According to Figure 3, most of the time is consumed during the local annotations phase because generally, the local hierarchies of different viewpoints are longer than the global hierarchy in the multi-view ontology. In addition, the time of the propagation of annotations, by using bridge rules, is negligible compared with local annotations phase. On the other hand, the time of the global annotation phase is very negligible compared with the second and third phases.
To evaluate our current approach, we applied it in several domains, including real estate agency, library, and medical domains. We had experts in the domains manually generate multi-viewpoints semantic annotations, and we compared the results of our approach with the experts' results, calculating the error rate in order to evaluate our results. The error rate is the number of erroneous annotations (compared to the experts' results) divided by the total number of annotations. We summarise our results in Table 3: It can be seen that the average error rates are very low. We therefore conclude that we have obtained good results with our approach.     Table 5: Evaluation of the capacity of the proposed approach in the enrichment of annotations In Table 4, we present a comparison, of the average consumption of execution time, between our approach and the existing works that allow exploiting a classical ontology.
We also calculated the number of specific annotations that enriched the consensual annotation and the number of inferred local annotations in order to show the advantages of our approach with respect to its inference capacity. Table 5 summarises the results.

Conclusion
We have proposed an approach to multi-viewpoints semantic annotation based on the DOMXML structure of a resource page and the multi-viewpoints ontology. This approach allows: 1. generating a consensual (global and common) annotation, 2. enriching the local annotations via an inference mechanism using a rule-based system, 3. generating a local annotation for each viewpoint via labeled substructures, 4. minimizing the number of comparisons between the page elements and the ontological elements by exploiting the structure of each page element and the rules mapping XML to OWL, 5. letting one page element be annotated by different specific concepts according to different viewpoints, and 6. letting a page element that doesn't have a local annotation according to a given viewpoint keep its global annotation. For future work, we plan to introduce the notion of fuzziness in the search for the attributes of concepts in the page. We also propose developing an approach that allows the enrichment of a multi-viewpoints ontology by multiviewpoints semantic annotation of resources, as well as an approach to reasoning on the multi-viewpoints annotations which allows selecting the appropriate results in a search engine according to a particular user viewpoint. Finally, we hope that our work on multi-viewpoints annotation can foster ideas for future investigations.