The Iris Dataset Revisited – a Partial Ordering Study

The well-known Iris data set has been studied applying partial ordering methodology. Previous studies, e.g., applying supervision learning such as neural networks (NN) and support-vector machines (SVM) perfectly distinguish between the three Iris subgroups, i.e., Iris Setosa, Iris Versicolour and Iris Virginica, respectively, in contrast to, e.g., K-means clustering that only separates the full Iris data set in two clusters. In the present study applying partial ordering methodology further discloses the difference between the different classification methods. The partial ordering results appears to be in perfect agreement with the results of the K-means clustering, which means that the clear separation in the three Iris subsets applying NN and SVM is neither recognized by clustering nor by partial ordering methodology.


Introduction
One of the most often applied datasets in machine learning studies test cases is the Iris dataset [1,2]. This dataset includes 150 entries comprising 3 x 50 entries for three subspecies of class of iris plant, i.e., Iris Setosa (iset), Iris Versicolour (i-ver) and Iris Virginica (i-vir), respectively. The plants are characterized by four indicators, i.e., Sepal length (SepalL), Sepal width (SepalW), Petal length (PetalL) and Petal width (PetalW), respectively, all in cm.
We find that supervised learning, like neural network and SVM, nicely classify the 3 classes Iris Setosa (i-set), Iris Versicolour (i-ver) and Iris Virginica (i-vir). Using 60% randomly chosen entries as test set a neural network with one hidden layer with 3 nodes leads to only one misclassification between the remaining 40% of the entries serving as validation set. A similar result was obtained using a SVM approach with a radial kernel. Here we find two misclassifications of i-vir being classified as i-ver. In the case of K-means clustering a somewhat less clear picture develops.
In a recent chemistry-oriented study we investigated the potential use of partial ordering methodology as a tool for classification of alkyl anilines [3]. In the present study we apply partial order methodologies to further study as to how far the supervised classification of the three types of Iris-plants can be re-found.
The mathematical theory of partial orders seems to be started in the late 19 th century (cf. [4]), however, the main development to establish an own mathematical discipline with the methodological components of combinatorics, algebra and graph theory, can be attributed to the work of Birkhoff [5] and Hasse [6]. To our knowledge there were only few applications of the theory of partial order, i.e., in statistics, e.g. [7][8][9], concepts of phase transfers [10] and early electronics [11]. In chemistry important and theoretically attractive applications were found by Ruch [12]. Nevertheless, these concepts became not popular, albeit their theoretical beauty. With publications of Randic [13] and Klein [14] and after the pioneering work of Halfon and Reggiani [15] the mathematical theory of partial order became a useful tool in environmental sciences. The background of this development is that environmental systems are complex and a decision about environmental issues had to be based on a set of indicators, describing the state of the environmental system. However, decisions based on a set of indicators are difficult and caused the usage of multicriteria decision aids (MCDA). Famous MCDA-methods are PROMETHEE [16], Electre [17] and partial order concepts [18]. Today the partial order theory is further developed, mainly in the field of multi indicator-systems and is recently applied in social sciences too, e.g. [19]. The latest methodological development is mainly focused on, how to include data uncertainty [20].

Data
The data for the current study is the well-known Iris dataset [1] comprising 3 x 50 entries for three subspecies of class of iris plant, i.e., Iris Setosa (i-set), Iris Versicolour (i-ver) and Iris Virginica (i-vir), respectively. Thus, the Iris dataset comprises in total 150 entries.

Basic concepts of partial order
Let X be a set of objects, labeled by xi (i = 1,…,n), which can be for example chemical compounds. To define an order relation among them, the relation "" has to obey the following order axioms: • reflexivity: the object can be compared with itself, i.e., x ≤ x • antisymmetry: if x  y and y  x  x = y • transitivity: if x  y and y  z  x  z A special realization of order relations is given by eq. 1. Equation 1 expresses a mapping from object x to its representation by a tuple q with m components, as well as the order relation among objects defined by the simultaneous evolution of the tuple q.
• x → q (the set of objects, X, is mapped onto the set of tuples {q}, by assigning to each object x its tuple, based on the values of the considered indicators, i.e., is a selection of certain properties of x • x  y :  (qj(x)  (qj(y) for all j (1) By eq. 1 a partial order is defined, and the object set X is by eq. 1 equipped with partial order relation; such a set is called a partially ordered set, in brief poset, denoted often as: (X, ≤.). If neither x ≤ y nor y ≤ x, x is said to be incomparable with y, denoted as x ǁ y. By eq. 1 an order relation is defined, if x ≤ y or y ≤ x. The presence of an order relation can be described by the zeta matrix. The zeta matrix is defined as follows: From transitivity it follows that in case of x ≤ z and z ≤ y the implication x ≤ z can be deduced from the premise, i.e., a rational description of the partial order can be given by the cover relation: " y is covering x, if x < y and there is no other element z, for which x < z and z ≤ y. Both relations, i.e., the order and the cover-relation can be expressed by adjacency like matrices.
By application of the cover-relation a graph is constructed. This graph is, based on the three axioms of partial order • directed (due to the order relation) • triangle free (due to the cover relation) and • does not contain cycles, due to the antisymmetry. By convention, originally introduced by Halfon and Reggiani [15], the graph is drawn with • x  y locating x below y, • attempting a symmetric presentation as far as possible • by an arrangement of objects in levels. For a detailed explanation see [18]. For examples, see sect. 3.

Levels
A poset (X, ≤) can be partitioned into a family of subsets Xi ⸦ X: (X, ≤ ) =  (Xi, ≤), i.e. Xi , Xj ⸦ X and Xi  Xj = , i≠j (2) The symbol  is a shorthand notation for the union of mutually non-intersecting subsets whereby any pair x, y with x  Xi and y Xj (with i ≠ j) implies x ǁ y .
The family of sets Xi can be ordered, i.e., Xi1 < Xi2:  there is an element in Xi1 which is covered by an element in Xi2. The sets Xi obeying the above relations are called levels.
The dissection of X into levels, i.e., into subsets obeying not only the order theoretical characterization, given above, but also eq. 2, allows a geometrical representation by a so-called Hasse diagram that can be seen as a rectangle, filled with the bottom level, then the next level, until the top level is reached. Important is the possibility to perform for each level a permutation (this is possible because there is no order relation among the elements of a level) in that manner, that supervised subsets can be given specific locations within any level. If for example a Hasse diagram has four levels, then its representation by level may look like that in Fig. 1.

Dominance and separability
A partially ordered set may be partitioned in any other manner, i.e., not following the level-construction, but following a supervised classification, e.g., through an aggregation process [21]. Let X be partitioned into a family of sets Xi, i= 1,…,r, where the sets Xi are externally defined.. Then the natural question is, as to how far the family of sets {Xi} can be partially ordered. This question was in depth analyzed by Restrepo and Bruggemann [22]. Here, however, we follow a different concept.
Assume that by any cluster analysis (K-means, hierarchical models) the family of subsets Xi is defined. What at best can be expected from an analysis by partial order? In order to arrive at this aim, the partial order concepts of linear sums and disjoint union of sets must be defined: Let Xi, Xj be subsets of X, then: Xi and Xj form a linear sum, denoted as Xi ≤ Xj (in order to differentiate this symbol from that in eq. 2, we add as a subscript "≤") when: For all x Xi, and all y Xj x ≤ y, Similarly: Xi and Xj form a complete disjoint union of sets, denoted as Xi ≤ Xj (in order to differentiate this symbol from the union symbol in set theory, we add as a subscript "≤") when: For all x Xi, and all y Xj x || y, A pretty clear classification by partial order theory can be obtained, when the poset is either a linear sum, cf. [21] or a complete disjoint union of sets [23]. Such a classification due to partial order concepts can be visualized as shown in Fig. 2. In Fig. 2A x ≥ y for any x  Xik and y Xik+1. A partially ordered set representable by Fig. 2A is called a linear sum. In contrast to the linear sum construction is the disjoint union of sets Xik in Fig. 2B. Here the following is valid: x ǁ y for all x  Xik and y Xik+s.
Structures like those shown in Fig. 2 cannot be expected within a real data set. The question is, as to how far approximations corresponding to the two archetypes of Fig. 2 can be found. Hereto two matrices, dominance and separability are introduced.

Software
The applied software is PyHasse, programmed applying the programming (interpreter) language Python and, in honor of Helmut Hasse, who was one of the main mathematicians, investigating partial order. The complete PyHasse software package is available from Dr. Bruggemann (brg_home@web.de). A limited version can be assessed at www.pyhasse.org (For further details, see [24,25]).

K-means clustering
K-means clustering and Hierarchical clustering (HCA) apparently does lead to less clear-cut pictures. Thus, we find that K-means clustering virtually separated i-set from i-vir and i-ver, whereas a separation of i-vir and iver is significantly less pronounced (Fig. 3A). This is in agreement with the plots shown in [2]. A further analyses including the i-ver and i-vir sets lead to some separation between the two sets although a significant overlap is seen (Fig. 3B).
The answer to the somewhat surprising K-means clustering can be found in the data shown in Table 1. Table 1 discloses the i-set/i-ver, i-set/i-vir and i-ver/i-vir ratios for the 50 sample in the three iris sets set 1, set2 and set 3. It is immediate noted that in the cases of the iset/i-ver and i-set/i-vir the rations are significantly different from 1, whereas in the case of i-ver/i-vir th ratio values are in most case rather close to 1, explaining the lack of separation between), Iris Versicolour and Iris Virginica as displayed in the K-means clustering.
A further discussion of the K-means clustering that is a well-established method is not in the focus od the present paper.

The Hasse diagram -visual inspection
The Hasse diagram of the complete Iris dataset is found in Fig. 4.
Inspecting Fig. 4 a substructure, which hopefully mimics the three Iris-families, can obviously not be recognized. There is no clear separation in the sense of eq. 4, which can be just visually detected.
The tools outlined in sect. 2.4 may be helpful to find a structure in the Hasse diagram, when the classification into the three Iris-subsets is used. Hence, we sharpen our message to: Given the classification into the three Irissubsets, what posetic relations among these three subsets can be found. As visual techniques fail, numerical devices as the dominance and separability matrices are necessary.

Dominances and separabilities
Applying the appropriate PyHasse module, mainly ddssimpl and the new module ddssimpl_batch (for ddssimpl,, cf. [22] the dominance and separability matrices for the three subsets are The separability matrix shows a clear separation between set1 and sets2 and set3, respectively, However, between set2 and set3 a considerable overlapping can be noted expressed by the relatively low value of the nondiagonal element Sep(2,3) = 0.368, which does not justify a separation in the sense of Fig. 3B. The dominance matrix shows correspondingly for the entry Dom(3,2) a value > 0.5. This result is in perfect agreement with the above discussed K-means results.

Separability matrix as mean to visualize the classification
Taken just the two sets, set1 (i-set) and set2 (i-ver), the Hasse diagram is shown in Fig. 5.
In contrast to the pretty clear separation between set1 and set2 (Table 5A) on the one hand, and set1 and set3 (Fig. 5B) on the other hand, the Hasse diagram, based on set2 and set3 only, shows a structure, which schematically could be visualized, as shown in Fig. 6.
The blue part is located below the orange colored part. Hence, there are many order relations of x set3 and y  set2, where x > y in the order-theoretical sense. This explains pictorially that dom(3,2) >> dom(2,3) (vide supra). As the blue part is located on the left side, whereas the orange part of the right side of the schematic representation of a Hasse diagram, there are also many relations with x ǁ y. The order theory does not support a separation between set2 and set3. An enhancement with respect to dominance relations may be given by the two little rectangles in the middle of the scheme (Fig. 6), which we call a "nose". Note that from a geometrical point of view, the elements of the "nose" could be arranged so that formally the two triangles are not perturbed. However, the "nose" indicates the count of elements which lead to the irregular structure, shown in Fig. 6. The real Hasse diagram of set2 and set3 is shown in Fig. 7, where the above schematic structure (Fig. 6) easily is recognized.
It remains to discuss two points: 1. What is the effect of the elements within the "nose" (section 3.4)? 2. What can be said about an internal partitioning of the sets i-set, i-ver and i-vir (section 3.5)?

Effect of the elements of the "nose"
In this section we investigate, as to how far elements of a specific geometric configuration, here the "nose" can influence the values of the dominance and separability matrix. It is to be clarified, whether or not such geometrical configuration destabilizes the conclusions based on the two concepts, i.e., dominance and  There will be 7 runs: a. Keeping i-vir constant and eliminate one after another 27, 28, 37 to study the effect of the "noses" of i-ver and b. Keeping i-ver constant and eliminate 12, 14, 22, 27, respectively, one after another.
The results are shown in Table 2. All in all the entries of the dominance and separability matrix are only slightly changed. The elements of the "nose" do not contribute much to the general dominance behavior, i.e., the presence of the "noses" does not change the impression that i-vir dominates to some degree the set i-ver. The separability values are in comparison to the standard reduced, showing that the elements of the "nose" contribute somewhat to the incomparabilities between the elements of i-vir and i-ver, respectively. When all elements of the "nose" are eliminated (9th row in Table 2) then the dominance of i-vir over i-ver is slightly enhanced, that of i-ver over i-vir reduced and the separation reduced. The comparison to the effect of the elements of the top level, namely i-vir10,i-vir18,i-vir19,i-vir32, i-vir36 shows that the position of elements to be eliminated within the levelorder-system as schematically shown in Fig. 6 does not play a strong role.

Internal structures
So far it has been demonstrated that posetic relations can roughly, but not convincingly verify the supervised classification. A further question remains, which is a partial-order-like point of view. Thus, do internal separations occur? Are there subsets of set1, set2 and set3, respectively that may be identified by partial order theory?
We answer these questions by an analysis of the i-set (set1). In Fig. 8 the Hasse diagram of set1 is shown.
By a simple optical inspection it is clear that there are two subsets which dominate each other to a striking degree. This situation can be schematically illustrated by Fig. 9.
In order not solely to rely on the visual impression, the dominance and separability approach (cf. sect. 3.2) is brought into play.

Dominances and separabilities
The (relative) dominance matrix was calculated to be The values of this matrix imply that there is a vague dominance of subset1 over subset2 (which is a singleton (namely comprising only i-set8), and a slightly clearer dominance of subset1 over subset3. It is further disclosed that i-set8 is higher located than all elements of subset3, i.e., completely dominating subset3. Here it becomes clear that the incomparability among elements of subset1 and subset2 is the main result. Whereas the role of incomparabilities among elements of subset1 and those of subset3 is relatively low. Thus, a model for a classification for the i-set-series is the Hasse diagram, found in Fig. 10 using the PyHasse module ddssimpl1.py.
It can be seen, that due to the values of the dominance and separability matrices the differentiation between subset1 and subset3 is more pronounced than a differentiation within subset1.
Just by a simple visual inspection of Fig. 7 it is clear that a differentiation between i-vir and i-ver does not appear meaningful.

Conclusions and outlook
The results of partial order theory suggest a pictorial representation as follows (Fig. 11) The result summarized in Figure 11 is in perfect agreement with the results of the K-means clustering (cf. Fig. 1). Hence, the clear separation in the three sets is disclosed through the supervised leaning by NN and SVM cannot be verified neither by clustering nor by partial ordering methodology.
The slight dominance of Set3 over Set2 is not indicated, it is discussed in more detail in Sect. 3. Correspondingly the internal structure within Set1 is not represented.
Section 3.4 shows that in the case of so many elements in each set (i-set, i-ver, i-vir) , the effect of elements in the "nose" seems to play a minor role. These elements contribute slightly to incomparabilities, but do not change the dominance values. The reason is that these elements are mainly "in-between"-elements. Elements above the "nose" (i-vir) are still connected with elements below the "nose" (i-ver) , independently whether elements in the "nose" are eliminated or not. Section 3.5 shows, based on optical inspection of the corresponding Hasse diagrams that at maximum the i-set could be further partitioned, although a complete dominance cannot be obtained.
The interplay between ranking studies and classification is emphasized. Hence, researchers being interested in ranking may additionally be interested in classification. The present study suggests that partial ordering may be helpful also in this second aspect. This can be thought of as being comfortable for any user. However, still more experience and work is needed to further elucidate how, e.g., NN/SVM and partial order methodology can supplement each other. Thus, in order further to elucidate the use of partial ordering for classification studies it appears appropriate to investigating the stability of the dominance and separability matrices with respect to data uncertainty.
The application of dominance and separability matrices implies a further critical point: The central point in many classification algorithms is the metric information inherent in the data matrix. By adoption of the concept of dominance and separability matrices, however, metric information is lost.
A possible way to some extent to recover the metric information, will be to consider not just the poset of observed flowers, but to embed this into a larger poset built upon the set of all profiles (combination of values) obtained by discretizing the input variables in a sufficiently fine way (see for example papers by Fattore and Maggino, [26] Fattore, [19]). Based on this construction, the mutual ranking probability matrix of the observed profiles, might lead to better separation of classes. However, enlarging the poset unequivocally leads to more difficult computations. Thus, instead of the dominance-and separability matrices a matrix of mutual ranking probabilities could be applied to decide whether or not a linear sum or a complete disjoint union could be stated. However, this procedure is on the one side computationally difficult, and, on the other side, especially the poset of all profiles is losing their structural information, which is evident, if an object oriented poset is applied, as in the case studied here. Therefore, we postpone this kind of analysis for a further publication.