My modeling guidelines

A customer's question gave me the idea of writing down the influences on my modeling strategies and my own attitudes and guidelines.

These are my rules based on my current experience at the end of 2023. They are not universally valid but document the status of my development.

As few concepts and relations as possible

For me, this is one of the main advantages over the relational models that I have come to know in practice. Many column names in relational tables repeat concepts that also occur in other tables, but with a different name, thus suggesting different meanings. This makes it more difficult to learn the structure and apply it precisely.

Because I develop RDF models from the predicates, I automatically make sure that a predicate is only represented once. And the predicates correspond to the column names in SQL tables or relations.

The basic motivation for this rule lies in the limited memory of humans (or of me?): the fewer concepts and predicates an object is modeled with, the easier and faster it is to understand and correctly apply such a model.

As many classes and relations as necessary

A foreseeable addition :)

RDF offers something like namespaces that I can use to map the identically named but different concepts that appear in every project in my experience.

My standard example is the term "customer": management, accounting, sales and support usually have different definitions for "customer". This is often clearest in the contrast between support and accounting: for support, the "customer" is anyone who opens a support ticket, while for accounting, the "customer" is often the person who orders the payment.

These are obviously very different concepts (people or parts of the organization), and I have heard many an argument about what "really" constitutes a customer.

In RDF, for example, I can solve this by creating my own ontologies for the different areas and then working with terms such as accounting:customer and support:customer.

If this is how it is written on the flipboard, I won't get any contradictions in a modeling workshop and the model is easier for those present to understand and agree with.

Performance

Of course, there are also the hopefully few concepts or predicates that are only created for performance in a special triplestore, or to make the SPARQL queries more readable.

These depend very much on the subject area and the triplestore or system architecture used, there can hardly be any general rules here, the decision here depends on the situation.

Relations have the same importance as classes

I have just seen another interface that puts classes in the foreground. And this can tempt you to create relations for special classes; i.e. :label for one, :name for another. And avoiding this is, in my view, one of the advantages of RDF and OWL that should always be used - defining the relations just as thoroughly and precisely as the classes, so that everyone knows exactly what is meant when :label or :name is used. You could use both relations and, depending on the use case, perhaps see :label as a relation that points to a printable representation and :name as a string with the label. And here it is important to fix this once and for all in the ontology, and not to call it differently for different classes.

Therefore, when modeling, I always start with equally simple lists of classes and relations. And my goal is to keep both lists as precise and complete, but also as short as possible.

Event semantics

Donald Davidson was the first to work out in "The Logical Form of Action Sentences" from 1967 that verbs are translated into formal logic as events or occurrences, or in other words that actions are objects, or in the language of RDF that actions are subjects and not predicates.

In the RDF community, this is rarely argued, but often applied. In the W3C's "Organization Ontology", for example, the "Membership n-ary relationship" is introduced in section 2.2, without reference to Davidson or Event Semantic.

In my view, this treatment of membership as a node or subject instead of a link or predicate is presented here as something special instead of a normal case, e.g. by first introducing the simplified representation as a predicate org:headOf, and also later pointing out that the complete representation as a subject via org:Membership "[...] can be a little less convenient to query and explore [....]".

And the "thing" org:Membership is also sufficient to express a sentence such as "Mr. Meier started the job on x.x.xxxx", because the start and end time is appended to the org:Membership, so in many cases it replaces all three perspectives, both "is an employee of" and "has joined" or "has resigned", i.e. status description and the verbs resign or hire. Depending on the use case, of course, separate events are needed for the events hiring and terminating in order to record things like reasons etc.

My view is rather that the representation of an action as an event, as a node instead of a link, is the normal case, and any representation as a simple link leaves a lot of information under the table. For me, the representation as a link is then again an application of my first principle: as few concepts as possible, or a performance decision, which is partly dependent on the triplestore used: only if it is absolutely certain that no further information is needed in the foreseeable future can a simple link be used instead of a node; however, this should be examined individually in each case. My first version of a new ontology always models verbs as events/things/nodes.

Since the simple relation can also be generated from the more complex representation as a node, e.g. using SPARQL, it may be possible to keep both representations in the graph. Here, however, you must always define what the simple definition means: often it means the current situation, as "right now" X is employed by Y.

Then you must check whether the simplified link must be deleted every time there is a change in the employment relationship, i.e. in org:Membership. In other words, whether the time interval that is attached to the org:memberDuring via org:Membership was closed with a time:hasEnd and thus the membership was ended.

The term "reification" is also often used here, but this is problematic in the RDF environment, as RDF has a syntax form called "reification" that expresses something completely different. That's why I prefer not to use this term, but instead speak of event semantics or event semantic.

I personally have all such classes inherit from a class for "event". in my view, this helps as documentation for people who were not present during the development of this ontology.

Relationships with more than 2 participants

These cannot be modeled directly in RDF. Here you have to use a class that represents the multivalued relation. An example is :between, which has to be modeled as :Between.

Similarly to verbs that become classes, to document this I model as subclasses of something like :Relation,

Mapping reality, not syntactic features

I noticed this again when I a video from Semantic Arts, Inc about Temporal Relations in their top-level ontology gist.

Here, the speaker uses relations such as ":isSubjectOfRel" and ":isObjectOfRel", i.e. syntactic features for exactly such models as explained above, i.e. for actions or events as in his example ":_Ownership_JKL :isSubjectOfRel :_Person_Joe; :isSubjectOfRel :_Vehicle_TeslaXYZ."

In my opinion, a syntactic designation is used here that is simply wrong. "Joe owns a TeslaXYZ" versus "The TeslaXYZ is owned by Joe" shows, apart from the weird wording in the second example, why subject and object are not good names for the relations here: both are easily interchangeable.

Conceptual semantics

I see conceptual relations as more appropriate here: :agent or :actor instead of :subject for the agent, if necessary also :initiator to indicate who initiated an action. and then definitely :object as the object to which the action refers, because the :patient (the sufferer) used by John Sowa or SUMO, for example, was rather unclear or strange to most stakeholders. Then there can be many more attributes: :recipient, if an object is passed or forwarded in an action, :material or :resource, if something is consumed in or by the action (like oxygen when breathing) and and and

I am certainly influenced here by early approaches, such as Robert Schank's Conceptual Dependency Theory or John Sowa's Conceptual Graphs as well as the Suggested Upper Merged Ontology (SUMO, more on this later).

Conceptual Dependency demonstrates very nicely both what is feasible and how far you can take simplification in the sense of "few concepts for many words" and how counterproductive this attempt is if you take it too far. Good work has been done here, and even today I remember discussions about relations and basic actions from this area and use this to find the simplest possible concepts and relations or, as shown above, to reuse them directly.

Today's pragmatic task of creating a large enterprise knowledge graph basically requires exactly what Schank tried to do back then: as few, precise concepts as possible with which as much as possible can be expressed, whether seen from the perspective of management or employees in production, suppliers or customers.

Ontologies I take inspiration from

gist - the "minimalist upper ontology for the enterprise" by Semantic Arts, Inc is a well-considered, relatively small, and well-documented ontology with a business focus. There are a number of videos on youtube in which you can follow the development and learn a lot about the structure and further development of an ontology.

Another source of ideas is the Suggested Upper Merged Ontology (SUMO)

Not related to RDF, but to the more powerful SUO-KIF (Knowledge Interchange Format), it contains many useful concepts, e.g. case relations such as the :agent, :initiator used above and additional ones such as :instrument or :resource.
SUMO and its concepts are the result of a long coordination process within the framework of the Standard Upper Ontology Working Group of the IEEE. Many dedicated experts have worked on this, and it's worth checking out, even if not every feature is transferable to RDF.

Literature

A few books that I can recommend to anyone in the field of knowledge graphs:

Michael Uschold: Demystifying OWL for the Enterprise, Basel, 2018
The best introduction to the Web Ontology Language (OWL) that I know of.

Panos Alexopoulos: Semantic Modeling for Data, Sebastopol, CA:O'Reilly, 2020
Exposes pitfalls and dilemmas in designing an ontology. Read!

Adam Pease: Ontology A Practical Guide, Angwin:Articulate Software Press, 2011
Documents the application and modeling of an ontology using the example of SUMO.