A Tale of Two Schemas:
SHACL vs. OWL

Victor Charpenay

Outline

  1. A Schema for What?
  2. A Schema for Validation
  3. A Schema for Inference

A Schema for What?

Relational databases always rely on a schema for guaranteeing consistency.

A relational schema defines:

  • relation attributes
  • functional dependencies
  • integrity constraints

CREATE TABLE Person(
  taxID VARCHAR(25) NOT NULL,
  familyName VARCHAR(50) NOT NULL,
  givenName VARCHAR(50) NOT NULL,
  birthDate DATE,
  PRIMARY KEY (taxID)
);

INSERT INTO Person
VALUES
(NULL, 'Ketchum', 'Ash', NULL);

Error: taxID cannot be NULL.

Object-oriented models also rely on a schema to verify the executability of programs at compile time.

An object-oriented schema defines:

  • a class hierarchy
  • frames (attributes, methods)

class Person extends Thing {
  final String taxID;
  final String familyName;
  final String givenName;
  final Date birthDate;

  public Person(String taxID) { … }
}

Error: familyName is never assigned.

In contrast, schema.org is an RDF schema that cannot generate errors.

Yet, schema.org also defines:

  • a class hierarchy
  • properties (~attributes)

@prefix rdfs: <http://www.w3.org/…>
@prefix : <http://schema.org/>

:Person rdfs:subClassOf :Thing .
# (skipping taxID, etc...)
:birthDate
  :domainIncludes :Person ;
  :rangeIncludes :Date .

Schema.org borrows concepts from RDF Schema (RDFS), a W3C standard.

<ash>
  a :Person;
  # taxID: none
  :familyName "Ketchum" ;
  :givenName "Ash" .

OK.

To ensure large adoption, schema.org imposes no constraint on the structure of RDF triples that use its classes and properties.

Semantic Web practitioners favor the terms 'vocabulary' or 'ontology' over 'schema' to refer to such a model.

For example, states identify citizens with tax IDs but not all persons are citizens of some state, especially not fictional characters.

Some real persons also have multiple tax IDs.

It has been a design choice to keep schema.org generic, i.e.:

  • easy to extend but
  • hard to validate

What would a validation schema for RDF look like?

A Schema for Validation

The Shapes Constraint Language (SHACL) has been designed to declare constraints on classes and properties.

ex:PersonShape
  a sh:NodeShape ;
  sh:targetClass :Person ;
  sh:property [
    sh:path :taxID ;
    sh:minCount 1 ;
    sh:maxCount 1
  ] .

<ash>
  a :Person;
  # taxID: none
  :familyName "Ketchum" ;
  :givenName "Ash" .

Error: taxID has less than one value.

Try the example on the SHACL playground.

On the Semantic Web, everything is in RDF. Even SHACL validation reports.

[]
  a sh:ValidationResult ;
  sh:resultSeverity sh:Violation ;
  sh:focusNode <ash> ;
  sh:resultPath schema:taxID .

A SHACL schema is composed of node shapes.

A node shape applies to focus nodes, which are either:

  • a single target node or
  • instances of a target class

ex:PersonShape
  a sh:NodeShape ;
  sh:targetClass :Person ;
  sh:property [
    sh:path :taxID ;
    sh:minCount 1 ;
    sh:maxCount 1
  ] .

A node shape is composed of one or more property shapes.

A property shape defines a path from the focus node to value nodes.

ex:PersonShape
  a sh:NodeShape ;
  sh:targetClass :Person ;
  sh:property [
    sh:path :taxID ;
    sh:minCount 1 ;
    sh:maxCount 1 ;
  ] .

A property shape also defines constraints that must apply on all value nodes.

ex:PersonShape
  a sh:NodeShape ;
  sh:targetClass :Person ;
  sh:property [
    sh:path :taxID ;
    sh:minCount 1 ;
    sh:maxCount 1
  ] .

The SHACL standard includes numerous built-in constraints.

Value type Cardinality Number String Combination Recursivity
sh:class
sh:datatype
sh:minCount
sh:maxCount
sh:minExclusive
sh:maxExclusive
sh:minInclusive
sh:maxInclusive
sh:minLength
sh:maxLength
sh:pattern
sh:or
sh:and
sh:not
sh:node
Main built-in SHACL constraints

ex:PersonShape
  a sh:NodeShape ;
  sh:targetClass :Person ;
  sh:property [
    sh:path :birthDate ;
    sh:datatype xsd:date
  ] .

SHACL was introduced in 2017.

How could Semantic Web developers validate their RDF graphs before 2017?

SPARQL can also be considered as a constraint language.

SELECT ?person WHERE {
  ?person a :Person .
  FILTER NOT EXISTS {
    # ~sh:minCount 1
    ?person :taxID ?id .
  }
}

SELECT ?person WHERE {
  ?person
    a :Person ;
    :birthDate ?bdate .
  FILTER (
    # ~sh:datatype xsd:date
    datatype(?bdate) = xsd:date
  )
}

SHACL and SPARQL can be combined to provide an expressive validation language.

The integration, along with many features of SHACL, is illustrated in a single example.

SHACL shapes must be defined in RDF, which might be tedious to write.

The main benefit is that RDF class and property declarations can embed SHACL shapes.

ex:Citizen
  a rdfs:Class ;
  rdfs:subClassOf :Person;
  sh:property [
    sh:path :taxID ;
    sh:minCount 1 ;
    sh:maxCount 1
  ] .

If a class is also a node shape, it implicitly targets instances of itself.

In contrast to SHACL, the Shape Expressions (ShEx) language made the choice of a distinct, more succinct syntax.

:PersonShape {
  :taxID xsd:string,
  :familyName xsd:string,
  :givenName xsd:string,
  :birthDate xsd:date
}

A Schema for Inference

SHACL is a valuable part of the Semantic Web technology stack.

However...

<junichi-masuda>
  a :Person;
  :familyName "Masuda" ;
  :givenName "Junichi" ;
  :birthDate "1968-01-12" .

Do these statements about Junichi Masuda validate the constraint that a birth date must be a date?

In many cases, obvious statements are not asserted.

A schema can help infer obvious statements from asserted ones.

A schema used for inference should rather be called an ontology.

<junichi-masuda>
  :birthDate "1968-01-12"^^xsd:date .

… is a necessary fact for the shape to be validated. It can be considered true.

Still, many statements are unknown with respect to a schema.

One cannot assert all what is true with a finite vocabulary.

The designer of a vocabulary must define its frame:

  • what is within the frame is known for certain
  • what is outside the frame is unknown

The two possible treatments for unknown statements are:

  • the Closed World assumption
    what is not stated is false
  • the Open World assumption
    what is not stated is undefined

Constraint satisfaction
is typically done under the
Closed World assumption.

Whatever is valid in a closed world remains valid in an open world.

Inference
is typically done under the
Open World assumption.

Whatever is inferred in an open world remains inferred in a closed world.

RDF Schema (RDFS) is a minimal language to define vocabularies.

RDFS includes:

  • sub-class relations,
  • sub-property relation,
  • property domain definitions and
  • property range definitions

The Web Ontology Language (OWL) is an overly complex language to define ontologies.

OWL includes:

  • all of RDFS
  • class combinations
    union, intersection, complement
  • property characteristics
    transitivity, symmetry, inverse, …
  • property chains

OWL also includes so-called restrictions, including:

  • existential restrictions
  • universal restrictions
  • cardinality constraints

For an introduction to OWL, see the OWL quick reference document.