Relational databases always rely on a schema for guaranteeing consistency.
A relational schema defines:
CREATE TABLE Person(
taxID VARCHAR(25) NOT NULL,
familyName VARCHAR(50) NOT NULL,
givenName VARCHAR(50) NOT NULL,
birthDate DATE,
PRIMARY KEY (taxID)
);
INSERT INTO Person
VALUES
(NULL, 'Ketchum', 'Ash', NULL);
Error: taxID cannot be NULL.
Object-oriented models also rely on a schema to verify the executability of programs at compile time.
An object-oriented schema defines:
class Person extends Thing {
final String taxID;
final String familyName;
final String givenName;
final Date birthDate;
public Person(String taxID) { … }
}
Error: familyName is never assigned.
In contrast, schema.org is an RDF schema that cannot generate errors.
Yet, schema.org also defines:
@prefix rdfs: <http://www.w3.org/…>
@prefix : <http://schema.org/>
:Person rdfs:subClassOf :Thing .
# (skipping taxID, etc...)
:birthDate
:domainIncludes :Person ;
:rangeIncludes :Date .
Schema.org borrows concepts from RDF Schema (RDFS), a W3C standard.
<ash>
a :Person;
# taxID: none
:familyName "Ketchum" ;
:givenName "Ash" .
OK.
To ensure large adoption, schema.org imposes no constraint on the structure of RDF triples that use its classes and properties.
Semantic Web practitioners favor the terms 'vocabulary' or 'ontology' over 'schema' to refer to such a model.
For example, states identify citizens with tax IDs but not all persons are citizens of some state, especially not fictional characters.
Some real persons also have multiple tax IDs.
It has been a design choice to keep schema.org generic, i.e.:
What would a validation schema for RDF look like?
The Shapes Constraint Language (SHACL) has been designed to declare constraints on classes and properties.
ex:PersonShape
a sh:NodeShape ;
sh:targetClass :Person ;
sh:property [
sh:path :taxID ;
sh:minCount 1 ;
sh:maxCount 1
] .
<ash>
a :Person;
# taxID: none
:familyName "Ketchum" ;
:givenName "Ash" .
Error: taxID has less than one value.
Try the example on the SHACL playground.
On the Semantic Web, everything is in RDF. Even SHACL validation reports.
[]
a sh:ValidationResult ;
sh:resultSeverity sh:Violation ;
sh:focusNode <ash> ;
sh:resultPath schema:taxID .
A SHACL schema is composed of node shapes.
A node shape applies to focus nodes, which are either:
ex:PersonShape
a sh:NodeShape ;
sh:targetClass :Person ;
sh:property [
sh:path :taxID ;
sh:minCount 1 ;
sh:maxCount 1
] .
A node shape is composed of one or more property shapes.
A property shape defines a path from the focus node to value nodes.
ex:PersonShape
a sh:NodeShape ;
sh:targetClass :Person ;
sh:property [
sh:path :taxID ;
sh:minCount 1 ;
sh:maxCount 1 ;
] .
A property shape also defines constraints that must apply on all value nodes.
ex:PersonShape
a sh:NodeShape ;
sh:targetClass :Person ;
sh:property [
sh:path :taxID ;
sh:minCount 1 ;
sh:maxCount 1
] .
The SHACL standard includes numerous built-in constraints.
Value type | Cardinality | Number | String | Combination | Recursivity |
sh:class sh:datatype |
sh:minCount sh:maxCount |
sh:minExclusive sh:maxExclusive sh:minInclusive sh:maxInclusive |
sh:minLength sh:maxLength sh:pattern |
sh:or sh:and sh:not |
sh:node |
ex:PersonShape
a sh:NodeShape ;
sh:targetClass :Person ;
sh:property [
sh:path :birthDate ;
sh:datatype xsd:date
] .
SHACL was introduced in 2017.
How could Semantic Web developers validate their RDF graphs before 2017?
SPARQL can also be considered as a constraint language.
SELECT ?person WHERE {
?person a :Person .
FILTER NOT EXISTS {
# ~sh:minCount 1
?person :taxID ?id .
}
}
SELECT ?person WHERE {
?person
a :Person ;
:birthDate ?bdate .
FILTER (
# ~sh:datatype xsd:date
datatype(?bdate) = xsd:date
)
}
SHACL and SPARQL can be combined to provide an expressive validation language.
The integration, along with many features of SHACL, is illustrated in a single example.
SHACL shapes must be defined in RDF, which might be tedious to write.
The main benefit is that RDF class and property declarations can embed SHACL shapes.
ex:Citizen
a rdfs:Class ;
rdfs:subClassOf :Person;
sh:property [
sh:path :taxID ;
sh:minCount 1 ;
sh:maxCount 1
] .
If a class is also a node shape, it implicitly targets instances of itself.
In contrast to SHACL, the Shape Expressions (ShEx) language made the choice of a distinct, more succinct syntax.
:PersonShape {
:taxID xsd:string,
:familyName xsd:string,
:givenName xsd:string,
:birthDate xsd:date
}
SHACL is a valuable part of the Semantic Web technology stack.
However...
<junichi-masuda>
a :Person;
:familyName "Masuda" ;
:givenName "Junichi" ;
:birthDate "1968-01-12" .
Do these statements about Junichi Masuda validate the constraint that a birth date must be a date?
In many cases, obvious statements are not asserted.
A schema can help infer obvious statements from asserted ones.
A schema used for inference should rather be called an ontology.
<junichi-masuda>
:birthDate "1968-01-12"^^xsd:date .
… is a necessary fact for the shape to be validated. It can be considered true.
Still, many statements are unknown with respect to a schema.
One cannot assert all what is true with a finite vocabulary.
The designer of a vocabulary must define its frame:
The two possible treatments for unknown statements are:
Constraint satisfaction
is typically done under the
Closed World assumption.
Whatever is valid in a closed world remains valid in an open world.
Inference
is typically done under the
Open World assumption.
Whatever is inferred in an open world remains inferred in a closed world.
RDF Schema (RDFS) is a minimal language to define vocabularies.
RDFS includes:
The Web Ontology Language (OWL) is an overly complex language to define ontologies.
OWL includes:
OWL also includes so-called restrictions, including:
For an introduction to OWL, see the OWL quick reference document.