Closed-World Graphs in Open-World Infrastructure

How many of the graphs people load into triplestores are actually closed-world problems wearing open-world infrastructure?

The open-world assumption

RDF’s open-world assumption says that the absence of a statement doesn’t mean the statement is false. It means you don’t know yet. This is correct for federated knowledge: hospital A doesn’t know what hospital B knows, and the graph should accommodate both without contradiction.

The entire W3C linked data stack is built on this assumption. Triplestores accept any valid triple. SPARQL queries run against incomplete graphs. SHACL validates what’s present without assuming completeness. The architecture is designed for a world where knowledge is distributed, incomplete, and evolving.

For a university course catalog, this is overkill.

Graphs you already have

A course catalog declares every course and every prerequisite before the semester starts. The full graph exists at authoring time. Nothing is discovered at runtime. The catalog changes once a year, and between publications the graph is frozen.

The same is true for:

Project plans. Every task, dependency, and milestone is declared during planning. The graph is static until the next planning cycle.
Infrastructure topology. Every server, service, and dependency is declared in a manifest. Discovery happens when you author, not when you query.
Build pipelines. Every stage, every job, every dependency is declared in a YAML file. The pipeline graph is complete before the first job runs.
Supply chain BOMs. Every supplier, component, and dependency is known when you build the bill of materials.
Compliance frameworks. Every requirement, control, and obligation is published before the first audit.

These graphs share three properties: every node is known at authoring time, every edge is known at authoring time, and nothing changes between authoring and consumption.

They are closed-world problems.

The cost of the mismatch

If you want W3C linked data for a project plan, the standard path is: author triples, load them into a triplestore, run SPARQL to compute topology, run SHACL to validate constraints, serialize the results as JSON-LD. Five services coordinating to process a graph you had in your hands before any of them started.

This is the open-world tax: infrastructure designed for incomplete, federated, evolving knowledge applied to a graph that is none of those things.

The tax isn’t just operational. It’s cognitive. You learn RDF syntax, SPARQL query patterns, SHACL shape authoring, triplestore configuration, and serialization options before you produce your first conformant output. Each piece exists for a reason. The reasons don’t apply to your problem.

What closed-world actually needs

A closed-world graph needs three things: a way to declare resources with types and relationships, a way to enforce constraints at authoring time so invalid states can’t be represented, and a way to project the validated graph into W3C vocabularies.

That’s it. No store, because the graph doesn’t persist between authoring and consumption. No query engine, because the full graph is available to the evaluation step. No separate validator, because the constraints are structural.

The W3C vocabularies are not the problem. SHACL, SKOS, DCAT, PROV-O are well-designed output formats. The problem is that producing them currently requires an architecture built for a harder problem than most people have.

The bridge nobody talks about

The discussion always frames closed-world and open-world as a choice. Pick your architecture, live with the trade-offs. But the output of a closed-world system is standard JSON-LD. Any triplestore can import it. Any SPARQL endpoint can query it. Any SHACL processor can validate it further.

So validate locally, then publish globally.

Author your infrastructure manifest or course catalog in a closed-world constraint system. Every resource is typed. Every edge is validated. Every constraint is enforced at compile time. The graph cannot contain invalid states. Then export standard JSON-LD and load it into the open-world stack alongside graphs from other organizations that did the same thing.

The closed-world constraints don’t disappear when the data enters a triplestore. They already ran. The data that arrives is pre-validated. The triplestore doesn’t need to re-check constraints that were enforced structurally at authoring time. It just needs to federate.

This is how compilation has always worked. You don’t choose between type-checked code and running it in a dynamic environment. You type-check first, then deploy. The type system ensures local correctness. The runtime handles distribution. Nobody argues that static typing and dynamic dispatch are competing paradigms. They’re sequential.

The same applies here. Closed-world validation and open-world federation are not competing architectures. They’re phases. Validate the graph you have. Publish it for the world you don’t control.

The question

None of this applies to graphs that are born federated, where no single party knows the full shape. Those need the open-world stack from the start.

But course catalogs aren’t born federated. Infrastructure manifests aren’t evolving ontologies. Build pipelines aren’t cross-organizational data sharing exercises. They’re authored, validated, and published. The open world receives them. It doesn’t need to re-derive them.

How much of the linked data in production today is a closed-world graph that skipped the validation phase because the tooling assumed open-world from the start?