The Semantic Web

An Experiment

Click this link to search for walnut asparagus tomatoes -- or some other ingredients of interest.

Look at the top results. What information was Google able to extract from the recipe pages from different sources? How did it do that?

Go to one of those results and copy the URL. Enter that URL on this page. Wait until it finishes. This may take a minute. The warning "The URL could not be rendered. Some markup may be missing" is usually OK.

Click on the Recipe box on the right. What information is there? Where did it come from?

On the original recipe page, use your browser to view the page source. In that source, search for "ld+json".

Schema.org

An ontology for web content created and maintained by Google, Microsoft, and Yahoo, to improve search engine results.

The hierarchy on one page

JSON-LD

JSON-LD is JSON with links. A standardize to semantic content in machine-usable form on a web page, using a script element.

JSON-LD home page

JSON-LD web site usage

JSON-LD + schema.org

JSON-LD provides the syntax. schema.org provides the semantics.

The ontology is large but still very small. No "animal" ontology to take one simple example. Only what search engines care about: shopping, menus, music events, travel, ...

But the technology is there for anyone to use to embed semantic content on a web page, providing a single URL for human and machine visitors.

Knowledge bases on the web

RDF and SPARQL

RDF triples the norm. SPARQL syntax used to query.

SELECT ?name WHERE { 
  ?x <http://xmlns.com/foaf/0.1/knows> ?y .
  ?y <http://xmlns.com/foaf/0.1/name> ?name .
}

In our simple Lisp system

(graph-search '((?x knows ?y) (?y name ?name))

DBpedia and WikiData

Yago

Home page

SPARQL access

"YAGO2s is a huge semantic knowledge base, derived from Wikipedia, WordNet and GeoNames. Currently, YAGO2s has knowledge of more than 10 million entities (like persons, organizations, cities, etc.) and contains more than 120 million facts about these entities."

"The facts come from Wikidata, and the predicates have been mapped manually to the predicates of schema.org. Facts whose predicates could not be mapped were omitted."

The classes file, limited just to concepts with English Wikipedia articles is 62MB uncompressed and the associated facts file is 415GB compressed.