These exercises develop some useful utility functions for working with different aspects of the Semantic Web.
Exercises
SW-1: Name conversion
Function names: camelize
, hyphenate
Test cases: microdata-exs-tests.lisp.
A common problem in many languages is mapping name strings from
one camelCase to hyphens or back. For example, in Javascript, names
are camelCase, but CSS is hyphenated, so the CSS attribute
font-size
has to be written as fontSize
in Javascript.
For us, this occurs when reading Semantic Web names which
use camelCase, e.g., JobPosting
, but we
want a Lisp symbol like job-posting
.
Define (camelize string [capitalize])
to take a hyphenated string
and return the camelCase equivalent. If the optional second argument is
given and is true, then the first letter should be capitalized.
Define (hyphenate string [case] )
to take a camelCase string
and return the hyphenated equivalent. If the optional second argument is
:upper
(the default), then the result string be all upper case.
If it is :lower
, it should be all lower case. Any other value is
an error.
Only insert a hyphen when the case changes. Something like "getURL" should become "GET-URL" not "GET-U-R-L".
SW-2: Microdata Reader
This exercise has been retired, in favor of JSON-LD
SW-3: JSON-LD reader
Function name: (read-json-ld url-string)
Submit this in the Code Critic under SWP 1: Semantic Web Personal 1
Your somewhat open-ended job in this exercise is to write a function that can take a URL, get the JSON-LD stored on that page, if any, and return a list of the entities defined in the JSON-LD as a list of frames, using the format for a frame used in class:
(instance-id (abstractions*) attributes*)
There are no test cases, but your code should handle at least the fairly complex example at https://www.foodnetwork.com/recipes/food-network-kitchen/honey-mustard-dressing-recipe-2011614
What to read
- Use JSON-LD to add schema.org data to your website -- a short clear example of how JSON-LD is added to a web page to describe what the page is about
- schema.org -- the end of every concept and property documentation page has examples in various formats, including JSON-LD
- Steal our JSON-LD -- a website to help web sites generate valid JSON-LD for various types of pages
The point of JSON-LD is that semantic data can be embedded in a web page in an easy to retrieve location, using the standardized schema.org ontology, in a relatively easy to parse JSON structure. Everything uses standard web technologies, supported by many programming languages.
Libraries
For this exercise, you will use several Common Lisp libraries, that can be loaded with QuickLisp:
- Drakma -- a library for fetching web data given URLs
- CL-JSON -- a library for parsing JSON text into a corresponding Lisp list structure
- CL-HTML -- a library for fetching parsing HTML text into a corresponding Lisp list structure
Lisp implementation-specific notes
LispWorks: You need to tell LispWorks to allow international characters in strings, since they occur often on web pages. You do that by executing:
(lw:set-default-character-element-type 'cl:character)
This needs to be re-done every time you start Lisp. If you want, you can put it in your init file.
To install these three libraries:
(ql:quickload "drakma") (ql:quickload "cl-json") #-allegro (ql:quickload "cl-html-parse")
The #-allegro tells the Lisp reader to skip the expression that follows when running in Allegro. The HTML library is already in Allegro and does not need to be installed.
With these libraries installed,
- (flexi-streams:octets-to-string (drakma:http-request "url-string" :force-binary t)) returns a string with the text in the file retrieved with the given URL
- (net.html.parser:parse-html string) takes a string with HTML and returns a nested Lisp syntax that's easy to process with normal Lisp functions.
- (json:decode-json-from-string string with JSON) takes a string with JSON (or JSON-LD) and returns a nested Lisp syntax that's easy to process with normal Lisp functions.
Test all of these functions by hand before going any further to make sure they work properly. If not, post to Campuswire. Include what Lisp and operating system you are using, and the exact input and output.
Testing and submitting
Test your code on several recipes at Food Network. Look other recipes on other sites.
Test your code on a non-recipe site. According to current search engine statistics, over 40% of all websites include JSON-LD! How do you find them? One clue is when web search returns a rich snippet. For example, if you search for music concerts, and you get a list of labeled events with critical details, then the source pages for those events probably have JSON-LD. Also see JSON-LD notable examples.
Submit all the functions you defined to do this task. Include the URLs you tested it on. Note issues and problems that you could and could not solve.