These exercises develop some useful utility functions for working with different aspects of the Semantic Web.
Exercises
SW-1: Name conversion
Function names: camelize
, hyphenate
Test cases: microdata-exs-tests.lisp.
A common problem in many languages is mapping name strings from
one camelCase to hyphens or back. For example, in Javascript, names
are camelCase, but CSS is hyphenated, so the CSS attribute
font-size
has to be written as fontSize
in Javascript.
For us, this occurs when reading Semantic Web names which
use camelCase, e.g., JobPosting
, but we
want a Lisp symbol like job-posting
.
Define (camelize string [capitalize])
to take a hyphenated string
and return the camelCase equivalent. If the optional second argument is
given and is true, then the first letter should be capitalized.
Define (hyphenate string [case] )
to take a camelCase string
and return the hyphenated equivalent. If the optional second argument is
:upper
(the default), then the result string be all upper case.
If it is :lower
, it should be all lower case. Any other value is
an error.
Only insert a hyphen when the case changes. Something like "getURL" should become "GET-URL" not "GET-U-R-L".
SW-2: Microdata Reader
This exercise has been retired, in favor of JSON-LD
SW-3: JSON-LD reader
Function name: (read-json-ld url-string)
Submit this in the Code Critic under SWP 1: Semantic Web Personal 1
Your somewhat open-ended job in this exercise is to write a function that can take a URL, get the JSON-LD stored on that page, if any, and return a list of the entities defined in the JSON-LD as a nested list structure. See the testing subsection below for some URLs to test on.
What to read
- Use JSON-LD to add schema.org data to your website -- a short clear example of how JSON-LD is added to a web page to describe what the page is about
- schema.org -- the end of every concept and property documentation page has examples in various formats, including JSON-LD
- Steal our JSON-LD -- a website to help web sites generate valid JSON-LD for various types of pages
The point of JSON-LD is that semantic data can be embedded in a web page in an easy to retrieve location, using the standardized schema.org ontology, in a relatively easy to parse JSON structure. Everything uses standard web technologies, supported by many programming languages.
Libraries
For this exercise, most of the work is done using several Common Lisp libraries that can be loaded with QuickLisp:
- Dexador -- a library for fetching web data given URLs
- CL-HTML -- a library for fetching parsing HTML text into a corresponding Lisp list structure
- CL-JSON -- a library for parsing JSON text into a corresponding Lisp list structure
Lisp implementation-specific notes
LispWorks: You need to tell LispWorks to allow international characters in strings, since they occur often on web pages. You do that by executing:
#+lispworks (lw:set-default-character-element-type 'cl:character)
This needs to be done every time you start Lisp. If you want, you
can put it in your code file or your init file. The #+lispworks
says "only do the next expression in Lispworks".
To install these three libraries:
(ql:quickload "dexador") (ql:quickload "cl-json") #-allegro (ql:quickload "cl-html-parse")
The #-allegro
says "do not do the next expression in Allegro".
The HTML library is already in Allegro and the portable version should not
be installed.
With these libraries installed,
- (dex:get "url-string") returns a string with the text in the file retrieved with the given URL
- (net.html.parser:parse-html string) takes a string with HTML and returns a nested Lisp syntax that's easy to process with normal Lisp functions.
- (json:decode-json-from-string string with JSON) takes a string with JSON (or JSON-LD) and returns a nested Lisp syntax that's easy to process with normal Lisp functions.
Test all of these functions by hand before going any further to make sure they work properly. If not, post to Piazza. Include what Lisp and operating system you are using, and the exact input and output.
Testing and submitting
There are no test cases, but your code should handle at least the following URLs:
- Young Detective Dee: Rise of the Sea Dragon
- NY Times: Skillet-Baked Eggs and Asparagus
- Builtin Chicago: Front-End Job Posting
- Northwestern University Symphony Orchestra
Look for a few other types of sites, such as organization home pages. In all cases,
you should see list structures with Scheme tags such as @TYPE
. For
readability, you may want to use PPRINT
to pretty-print the list output.
Submit all the functions you defined to do this task. Include the URLs you tested it on. Note issues and problems that you could and could not solve.