A friend and former colleague asked me (about three weeks ago – I’ve been busy!) about the bit of fuss I had been making about a schema for JSON. Specifically, he asked what schemas are useful for. I admit, its an interesting question – and perhaps even shows the difference between my background which has been 90% compiled languages like java/C#, and his experience of probably around 80% scripting/late binding languages like perl and ruby.
So what is a schema? Well, the dictionary entry from new Oxford is a good start:
schema |ˈskēmə|
noun ( pl. -mata |-mətə| or -mas ) technical
a representation of a plan or theory in the form of an outline or model
...
So, a schema represents the desired form of the result; in this case, the format and structure of JSON data.
But why is this useful?
A schema allows you to know what to expect
The first major usage I’ll state is probably the most important. A standard schema document has an advantage over any other publication of a document’s structure simply because – its a standard document. Even ignoring that it defines a consistent way to describe what a document should look like, without any schema you are limited to giving the other person nothing other than examples of things that should work1.
Examples have their own purpose. They are great for unit tests, for example, or for having an author see what a real document’s structure should be like. In this sense, this allows the reader to internally generate their own schema in their head.
A schema allows you to use tools to help you
My reasoning for working on schema right now is that my JSON4Java project is stalled. For 1.0, I wanted support or JSON bindings; being able to feed in a plain java object (a POJO, as it were) and get out JSON text. That turned out to be pretty simple, especially in comparison to feeding in a JSON text document and getting java out. It turns out that I have to know more than what a class defines for me in order to be able to handle all the corner cases. It turns out the best way of doing that is to create tools that work based on a schema.
The second big reason for schemas is automatic validation. Based on a schema definition, a piece of software can accept or reject a document you are being given. This is a great thing for removing all that ugly document validation logic you had to manually add to your code to handle problems. Or rather, this makes it so a tool handles all the corner cases you forgot. You can write much more robust code, easier, with this sort of tool.
JSON for example doesn’t support a native “date” format, while a schema could define one. A tool could thus know to deal with date objects in your language of choice while reading in a json document. There is simply no good way you could know the difference between “2008-02-21” as a date and as a string value without knowledge of the document form.
Schemas help to establish Meta-schemas2
I believe, and have seen evidence, that publishing and sharing and reusing document forms will allow for an ad-hoc set of standard best practices to result.
This isn’t to say such a thing can’t happen without schema. However, its much more likely to happen once schema is in the picture. Given the date format above, it is much more likely to become the common format for representing dates in JSON once you have a way to easily choose to use it. Otherwise, you will have people who insist on using two digit years, removing the dashes for terseness, or even doing a ‘the number of days since jan 1st, 1970’ and using an integer value. Or representing times in the document, but throwing the time of day portion away.
Reinvention takes time too. It takes time both from the person who is doing the inventing and from the others who are attempting to work with it. Schema can allow for reuse, and for tools which can evolve to know how to handle data for you.
1 And when I say schema, I am including ad-hoc sentences like “and the post tag is an object which contains a url, a date, and a title”. These may not be formal definitions and thus may be easier to write. However, they may leave significant gaps in the ability implement because of their informality.
2 Ok, so I kinda invented that word, or at least a new use for that word. I am not just doing so to have a search that will bring up my blog as the first result.