Displaying articles with tag schema
Liquid error: undefined method `include?' for nil:NilClass
Posted by david, Sat Mar 29 21:45:00 UTC 2008
Liquid error: undefined method `include?' for nil:NilClass
Posted by david, Fri Feb 22 23:07:00 UTC 2008
A friend and former colleague asked me (about three weeks ago – I’ve been busy!) about the bit of fuss I had been making about a schema for JSON. Specifically, he asked what schemas are useful for. I admit, its an interesting question – and perhaps even shows the difference between my background which has been 90% compiled languages like java/C#, and his experience of probably around 80% scripting/late binding languages like perl and ruby.
So what is a schema? Well, the dictionary entry from new Oxford is a good start:
schema |ˈskēmə|
noun ( pl. -mata |-mətə| or -mas ) technical
a representation of a plan or theory in the form of an outline or model
...
So, a schema represents the desired form of the result; in this case, the format and structure of JSON data.
But why is this useful?
A schema allows you to know what to expect
The first major usage I’ll state is probably the most important. A standard schema document has an advantage over any other publication of a document’s structure simply because – its a standard document. Even ignoring that it defines a consistent way to describe what a document should look like, without any schema you are limited to giving the other person nothing other than examples of things that should work1.
Examples have their own purpose. They are great for unit tests, for example, or for having an author see what a real document’s structure should be like. In this sense, this allows the reader to internally generate their own schema in their head.
A schema allows you to use tools to help you
My reasoning for working on schema right now is that my JSON4Java project is stalled. For 1.0, I wanted support or JSON bindings; being able to feed in a plain java object (a POJO, as it were) and get out JSON text. That turned out to be pretty simple, especially in comparison to feeding in a JSON text document and getting java out. It turns out that I have to know more than what a class defines for me in order to be able to handle all the corner cases. It turns out the best way of doing that is to create tools that work based on a schema.
The second big reason for schemas is automatic validation. Based on a schema definition, a piece of software can accept or reject a document you are being given. This is a great thing for removing all that ugly document validation logic you had to manually add to your code to handle problems. Or rather, this makes it so a tool handles all the corner cases you forgot. You can write much more robust code, easier, with this sort of tool.
JSON for example doesn’t support a native “date” format, while a schema could define one. A tool could thus know to deal with date objects in your language of choice while reading in a json document. There is simply no good way you could know the difference between “2008-02-21” as a date and as a string value without knowledge of the document form.
Schemas help to establish Meta-schemas2
I believe, and have seen evidence, that publishing and sharing and reusing document forms will allow for an ad-hoc set of standard best practices to result.
This isn’t to say such a thing can’t happen without schema. However, its much more likely to happen once schema is in the picture. Given the date format above, it is much more likely to become the common format for representing dates in JSON once you have a way to easily choose to use it. Otherwise, you will have people who insist on using two digit years, removing the dashes for terseness, or even doing a ‘the number of days since jan 1st, 1970’ and using an integer value. Or representing times in the document, but throwing the time of day portion away.
Reinvention takes time too. It takes time both from the person who is doing the inventing and from the others who are attempting to work with it. Schema can allow for reuse, and for tools which can evolve to know how to handle data for you.
1 And when I say schema, I am including ad-hoc sentences like “and the post tag is an object which contains a url, a date, and a title”. These may not be formal definitions and thus may be easier to write. However, they may leave significant gaps in the ability implement because of their informality.
2 Ok, so I kinda invented that word, or at least a new use for that word. I am not just doing so to have a search that will bring up my blog as the first result.
Liquid error: undefined method `include?' for nil:NilClass
Posted by david, Fri Feb 01 00:18:00 UTC 2008
So I have a selfish desire for some form of JSON schema, as the types used within the system would greatly help in parsing JSON into Java objects. I’m trying to limit my current thinking to the fundamentals:
- JSON has several types of values: string, number, boolean, array, object, and the value null
- JSON schema would allow you to define which types of values, and which values themselves, are legal for a particular object, object member, or array
- As declared, those declared legal values would have schemas themselves
- For reuse, schemas should be able to be referenced by URI, with a strong recommendation to have that URI resolvable to the authoritative schema document
- For security and Denial-of-service resistance, it is strongly recommended to not resolve URIs looking for referenced schemas, at least as part of message processing.
- There are three classifications of schema types
- Native types
- Types which are unable to be expressed within JSON schema themselves. This includes the core schema types, described later
- Derived types
- A type, based on another type either in a subclass relationship or merely as a starting point, defining restrictions on what values are considered legal.
- Union types
- A choice between two or more other types, such as
"either a boolean or a string"
- Derived types should only define further restrictions of their base type for simplicity. They should not allow values which are considered legal by their base type.
- Further schema can be declared dynamically within a document, although it only can specify the schema in place or further restrict the document further than any existing schema. A processor may choose to ignore the specified schema and go with preexisting validation rules.
- Union types can specify one or more schema types which overlap, in which case the first valid match ‘wins’
- A Union type of
'URI or String'would thus cause the value"http://www.blog.alkaline-solutions.com"to always be accepted by the URL rule, even if it was a ‘string’ value in the base document. If special processing occurs based on type, this needs to be taken into account - A union type of
'http-url or string'would thus not have a clean derived relationship from the above'URI or String'- a value accepted before by the URI rule would now be interpreted by a completely different rule instead of being rejected. - For that reason, and for simplicity, union types can only be derived as a whole, with restrictions being applied to every member
- Because a schema valid value could be one of several JSON types, it makes sense to have restrictions be specific to each kind of JSON value, rather than (for example) having maximum apply to numerical value, array element count, and string length
Liquid error: undefined method `include?' for nil:NilClass
Posted by david, Wed Jan 30 18:13:00 UTC 2008
"location" : "http://blog.alkaline-solutions.com/"
is meant to be a java URI, a Java URL, or a String? In the absence of typed containers, it becomes even harder to determine how to do this than in XML binding cases. For another example:
class Foo {
Object location;
}
For serialization, the type of the "location" field's referenced object can be used to determine how to represent it. However, on input you have the same problem as above - and unless you radically enforce a JSON structure that makes everything except the primitives an object and forces a java class 'type' string into every object to determine what is being specified, having a round trip becomes something impossible to do without some local specification.
So, for starters I used annotations, defaulting to using a field/'bean accessor' name and a global registry of class/interface type handlers for members. Types like Class and Date translated to simple string types in the JSON output. I still ran into some problems though - for instance, a List