Displaying articles with tag

Liquid error: undefined method `include?' for nil:NilClass

Posted by david, Fri Feb 22 23:07:00 UTC 2008

A friend and former colleague asked me (about three weeks ago – I’ve been busy!) about the bit of fuss I had been making about a schema for JSON. Specifically, he asked what schemas are useful for. I admit, its an interesting question – and perhaps even shows the difference between my background which has been 90% compiled languages like java/C#, and his experience of probably around 80% scripting/late binding languages like perl and ruby.

So what is a schema? Well, the dictionary entry from new Oxford is a good start:

schema |ˈskēmə|
noun ( pl. -mata |-mətə| or -mas ) technical
a representation of a plan or theory in the form of an outline or model
...

So, a schema represents the desired form of the result; in this case, the format and structure of JSON data.

But why is this useful?

A schema allows you to know what to expect

The first major usage I’ll state is probably the most important. A standard schema document has an advantage over any other publication of a document’s structure simply because – its a standard document. Even ignoring that it defines a consistent way to describe what a document should look like, without any schema you are limited to giving the other person nothing other than examples of things that should work1.

Examples have their own purpose. They are great for unit tests, for example, or for having an author see what a real document’s structure should be like. In this sense, this allows the reader to internally generate their own schema in their head.

A schema allows you to use tools to help you

My reasoning for working on schema right now is that my JSON4Java project is stalled. For 1.0, I wanted support or JSON bindings; being able to feed in a plain java object (a POJO, as it were) and get out JSON text. That turned out to be pretty simple, especially in comparison to feeding in a JSON text document and getting java out. It turns out that I have to know more than what a class defines for me in order to be able to handle all the corner cases. It turns out the best way of doing that is to create tools that work based on a schema.

The second big reason for schemas is automatic validation. Based on a schema definition, a piece of software can accept or reject a document you are being given. This is a great thing for removing all that ugly document validation logic you had to manually add to your code to handle problems. Or rather, this makes it so a tool handles all the corner cases you forgot. You can write much more robust code, easier, with this sort of tool.

JSON for example doesn’t support a native “date” format, while a schema could define one. A tool could thus know to deal with date objects in your language of choice while reading in a json document. There is simply no good way you could know the difference between “2008-02-21” as a date and as a string value without knowledge of the document form.

Schemas help to establish Meta-schemas2

I believe, and have seen evidence, that publishing and sharing and reusing document forms will allow for an ad-hoc set of standard best practices to result.

This isn’t to say such a thing can’t happen without schema. However, its much more likely to happen once schema is in the picture. Given the date format above, it is much more likely to become the common format for representing dates in JSON once you have a way to easily choose to use it. Otherwise, you will have people who insist on using two digit years, removing the dashes for terseness, or even doing a ‘the number of days since jan 1st, 1970’ and using an integer value. Or representing times in the document, but throwing the time of day portion away.

Reinvention takes time too. It takes time both from the person who is doing the inventing and from the others who are attempting to work with it. Schema can allow for reuse, and for tools which can evolve to know how to handle data for you.

1 And when I say schema, I am including ad-hoc sentences like “and the post tag is an object which contains a url, a date, and a title”. These may not be formal definitions and thus may be easier to write. However, they may leave significant gaps in the ability implement because of their informality.

2 Ok, so I kinda invented that word, or at least a new use for that word. I am not just doing so to have a search that will bring up my blog as the first result.

Liquid error: undefined method `include?' for nil:NilClass | Filed Under: | Tags:

Liquid error: undefined method `include?' for nil:NilClass

Posted by david, Fri Feb 01 00:18:00 UTC 2008

So I have a selfish desire for some form of JSON schema, as the types used within the system would greatly help in parsing JSON into Java objects. I’m trying to limit my current thinking to the fundamentals:

  1. JSON has several types of values: string, number, boolean, array, object, and the value null
  2. JSON schema would allow you to define which types of values, and which values themselves, are legal for a particular object, object member, or array
  3. As declared, those declared legal values would have schemas themselves
  4. For reuse, schemas should be able to be referenced by URI, with a strong recommendation to have that URI resolvable to the authoritative schema document
  5. For security and Denial-of-service resistance, it is strongly recommended to not resolve URIs looking for referenced schemas, at least as part of message processing.
  6. There are three classifications of schema types
    Native types
    Types which are unable to be expressed within JSON schema themselves. This includes the core schema types, described later
    Derived types
    A type, based on another type either in a subclass relationship or merely as a starting point, defining restrictions on what values are considered legal.
    Union types
    A choice between two or more other types, such as "either a boolean or a string"
  7. Derived types should only define further restrictions of their base type for simplicity. They should not allow values which are considered legal by their base type.
  8. Further schema can be declared dynamically within a document, although it only can specify the schema in place or further restrict the document further than any existing schema. A processor may choose to ignore the specified schema and go with preexisting validation rules.
  9. Union types can specify one or more schema types which overlap, in which case the first valid match ‘wins’
  10. A Union type of 'URI or String' would thus cause the value "http://www.blog.alkaline-solutions.com" to always be accepted by the URL rule, even if it was a ‘string’ value in the base document. If special processing occurs based on type, this needs to be taken into account
  11. A union type of 'http-url or string' would thus not have a clean derived relationship from the above 'URI or String' - a value accepted before by the URI rule would now be interpreted by a completely different rule instead of being rejected.
  12. For that reason, and for simplicity, union types can only be derived as a whole, with restrictions being applied to every member
  13. Because a schema valid value could be one of several JSON types, it makes sense to have restrictions be specific to each kind of JSON value, rather than (for example) having maximum apply to numerical value, array element count, and string length

Liquid error: undefined method `include?' for nil:NilClass | Filed Under: | Tags:

Liquid error: undefined method `include?' for nil:NilClass

Posted by david, Wed Jan 30 18:13:00 UTC 2008

Working on a model to map user-defined Java objects to JSON has been quite a challenge. Some of the challenge I'm sure is the mapping between a section of a JSON document and the corresponding Java object - do I know if a object member of "location" : "http://blog.alkaline-solutions.com/" is meant to be a java URI, a Java URL, or a String? In the absence of typed containers, it becomes even harder to determine how to do this than in XML binding cases. For another example: class Foo { Object location; } For serialization, the type of the "location" field's referenced object can be used to determine how to represent it. However, on input you have the same problem as above - and unless you radically enforce a JSON structure that makes everything except the primitives an object and forces a java class 'type' string into every object to determine what is being specified, having a round trip becomes something impossible to do without some local specification. So, for starters I used annotations, defaulting to using a field/'bean accessor' name and a global registry of class/interface type handlers for members. Types like Class and Date translated to simple string types in the JSON output. I still ran into some problems though - for instance, a List member does not at compile time indicate that its members must be strings - so without further annotations, deserialization faced the 'no type' problem above. This got me looking into a framework to base things on, which in turn got me looking at JSON Schema. I'll post more thoughts about JSON schema next.

Liquid error: undefined method `include?' for nil:NilClass | Filed Under: | Tags:

Liquid error: undefined method `include?' for nil:NilClass

Posted by david, Sun Jan 06 16:39:00 UTC 2008

So, I decided that the push-style event interface (Push parser) was not the ideal for what I wanted to do. Push-style interfaces like SAX are efficient, but can be harder for people to wrap their heads around to get the behavior they want. I in particular want to write a JSON <-> POJO library on top of the core, and decided that it becomes a bit harder to keep track of it all in your head using an event interface.

So, I switched to a pull-style model, which I have named JSONCursor. Every time a pertinent event (data encountered or start/end of some structured type), the parser wills top and return an ‘event’ object explaining what it found.

I also decided to raise the level of the interface quite a bit, so you get a full String object rather than being triggered as before to handle a new bit of text or escaped character, and being responsible for accumulating it all yourself to handle the event.

One of the fun things this provides is that I was able to make JSONCursor implement Iterator<JSONEvent>. Of course, iterators are normally over data structures and not data being processed, so Java did not design iterator to be able to return errors. I currently wrap errors and raise RuntimeExceptions to get around this; I may decide at some point to make all of JSONException extend RuntimeException instead.

For now, since its relatively easy to have a pull parser emulate a push-parser, I still support a SAX-style interface as well. It was the easiest way to get things up and running and testable. I will probably drop this interface in the future just because of the additional size and confusion – or move it to another package, if there is one which would be relevant.

Liquid error: undefined method `include?' for nil:NilClass | Filed Under: | Tags:

Liquid error: undefined method `include?' for nil:NilClass

Posted by david, Wed Dec 26 21:15:00 UTC 2007

JSON has relatively few elements which are geared toward specific data types. The data types are inferred from the markup directly.

The types exposed via JSON are:

  • Data Types -
    • Numbers (4, 2E-12)
    • Boolean values (true and false)
    • Unicode strings
    • The nothing value, null
  • Structure Types -
    • Arrays
    • Objects; these are also known as a Dictionaries, or as Maps of String names to any defined type

This differs from XML, which really has only text content. Type in XML is external to the document, often specified by some schema document (DTD, XML Schema, and RelaxNG being the most common schema formats). The indirection in XML allows with greater flexibility in how data or other content is exposed; JSON by contrast often only has one logical way for the data to be structured, which leads to the 'schema' of the JSON document being exposed.

Like XML but unlike some of the binary data encodings, JSON does not have a binary data type. If you wish to transport binary data, like XML you will probably resort to base64 encoding the data for transmission

Liquid error: undefined method `include?' for nil:NilClass | Filed Under: | Tags:

Liquid error: undefined method `include?' for nil:NilClass

Posted by david, Wed Dec 26 20:51:00 UTC 2007

I have had a few people who have looked at my blog now, and the majority of them have asked me “what is JSON anyway?” There are some good places to learn more without having to resort to reading an RFC, but I’ll take a shot at explaining it in my own way here.

JSON is short for JavaScript Object Notation. You are not required to use, know, or understand JavaScript (or rather, ECMAScript) in order to leverage JSON, however. It is instead used as a markup for representing structured data. This markup just happened to grow of more importance out of the “Web 2.0 / AJAX” phenomenon, because so much more code was written in browser script. The significance is that the syntax is interpretable by the scripting language directly – rather than using an XML api to load and manipulate a document into an internal data representation, you can simply do an ‘eval’ to have a new object created from the data supplied.

The competition in markup formats between JSON and XML comes up often. Both have their places – while JSON is good for representing simple structured data, XML is much better for semantic markup of text – things like HTML documents. A lot of the benefits of JSON come from it being ‘simpler’, both in use and in required body of reading to learn.

Here however is an example based on the DIGG api (adapted from http://apidoc.digg.com/ListGalleryPhotos, but shortened for length.):

In XML:
<gallery timestamp="1193358475" min_date="1190766450" total="31794" offset="0" count="3">
 <galleryphoto id="3920948" submit_date="1193358472" comments="0" 
        src="http://digg.com/users/KalimaSaraswati/gallery/3920948/t.jpg" 
        href="http://digg.com/users/KalimaSaraswati/gallery/3920948">
  <title>241_4171 Sun Halo Upper Tangent Arc.jpg</title>
  <user name="KalimaSaraswati" icon="http://digg.com/img/udl.png" 
        registered="1193358174" profileviews="5" />
 </galleryphoto>
</gallery>
In JSON:
{
  "timestamp" : 1193358478,
  "min_date"  : 1190766450,
  "total"     : "31794",
  "offset"    : 0,
  "photos"    : [
    {
      "id"          : 3920948,
      "submit_date" : 1193358472, 
      "comments"    : 0,
      "title"       : "241_4171 Sun Halo Upper Tangent Arc.jpg",
      "user"        : {
        "name"         : "KalimaSaraswati",
        "icon"         : "http://digg.com/img/udl.png",
        "registered"   : 1193358174,
        "profileviews" : 5
      },
      "src"         : "http://digg.com/users/KalimaSaraswati/gallery/3920948/t.jpg",
      "href"        : "http://digg.com/users/KalimaSaraswati/gallery/3920948" 
    } 
  ]
}

Liquid error: undefined method `include?' for nil:NilClass | Filed Under: | Tags:

Liquid error: undefined method `include?' for nil:NilClass

Posted by david, Tue Dec 25 19:10:00 UTC 2007

In addition to starting this blog, I just created a new project, JSON4Java. Its a fun little tinkering project that I started somewhat on accident.

I decided to use this as an opportunity to work more with a parser generator. I decided to use Ragel, since there is relatively little state needed to handle the JSON format. Ragel syntax had me stumped a few days – while writing the grammar was rather simple, embedding the actions can cause problems. In the end, I managed to tweak the thing until all of my test cases passed. Tests are absolutely vital for this sort of work!

I’m going to experiment now with adding a bit better error reporting, but also creating a somewhat SAX-ish interface on it to split out the generated parser from my business object for creating data. Ideally, the same interface could be used to both serialize and deserialize structures for JSON, and I’m hoping (if I get to that point) I can use Java annotations to allow mapping to more complex POJOs. I already split out the package into a json4java-core in anticipation for a json4java-pojo.

This sort of project also gives a level of freedom for experimentation and learning you just can’t get through typical commercial software development. I’ve finally got my unit testing ‘fu’ on, learning the new techniques in JUnit 4.4 (including assertThat); I’ve learned a bunch about maven; learned Ragel and implemented successfully a parser written in it. I’ve also started looking into newer Java features like Attributes, and around things like dependency injection and XML binding frameworks for ideas on allowing people to override functionality.

Liquid error: undefined method `include?' for nil:NilClass | Filed Under: | Tags: