Extending JSON with key paths
It all starting with a tweet:
I am not a big JSON fan, but I think JSON is rather good way of representing object tree in a textual form. The example by Kelly Summer is a bit verbose, because the tree structure is very deep and the paths are mostly linear. This is not necessarily a problem of JSON, but rather of data representation design. However it made me think.
What if we could represent deep linear paths in JSON, by defining key paths in object key?
Based on JSON specification an object key needs to be a string literal. So if a key path is represented as string literal, it is still a valid JSON.
Here is the same JSON object with key paths:
{
">facets>0": {
"date_histogram": {
"field": "@timestamp",
"interval": "10m"
},
"global": true,
">facet_filter>fquery>query>filtered": {
">query>query_string>query": "_type:\"your server\" AND status:[500 TO 599]",
">filter>bool>must": [
{
">range>@timestamp>from": "2014-02-11T02:00:00.000Z",
">range>@timestamp>to": "2014-02-12T02:00:00.000Z"
}
]
}
}
}
That’s a bit less verbose isn’t it?
Now lets analyse what I have done.
The root object has only one property facets
which also has only one child "0"
. I reduce it by defining {">facets>0": {...} }
.
An object key which encodes a key path starts with >
and the path parts are separated by >
. I chose this character, because I haven’t seen anyone using it in keys before, so there will be no accidental collisions. However which character or string you use for key path prefix and separator is extendable and can be configurable. Also you could decide to not prefix key paths at all. I would not recommend doing it though, because with a key path prefix there is a simple and cheap way of checking if given key is a key path.
Now let’s continue with the example.
">facets>0"
has three children — "date_histogram"
, "global"
and a deep linear path again ">facet_filter>fquery>query>filtered"
.
">facet_filter>fquery>query>filtered"
has two children ">query>query_string>query"
and "filter>bool>must"
.
"filter>bool>must"
is an array of deep linear objects. This time however I decide to write the paths out:
{
">range>@timestamp>from": "2014-02-11T02:00:00.000Z",
">range>@timestamp>to": "2014-02-12T02:00:00.000Z"
}
Rather than keeping the branching structure:
{
">range>@timestamp>": {
"from": "2014-02-11T02:00:00.000Z",
"to": "2014-02-12T02:00:00.000Z"
}
}
As a matter of fact I could write the whole object in a very shallow way:
{
">facets>0>date_histogram>field": "@timestamp",
">facets>0>date_histogram>interval": "10m",
">facets>0>global": true,
">facets>0>facet_filter>fquery>query>filtered>query>query_string>query": "_type:\"your server\" AND status:[500 TO 599]",
">facets>0>facet_filter>fquery>query>filtered>filter>bool>must": [
{
">range>@timestamp>from": "2014-02-11T02:00:00.000Z",
">range>@timestamp>to": "2014-02-12T02:00:00.000Z"
}
]
}
The only place where we see depth is when a property holds an array. Theoretically I could also introduce an encoding in key paths for arrays as well, but I like to keep things simple 😉.
What’s next?
Extend JSON Standard? Change all the parser in all languages?
No! Lets first just write a tool which can take JSON with key paths and return an expanded JSON.
Who is going to take time to write this tool? Well, yours truly already did.
Send a POST message to this url:
https://x8eep29n1a.execute-api.eu-central-1.amazonaws.com/public
The body of the request should contain your JSON with key paths and the service will return an expanded JSON in response body.
For example type this in your terminal:
curl --header "Content-Type: application/json" --request POST --data '{">a>b>c":"hello"}' https://x8eep29n1a.execute-api.eu-central-1.amazonaws.com/public
Which will do following transformation:
{">a>b>c":"hello"} => {"a":{"b":{"c":"hello"}}}
That’s it. Just try it out 😉.
If you find JSON key paths useful for your day to day work, let me know and we can try to build something more sustainable out of this idea.