Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.


HPR4227: Introduction to jq - part 3

Hosted by Dave Morriss on 2024-10-15 00:00:00
Download or Listen

Overview

In this episode we will continue looking at basic filters. Then we will start looking at the feature that makes jq very powerful, the ability to transform JSON from one form to another. In essence we can read and parse JSON and then construct an alternative form.

More basic filters

Array/String Slice: .[<number>:<number>]

This filter allows parts of JSON arrays or strings to be extracted.

The first number is the index of the elements of the array or string, starting from zero. The second number is an ending index, but it means "up to but not including". If the first index is omitted it refers to the start of the string or array. If the second index is blank it refers to the end of the string or array.

This example shows using an array and extracting part of it:

$ x="[$(seq -s, 1 10)]"
$ echo "$x"
[1,2,3,4,5,6,7,8,9,10]
$ echo "$x" | jq -c '.[3:6]'
[4,5,6]

Here we use the seq command to generate the numbers 1-10 separated by commas in a JSON array. Feeding this to jq on its standard input with the slice request '.[3:6]' results in a sub-array from element 3 (containing value 4), up to but not including element 6 (containing 7). Note that using the '-c' option generates compact output, as we discussed in the last episode.

For a string, the idea is similar, as in:

$ echo '"Hacker Public Radio"' | jq '.[7:10]'
"Pub"

Notice that we provide the JSON string quotes inside single quotes following echo. The filter '.[7:10]' starts from element 7 (letter "P") up to but not including element 10 (letter "l").

Both of the numbers may be negative, meaning that they are offsets from the end of the array or string.

So, using '.[-7:-4]' in the array example gives the same result as '.[3:6]', as do '.[3:-4]' and '.[-7:6]'. This example uses the x variable created earlier:

$ for f in '.[-7:-4]' '.[3:6]' '.[3:-4]' '.[-7:6]'; do
> echo "$x" | jq -c  "$f"
> done
[4,5,6]
[4,5,6]
[4,5,6]
[4,5,6]

Similarly, using '.[-12:-9]' gives the same result as '.[7:10]' when used with the string.

$ echo '"Hacker Public Radio"' | jq '.[-12:-9]'
"Pub"

As a point of interest, I wrote a little Bash loop to show the positive and negative offsets of the characters in the test string - just to help me visualise them. See the footnote1 for details.

Finally, here is how to get the last character of the example string using positive and negative offsets:

$ echo '"Hacker Public Radio"' | jq '.[18:]'
"o"
$ echo '"Hacker Public Radio"' | jq '.[-1:]'
"o"

Array/Object Value Iterator: .[]

This filter generates values from iterating through an array or an object. It is similar to the .[index] syntax we have already seen, but it returns all of the array elements:

$ arr='["Kohinoor","plastered","downloadable"]'
$ echo "$arr" | jq '.[]'
"Kohinoor"
"plastered"
"downloadable"

The strings in the array are returned separately, not as an array. This is because this is an iterator, and its output can be linked to other filters.

It can also be used to iterate over values in an object:

$ obj='{"name": "Hacker Public Radio", "type": "Podcast"}'
$ echo "$obj" | jq '.[]'
"Hacker Public Radio"
"Podcast"

This iterator does not work on other data types, just arrays and objects.

An alternative iterator .[]? exists which ignores errors:

$ echo "true" | jq '.[]'
jq: error (at <stdin>:1): Cannot iterate over boolean (true)

Ignoring errors:

$ echo "true" | jq '.[]?'

Using multiple filters

There are two operators that can be placed between filters to combine their effects: the comma (',') and the pipe ('|').

Comma operator

The comma (',') operator allows you to chain together multiple filters. As we already know, the jq program feeds the input it receives on standard input or from a file into whatever filter it is given. So far we have only seen a single filter being used.

With the comma operator the input to jq is fed to all of the filters separated by commas in left to right order. The result is a concatenation of the output of all of these filters.

For example, if we take the output from the HPR stats page which was mentioned in part 1 of this series of shows, and store it in a file called stats.json we can view two separate parts of the JSON like this:

$ curl -s https://hub.hackerpublicradio.org/stats.json -O

$ jq '.shows , .queue' stats.json
{
  "total": 4756,
  "twat": 300,
  "hpr": 4456,
  "duration": 7640311,
  "human_duration": "0 Years, 2 months, 29 days, 10 hours, 18 minutes and 31 seconds"
}
{
  "number_future_hosts": 6,
  "number_future_shows": 18,
  "unprocessed_comments": 0,
  "submitted_shows": 0,
  "shows_in_workflow": 51,
  "reserve": 20
}

This applies the filter .shows (an object identifier-index filter, see part 2) which returns the contents of the object with that name, then it applies filter .queue which returns the relevant JSON object.

Pipe operator

The pipe ('|') operator combines filters by feeding the output of the first (left-most) filter of a pair into the second (right-most) filter of a pair. This is analogous to the way the same symbol works in the Unix shell.

For example, if we extract the 'shows' object from stats.json, we can then extract the value of the total' key' as follows:

$ jq '.shows | .total' stats.json
4756

Interestingly, chaining two object identifier-index filters gives the same output:

$ jq '.shows.total' stats.json
4756

(Note: to answer the question in the audio, the two filters shown can also be written as '.shows .total' with intervening spaces.)

We will see the pipe operator being used in many instances in upcoming episodes.

Parentheses

It is possible to use parentheses in filter expressions in a similar way to using them in arithmetic, where they group parts together and can change the normal order of operations. They can be used in other contexts too. The example is a simple arithmetic one:

$ jq '.shows.total + 2 / 2' stats.json
4757
$ jq '(.shows.total + 2) / 2' stats.json
2379

Examples

Finding country data #1

Here we are using a file called countries.json obtained from the GitHub project listed below. This file is around 39,000 lines long so it is not being distributed with the show. However, it's quite interesting and you are encouraged to grab a copy and experiment with it.

I will show ways in which the structure can be examined and reported with jq in a later show, but for now I will show an example of extracting data:

$ jq '.[42] | .name.common , .capital.[]' countries.json
"Switzerland"
"Bern"
  • The file contains an array of country objects; the one with index 42 is Switzerland.
  • The name of the country is in an object called "name", with the common name in a keyed field called "common", thus the filter .name.common.
  • In this country object is an object called "capital" holding an array containing the name (or names) of the capital city (or cities). The filter .capital.[] obtains and displays the contents of the array.
  • Note that we used a comma operator between the filters.

Finding country data #2

Another search of the countries.json file, this time looking at the languages spoken. There is an object called "languages" which contains abbreviated language names as keys and full names as the values:

$ jq '.[42] | .name.common , .capital.[] , .languages' countries.json
"Switzerland"
"Bern"
{
  "fra": "French",
  "gsw": "Swiss German",
  "ita": "Italian",
  "roh": "Romansh"
}
  • Using the filter .languages we get the whole object, however, using the iterator .[] we get just the values.
$ jq '.[42] | .name.common , .capital.[] , .languages.[]' countries.json
"Switzerland"
"Bern"
"French"
"Swiss German"
"Italian"
"Romansh"
  • This has some shortcomings, we need the construction capabilities of jq to generate more meaningful output.

Next episode

In the next episode we will look at construction - how new JSON output data can be generated from input data.

HPR Comments

Mastodon Comments



More Information...


Copyright Information

Unless otherwise stated, our shows are released under a Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.

The HPR Website Design is released to the Public Domain.