QueryPic Exploring digitised newspapers from Australia & New Zealand

About QueryPic

What is it for?

QueryPic provides a simple visualisation of a set of search results. Instead of just presenting you with a list of matches, it plots the results over time as a familiar line graph. Each point on the graph represents the number of articles in that year matching your search query. You can even combine multiple queries to compare results.

It’s simple, but it’s also surprisingly powerful. QueryPic lets you see patterns and trends that normal search interfaces hide. It helps you frame your research questions – to survey the territory and decide where to dig.

What data does it use?

QueryPic searches digitised newspapers from Australia and New Zealand published online by Trove and Papers Past. The data is accessed through APIs (Application Programming Interfaces) provided by Trove and DigitalNZ.

These are wonderfully rich resources, but if you’re going to interpret QueryPic results you really need to think about their limitations. What titles have been digitised? What sort of gaps remain? What is the quality of the text extracted by OCR? How might factors such as these affect your results?

And then there’s the question of how search actually works in the two databases. How ‘fuzzy’ are the matches? How do they handle hyphenated words? I’d suggest you become familiar with the help pages of both systems.

QueryPic lets you pursue your hunches, but what it creates are sketches, not arguments. You have to think critically about the data and how you’re accessing it.

Creating a QueryPic

Keyword searches

The easiest way to make a QueryPic is to type a word or phrase in the keywords search box. To search for a phrase, enclose it in double quotation marks. If you submit multiple keywords, only articles that contain all the words will be matched.

You can also use standard boolean operators – like AND, OR and NOT – to build more complex searches. Have a look at the help pages of Trove and DigitalNZ for more details.

What actually happens when you perform a keyword search?

As I said, it’s important to try and understand what’s going on behind the scenes when you search, so here’s the blow by blow description.

  1. You enter your keywords and press ‘show’.
  2. QueryPic examines the keywords string, looking for signs that it’s a complex query. These signs include:
    • double quotes
    • any of the words AND, OR, NOT
    • any of the following punctuation marks: brackets, colon, minus sign, squiggly thing (~)
  3. If any of the signs are present QueryPic leaves the keywords string alone, assuming that you know what you’re doing.
  4. If there are no signs, QueryPic treats the string as a series of keywords. It splits words at spaces and then connects them back together using the ‘AND’ operator. So if you enter ‘cat dog’, QueryPic turns it into ‘cat AND dog’. Ok, this isn’t strictly necessary as, by default, both Trove and DigitalNZ will treat keywords as if they were ‘ANDed’ together. But I thought it was good to be explicit, particularly as default behaviours can sometimes change.
  5. QueryPic sets the date range for the search at 1803–1954 for Trove and 1839–1945 for Papers Past. Yes, there is some post–1954 content in Trove, but it can cause odd results. If you want control over the date range, use the query url option.
  6. The query string is then sent off to the appropriate API. It’s important to note that, by default, both Trove and DigitalNZ apply some degree of fuzzy matching to your search terms.
  7. For each year in the date range, QueryPic extracts two values: the number of articles matching your query and the total number of articles. It then calculates the proportion of articles matching your query by just dividing one by the other.

Using query urls

While you can build fairly complex queries using the keywords option, you’ll probably find it easier to build and test your query using the existing search interfaces to Trove and Papers Past. You’ll also be able to apply custom limits, such as date ranges.

The procedure is fairly simple. In Trove head to the ‘Advanced search’ page and start building your query. Alternatively, you can use the basic interface and filter your search by selecting facets. See the help pages for more information.

QueryPic recognises the following facets (or limits) in Trove searches:

  • date ranges (years only)
  • newspaper titles
  • article categories
  • word length
  • illustrated (yes or no)

Note that the option to search only certain parts of articles (such as the headings) is not currently supported by the Trove API, so QueryPic can’t apply it.

In DigitalNZ you can use the ‘filters’ option to apply a date range.

You can, of course, keep testing and tweaking until it looks like you’re getting the results you want (or none of the results you don’t want). Then it’s just a matter of copying the url in your browser’s address box, selecting the ‘query url’ tab in QueryPic and pasting in the url.

When you click ‘Show’, QueryPic will parse the url into it’s components and translate them into the form required by the APIs. As with the keyword search, it will gather data for each year and calculate proportions.

To make using the query url option even easier, you can install a bookmarklet that connects Trove and DigitalNZ directly to QueryPic.

Using the bookmarklet

A bookmarklet is a little piece of javascript code disguised as a browser bookmark. When you click on the bookmark, the code runs.

The QueryPic bookmarklet is designed to copy a query url from Trove or DigitalNZ and feed it directly to QueryPic – no copying and pasting required!

To install the bookmarklet simply drag this link – QueryPic – to your browser’s bookmarks bar. Different browsers work slightly differently, so if this doesn’t seem to work see installing the bookmarklet for more detailed instructions.

Once the bookmarklet is installed just construct your search in Trove or DigitalNZ. When you’re happy with it click on the bookmarklet. That’s it! QueryPic will open automatically and start loading data.

Copying an existing QueryPic

If you come across an interesting QueryPic that you like to use as a basis for your own comparison you can use it to generate a new graph.

  1. Go to the existing QueryPic and click on the tab in the right-hand side bar corresponding to the query you’re interested in – they’ll be labelled ‘Query 1’, ‘Query 2’ etc.
  2. Click on the ‘Create new QP’ button.
  3. The ‘Create’ page will open and QueryPic will start retrieving data. Note that the contents of Trove and DigitalNZ can change as more newspapers and articles are added and the OCR text is improved. That means the new graph you create might not be exactly the same as the one you’re basing it on.
  4. Once the graph is complete, you can build a new comparison by combining it with another query.

You can also regenerate a complete QueryPic.

Combining queries

One of the most useful aspects of QueryPic is it’s ability to compare queries – you can add as many lines as you like to a single QueryPic.

Combining queries is easy, once you added your first query simply add another keyword search or query url. And then repeat until you’re done.

Note that you can’t add queries to a saved QueryPic, so wait until you’ve added all your queries before saving.

Also you can’t use the bookmarklet to add additional queries, so you’ll have to resort to copying and pasting the urls.

Regenerating QueryPics

The contents of Trove and Papers Past will change over time. Additional newspapers and articles will be added, and corrections will be made to the text. A query you ran a year ago might produce a different result today. For this reason every saved QueryPic is date-stamped. You should include this date in any citation.

If you want to track changes over time you can easily regenerate a saved QueryPic. Just click on the big blue ‘Regenerate this QP’ button. The ‘Create’ page will open and QueryPic will retrieve a new dataset for all of the queries in the original graph. You can then save the new version.

Saving your QueryPic

Once you’re happy with your QueryPic you’ll want to save it. It’s easy – just click the big blue ‘Save’ button. A form will pop up and ask you for a few details. There are only two required fields:

  • your email
  • a title for your QueryPic

Note that your email will not be displayed or shared.

The optional fields are:

  • your name (or your avatar’s name)
  • your web page (it could be a personal home page or a project page)
  • a description of your QueryPic

Just fill in the fields and click on ‘Save’. Your details will be added to the database and you’ll be redirected to a freshly-minted, persistent url for your saved QueryPic. You can cite or share this url – tell the world!

Sharing and using QueryPics

Exploring saved QueryPics

Saved QueryPics are stored in a database. Just visit the explore page to browse. Limit the number of results displayed by entering a keyword in the filter box. Click on the arrows in the column headings to change the order of the results.

Sharing QueryPics

Each saved QueryPic is assigned a persistent url that can be cited or shared. You can find the url under the graph, or in your browser’s address bar.

A number of standard social network buttons have been included for easy sharing.

Previewing articles

QueryPic’s graphs give you a new perspective on your newspaper searches, but eventually you’ll want to go back to the articles themselves – to find out what’s actually lurking under each point on the graph.

To retrieve a list of the first twenty matching articles for each year, just click on that point on the graph. QueryPic will once again fire off a request to the API and return the articles ordered by relevance. Click on an article to open it in Trove or Papers Past. To dig deeper just click on the ‘View more in…’ button at the bottom of the list of articles to view all the matching results.

Viewing query metadata

To view a summary of a query just click on the appropriate tab in the right-hand side bar – they’re labelled ‘Query 1’, ‘Query 2’ etc.

The query summary displays some basic metadata about the query, including:

  • the source (either Australia or New Zealand)
  • the search keywords
  • the date range
  • any other limits (such as newspaper titles)

There are also two buttons, one opens up that query in either Trove or DigitalNZ. The other button uses the query to generate a new QueryPic.

Features of the graph

The graphs created by the HighCharts javascript library include a number of clever features.

By clicking and dragging you can zoom into any section of the graph. Click the ‘Reset zoom’ button to return to the original view.

Click on a label in the legend to temporarily hide a query. This can be useful if you’re comparing multiple queries. Note that when you hide a query the vertical scale is reset to suit the remaining data. This makes it easier to study queries with fewer results.

There are two buttons in the top right-hand corner or every graph. One prints the graph, while the other lets you download the graph as an image in a choice of file formats.

Hints and tips

Fuzzy searches

Fuzzy searches expand the number of results returned by truncating or stemming your keywords to match a variety of possible endings. So a search for ‘smile’ would return ‘smiles’, ‘smiling’ and ‘smiley’.

This is great if you’re not exactly sure what you’re looking for – it maximises your chances of discovery. But if you want to track the occurrence of a particular word or phrase it can be rather annoying.

Both Trove and DigitalNZ include some degree of fuzziness by default. In Trove, the situation is further complicated by the way the indexing handles hyphenated words.

Fuzzy searching can be switched off in Trove by using the ‘fulltext:’ modifier, but it can take a fair bit of trial and error to find out what actually works. The Trove Forum is a useful source of guidance and tips, see for example:

Another trap is that, by default, Trove searches include user-contributed tags and comments. So if you search for ‘World War I’ you’ll get some matches from the period 1914–18 (think about it). This is because some diligent user has added the tag ‘World War I’ to a number of articles from the period. There’s no easy way of avoiding these sorts of anachronisms. Once you become aware of such cases you can explicitly exclude matching tags, but it’s not a wholly satisfactory solution.

The only answer to the problems of fuzzy searching is to experiment with your queries and remain critical of the results.

Installing the bookmarklet

  1. Make sure your browser’s bookmarks toolbar is visible.

    Chrome: View -> Always show bookmarks bar
    Firefox: Views -> Toolbars -> Bookmarks toolbar
    Safari: Views -> Show bookmarks bar
    IE: Tools -> Toolbars -> Favourites

  2. For Chrome, Firefox or Safari, simply click and drag this link – QueryPic – to your bookmarks bar.

    For IE:

    1. Right click on the link above.
    2. Choose ‘Add to favourites’.
    3. If a security warning pops up, click on ‘Yes’ to proceed.
    4. Select location Favourites -> Favourites bar.
    5. Click ‘Add’.

comments powered by Disqus