Tim Sherratt Profile picture
1 Jan, 100 tweets, 34 min read
It’s January 1, the day each year when our minds turn to newly released Cabinet records from @naagovau. But while the media focuses on the records that have been made open, I’ll be spending the day looking at those that were closed. What weren’t you allowed to see in 2020?
This will be a *slow* thread, as I gradually pull the data together and document things. But this year I’ll be sharing all the data and code through the #GLAMWorkbench, so stay tuned...
This’ll be the sixth consecutive year in which I’ve harvested all NAA files with an access status of ‘closed’ on or about 1 January. For some background and past analyses, see my @insidestorymag article from 2018: insidestory.org.au/withheld-pendi…
The code that I’m using to harvest all the ‘closed’ files is here: glam-workbench.github.io/recordsearch/#… It scrapes the data from RecordSearch, the NAA’s online database. It’s one of a number of handy RecordSearch tools in the #GLAMWorkbench.
Ok, so the harvest is done and, as of today, there are 11,140 files in the NAA’s RecordSearch database with an access status of ‘closed’. This is down from 11,867 a year ago.
A quick recap for new players…

All Commonwealth government records are supposed to be opened to the public after 20 years (30 years for Cabinet documents). However, there are exemptions laid out in Section 33 of the Archives Act. www8.austlii.edu.au/cgi-bin/viewdo…
Before being released to the public, records go through a process known as ‘access examination’ to check them against the exemption categories (these are things like national security and privacy). There’s more info about the process here: naa.gov.au/help-your-rese…
After examination, records are assigned an an access status, either:

* Open - Yay! Most records are available for public access.
* Open with exception – Some have sensitive material removed, but are otherwise open.
* Closed – A small number are completely closed to the public.
It’s these ‘Closed' files that I’m looking at – files that have been through the examination process and have been withheld from the public.

(Actually, as we’ll see, some of the closed files are actually part way through the examination process…)
The reasons why files have been closed are recorded in RecordSearch. Here’s the number of closed files in today’s dataset citing each reason.
Aside – multiple reasons can be cited by an individual file, so the number of cited reasons will be greater than the number of closed files.
The reasons starting with ’33’ refer to specific parts of Section 33 of the Archives Act. They’re also listed on the NAA website: naa.gov.au/help-your-rese…

Of the most common – 33(1)(g) relates to individual privacy, while 33(1)(a) is the national security catch-all.
‘Withheld pending adv’ is not really a reason, more of a work-in-progress marker. During access examination, some files are referred back to the relevant Govt agency for advice, and are marked as ‘closed’ while the NAA waits for a response. More about that later...
Now let’s have a look at how the ‘closed’ files are distributed across series. There are 686 individual series represented in the dataset. (From memory there’s about 60,000 series in RecordSearch.)
What series have the most ‘closed’ files?

* K60 are repatriation files from DVA
* A1838 is a general correspondence series from DFAT

Can you guess why?
To find out, let’s add the reasons into that last chart.
From the above chart you can see that most of the K60 (DVA repatriation) files are closed under section 33(1)(g) of the Archives Act, which relates to privacy. Fair enough.
On the other hand, most of the files from A1838 (DFAT) are ‘Withheld pending adv’. The delays and backlogs in getting access to files from A1838 are well known. I wrote them in my Inside Story article: insidestory.org.au/withheld-pendi…
You might also notice A6122. That’s a series of ASIO surveillance files. As you can see most are closed on grounds of national security (33(1)(a)). And yeah, there’s some irony in the fact that public access to some ASIO surveillance files is closed due to privacy concerns… 🧐🤐
While we’re at it, here’s the series that most frequently cite the national security exemption (33(1)(a)).

* A6122 – ASIO surveillance files
* AWM54 – WW2 records from Defence
* A1838 – DFAT correspondence files
* A1209 – PMs Dept correspondence
The next series on that list citing 33(1)(a) is interesting. The whole of C5326 is closed (42 files). Why? It's all about how to blow up bridges. recordsearch.naa.gov.au/scripts/AutoSe…
So that’s the overview… I’m going to have a bit of a break, clean up some code, and get stuff online.

Later today (or tomorrow) I’ll look at changes across all 6 annual harvests. I’ll also compare 2020 to 2019 to see what changed in the last 12 months.

Here’s a taste… 🤓
Just inserting a reminder here that if you ever want to make persistent links to things in RecordSearch (avoiding nasty session timeout errors), you can use this simple tool: recordsearch-links.glitch.me
One more thing I intended to add to the overview relates to the ages of these files. There's no automatic re-assessment or time limit on 'closed' files. They stay closed until someone asks for them to be re-examined. So some are very old. Here’s the ages of files charted.
The age here is calculated by subtracting the contents end date from today’s date.
If you look at the proportions, 25% of the closed files are more than 63 years old.
If we look just at the national security exemptions (33(1)(a)), we see that the mean age of closed files is 61 years, and 25% of the files are more than 71 years old. Are 70 year-old files still a risk to our security?
Thanks for all the interest in this! I'm currently examining changes in ‘closed’ files between 2019 and 2020, and I think I’ve finally got the numbers adding up! More details tomorrow...
For example, the 2019 harvest included 519 more files with the reason ‘Closed period’ than the 2020 harvest. 538 ‘closed period’ files dropped out of the harvest in 2020, and 19 were added – a difference of 519. But where did those 538 files go? Looks like…
Before I continue exploring ‘closed’ files in @naagovau, here’s some links to related stuff.

This is a conference paper I gave after my first ‘closed access' experiments in 2016: discontents.com.au/closed-access/ It provides some useful context.
@naagovau And here’s a keynote to the @ausarchivists annual conference that covers the closed files as well as some experiments on finding & quantifying redactions in ASIO files: discontents.com.au/turning-the-in… (including some #redactionart!)
@naagovau @ausarchivists This chapter includes some analysis of how digitisation affects access to NAA records, as well as reflecting on the nature of online access to cultural heritage collections (search interfaces lie!). timsherratt.org/blog/hacking-h…
Back in the thread, @rick_carmody asked what proportion of files are closed. It’s a tricky question to answer because I don’t have all the data. The NAA says that ‘less than 0.25%’ are totally withheld: naa.gov.au/help-your-rese… But 0.25% of what exactly?
@rick_carmody The NAA does include access examination statistics in its annual reports. I started compiling them here: timsherratt.org/research-noteb… But again, it’s not always clear what is being counted.
@rick_carmody I made an attempt to reconcile the access examination statistics with my harvests of closed files, but the results were inconsistent. timsherratt.org/research-noteb…
@rick_carmody A few years ago I also attempted to harvest data about every series in the NAA. That told me there were over 10 million items described in RecordSearch. So if about 11,000 are currently closed, that’s around 0.1%.
@rick_carmody Keep in mind though that an ‘item’ could be a file with 300 pages, a bound volume, a single photograph, etc.
@rick_carmody So the number of files completely withheld from public access is very small. This is as it should be – ‘open’ is the default, closed files are ‘exemptions’. But by examining ‘closed’ files we can learn more about how the access examination system works in practice.
And this is important. Access doesn’t just happen, it’s a historical process. I talked a bit about questions of governance and policy arising from an analysis of the data in my submission to the Tune Review of the NAA: naa.gov.au/sites/default/…
And BTW, the Tune Review of the NAA was apparently delivered to the Attorney-General a year ago. There has been no public release or statement: naa.gov.au/about-us/tune-…
Before I continue, a disclaimer – the data on ‘closed’ files I’m working with is scraped from RecordSearch annually. It’s a snapshot at a point in time, not a continuous record of access decisions. So some things will be missing or inconsistent. Please keep that in mind.
If you want the harvested data, it’s here on GitHub: github.com/GLAM-Workbench… There’s harvests each January from 2016-21, each providing a snapshot of the previous year.
I’ve also shared the notebook I was using yesterday to explore the 2020 dataset. It’s very much a WIP & needs more explanation and documentation, but it might help you follow what I’m doing. I'll continue to update/improve it. nbviewer.jupyter.org/github/GLAM-Wo… (Try hovering on charts!)
Here’s another notebook that aggregates data from all 6 annual harvests to look at how things have changed over time (also WIP etc). nbviewer.jupyter.org/github/GLAM-Wo…
The notebook includes this comparison of how often reasons for closing files have been cited in each annual harvest.
If you want to focus on a particular reason, just select it from the dropdown list at the bottom of the notebook. It’ll show how often that reason has been cited in each annual harvest.
You’ll notice a few major changes across the annual harvests. After my initial work in 2016, the NAA cleaned things up a bit and removed the ‘Pre access recorder’ reason which was a hangover from before the Archives Act.
Similarly, the number of ‘Closed period’ entries was reduced.
They’re the main reasons why the total number of ‘closed’ files dropped significantly between the 2015 and 2016 harvests.
BTW the ‘Pre access recorder’ files weren’t automatically opened, despite being old. Their status was set to ‘Not yet examined’, meaning that researchers had to submit access requests for them and start the examination process again.
The frequency of most of the reasons defined under the Archives Act show small variations over time. Interestingly, 33(1)(f)(i) which exempts material that might prejudice a fair trial wasn’t used until 2018. The closed files mostly relate to the David Eastman trial.
Researchers will be pleased to note that the number of ‘Withheld pending adv’ files continues to drop, after reaching a peak in 2017. But let’s put that in some context...
As I noted earlier, ‘Withheld pending adv’ is a marker applied to files during the access examination process when the NAA refers a file to the relevant agency for advice. So the files are sort of ‘temporarily closed’, while the agency mulls things over.
This can take *years*. So while they’re not officially ‘closed’, they’re effectively closed. This is the major reason why researchers are so frustrated by the access examination process. It’s broken. Read some of the other submissions to the Tune Review: naa.gov.au/about-us/tune-…
Yes, it’s good that the total number of ‘Withheld pending adv’ files is coming down, but there’s still over 3,000 files in the backlog!
I just did some quick calculations of ‘wait time’ for files that are currently in the ‘Withheld pending adv’ category by subtracting the access decision date from the current date. The mean wait time according to this method is *5 years*.
I’m not convinced that the access decision dates are always accurate or useful, so I wouldn’t want to claim too much for this analysis. But it does indicate the scale of the problem.
I was just wondering whether some of this might be because the ‘Withheld pending adv’ marker isn’t removed when the examination is complete. But I repeated the calculation with files that have no reason other than WPA and it’s much the same.
Repeating provisos here that I’m trying to figure out how an active system works by examining a snapshot frozen in time...
Also, one of the reasons why the WPA backlog is decreasing might be changes to the Archives Act that were introduced in 2018 to discourage researchers from making large numbers of access examination requests. www8.austlii.edu.au/cgi-bin/viewdo…
Putting new constraints on researchers because the access examination system is borked seems to be the wrong approach to me. How about making the agencies accountable for the delays they create? (See my Tune Review submission for more… naa.gov.au/sites/default/…)
I think I need a break, back later with some 2019/2020 comparisons — following the fate of ‘unclosed’ files...
ok, buckle up, let’s see if we can finish this thread!
What I’m going to do now is to compare the 2019 and 2020 harvests to see what changed. This should tell us which records were 'un-closed', which records were newly closed, and which records were modified.

* ‘un-closed’ because ‘open’ is only one of the possible destinations...
For people who want the tech details – I’m merging the datasets using Pandas, then dropping duplicates (on identifier & reasons) to get rid of unchanged entries. Then I can separate out duplicates on identifier to find where reasons changed.
I ended up with 58 entries, but remember they’re in pairs by identifier – so that means there’s 29 files whose reason for being closed changed in 2020. I’ve listed the file details, then the reasons from 2019 and 2020 to show the changes.

Here’s a gist: gist.github.com/wragge/8126560…
Some of these are straightforward – where ‘Withheld pending adv’ has changed to ’33(1)(a)’, for example, it seems that the access examination process has finished with the result that the file has been closed on national security grounds.
You might notice that some of those files closed on national security grounds document the fire control systems in Parliament House...
I’m not sure what to make of the cases where the reasons are blank in the 2020 harvest. Just a glitch?
Interesting that some personal records from John Howard (Series M880) have had their reasons for closure changed by adding '33(1)(g)’ to 'Non Cwlth-depositor’. If they’re not Cwlth records then the 33(1)(g) exemption shouldn’t apply should it?
Perhaps it’s just a belts & braces approach given the reassessment of what constitutes a personal record forced by the Palace Letters case.
Ok, now let’s look at files that dropped out of the closed files harvest in 2020. I’m starting with my Pandas dataframe of differences, then dropping out duplicate barcodes (which we’ve just looked at), and limiting to 2019.
Then I’m looking each of these files up in RecordSearch to see what’s happened to them. I’m using the barcode in the first instance, but sometimes these change (eg when an access copy is made). If I can’t find the barcodes, I search on the series & control symbol.
If I can find the file, I add its current access status, access decision date, and barcode (if changed) to my dataset. This means I can see what’s happened to them!

I’ve saved the results as a CSV file if you’d like to explore: github.com/GLAM-Workbench…
First of all, let’s look at the reasons these files were closed in the first place. You’ll see most were either ‘Withheld pending adv’ or ‘Closed period’.
To save you adding those numbers up, there are 1,293 files in the dataset. But where did they all go?

Let’s look at their current access status to find out. How many successfully fought their way to freedom?
If we convert that to percentages:

* 56% are now Open 🎉🍾
* 38% are now Open With Exceptions 🤨
* 4% are now Not Yet Examined 😯

A few I couldn’t find in RecordSearch, and 4 were moved to a new barcode but stayed closed.
Of course, to complete these results we should add in those we looked at previously that stayed closed, but had their reasons changed from ‘Withheld pending adv’ to something else.
I’m not sure how a file goes from being ‘Closed' to ‘Not Yet Examined’. Looking at these it seems they were mostly ‘Withheld pending adv’, so maybe the request for access was withdrawn…??
Let’s now focus just on files that were ‘Withheld pending adv’ – what were their outcomes? There’s 623 of them, and their current status is…
Or as percentages:

* 64% are Open With Exceptions
* 29% are Open
* 7% are Not Yet Examined

One wasn’t found and one stayed closed (plus those we found above).
Remember that these files have been picked up in my ‘closed’ harvest because they were flagged during the access examination process as needing advice from agencies. They were seen as somehow ‘risky’. This is, I suppose, reflected in the fact that 64% have redactions or removals.
But conversely, we might point to the fact that almost a third ended up completely Open. Does this mean the risk assessment needs to be recalibrated? If fewer files were sent to agencies for advice, there would be fewer delays in the access examination process. Win/win.
How long did it take for these files to be access examined? If we subtract the original access decision date from the current access decision date, we should find out.
Hmm, in some cases the current date is earlier than the original date. Seems like these are mostly cases where the barcodes have also changed… In these cases I’ll subtract the original date from 1 January 2021 (the harvest date).
Here’s the breakdown…

There was a mean waiting period of 3.8 years!
The file that had to wait longest to be Open (with exception) was 'United Kingdom Joint Intelligence Committee Reports - General - File No 2 [3.5cm]’. It took 23 years.

recordsearch.naa.gov.au/scripts/AutoSe…
Again – I’m taking the access decision dates at face value, it’s possible this could be misleading, but I suspect the overall picture is fairly accurate.
I’m running out of steam, but before we look at the files that were newly closed last year, let’s celebrate a few who made it into the open. Go well brave records!

There’s 'Philippines - Relations with Pakistan’: recordsearch.naa.gov.au/scripts/AutoSe…
Also 'Fighter effectiveness - studies by ARL’: recordsearch.naa.gov.au/scripts/AutoSe…
And 'France - Disarmament - Nuclear Weapons Testing’: recordsearch.naa.gov.au/scripts/AutoSe…
For the full list of outcomes see the CSV file I shared earlier: github.com/GLAM-Workbench…
I’m spent, and you’ve probably had your fill of archives nerdery, so let’s have a quick peek at what was newly closed in 2020.
There are 566 files in my harvest that were given the access status of ‘closed’ in 2020. Here’s a CSV file with their details: github.com/GLAM-Workbench…
The most common series is A1838, and the most common reason is ‘Withheld Pending Adv’. It’s a familiar story…
To protect our national security in 2020, you were not allowed to see…
I’m off to have some dinner and watch Dr Who. Thanks for all the interest in this long thread.

In coming days, I'll clean up the notebooks, add more documentation and visualisations, and share it all through the #GLAMWorkbench: glam-workbench.github.io
If you’re interested in the sorts of things I do – exploring data from GLAM collections (Galleries, Libraries, Archives & Museums) – follow me here, or keep an eye on my updates feed: updates.timsherratt.org
You can also support me on Patreon if you want to: patreon.com/timsherratt It helps me pay some of the cloud hosting costs for my various projects...

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tim Sherratt

Tim Sherratt Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @wragge

30 Dec 20
Finished! NAA: SP42/1 is a general correspondence series from the Collector of Customs in Sydney. It includes many files relating to the administration of the White Australia Policy. 3,375 files have been digitised (about 20% of the series), that’s 49,781 digital images.
We downloaded all those images and used MTCNN to find faces. Generally portrait photos will appear in files relating to questions of identity. We found 3,803 faces (this includes a number of duplicates). Image
I did some quick tagging of women and children for @baibi (see the picture in the tweet above!). Interestingly, about 19% of the faces in digitised files from SP42/1 were women or children, compared to just 2% from ST84/1. I think I’ll leave it to @baibi to explain why…
Read 5 tweets
21 Oct 20
As a little experiment for #OAWeek2020, I’ve saved the details of 242 articles published in @AHSjournal between 2008 and 2018 using Zotero.

@AHSjournal has a Green OA embargo period of 18 months, so things published in 2018 or earlier should be outside the embargo.
@AHSjournal So how many Green OA versions of these 242 articles will we find using @OA_Button?

Any guesses?

To answer this I’m getting the article DOIs via the Zotero API and then feeding them to the OA Button API.

So the results...
@AHSjournal @OA_Button 18 of 242 articles are green or gold open access.

5 of these gold OA (ie via APC).

13 are green OA (accepted manuscripts in institutional repos).

There are also 2 articles that the journal has made ‘free access’. These don’t show up in the OA Button results.

20 of 242 – 8%.
Read 23 tweets
20 Oct 20
Another #OAWeek2020 handy hint for people without access to journal subscriptions -- use @zotero! When you save an article it uses Unpaywall to automatically find and download a Green OA version if available. zotero.org/blog/improved-…
The Unpaywall browser extension is also very handy -- it tells you when a green OA version of an article is available. unpaywall.org #OAWeek2020
The Open Access Button also helps you find OA versions of articles. And if there's no OA version you can request one! openaccessbutton.org #OAWeek2020
Read 5 tweets
22 Aug 20
Hey #ozhist #twitterstorians, if you want to explore the gaps & inconsistencies in @TroveAustralia’s coverage of digitised newspapers the #GLAMWorkbench can help! Here’s a few suggestions...
@TroveAustralia For the big picture, see this notebook which visualises the total number of newspaper articles by year, then breaks them down by state. nbviewer.jupyter.org/github/GLAM-Wo…
@TroveAustralia Notice the copyright cliff of death in the charts above? If you want to find out what newspapers are available beyond 1954, you can grab a list here: nbviewer.jupyter.org/github/GLAM-Wo…
Read 17 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!