Profile picture
Thomas A. Fine 🇺🇸 @thomasafine
, 34 tweets, 7 min read Read on Twitter
I'm continuing to work on Kavanaugh's calendar that his lawyers submitted into evidence. The longer I look the more questions I have. I have no smoking guns right now, but I want to go through some of the many issues that are troubling me.
First, background. PDF files are "container" file formats. They contain a collection of objects, arranged in a hierarchy, along with instructions on how to arrange those objects for display. The objects include images (e.g. JPG files), text, fonts, and vector graphics.
Most scanned documents are stone simple. Each page contains one object and that object is an image of the entire page. This is not the format of Kavandaugh's calendar. his pages are each a collection of background images stitched together, along with text/graphics overlays.
This is why many people were suspicious from the start. However there are several software packages that convert images into searchable or editable documents. It's been suggested that this was done innocently, which remains possible.
The scanner in this case is allegedly a RICOH C4503 MP printer/scanner/copier, based on metadata embedded in the PDF. Based on its documentation this machine has an optional OCR unit which can embed text information.
As far as I can tell, this does not convert graphics to text, it simply tags text locations with their text values. RICOH has a different product, ESA transformer, which can make text editable (and would convert graphics into text overlays).
But as far as I can tell, ESA transformer is not available on the 4503 (or any xx03 series) multifunction copiers. It seems unlikely (but not proven) that the copier could generate a PDF file of this kind.
It's important to mention that having both Creator and Producer in the PDF metadata say "RICOH CF4503 MP", this means that the scanner must have been used to create a PDF (rather than individual images, which were assembled into a PDF in other software).
It is possible that a RICOH-produced PDF file was read into other software and modified. This is bolstered by the fact that the PDF metadata has different metadata for Created and Modified timestamps, more than three hours apart.
This could still be innocent. They could have scanned in the entire calendar and decided to only submit four months, so they read it into a product like Adobe Acrobat, and removed the unwanted pages.
I would expect in that case though that the Producer metadata would be changed to reflect the software used. I can't verify that this happens for all software though. Some software definitely does this, and I would expect it to be standard.
It's also troubling that the metadata section of this PDF file, a plaintext XML stream of 3153 characters, includes over 2000 spaces with intermittent carriage returns at the end of it.
Troubling because, if you wanted to "hack" a PDF file to change the metadata, it's troublesome because every object in the file has its length encoded, and many PDF files also include an index called an xref which tracks the location of every object in the file.
So to change the metadata "correctly" you'd have to change the length of that object, and potentially correct the xref of all the objects that come after it. And there's more than one way to encode xrefs. And PDF files can be hierarchical. It's an editing nightmare.
So the simple way to hack the metadata is to simply replace a whole bunch of stuff with spaces.
However I again want to emphasize this is not a smoking gun, as there could be real software out there that does this. For some effed up reason.
Another troubling bit is the way in which the software that converted handwriting to text was very arbitrary in its decisions about what was and was not text, and when to replace it. Here's one example from July 28th.
Look at that image as large as possible. The "Dr Dellatorre 2:00" text has been completely wiped from the graphical image and replaced with a graphical or font object. But in that same space, the "Go to judge's" text was left mostly untouched, except in four letter loops.
Here's the text that was extracted from this region, according to cut-n-paste:
ue • 28
te-


� a,:•
So the top text that was 100% replaced graphically, was converted into about the same number of random characters as the bottom text which was barely touched at all. So that's pretty odd.

Still I can not rule out software failing in this way, completely innocently.
I can pull layers out of the PDF file (this is an ongoing effort). Here's the background image. This is the data as it was scanned, and after some of the text was erased and converted into graphical overlays.
As you can see the graphics replacement doesn't always go well. It tries to fill in an area with the background color, and in the "Go to judges" where it detected four loop shapes, it decided the background color was blue. The white parts you see in the PDF are overlays there.
But the top portion ("Dr. Dellatore 2:00") was completely 100% erased from the image. Not a trace of that text remains here.

But here's where it gets really weird...
If you look really closely at the background image, you can see a blurry shape near the upper left, and an even more blurry shape in the upper right. These correspond to the "M" and the "P" on the following page. Bleed-through on scans is totally normal yes?
But here's where a miracle occurred. A bad algorithm that uses very simplistic methods for background replacement, and gets it wrong sometimes, was able to remove every trace of the Dellatore text, while preserving bleed-through from another page right under that text ("M").
I absolutely can still not rule out the notion that software could manage this small miracle, and still be generally bad at background replacement. It's not impossible. But it's REALLY. FREAKING. UNLIKELY.
I am still searching this file for a smoking gun. One place I'm hoping to find it is in the fonts that were created to replace the handwriting. It's possible that if the file was edited, text objects were removed, but the associated font would not be reduced.
That's what I'm working on. These are only some of the issues. I wanted to get this status out to let people know that the issue is not settled. Text conversion may not fully explain what's happening here.
Nefarious intent has not been ruled out. More work is required to explain the many oddities in this file. cc: @MickWest
CORRECTION: as you might have noticed in the image the Creator and Producer in the metadata is "RICOH MP C4503". Got the order wrong a few times above.
Going to toss out another issue while I'm thinking about it. The manner in which the background images have been sliced apart, and then stitched back together makes no sense to me. The cuts don't always match text changes. And there are resolution changes in different cuts.
Here's an example. It's a portion of May 9th. Why cut this in this way? There's no text in the remove corner, and the "1 Year" was not converted at all. Also this is much higher resolution than the July 28th background clip above.
To be clear, I clipped the July 28th image down from a larger extracted image to use as an example but did not alter the resolution. The weird L-shaped piece from May 9th is exactly as it was extracted from the PDF file.
Here's the largest section of the May background exactly as extracted, which is also a different resolution than the L-shaped clip. I can't imagine why any automated conversion would cause these resolution changes (but it is still possible).
(I can't guarantee that twitter won't screw with the resolution of any of these things on upload).
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Thomas A. Fine 🇺🇸
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!