Read on Twitter

12,399 views

Erik Reppel

@programmer

, 97 tweets, 26 min read Read on Twitter

🔧 Its a long weekend, I only found out on Thursday, I have no plans.

That means one thing: hack weekend 😃

Gonna try live tweeting my progress in this thread.
Hopefully it increases my accountability and forces me to ship.
#HackWeekend

(this may be rambling garbage, you've been warned)

💡 The idea: iMessage extension that replies to the last message in your conversation with text generated by GPT-2 AKA AI that replies to your texts.

I know literally nothing about iOS development
I have worked with GPT-2 before but never CoreML or other formats Swift supports.

GPT-2 is a state of the art text model produced by OpenAI, I'll be leaning heavily on the weights / code they released d4mucfpksywv.cloudfront.net/better-languag…
openai.com/blog/better-la…

Rough plan: convert a pre-trained GPT-2 to iOS friendly CoreML format, read last incoming message from conversation when the extension is activated, use that text as the input to the CoreML model instance, send message of generate text, find way 2do model config for temp and seed

(I actually started on this last night during the Bucks Raptors game so gonna back fill my progress)
So far:

Decided to get the iOS stuff going first since it has more unknowns

Set up XCode, did all the Apple cert stuff/setup a dev account, got the iOS emulator working, started an iMessage extension project in XCode

Read through developer.apple.com/documentation/…, cloned and got developer.apple.com/documentation/… to work.

Created a placeholder iMessage icon (I'm terrible at design, will come back to it later)

Struggled with the iOS ViewController APIs. Very different than any of web dev stuff I've done, but I think I'm starting to get it. The Apple developer docs are actually really good.

Tangent: so far Swift is really nice. guard statements and `if let` are great for handling nil values. Take notes Go. Also like the optional `?` syntax.

First setback, I assumed that imsg extensions could read all msgs in a conversation, this was a bit misleading in the docs. Turns out you can only read msgs generated by your extension (I known nothing about iOS, lmk if I'm wrong)

This does make feel better about using imsg tho

Work around: textfield the user enters the input to the model with a suggestion to copy paste the last message in their convo 🤷‍♂️ if anyone has any better ideas let me know

Dug up some code to abstract gpt-2 I'd written for a side project that never saw the light of day, looks like it still works. Still need to figure out the best way to convert to coreml, but not starting at 0 so thats nice

Thought of a name and bought a domain! aireply.app (nothing there yet)

(this is roughly when I stopped for the night)

Alright, back to code, Saturday morning, started working on the ui in xcode.
Ran into a weird issue where all my ui components would appear on top of each other when I ran the app on the emulator. Turned out I needed a stack view, autolayout, and constraints.

This was helpful appcoda.com/learnswift/aut…

Started hooking up the UI to code. Was relatively straight forward to just follow a random tutorial but delegates, Reference outlets, all the stuff in this menu, seems crazy complicated.

Thankfully I don't think I'll need to touch it again, I have most of the iOS stuff done!

Can:
grab the text entered and do stuff with it on button press, can send messages, even have an alert if the text field is empty

Very simple stuff so far. Pretty much everything happens in this function.
So far its just been climbing the iOS learning curve

(if you're tracking the Erik's life in real time portion of this for some reason, this is the point when I went to the gym and shopping for clothes)

But yeah, all caught up on progress! Will live(ish) tweet progress from now on.

Next up: the make or break piece of this, see if I can get gpt-2 to work with CoreML.

Theres a bunch of more subtle questions with this

1. Is dynamic text input length possible with how CoreML handles graph execution? If not whats the best UX for forcing a sequence length? Padding probably
...

2. What do meta parameters look like in this world? Seed, temperature, and top_k have a large effect on model output, would be cool to let people play with these settings

3. Does this cook the processor on an iPhone? Takes ~20s to run on a 2015 MBP. Model size is also an issue, gpt-2 is big even with 117 million params (I think ~500 MB on disk)

4. How easy is swapping out models with CoreML? It might be possible to fine tune gpt-2 on more specific datasets based on specific characters / show "Reply as {Joe Rogan | Michael Scott | Jon Snow }" etc seems awesome if possible

Alright, baby steps, lets convert a simpler model to Core ML and get more familiar with what the format allows. Lets do the hello world (train mnist in TF/keras, convert to CoreML, get it running in the emulator)

Interesting, Core ML has an NLP component developer.apple.com/documentation/…. GPT-2 does its own encoding, but might be useful if I need to port parts into Swift

Ah yes, the perfect time for my internet to go down

We back, shout out to mobile hotspots and cheap data

One day libraries will upgrade to Python 3.7 and the world will be a better place. Until then:

Tangent: It is really annoying Python requires packages to explicitly target minor version bumps. My guess would be <5% of package break on minor version bumps (Python allows breaking changes in minors 😬) and it unnecessarily prevents people from upgrading to latest

Btw if anyone has random phrases they want me to run GPT-2 on while I have it up, send them my way.

Results are usually pretty fun

Core ML quantization looks v cool, less worried about model size
heartbeat.fritz.ai/reducing-corem…

Alright so definitely should use a high top_k and shorter output length. Temperature is more of a subjective thing

Pro tip: shift + tab on functions in jupyter gives you the doc string

Alright so here's the plan, going with option 1:

I have a trained TF version of mnist as a set of ckpt files that replicates the format OpenAI released. Plan is to go .ckpt -> .pb (tf protobuf binary form needed the tfcoreml conversion tool) -> .mlmodel (file type used by CoreML).

Then load the .mlmodel file into the app.

🤞 if that goes smoothly I should be able to convert the GPT-2 weights to coreml, then its just porting the vocab encoding to Swift.
If it doesnt work I might look at TF lite and see if that makes things any easier.

fwiw heres the state of the two models

Success! God Pytorch >>>>>> Tensorflow. So many serialization formats in TF

In TF land I know of the ckpt family (not a single file, generates multiple files, the number of files it builds also changes if you're using TF or TF Keras, and based on tf version), theres also, .m5, .pb, .pbtxt, and might be forgetting some. They all do the same thing.

In PyTorch there's 2 way one specific to PyTorch with the option to ignore data about model structure (.pth): pytorch.org/docs/stable/no…
And then theres first class ONNX support (open standard for model serde) pytorch.org/docs/stable/on…

Alright theres definitely something borked with my model weight checkpoint. The TF Core ML converter can't infer any layer shapes. For now gonna try exporting keras directly to Core ML. Wont work for gpt-2 but at least moves me into core ml land

This tweet makes a lot more sense if you see the the massive heap of errors produced by coremltools

Pro tip: TF Keras is not supported by coremltools. Also, importing keras after using TF Keras in a notebook session will cause issues. Also, when in doubt, restart your kernel

Okay, now have a Core ML model to play with, a path to converting GPT-2 weights that might work, a good enough for now UI, and a domain.

Major things left: get familiar with CoreML api, convert gpt-2, port the vocab code from python to swift, integrate into app

Time for a break

@fritzlabs

@fritzlabs

Shout out @fritzlabs, this post was helpful for my sanity making sure I wasn't misusing coremltools I've read other stuff on their blog, its great, check it out heartbeat.fritz.ai/using-coremlto…

Back to the break

Definitely Coors o'clock sharp

Back at it, gonna see if I can come up with a better icon. Design is not my strong suit, feedback plz

What are the colors again? Asking for a friend

Left, center, right? Keep trying?

Breakfast of champions, back at it

@ata_aman

@ata_aman

Some more icon iterations, shoutout to @ata_aman @HeyJakeC @_michaelreiter and @tbtstl for the design feedback. Probably gonna go with one of these two

@ata_aman

@ata_aman

@ata_aman @HeyJakeC @_michaelreiter @tbtstl Back 2 code

@ata_aman

@ata_aman

@ata_aman @HeyJakeC @_michaelreiter @tbtstl Core ML's quantization seems really impressive. Need to read more about tradeoff between number of bits / algorithm and performance but if I can actually reduce model size by 10x that opens up a bunch of possibilities

@ata_aman

@ata_aman

@ata_aman @HeyJakeC @_michaelreiter @tbtstl me:

XCode:
💻🔫

@ata_aman

@ata_aman

@ata_aman @HeyJakeC @_michaelreiter @tbtstl Alright, have my dummy model in XCode and have it scoring fake data.

The fact this seems to be the best way to create a vector in Core ML is worrying. Seems better if you're using images directly but still, super limiting.

Hopefully Swift for TF can bridge the gap...

But anyway, there seems to be *a* path to porting the GPT-2 encoder code to swift, its just gonna be a bit verbose 😅

Break time, heading to the gym

Naps are underrated. Time to convert GPT-2 from TF to Core ML.

Core ML was never my friend

But yeah this really limits what language models can be deployed on Core ML. This is the crux of how GPT-2 produces tokens that are then decoded back to text.

I guess I could unroll the while loop for a fixed length but I'm pretty sure I'd have no shot using the provided weights

Its see if TF lite or Onnx -> some format supports cycles I guess but not looking good 😬

I'm now convinced that the hardest question in computer science is "given a TF ckpt folder, find the output node of the graph"

Alright back at it. Gonna give Pytorch / Onnx like 45 mins of effort and if that doesnt work switch to server

Apparently OpenAI released a larger weight set at some point (~3x the model parameters) and you, know what, its better. Who woulda thought 🤷‍♂️

"Huh, lets see what the input and outputs of this model look like so I can tell the graph generator how to do the thing"

Tensorflow: literally half an hour of internet research literally nothing better than "search list of all tf graph nodes for things it might be"
Pytorch:

Alright so I have an idea of how to do model inference on device, split the parts of the model into smaller, easier to serialize parts, write the code to connect those parts in Swift.
Core ML's arrays do not look fun but it should be possible, there might be better abstractions..

..that make dealing with the arrays on device easier, can I compile iOS apps using the Swift for Tensorflow tool chain? (yes its in the compiler, its not just a lib that can be installed, something something optimization) guess I'll find out another time

So gonna pivot to having the iMessage extension make a POST request to a server I stand up. Not as fun but theres no way I get all that done tonight and I'd be cool to have a demo.
Definitely want to come back to on device later though. Seems possible, just challenging.

Alright model + server works, time to add the POST to the iOS app

Also my god is GPT-2 scary. Input above, generated below.

Done! That there a functioning prototype.

Still a lot of things to add (clean up generated output, smart output length, user defined temp / top_k, on device gen) but happy with the result. Will write a summary / reflection

This was fun! Definitely will do #HackWeekend again 🔥

Alright, reflections and learnings from #HackWeekend, a thread:

Swift is a great language.
Quite like the way it feels, never felt "limited" by the language. Can definitively feel some legacy stuff from Objective-C in iOS dev, NSNumber (really NS*) seems a bit unwieldy but overall...

...iOS dev in felt pretty seamless. I'm sure there are gotchyas but I was pleasantly surprised.
Few things were non-obvious to me; how to do programmatic UI updates? I'm guessing view components can be created / modified by ID? Didn't look too hard but obvs is a solved problem

This project showed me yet again that raw TF is a nightmare to work with if you didn't author the original code.
There are so many ways to do everything, manually dealing with graph sessions sucks, serde is fractured, somethings play nice with keras others don't.

Its definitely a good call for TF to adopt the Keras api and make it a first class citizen, and TF 2.0 and eager look to be a huge improvement, but man TF 1.13 can be rough and theres enough code written that its not going anywhere any time soon.

@huggingface

@huggingface

It was far easier to use PyTorch with @huggingface's implementation of GPT-2 code, convert the newly released weights, and then poke around than use the original TF based code / weights. PyTorch is definitely not perfect but I think overall it has the tradeoffs I like the best.

CoreML is really simple to use for models that look like this, static shape, easy to define forward pass, easy* to serialize, no control flow or logic

It is not well suited for anything that has conditionals/is dynamic/has cycles that be cant unrolled. Core ML uses MLMultiArray as input to models. It does not have support for math ops. You can likely do all the math with MPSMatrix then convert but I couldn't find any examples.

Or I should say, it wasn't obvious how to do math ops and I couldnt find any examples or docs

More complex "models" are really becoming programs where parts of the program are neural networks called in various ways in between sections of control flow. Alpha Go Zero and GPT-2 both fall into this category.

At this point you likely need to re-write almost all of the program code to make it usable on mobile which is a lot of effort and realistically means the models on device will remain static unless its for a very specific product need that makes it worth the effort

It isnt just "lets serialize Alpha Go Zero and run it on Core ML", you're re-implementing MCTS from scratch and then using Core ML to make calls to the encoder head you struggled to convert.
This may be an obvious point "of course you need to port the code between devices"

But I don't think thats the way it necessarily needs to be, and this is where I think Swift for Tensorflow is going to blow everything out of the water

Onnx, CoreML, TFLite, all those standards are great for serializing models as static graphs, but for dynamic graphs or complex control flow you need a compiler.

Swift for TF is the only thing I've seen that is solving this problem and has a path to work on device, in prototyping, and on server all with the same code or only minor modifications.

Theres a world in which Alpha Go Zero is released in developed in swift, native code for control/MCTS, TF for the NNs, is cross compiled to iOS and is running on your phone on day 1.

Its really powerful to be able to minimize the time between cutting edge and consumer use.

Its very possible I'm wrong about Core ML and there is so secret lib somewhere that makes it more usable, if so let me know I'd love to check it out.

Alright I'm done ✌️

@threadreaderapp

@threadreaderapp

@threadreaderapp unroll

Like this thread? Get email updates or save it to PDF!

Subscribe to Erik Reppel

This content may be removed anytime!

Try unrolling a thread yourself!

Trending hashtags

Like this thread? Get email updates or save it to PDF!

Subscribe to Erik Reppel

This content may be removed anytime!

Try unrolling a thread yourself!

Related hashtags

Related threads

Trending hashtags

Did Thread Reader help you today?