Before the iPhone existed, I worked on a few games for what were called "feature phones": Doom RPG 1&2, Orcs&Elves 1&2, and Wolfenstein RPG.
Qualcomm's native-code BREW platform had better versions, but I haven't seen any emulators and archives for it, so they may be lost at \
this point. The J2ME (java mobile) versions are still floating around, and can be emulated.
My son wanted to get O&E2 running, so we set out on a little adventure.
Kemulator ran the game, but audio was glitchy and it hung after you died in game. Well, we are programmers, we \
should be able to fix it. Unlike most emulator projects, Kemulator turned out to be closed source abandonware, so we moved over to freej2me, which is a live github project.
The hang didn't happen, but audio was even worse. Missing sound effects was a simple bug fix -- \
MIDI sounds weren't seeking to the start on replays. We will submit a patch. Still, everything was glitchy with audio underruns.
We noticed that the emulator was taking an absurd amount of CPU, despite the game being built for <100 MHz mobile CPUs. \
We spent a frustrating afternoon exploring java profiling tools, but finally, Flight Recorder and JDK Mission Control pointed out the root cause: explicitly invoked garbage collection. A vague memory of having to call system.GC() every frame to avoid problems on some mobile \
phones bubbled up. We couldn't change the source on the game, but the jvm has a handy option -XX:+DisableExplicitGC that fixed everything right up.
This is an interesting case where an operation is >10x slower on a modern computer. \
A GC sweep on a phone with 128k of heap is a very different thing than a desktop with a multi-GB heap.
Some old writing about the early cell phone work:…

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with John Carmack

John Carmack Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ID_AA_Carmack

2 Jul
A lot of indie game devs want to do everything themselves, either by leaning on the asset store, or by becoming a polymath coder/artist/modeler/sound designer. It isn't impossible, and everyone has their favorite example, but it definitely isn't the high-probability path to \
actually produce something successful. If it is a personal growth hobby, then fine, but if you want to compete in a very crowded market, expanding the team with complementary skill sets is usually critical. I think about this a lot as I sit here working on AI by myself.
To make this a little more actionable -- maybe the next development task you brace yourself for shouldn't be learning how to make hair in Blender, but rather scouting partners or contractors you can afford (do not value your time at zero!). There may be a middle ground for \
Read 4 tweets
18 Dec 20
I get asked for career advice a lot, and while my "learn deeply" pitch may be good long term advice, it doesn't help breaking in -- being able to write your own tool chain with a hex editor is great and all, but it doesn't add value at most companies. \
I suspect there is a useful path that I think of as the "tool master". Modern art and programming tools are enormously complex systems, and the typical user only touches a tiny fraction of their features. Picking a path of study that revolves around deeply learning a tool rather\
than building works is potentially backwards, but if you learn why every feature exists, you actually learn a lot about the craft the tool is used for, and you are very likely to be able to add value to a team almost immediately by teaching tricks to the existing developers, \
Read 5 tweets
7 Dec 20
After complaining that numpy took many hours to solve a 64k x 64k matrix, I broke out cuSolver, Nvidia's GPU linear algebra library. A 32k matrix gets solved (LU decomp) over 1000x faster than base numpy (with MKL not loving my AMD CPU), but a 64k matrix of floats is too big \
to solve directly on my 24 GB Titan RTX card. The nice thing about working with a low level library is that you have to explicitly allocate the temporary working buffers, so when it doesn't fit on the device, I can put it in pinned host memory or on my other card connected \
by NVLink. The 64k matrix gets solved in 109 s with nvlink memory, which is still 200x faster. At 32k, the comparison is:
Local mem: 2.2
Nvlink mem: 21.7
Host mem: 80.8
Clearly very bandwidth bound! There is probably a super-linear speedup for explicit multi-gpu computation. \
Read 5 tweets
27 Apr 20
The Imperial College epidemic simulation code that I helped a little on is now public:… I am a strong proponent of public code for models that may influence policy, and while this is a "release" rather than a "live" depot, it is a Good Thing.
Before the GitHub team started working on the code it was a single 15k line C file that had been worked on for a decade, and some of the functions looked like they were machine translated from Fortran. There are some tropes about academic code that have grains of truth, but \
it turned out that it fared a lot better going through the gauntlet of code analysis tools I hit it with than a lot of more modern code. There is something to be said for straightforward C code. Bugs were found and fixed, but generally in paths that weren't enabled or hit. \
Read 4 tweets
11 Apr 20
AMD 3990 CPU scaling tests: Because of the Windows group limit of 64 CPUs, just firing up a lot of C++ std::threads didn't work great:

128 t = 67 s
64 t = 63 s
32 t = 84 s
16 t = 160 s
8 t = 312 s

32 to 64 threads wasn't a big boost, and 64 to 128 was slower. However! \
Setting the group explicitly let it scale all the way up:

128 t = 38 s
64 t = 48 s
32 t = 84 s
16 t = 160 s
8 t = 312 s

Notably, because each group gets 32 hyperthreaded cores, 64 threads across 2 groups on an unloaded system is much faster because they are all alone on a core\
instead of shared two to a core. That means that if you don't want to add the windows group code, you are better off disabling hyperthreading and having 64 single thread cores in a single group.

I expected this code to be memory bound sooner, I'm impressed with the scalability!
Read 4 tweets
27 Mar 20
Just before travel started getting locked down I was at a small gathering of people brought together to talk about the future of computing. One of the after-dinner topics was more general predictions a decade ahead, and the first question was "Person on Mars in 10 years?"
Many thought it would happen, but despite wishing it, I think the odds are <50% in that window. As a clarifying tactic, I said "Lets put money on it. $10k says it doesn't happen by then." It was *striking* how opinions got immediately reevaluated much more critically. This was
a room full of world class engineers, but the gap between abstract belief and careful consideration was large. I try this tactic often, because I think people will argue, sometimes passionately and occasionally belligerently, for positions that, if they really bring all their
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!