Tweet

Marcus Mengs

Follow @mame82

12 Jan, 62 tweets, 18 min read

@radareorg

Okay, doing my first baby steps with r2frida (which combines the power of @radareorg and @fridadotre).

Gonna share my progress in this thread (live, so keep calm).

The goal: Runtime inspection of data sent out by TikTok !!before!! it gets encrypted

1/many

First of all, we do not start from zero. I got some prior knowledge from past reversing attempts and want to share some important facts.

TikTok's (log data) encryption is accomplished by a native library. The Android Java code just serves as proxy function to the native function

The decompiled code for the respective native JNI function of an older TikTok version looks something like this, but in this example I use the most current TT version (no statical analysis done, yet)

@maddiestone

In case you never reversed native libraries which were build to interface with Android Java layer via JNI, I highly suggest the entry level introduction on the topic by @maddiestone

Before we start, I want to pinpoint some important aspects (which are also covered by Maddie's videos).

1) Unlike raw C-functions, JNI functions like the one showcased above, receive pointers to complex Java objects .

F.e. a function receiving a String on the Java layer...

... would receive a pointer to a 'jstring' on the native layer (not a zero-terminated C-String).

In order to retrieve a C-String, to go on working with it in the native code, some translation functionality is required. This functionality is provided by the ...

JNI (Java Native Interface). The JNI environment is passed in to JNI functions as first parameter.

If you look at the example screenshot again, you see exactly this. Functions provided by the 'env' pointer are used to parse the Java function arguments (f.e. jByteArrays) ...

Once the raw data is converted to a more C-ish form, it gets passed to a inner function 'ss_encrypt' in my example. The inner function, in this case, is a pure C function and thus receives only C-style parameters (also no 'env' parameter, so it would not be able to access JNI)

@maddiestone

A 2nd important aspect on JNI libraries, covered by @maddiestone

2) There are two ways to expose JNI methods from a native library:

a) export them with proper naming convention, so that JNI could recognize same on library load
b) use the JNI functionality 'registerNatives'...

... to register the JNI functions once the library gets loaded.

The second method of registering methods is wel suited for obfuscated code, as the methods neither have to follow naming convention, nor do they have to be exported.

As you might expect, TikTok uses the 'registerNative' approach. The screenshot below shows log output from a custom tool, which monitors JNI methods registered by instrumented Android apps (TikTok's encryption method in the example)

If you would decompile the Java part of the TikTok apk, the encryption functionality (on an older version) would look something like this:

The Java method 'm18421a' receives a Java 'byte[]' and a Java 'int' as parameters and returns a 'byte[]', again.

Internally, this data is forwarded to the native JNI method 'ttEncrypt'.

The important aspect about this, is that the native 'ttEncrypt' JNI method has to accept those exact parameter types and thus has to "register" with a proper method signature.

We already saw this signature in a previous screenshot

The last line from the screenshot above, shows 3 things the native code has to provide for each method, when calling register natives:

1) the call address of the native function implementation (0x7d70d1d5 in example)
2) The function name (ttEncrypt)
...

3) The method signature, which is '([BI)[B' in this case and translates to:

'(' start of parameters
'[B' byte[]
'I' int
')' end of parameters
'[B' byte[] (return value)

So we keep this in mind: Even if the native library does not export the encryption method, it has to store the 1) funtion address, 2) name and 3) signature in a data structure, in order to provide it to 'registerNatives' once the library gets loaded by JNI

We now know almost everything we need, in order to head over to r2frida, except some important facts on my test setup:

- the app is inspected on a physical device, running Android 9
- the device uses a !!32bit!! ARM application core

@fridadotre

As I am new to r2frida, chances are high that things could be achieved in an easier ways.

Now to get started, I already have the latest @fridadotre server running on my USB connected android device and 'frida-ls-device' shows it being ready-for-action

So let's spawn a fresh TikTok instance with r2frida, using the following command

The command 'launch'es the process, on the attached frida 'usb' device with the device id '4c197256' and the process's package name, of course is 'com.zhiliaoapp.musically'

Instead of 'launch', two other options could be used:

- 'attach' (would attach to an already running process, given by name or PID)
- 'spawn' (like 'launch', but the process would not be resumed automatically after attaching)

So for the warm up, let us use the Frida functionality, which alllows enumerating loaded Java classes. This nicely combines with the r2 syntax (concatenation of single-letter commands, '~' for grep)

Important: commands targeting the r2frida plugin have to be prefixed with '\'

The r2frida command to list classes is '\ic' (note the backslash prefix). The unfiltered result would be a bit overwhelming ...

... so we grep for classes including the term "crypt"

The '\ic <classname>' command lists the !Java! methods of the respective class
The signature of the static method 'EncryptorUtil.a' should look familiar to us (if you read the first tweets). It represents the Java layer of the encryption method and is called 'a' in this version

@fridadotre

Note: The information above would be enough, to Intercept the method from the Java layer (f.e. with @fridadotre or Xposed), in order to inspect the call arguments (the byte[] parameter represents the plain data before encryption)

... but we are here for the native layer and to inspect data at runtime, right?

So lets search the whole address space for our native method name 'ttEncrypt'

Note: If you'd use r2's ascii search nothing would happen, you have to use the '\' prefix to search with r2frida

The search ends with two hits:

The screenshot below shows, that the attempt to print a hexdump from the address of the first hit fails with r2, while r2frida (backslash prefix) works.

Reason: The memory region was not populated when r2 was started (encryption library was loaded after process launch)

I solved this issue like this:
1) Quit r2
2) Open r2 with r2frida, again, but this time **attach** to the already running process

et voila ... the memory offset is mapped and dumpable with 'px' (without backslash prefix)

Note: The last step is not necessary for a data hexdump, as you could still use '\px', but it turned out to be useful when it comes to printing the disassembly of "late loaded" code regions. This is because I sometimes struggled with '\pd', but 'pd' worked (+ various r2 views)

Having a closer look at the first hit of out string search for 'ttEncrypt', we notice that it is directly followed by a C-string with our method signature.

So chances are high, that this data is part of the structure which gets handed in to 'registerNatives'

Reminder: in order to register the 'ttEncrypt' method to JNI, the 'registerNatives' method requires a structure containing
- method name (C-string)
- method signature (C-string)
- method pointer (native pointer)

So the next step would be to search the process memory space for cross references to the address of this method name string (0x8448b74c). As I haven't applied any auto analysis, I use a simple hex search for this (in my case the byte order of the address has to be reversed ...

... to account for the architecture endianess).

The result is promising: Only one hit, for a search across the whole address space:

Printing the first 12 bytes from this XREF offset, reveals 3 pointers again (reversed endianess):

- 0x8448b74c (expected, method name pointer)
- 0x8448b756 (ptr to signature string, yay)
- 0x8448b1d5 (likely pointer to JNI method implementation)

So the layout of the 3 pointers from above speaks a clear language. Very likely this is the data struct passed in to 'registerNatives' and thud 0x8448b1d5 points to the native implementation of 'ttEncrypt'

Sorry, before going on I have to insert a small excurse on adressing/instruction sets on arm 32 (specific to my test setup). Anyways it is crucial:

Arm 32 supports two instruction sets "ARM mode" (32bit) and "Thumb mode" (16bit) which could be used interchangebly

In order to distinguish if a function call target (branch) should be interpreted as ARM or THUMB, the least significant bit (LSB) of the function address is taken into account

For ARM the LSB is 0 (even address)
For THUMB the LSB is 1 (odd address)

The actual instruction ALWAYS resides on an odd address.

This means the function address 0x8448b1d5 homes code in THUMB mode (16bit), while the first instruction resides at 0x8448b1d4

(sorry if it gets a bit complicated, will be clear in a second)

So if we print the disassembly at the (assumed) 'ttEncrypt' address, things look a bit weird

... this is because instructions are interpreted in ARM mode (32bit).

Lets fix this:

... looks much better, still the first instruction is off-by-one 🤣

No seriously, as explained, on arm32 we have to disassemble at [THUMB mode address - 1] = 0x8448b1d4

Nice, this looks like a proper function stub (note how the callee stores reg values on the stack, before moving on).

Now to get a feeling on how often this function is called, lets use 'r2frida' power to trace it.

Important: The thumb address has to be used here!!!

The resulting output of the '\dt' command, which places the trace hook also indicates that the function address maps to an offset in 'libEncryptor.so' ... let us call this a "nice confirmation"

Some actions in the TikTok app ... trace logs for ttEncrypt-calls arrive

... cigarette break ... stay tuned (if the app crashes meanwhile, I'll start from scratch)

Let's remove the trace hook for now, with '\dt-*'

Remember my screenshot of a decompiled 'ttEncrypt' function from an older TT version. We traced the corresponding functions.

Trying to runtime-parse the function parameters, which represent Java object instances would be insane (maybe impossible)

... luckily, at least the old implementation, internally called a method 'ss_encrypt' which received a c-style byte array pointer and an integer representing the length as first two parameters.
It would be way easier to runtime-inspect these

Lets take a closer look on the disassembly of our assumed 'ttEncrypt' function, by seeking to its offset with 's 0x8448b1d5' and switching to a more suitable r2 view with uppercase 'V' command (press 'p' till the view changes to disassembly)

The view from above allows scrolling through the functions code with cursor keys. Most important: calls to other code parts (branches) are printed bold and suffixed with [1], [2] ...

Hitting [alt+1] moves us straight to the marked branch offset:

The code above looks not like a legit inner function (we do not care for alignment and inspect the next branch).

Hitting 'u' returns us to the parent function, followed by [alt+2] which brings us into the 2nd branch

The 2nd branch at 0x84483aa4 looks better (proper function stub). We could easily drift back to the static analysis world, to find further evidence for it being the inner 'ss_encrypt' function. But hey, we are working with instrumentation, so let us just inspect the calls

Remember: While we disassembled at 0x84483aa4, the code is THUMB mode. Thus the proper tracing address would be 0x84483aa5 (LSB set to 1), unlike you like crashes (restarting here would not be funny, 'cause thanks to ASLR all function offsets would differ)

In contrast to our first tracing attempt, we use the beautiful command for formatted tracing, which allows us to print out function parameters for each call in a predefined format.

Command syntax:

The screenshot below shows how I placed my trace hook. The 'pppp' means that the first 4 function parameters should be printed as hex values (pointers) for each call.

Ultimately 2 calls get logged

Those call traces look very promising. We expect the inner function to receive a pointer to a native byte array as 1st argument and an integer representing the array length as second argument. The logged call traces totally match this assumption.

So lets slightly modify this:

With the formatted trace modification 'pi' from above, the first argument gets printed as hex pointer, the second as decimal integer.

This makes it easy to hexdump the content of this buffer (as long as it is kept alive)

But nooo ... what's this. The encryption input data is not human readable. No worries, pay attention on the first 3 magic bytes ... looks gzipped, doesn't it?!

Luckily r2 could has an "unzip print" command 'prg'.

Here is an attempt for another buffer (next trace event):

That's it folks, we real-time inspected what data gets encrypted by TikTok, before the app is even able to send it to the internet.

Readers task: Dump the resulting cipher bytes and find them in intercepted web traffic

@as0ler

Credz to @as0ler @oleavr @trufae @enovella_ and all other devs/contributors to great FOSS projects, which allow doing things like this ❤️❤️

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Marcus Mengs

Try unrolling a thread yourself!

More from @mame82

Marcus Mengs

Marcus Mengs

Marcus Mengs

Marcus Mengs

Marcus Mengs

Marcus Mengs

Did Thread Reader help you today?

Like this author's thread?