Marcus Mengs Profile picture
12 Jan, 62 tweets, 18 min read
Okay, doing my first baby steps with r2frida (which combines the power of @radareorg and @fridadotre).

Gonna share my progress in this thread (live, so keep calm).

The goal: Runtime inspection of data sent out by TikTok !!before!! it gets encrypted

1/many
First of all, we do not start from zero. I got some prior knowledge from past reversing attempts and want to share some important facts.

TikTok's (log data) encryption is accomplished by a native library. The Android Java code just serves as proxy function to the native function
The decompiled code for the respective native JNI function of an older TikTok version looks something like this, but in this example I use the most current TT version (no statical analysis done, yet)
In case you never reversed native libraries which were build to interface with Android Java layer via JNI, I highly suggest the entry level introduction on the topic by @maddiestone

Before we start, I want to pinpoint some important aspects (which are also covered by Maddie's videos).

1) Unlike raw C-functions, JNI functions like the one showcased above, receive pointers to complex Java objects .

F.e. a function receiving a String on the Java layer...
... would receive a pointer to a 'jstring' on the native layer (not a zero-terminated C-String).

In order to retrieve a C-String, to go on working with it in the native code, some translation functionality is required. This functionality is provided by the ...
JNI (Java Native Interface). The JNI environment is passed in to JNI functions as first parameter.

If you look at the example screenshot again, you see exactly this. Functions provided by the 'env' pointer are used to parse the Java function arguments (f.e. jByteArrays) ...
Once the raw data is converted to a more C-ish form, it gets passed to a inner function 'ss_encrypt' in my example. The inner function, in this case, is a pure C function and thus receives only C-style parameters (also no 'env' parameter, so it would not be able to access JNI)
A 2nd important aspect on JNI libraries, covered by @maddiestone

2) There are two ways to expose JNI methods from a native library:

a) export them with proper naming convention, so that JNI could recognize same on library load
b) use the JNI functionality 'registerNatives'...
... to register the JNI functions once the library gets loaded.

The second method of registering methods is wel suited for obfuscated code, as the methods neither have to follow naming convention, nor do they have to be exported.
As you might expect, TikTok uses the 'registerNative' approach. The screenshot below shows log output from a custom tool, which monitors JNI methods registered by instrumented Android apps (TikTok's encryption method in the example)
If you would decompile the Java part of the TikTok apk, the encryption functionality (on an older version) would look something like this:
The Java method 'm18421a' receives a Java 'byte[]' and a Java 'int' as parameters and returns a 'byte[]', again.

Internally, this data is forwarded to the native JNI method 'ttEncrypt'.
The important aspect about this, is that the native 'ttEncrypt' JNI method has to accept those exact parameter types and thus has to "register" with a proper method signature.

We already saw this signature in a previous screenshot
The last line from the screenshot above, shows 3 things the native code has to provide for each method, when calling register natives:

1) the call address of the native function implementation (0x7d70d1d5 in example)
2) The function name (ttEncrypt)
...
3) The method signature, which is '([BI)[B' in this case and translates to:

'(' start of parameters
'[B' byte[]
'I' int
')' end of parameters
'[B' byte[] (return value)
So we keep this in mind: Even if the native library does not export the encryption method, it has to store the 1) funtion address, 2) name and 3) signature in a data structure, in order to provide it to 'registerNatives' once the library gets loaded by JNI
We now know almost everything we need, in order to head over to r2frida, except some important facts on my test setup:

- the app is inspected on a physical device, running Android 9
- the device uses a !!32bit!! ARM application core
As I am new to r2frida, chances are high that things could be achieved in an easier ways.

Now to get started, I already have the latest @fridadotre server running on my USB connected android device and 'frida-ls-device' shows it being ready-for-action
So let's spawn a fresh TikTok instance with r2frida, using the following command
The command 'launch'es the process, on the attached frida 'usb' device with the device id '4c197256' and the process's package name, of course is 'com.zhiliaoapp.musically'

Instead of 'launch', two other options could be used:
- 'attach' (would attach to an already running process, given by name or PID)
- 'spawn' (like 'launch', but the process would not be resumed automatically after attaching)
So for the warm up, let us use the Frida functionality, which alllows enumerating loaded Java classes. This nicely combines with the r2 syntax (concatenation of single-letter commands, '~' for grep)

Important: commands targeting the r2frida plugin have to be prefixed with '\'
The r2frida command to list classes is '\ic' (note the backslash prefix). The unfiltered result would be a bit overwhelming ...
... so we grep for classes including the term "crypt"
The '\ic <classname>' command lists the !Java! methods of the respective class
The signature of the static method 'EncryptorUtil.a' should look familiar to us (if you read the first tweets). It represents the Java layer of the encryption method and is called 'a' in this version
Note: The information above would be enough, to Intercept the method from the Java layer (f.e. with @fridadotre or Xposed), in order to inspect the call arguments (the byte[] parameter represents the plain data before encryption)
... but we are here for the native layer and to inspect data at runtime, right?

So lets search the whole address space for our native method name 'ttEncrypt'

Note: If you'd use r2's ascii search nothing would happen, you have to use the '\' prefix to search with r2frida
The search ends with two hits:
The screenshot below shows, that the attempt to print a hexdump from the address of the first hit fails with r2, while r2frida (backslash prefix) works.

Reason: The memory region was not populated when r2 was started (encryption library was loaded after process launch)
I solved this issue like this:
1) Quit r2
2) Open r2 with r2frida, again, but this time **attach** to the already running process

et voila ... the memory offset is mapped and dumpable with 'px' (without backslash prefix)
Note: The last step is not necessary for a data hexdump, as you could still use '\px', but it turned out to be useful when it comes to printing the disassembly of "late loaded" code regions. This is because I sometimes struggled with '\pd', but 'pd' worked (+ various r2 views)
Having a closer look at the first hit of out string search for 'ttEncrypt', we notice that it is directly followed by a C-string with our method signature.

So chances are high, that this data is part of the structure which gets handed in to 'registerNatives'
Reminder: in order to register the 'ttEncrypt' method to JNI, the 'registerNatives' method requires a structure containing
- method name (C-string)
- method signature (C-string)
- method pointer (native pointer)
So the next step would be to search the process memory space for cross references to the address of this method name string (0x8448b74c). As I haven't applied any auto analysis, I use a simple hex search for this (in my case the byte order of the address has to be reversed ...
... to account for the architecture endianess).

The result is promising: Only one hit, for a search across the whole address space:
Printing the first 12 bytes from this XREF offset, reveals 3 pointers again (reversed endianess):

- 0x8448b74c (expected, method name pointer)
- 0x8448b756 (ptr to signature string, yay)
- 0x8448b1d5 (likely pointer to JNI method implementation)
So the layout of the 3 pointers from above speaks a clear language. Very likely this is the data struct passed in to 'registerNatives' and thud 0x8448b1d5 points to the native implementation of 'ttEncrypt'
Sorry, before going on I have to insert a small excurse on adressing/instruction sets on arm 32 (specific to my test setup). Anyways it is crucial:

Arm 32 supports two instruction sets "ARM mode" (32bit) and "Thumb mode" (16bit) which could be used interchangebly
In order to distinguish if a function call target (branch) should be interpreted as ARM or THUMB, the least significant bit (LSB) of the function address is taken into account

For ARM the LSB is 0 (even address)
For THUMB the LSB is 1 (odd address)
The actual instruction ALWAYS resides on an odd address.

This means the function address 0x8448b1d5 homes code in THUMB mode (16bit), while the first instruction resides at 0x8448b1d4

(sorry if it gets a bit complicated, will be clear in a second)
So if we print the disassembly at the (assumed) 'ttEncrypt' address, things look a bit weird
... this is because instructions are interpreted in ARM mode (32bit).

Lets fix this:
... looks much better, still the first instruction is off-by-one 🤣

No seriously, as explained, on arm32 we have to disassemble at [THUMB mode address - 1] = 0x8448b1d4
Nice, this looks like a proper function stub (note how the callee stores reg values on the stack, before moving on).

Now to get a feeling on how often this function is called, lets use 'r2frida' power to trace it.

Important: The thumb address has to be used here!!!
The resulting output of the '\dt' command, which places the trace hook also indicates that the function address maps to an offset in 'libEncryptor.so' ... let us call this a "nice confirmation"

Some actions in the TikTok app ... trace logs for ttEncrypt-calls arrive
... cigarette break ... stay tuned (if the app crashes meanwhile, I'll start from scratch)
Let's remove the trace hook for now, with '\dt-*'
Remember my screenshot of a decompiled 'ttEncrypt' function from an older TT version. We traced the corresponding functions.

Trying to runtime-parse the function parameters, which represent Java object instances would be insane (maybe impossible)
... luckily, at least the old implementation, internally called a method 'ss_encrypt' which received a c-style byte array pointer and an integer representing the length as first two parameters.
It would be way easier to runtime-inspect these
Lets take a closer look on the disassembly of our assumed 'ttEncrypt' function, by seeking to its offset with 's 0x8448b1d5' and switching to a more suitable r2 view with uppercase 'V' command (press 'p' till the view changes to disassembly)
The view from above allows scrolling through the functions code with cursor keys. Most important: calls to other code parts (branches) are printed bold and suffixed with [1], [2] ...

Hitting [alt+1] moves us straight to the marked branch offset:
The code above looks not like a legit inner function (we do not care for alignment and inspect the next branch).

Hitting 'u' returns us to the parent function, followed by [alt+2] which brings us into the 2nd branch
The 2nd branch at 0x84483aa4 looks better (proper function stub). We could easily drift back to the static analysis world, to find further evidence for it being the inner 'ss_encrypt' function. But hey, we are working with instrumentation, so let us just inspect the calls
Remember: While we disassembled at 0x84483aa4, the code is THUMB mode. Thus the proper tracing address would be 0x84483aa5 (LSB set to 1), unlike you like crashes (restarting here would not be funny, 'cause thanks to ASLR all function offsets would differ)
In contrast to our first tracing attempt, we use the beautiful command for formatted tracing, which allows us to print out function parameters for each call in a predefined format.

Command syntax:
The screenshot below shows how I placed my trace hook. The 'pppp' means that the first 4 function parameters should be printed as hex values (pointers) for each call.

Ultimately 2 calls get logged
Those call traces look very promising. We expect the inner function to receive a pointer to a native byte array as 1st argument and an integer representing the array length as second argument. The logged call traces totally match this assumption.

So lets slightly modify this:
With the formatted trace modification 'pi' from above, the first argument gets printed as hex pointer, the second as decimal integer.

This makes it easy to hexdump the content of this buffer (as long as it is kept alive)
But nooo ... what's this. The encryption input data is not human readable. No worries, pay attention on the first 3 magic bytes ... looks gzipped, doesn't it?!

Luckily r2 could has an "unzip print" command 'prg'.

Here is an attempt for another buffer (next trace event):
That's it folks, we real-time inspected what data gets encrypted by TikTok, before the app is even able to send it to the internet.

Readers task: Dump the resulting cipher bytes and find them in intercepted web traffic
Credz to @as0ler @oleavr @trufae @enovella_ and all other devs/contributors to great FOSS projects, which allow doing things like this ❤️❤️

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Marcus Mengs

Marcus Mengs Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @mame82

11 Jan
Recently tweeted on a bypass for Snapchat's cert pinning. It required monitoring 'android_dlopen_ext' to instrument the native target library directly after load.

Absence of 'adroid_dlopen_ext' on older Android SDKs raised some questions, so I'll share a partial solution.

1/n
The appended screenshot shows an alternative approach to monitor loading of dynamic modules for JNI based on 'JavaVMExt::LoadNativeLibrary'.

Below it is showcased with @fridadotre frida-trace (upper terminal) and a modified script for the frida-trace hook (lower terminal)

2/n
As pointed out in the comments, you have to deal with C++ mangled function names and the std::string implementation of the respective C++ library, to do it in this way (less clean than the 'android_dlopen_ext' approach).

3/n
Read 6 tweets
19 Dec 20
Whenever I talk about an Android app sending data to Asian countries, some folks go crazy.

Let me comfort you: If you are a European user, like me, most apps communicate to US servers as shown below (2h capture ... DJI, AliExpress, Gojek etc).

What? Not to US?? Let me help you!
My issue is simply that I am using a German DNS-Server, which might be a bit biased when it comes to resolving a DNS host to the best suited server.

So lets resolve with DNS over TLS from @Cloudflare

Hmm ... still so much US-traffic, even from Asian apps?
Okay, my fault ... I am still using an European source IP (Germany).

So let me change this, too, by using a VPN exit in Japan!

Damn, even more requests are directed to US, now.

Sorry, I cannot help you - your data will always end up in US ... unless you install Camera360 😉
Read 4 tweets
17 Dec 20
If you cannot see the wood for the trees ...

Me talking bullshit (without anybody noticing it).

This thread does not describe an issue caused by cert pinning, but by not running the interception proxy in "transparent" mode.

1/2
The real problem is actually highlighted in the example linked above.

I tried to "unpin" CONNECT requests (not POST/GET/DELETE etc), which occur because my proxy is visible. Of course TikTok wants to establish a raw tunnel through the proxy with CONNECT.

... time for a break
All failed TLS connections used CONNECT as request method Image
Read 4 tweets
17 Dec 20
Comparison of

1) TikTok TLS connections with several Cert Pinning bypasses (targeting Java layer of common SSL implementations, not custom native implementations)
65% error

vs

2) CertPinning bypasses disabled (CA cert is still placed in Android's system store)
100% error ImageImage
The "success to error ratio" of intercepted requests with CertPinning largely depends on the app and the respective SSL pinning implementation(s) in use. The 65% error of "TikTok" are the "worsed case".

Here's a broader look showing this ration for other apps (no captions) Image
Did additional event enrichment for failed connection (caused by client disconnect, likely CertPinning).

I want to share an idea for a generic Cert Pinning bypass, cause I have no time to implement it.

Here is a full event for a failed connection to 'api16-core-c-alisg.tiktokv.com' ImageImageImage
Read 15 tweets
17 Dec 20
Most of my past tweets were about privacy behavior of Android apps (and the techniques/stack I built for analysis).

I recently introduced a new feedback loop from the HTTPS Interception Proxy to my stack, to get some insights in failed client connections.

Example TikTok: Image
The visualization above shows for which target hosts the TLS connection was terminated by the Android device (error) and for which hosts it succeeded.

Some hosts are tagged with error+request, means: Connections succeeded once the CA certificate of the proxy was inserted ...
... into Android's system store or proper CertPinning bypasses were brought into place.

The connections tagged only with "error" have never been intercepted successfully, because of custom CertPinning implementations (mostly native code with obfuscated/modded "boringssl")
Read 5 tweets
9 Dec 20
Sohn bekommt @kinderus Überraschungs Ei ... ihr wisst schon, das Ding wo früher mal coole Sachen drin waren ... Getriebe, Metallzahnräder richtige Schwungräder

"Spannung, Spaß, Spiel"
Wir kaufen die nur noch selten, weil Heute nur noch Plastikmüll drin ist.

Mein Sohn rennt also wie angestochen durch die Bude und sucht verzweifelt den Beipackzettel zu dem Plastikmüll.

Ich wundere mich, ist ja alles schon zusammengebaut.

Er findet ihn ...
Ich verstehe, ist ein QR Code drauf, das sagt ihm was.

Er: Papa kann ich den mit deinem Handy scannen und die App spielen (App hat er also auch gleich abgeleitet)

Ich: Vergiss es, kostenlose Apps sind alle 😈... ich beschäftige mich gerade damit.

Sohn sauer, ich gehe Rauchen
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!