Cracking Ransomware

/ 07.06.15 / Derek Soeder

In early 2013, an organization approached Cylance for help recovering from a devastating ransomware attack that made it impossible to access large numbers of critical files. The attacker used a version of the "Anti-Child Porn Spam Protection" ransomware, which combed every drive it could find and encrypted critical files. The backup drives were mounted when the attack hit, so they faced a total data loss. Fortunately we were able to derive the password used to encrypt that data and commence recovery. This blog presents the technical story behind the work we did to crack that code.

We have held off on going public out of concern that releasing this data could prompt the ransomware authors to identify methods for better securing their passwords. There is plenty of evidence that malware developers follow the work of security firms. For example, a few months after we cracked this case, another firm publicly announced that they could recover files encrypted by the same ransomware. Although the firm did not publish details, it appears that the malware authors took the announcement seriously enough to take countermeasures. A new version of the ransomware soon appeared that was no longer susceptible to the same password-guessing technique. The authors even professed explicit knowledge of the weak password-generation flaw in comments inserted in the malware. We presented at Black Hat USA in 2013 on attacking pseudorandom number generators, but this is the first time we have discussed how we cracked the ransomware.

We encourage other researchers to exercise discretion when they discover a correctable flaw in ransomware. Work privately with trusted agencies and organizations that victims are likely to contact, so that the ransomware remains flawed and its victims can be helped for as long as possible.

(Editor's note: A shorter version of this report appeared last month on the RSA Conference Blog.)

Improved Version of ACCDFISA

Earlier ransomware specimens from the same family, commonly called ACCDFISA, were defeated by researchers who exploited mistakes or oversights by the ransomware's developer. The version we faced seemed to have been improved by the experience. It claimed to use AES-256 encryption with a 256-character random key generated individually for each victim and communicated back to the attacker. It also claimed to securely delete files to prevent recovery of any unencrypted original files or passwords. Those claims demonstrated an awareness of how a naive ransomware attack could be beaten, suggesting that correct technical countermeasures had been taken to dash any hopes of recovery. Had the ransomware author finally gotten everything right?

We examined one of the victim’s encrypted files, which was renamed with the instructions "(!! to decrypt email id ... to ...@... !!).exe." The file was a WinRAR self-extractor containing an encrypted RAR. Finding a flaw in WinRAR's cryptographic implementation didn't seem a promising approach, so instead we decided to crack the password. To do that we needed the code that created it.

We found the malware and a number of files that seemed to be associated with it on the infected drive. One file was Microsoft Sysinternals' sdelete utility, which permanently deletes files and which we also didn't imagine would contain a bug that would lead us to a quick victory. We also found a "NoSafeMode" library and a RAR utility for making self-extractors. The presence of these files suggested that the attackers had created a Frankenstein's monster of stolen code, crudely sewn together with a main ransomware executable that appeared to be written in PureBasic. The RAR utility gave us a place to start reverse engineering the ransomware. The utility accepts the encryption key password as a command-line argument, so we reasoned that backtracking from the point in the ransomware code that launches the utility would lead us to the place where the password is constructed.

Finding the Password Generator

We started by running the ransomware on a disposable system. We attached a debugger and intercepted the CreateProcess calls that launch the RAR utility to encrypt our files. With a little effort, the debugger broke and we were able to view the full command line, which includes the password in the "-hp" switch.

Running the ransomware under a debugger, we can intercept its attempt to launch the RAR utility (disguised as "svchost.exe"). At that point, we can display the exact command line that the ransomware is executing, and this reveals the password being used to encrypt our files.

This test run gave us a password. It was not necessarily the password used to encrypt the victim's files, but it gave us some clues to guide our reverse-engineering effort: We could look for the password or pieces of it, we could search for the likely "alphabet" used to generate the password if it is random, and we could search for the "-hp" string used to build the password portion of the command line.

The intercepted password appeared to be a 57 character mixture of letters, digits and punctuation marks. It was too random to have been keyed in by a human and had "aes" in the prefix. This latter feature could just be a coincidence, like spotting a meaningful word or number on a license plate, or it could be an intentional prefix which turns up as a hard-coded string in the ransomware code. In fact, when we open the ransomware in a disassembler, we found not just an "aes" string, but the full "aesT322" prefix:

This hard-coded "aesT322" string, appearing in a disassembly of the ransomware, exactly matches the first seven characters of the password we intercepted earlier. We assigned names to some of the functions ("_main", "_strcat_edx") and global variables ("PasswordPrefix") during an earlier reverse engineering pass.

This told us that the password is actually "aesT322" followed by 50 presumably random characters.

In the screenshot above, the partially highlighted instruction is where the program loads a reference to the "aesT322" string. We guessed that the next instruction loads a reference to some global variable where that string will be stored. We've already named the variable "PasswordPrefix", but it was easy to double-check that assertion. First we located where that variable resides in memory.

This portion of the ransomware's data section contains some global variables involved in random password generation. Like before, we've renamed and commented on some of the variables.

With the addresses of the variables, we returned to the debugger to see what values they held during our live test run. Here's what we found:

While the disassembler lets us easily browse the ransomware's code as a static or "dead" program on disk, the debugger enables us to pick through the memory of a "live" ransomware process as its running. Here, we can view the values taken by three string variables.

Just as we had expected, the variable named "PasswordPrefix" pointed to a copy of the "aesT322" prefix string. "PasswordRandom" pointed to a string of 50 random characters and "PasswordFull" pointed to a string comprising the two parts concatenated.

We then validated our findings and methodology by revisiting the third approach, tracking down the "-hp" string. Back in the disassembler, a very quick search led to one of a few instances:

We found one instance of the "-hp" string in code that clearly constructs a command line containing the password and executes it (via CreateProcessA or ShellExecuteA).

We understood more about the ransomware but were still not sure if we could help the victim. We had ruled out the possibility that the password might be the same for all victims. Fortunately, the disassembler makes it easy to find every place a variable is accessed, so we could backtrack to the code where "PasswordFull" is constructed:

This code builds "PasswordFull" by concatenating "aesT322" with the random string.

Next, we followed cross-references to "PasswordRandom."

This loop (emphasized by the thick dashed arrow on the left) strings together 50 random characters from a 78-character "alphabet" which seems to represent the selection of random characters we noticed in the intercepted password.

We found a loop that counts from 1 to 50, which matches the 50-character length of the password’s random portion. Inside the loop we found a string that looked like an "alphabet" from which the 50 characters would be randomly chosen. The alphabet included 26 lowercase letters, 26 uppercase letters, 10 digits, and 16 punctuation symbols.

The next step was to figure out which function (from the subroutines called inside the loop) selects a random character. Despite popular belief, computers struggle to be truly random; attacking what's often and more appropriately referred to as the pseudorandom number generator has long been a fruitful approach to defeating encryption. We labeled the function "_get_random_Alphabet_char" above. The function disassembles as follows:

According to this disassembly, the "_get_random_Alphabet_char" function appears to generate a random number, store it in the local variable "var_8", and loop sequentially through the characters of "Alphabet" until it reaches the chosen one. That character then becomes a single-character string pointed to by the global variable we've named "RandomAlphabetChar."

That disassembly was fairly easy reading after we determined that the highlighted function is "Rnd" (PureBasic's random number generating function). This is what the function looks like inside:

The "Rnd" random number function ensures that the pseudorandom number generator is initialized and then generates a number between zero and the specified maximum "arg_0", inclusive.

The "Rnd" function wraps calls to even more functions, which we named. The function "_internal_Randomize_default" calls a few Windows functions, as well as a function internal to the ransomware that we've named "_internal_RandomInit." The following screenshot displays side-by-side disassemblies of both.

The left window shows a disassembly of the "_internal_Randomize_default" function, while the window on the right depicts an interesting portion of the "_internal_RandomInit" function.

We finally had a breakthrough. On the left, we see that the PRNG is initialized, or seeded, with a 32-bit number derived from the identifier of the thread executing this code and how long the system has been running in milliseconds, both rather predictable values. On the right, we've highlighted a "magic" constant--a special-purpose number that typically lends itself to an Internet search. The number appears here in hexadecimal as 53A9B4FBh, although other likely representations include 0x53A9B4FB or the decimal representation 1403630843. The instructions that follow it can be translated to the expression "1 - EAX * 0x53A9B4FB," meaning the constant we see may actually be the negation -1403630843, which could be represented AC564B05h, 0xAC564B05, or 2891336453 if treated as unsigned. Searching the Internet for these various terms eventually led to source code ostensibly related to random number generators, as well as a disassembly posted in a PureBasic forum.

Below, the disassembly of the purported "Rnd" function further corroborated our findings with its prominent rotate operations, which have counterparts in the source code we found online.

The "ror" mnemonics highlighted above signify rotate-right operations, by 13 and 5 bit positions respectively. Some of our first Internet search results led us to source code that performed rotates-left in a similar context and by similar bit counts.

This is where things got exciting. Thanks to the 32-bit seed, we knew there could be at most 4 billion possible passwords, not the nonillions of nonillions of possibilities that 50 characters picked truly randomly from an alphabet of 78 would yield. This is because, as noted before, computers usually operate with rigid determinism even when they're trying to act random. For any given seed value, the PRNG will produce the exact same numbers in the exact same order any time it's initialized with that value. Since the seed is a 32-bit number, it can range from zero to about 4 billion, and therefore the realm of possible initial states is equally confined.

Of course, a list of 4 billion passwords is no trifling thing. In this case, we're greatly assisted by the choice of seed sources--a thread ID and the system uptime. The former is a multiple of 4 and typically less than 10,000, while the latter is more variable. Over the course of 49.7 days, the uptime will count from zero to 4 billion and then wrap around to become zero again, typically counting by 15 or 16 as it goes. If we can catch a hint of how long the victimized system had been running prior to the attack, we can greatly narrow down the possible values of the uptime component of the seed and accordingly the number of passwords to try.

Guessing Passwords

The problem with guessing passwords is that it's expensive in terms of time. Generating a password from a given seed is very fast, but in the quick-and-dirty system we threw together, we could only test a guess by attempting to decrypt the RAR and checking if anything sensible came out, a comparatively costly operation especially when attempted millions of times. As it turns out, we didn't find a record of when the victim's computer had started last, but on the way we discovered something even better.

One of the first things we noticed when examining the infected drive was a variety of strange files stashed in the "\ProgramData" folder, under hidden subdirectories of various randomly-lettered names. We also found a text file, "\ProgramData\svcfnmainstvestvs\stppthmainfv.dll", containing 21 lines of eight random letters each. With further scrutiny, we realized that each line was the reverse of a random subdirectory or file name also created by the ransomware under "\ProgramData".

The contents of "\ProgramData\svcfnmainstvestvs\stppthmainfv.dll" from our test run. Even though the file has a ".dll" extension, it's actually a text file. Each line, when reversed (e.g., "chlqfohk"), corresponds to the random name of a file or subdirectory which the ransomware creates somewhere under "\ProgramData".

The value of this data isn't that we need a list of names--its value is that it represents output from the PRNG that gets left behind after an infection. In the case of a real infection, like the one we were called in to resolve, we of course wouldn't have captured the password like we did in our test run, but chances are good we could still find "stppthmainfv.dll" or reconstruct it based on what’s in the infected drive's "\ProgramData" directory. With this data, we can simply brute-force all possible seed values--all the way to four billion if needed--and figure out which value was used to seed the PRNG before it cranked out all these random names. The search should take no more than a few hours on a reasonable computer, making it orders of magnitude faster than feeding guessed passwords to a RAR utility.

There are a few catches though. First, as suggested by the Thread Local Storage (TLS) calls spotted in the random functions, each thread has its own PRNG state, initialized independently the first time "Rnd" is called. It so happens that the random eight-letter names are generated on the ransomware process's main thread, while the password is decided on a second thread. The following code is the loop where the 21 names are generated; the code cross-reference ("CODE XREF") comments, in green, indicate that the code resides in the program's "start" function, which runs on the process's first thread.

This loop, in the ransomware's "start" function, creates the 21 random file and subdirectory names.

This is the loop in the function annotated above that chooses the eight random letters from a regular A-to-Z alphabet. As expected, the function calls "Rnd" once per letter.

The password-generation code, on the other hand, we traced to a function tagged "sub_406582", which is called by a function we named "_ServiceMain", which Windows executes on a separate thread when the ransomware runs as a Windows service. This means that the two code regions of interest will execute with separate PRNG states, each seeded by a different value. Brute-forcing the seed value that gave the random names won't directly give us the password seed value, although it should put is in the neighborhood, owing to the simplicity and predictability of its bases. Put another way, the seed values will differ in both the system uptime and thread ID components, since the threads start at different times and necessarily must have different IDs if they run concurrently, but they won't differ by much. With the first seed value in hand, we can conservatively narrow the possibilities for the second to perhaps a range of a few hundred thousand.

A second catch, easily overlooked, is that we have a sequence of letters, but the PRNG technically issues a sequence of numbers. In this case, as depicted in the preceding screenshot, numbers from 0 to 25 straightforwardly represent the letters "a" through "z", so the intuitive alphabet mapping letters to numbers is the correct one. In other cases, however, letters could have been omitted, duplicated, reordered, or interspersed with other characters that we didn't happen to see in "stppthmainfv.dll", any of which could have meant wasted hours of misguided brute-forcing attempts if we hadn't been paying attention.

A third complication is that there's no guarantee that the PRNG is being seeded immediately before the ransomware generates the names or password. Other calls to "Rnd" could take place earlier in either or both threads, meaning that the PRNG's state wouldn't be the pristine seeded state when the random generation code in question executes. We needed to figure out how many times "Rnd" is called in each thread prior to the calls we care about, and discard that many random numbers before generating our own speculative names or passwords.

So let's look for "Rnd" calls. Scrolling up a bit in the disassembly, we see only one call to "Rnd" in the "start" function, shown here:

The first call to "Rnd" executed by the ransomware, and the only call made before the file and subdirectory name-generation loop.

On the other hand, the "_ServiceMain" thread which executes the password generation code calls "Rnd" three times before it uses the PRNG to construct the password, as depicted below.

Three calls to "Rnd" precede the password generation code on the "_ServiceMain" thread.

These three calls mean that, once we're ready to start generating candidate passwords, we'd better discard the first three random numbers after each seeding.

Finally, we were ready to begin brute-forcing. Our program for finding the names seed value looked something like this: (We've omitted the PureBasic PRNG reimplementation for brevity.)

for (unsigned int seed = 0; ; seed++)
{
// seed the PRNG with a possible value
RandomInit(seed);

// discard one random number
Rnd();

    size_t i;
    for (i = 0; i < 21 * 8; i++)
    {
        // generate the next random letter
        // Rnd(25): from 0 to 25 inclusive
        char ch = "abcdefghijklmnopqrstuvwxyz"[Rnd(25)];

        // does this letter match the next in the sequence?
        if (ch != "chlqfohkayfwicdd...dszeljdp"[i])
            break;
    }

    // did we complete the entire sequence?
    if (i == 21 * 8)
    {
        // yes, display the result and finish
        printf("Names seed = %u\n", seed);
        break;
    }

// no, try the next seed value
}

In about 4 seconds on a single CPU core, the code tested 31,956,209 possibilities and found that the last one--seed value 31,956,208--generated the same sequence of letters as observed in "stppthmainfv.dll". Obtaining that one number validated all of our work up to that point.

The system we devised for turning this information into results was considerably less elegant, but at that point we figured any working prototype would do. Taking a mostly wild guess, we assumed that the seed value responsible for generating the password would be no further than 32,768 below or 180,000 (3 minutes in milliseconds) above the names seed value we just recovered. Accordingly, we generated a list of approximately 200,000 passwords based on the possible seed values in that range, using code like the following:

// this is the names seed value we brute-forced earlier
unsigned int namesseed = 31956208;

// loop through a range of seed values around the determined names seed value
for (unsigned int seed = namesseed - 32768; seed <= namesseed + 180000; seed++)
{
// seed the PRNG with a candidate password seed value
RandomInit(seed);

    // discard three random numbers
    Rnd();
    Rnd();
    Rnd();

char pwrandom[51];
pwrandom[50] = '\0';

    // generate the fifty-char random portion of the password corresponding to
    // the candidate seed value, using the alphabet extracted from the ransomware
    // Rnd(77): from 0 to 77 inclusive (this alphabet contains 78 characters)
    for (size_t i = 0; i < 50; i++)
      pwrandom[i] = "abc...xyzABC...XYZ0123456789!@#$%^&*&*()-+_="[Rnd(77)];;

// output the full password; this output can be captured to compile a list
printf("aesT322%s\n", pwrandom);
}

Running this code with its output redirected into a text file gave us a roughly 12MB list of passwords to test, which compares very favorably to the 236GB text file we would need to hold all 4 billion possibilities if we hadn't been able to narrow down the password seed value. With the list in hand, we cobbled together a crude batch file to run the free archive utility 7-Zip against an encrypted RAR self-extractor, testing in sequence each password in the list. 7-Zip seemed to experience some false positives when detecting a successful decryption, so we used "find" to search for output that was very likely to appear if and only if the file decrypted into something sensible. Here's what the batch file looked like:

@for /f %%p in (pwlist.txt) do @(
7z.exe l "testtargetfile_0123456789abcdef.docx(!! to decrypt email id ... to ... !!).exe" "-p%%p" 2>&1 |find ".."
if NOT ERRORLEVEL 1 (echo "%%p")
)

We let the batch file run overnight, and by morning it had finished with exactly one password waiting for us on the screen--the correct password. Our experiment was a success! We had earned the victim's faith in us, and at last, we were ready to get their data back.

About Derek Soeder

Senior Threat Researcher at Cylance

Derek Soeder is a Senior Threat Researcher at Cylance. He has reverse-engineered, prototyped, and programmed for offense and defense with a number of companies, including eEye Digital Security. Derek specializes in Windows internals and manipulating systems at a machine-code level.

Back