You, Me, and .NET GUIDs

/ 03.26.15 / Brian Wallace

Research done here at CylanceSPEAR is not limited to vulnerabilities. For instance, in the following research I describe two GUIDs that can be extracted from .NET assemblies in order to identify project and build information. These GUIDs could allow for a reverse engineer to identify samples with greater ease.

Born in a Pit of Cleavers

In December of 2014, we released the Operation Cleaver report. This report was the result of an enormous effort in investigating, reverse engineering, development, and design. Given limited human resources, and an ever-shrinking time window, the investigation required some innovation.

Early on in the investigation, we encountered multiple anonymous FTP servers being used by threat actors to store their malware binaries as well as exfiltrate information from the victims. When we first uncovered these and mirrored their content, an initial assessment showed an immense number of malware samples which we now needed to identify and reverse engineer to determine their functionality.

Standing up against over 500 executables with limited man hours to reverse engineer them all, I needed to think of a way to cut this number down to something a bit more reasonable. I developed a few solutions to this, but there was one I had never seen used elsewhere.

As a self proclaimed pwngrammer (developer and security researcher) with a good chunk of .NET development experience under my belt, I recalled observing the AssemblyInfo.cs file being generated when new C# projects are created in Visual Studio.

AssemblyInfo.cs in Visual C# Project

Most of the contents of this source file contains information about the project, such as name and version. Additionally, it contains a GUID generated by Visual Studio.

GUID in AssemblyInfo.cs

This GUID is referred to as the Typelib ID. While this is a version 4 GUID (not known to leak any information from the developer's computer), it appears in the compiled assembly for all builds of the project unless removed from the AssemblyInfo.cs file (samples have been observed without this GUID).

In order to extract this information from a large number of samples, we can simply use .NET reflection. I do not suggest this method though as there is potentially unexplored attack surface for loading unknown .NET assemblies via reflection. If you search StackOverflow for how to go about this in .NET, we find this post. The post's second answer correctly describes how to obtain this GUID from the current application. Since we want to use this for static malware analysis, we need to otherwise load the .NET assembly to use this method.

 using System; using System.Reflection; using System.Runtime.InteropServices;  namespace dotnetguid {  class Program  {   static void Main(string[] args)   {    var assembly = Assembly.ReflectionOnlyLoadFrom("dotnetguid.exe");    var attributes = assembly.GetCustomAttributesData();    foreach(var attr in attributes)    {     if (attr.AttributeType == typeof(GuidAttribute))     {      Console.WriteLine(attr.ConstructorArguments[0].Value);      break;     }    }   }  } }

You may notice that the method is changed in the proof of concept code above since the assembly is loaded with ReflectionOnly to minimize the security risk. For more details about the security risk, see my previous blog post on executing arbitrary code when loading assemblies. Again, it is not suggested that this proof of concept code be used with untrusted assemblies, as there may be unexplored attack surface.

Proof of Concept getting GUID from Assembly

A particularly useful feature of this GUID is that it is preserved even if the .NET assembly is obfuscated with SmartAssembly. This was discovered when this method managed to group TinyZBot samples which were not otherwise grouped by ssDeep clustering.

It should also be noted that another GUID is in .NET assemblies which can be useful in this context. This GUID is referred to as the Module Version ID (MVID). This GUID identifies the build of the project. If two distinct samples share an MVID and Typelib ID, the differences between the samples likely occurred after they were compiled (unless there is a collision of the GUIDs).

It should also be noted that it would not be difficult for an attacker to change these values. These GUIDs are not absolute identifiers and should be used with other indicators for identification. In the Operation Cleaver investigation, they were used to prioritize malware analysis.

GetNETGUIDs

It's no secret that I'm a fan of developing security tools in Python, and this case will be no exception. The tool I've developed to extract these GUIDs is simply named GetNETGUIDs. By statically extracting these GUIDs without reflection, we avoid any potentially unexplored attack surface and avoid unnecessary risk.

The tool simply takes in a path or multiple paths to files or directories to scan for .NET assemblies. It will check each file, and will attempt to extract both GUIDs from these files. In the cases where only the MVID is present in the assembly, the Typelib ID will be displayed as None.

You can download GetNETGUIDs here.

If GetNETGUIDs is run with the -r flag, it will recursively check all files and directories in the directory structure below the supplied paths.

GetNETGUIDs output from scanning CSext Samples

Once installed, it can also be simple to use as a Python module. If we want to extract the desired GUIDs from a .NET assembly at a file path, we would simply use the following code.

 import getnetguids  print getnetguids.get_assembly_guids("csext/10cf7a186897243363278cf0283a1687749d9ba43fa713b9f974050f56e97cca")

When we run this, we get the following result.

Results from executing using_as_module.py

Real World Results

Considering that this was originally developed to decrease the work load during the Operation Cleaver investigation, it only makes sense to share the result set from scanning my Operation Cleaver investigation back up directory. The results can be found here. We can visually represent the results of this data with some Python and Gephi.

OpCleaver .NET Samples clustered by .NET GUIDs

Typelib IDs are red, MVID's are light blue, and samples are greenish blue. The graph makes it more apparent that a few .NET projects had significantly more samples created than others (TinyZBot for instance). If we use the streaming API plugin for Gephi, we can watch these associations organize into the above graph.

There is a limited amount of well known .NET malware to demonstrate this tool, but thanks to VirusShare, we have a large collection of .NET malware in their DotNET malware set. The results from this scan can be found here. Graphing this in Gephi causes a bit of lag, but the resulting image is quite telling.

Clustering VirusShare DotNET by Assembly GUID and MVID

We can see that there are a handful of prevalent Typelib IDs and MVIDs in this sample set, which could help a reverse engineer identify prevalent families with the simple running of a Python script.

Conclusion

With the creation of new programming languages, there will be the creation of new identifiable artifacts. Applications developed in .NET languages have a wealth of information which can be extracted, and while it may not always lead to identifying the author, they can ease the load of malware analysis if we know what to look for. In this case, by using an artifact created by Visual Studio project creation, we can identify samples from the same project, and even samples from the same build. Here at CylanceSPEAR, we consider any new ways to get leverage against threat actors and malware developers to be worth sharing.

About Brian Wallace

Lead Security Data Scientist at Cylance

Brian Wallace is a data scientist, security researcher, malware analyst, threat actor investigator, cryptography enthusiast and software engineer. Brian acted as the leader and primary investigator for a deep investigation into Iranian offensive cyber activities which resulted in the Operation Cleaver report, coauthored with Stuart McClure.

Brian also authors the A Study in Bots blog series which covers malware families in depth providing novel research which benefits a wide audience.

Back