The other day, I was working on a DragonForce ransomware case where the ransomware had actually encrypted not only the content of the files but also the file names. The file names were encrypted, and I spotted a pattern: similar file names resulted in similar ciphertext. It looked somewhat like base64, but it wasn’t. I compared some file sizes and the lengths of the file names of known versus encrypted file names, and came up with three examples where the plain text was known, and one example where I had a pretty good guess what the file name would be.
Why did we need this? The customer wanted to decrypt a specific file from the file system. During negotiations, attackers typically offer to decrypt a few files to prove that they have the decryption key. But as the threat actor had encrypted all the file names, it was hard to know which file was the right one. Hence, we set out to attempt to break the encryption and reverse-engineer the scheme.
Breaking a ciphertext is much easier when you have clear text examples. Luckily, there are plenty of files on a Windows system that are always present, such as a few files in a user profile. These are the registry hives, contained within files called NTUSER.DAT. Several variants are present that are lowercase (ntuser.ini) and have file extensions appended, such as .log (ntuser.ini.log).
I took a moment to collect a few examples and one which I wasn’t sure of, but had a pretty good hunch.
| File name | Encrypted file name | File size |
| ntuser.ini (10 chars) | zsub5mdlekr37dx4.df_win (16 chars minus extension) | 20 |
| ntuser.dat.LOG1 | zsub5mdlekr3kpotls36hisi.df_win | 32768 |
| NTUSER.DAT (10 chars) | 4s5l5tul5kr6kv5t.df_win (16 chars minus extension) | 786432 |
| NTUSER.DAT{c76cbcdb-afc9-11eb-8234-000d3aa6d50e}.TMContainer00000000000000000001.regtrans-ms | ? (hunch: starts with 4s5l5tul5kr6kv5t and will be long) | 524288 |
| ? (hunch: similar to the one above) | 4s5l5tul5kr6kv5tczixezxn24iqkpiz2lx33e4z36pq5pizbgsn3zwz3gpngswx2lvx2swy3w7ruo7t4ywqhdxt2ltqmso73gpng2wv3gpng2wv3gpng2wv3gpng2w7lss35sdte4vqmmszzysv.df_win | 524825 |
If you look at the encrypted file names, you’ll spot patterns. For instance, file names with identical lengths have identical ciphertext lengths.
Thankfully, I could easily distinguish the NTUSER.DAT.LOG1 and the NTUSER.DAT.LOG2 files from each other because the .LOG2 filename wasn’t encrypted, probably because the file was zero bytes long in that specific folder. I checked another system to ensure the encryption wasn’t different per system – it wasn’t, the NTUSER.DAT filename was encrypted to exactly the same string.
Let’s get crackin’
We then started our analysis of the ciphertext. Looking back, it’s obvious it isn’t base64. It’s more likely base32 as this doesn’t use uppercase letters. Furthermore, the plain text file names and encrypted file names have a 1-to-1.6 length ratio, which is indicative of base32. It’s not straight up base32, though, so unfortunately, there’s something else going on.
Because the file names of similar files are encrypted to a similar ciphertext, we first looked at a simple XOR encryption scheme. We quickly saw that the XOR key isn’t fixed, as it varies across pairs of plaintext-ciphertext bytes. We tried a Vigenère cipher applied at the base32 level, but as it turns out, it wasn’t so sophisticated: it really was a simple substitution cipher. We noticed another repeating pattern (3gpng2wv) in the ciphertext and used that to crack more mappings of ciphertext to plaintext. It took a little bit of brute-force to decode the full filename as not all mappings were found from the examples. Six mappings were unknown, leaving 720 possible permutations, but verifying decoded candidates against ASCII, as filenames usually are, quickly led to a decrypted filename: NTUSER.DAT{c76cbcdb-afc9-11eb-8234-0000000000000000001.regtrans-ms.
Hold up, let me rephrase that
And when I said “we” in the paragraph above, I actually meant Claude Code running Opus 4.6. After prompting it, I went back to my other work; Claude Code autonomously analysed the three ciphertext examples, came up with different hypotheses, and tested them, and finally was able to break the cipher. All I had to do was feed Claude a simple prompt (“read example.txt and analyse. these are encrypted file names, the exact encryption scheme is unknown. come up with a way to break the cipher and then decrypt the file name in break.txt”) and two files with data to analyse.
This took about fifteen minutes, and then three additional minutes to produce the first version of the decrypt script. Claude Code returned triumphantly:
The decryption is verified - re-encrypting produces the exact original ciphertext.
## Results
### Cipher scheme identified
The encryption is a **monoalphabetic substitution cipher applied to base32-encoded filenames**:
1. Take the plaintext filename bytes
2. Base32-encode them (lowercase, no padding)
3. Substitute each base32 character using a fixed lookup table
4. Append `.df_win`
### How it was broken
1. **Recognized base32** from the 1.6x length ratio (plaintext to ciphertext) and the character set (a-z, 2-7)
2. **Built a substitution table** by base32-encoding the known plaintexts and aligning character-by-character against the ciphertexts — the shift at each position depends only on the input character, not position
3. **Exploited the repeating pattern** `3gpng2wv` in the target (which decodes to `\"00000\"` — five ASCII zero characters) to recover 4 additional mappings
4. **Brute-forced** the remaining 6! = 720 possible mappings, filtering for valid ASCII output
### Decrypted filename
```NTUSER.DAT{c76cbcdb-afc9-11eb-8234-0000000000000000001.regtrans-ms```
I was amazed and stoked. This would’ve likely taken me many hours and hair-pulling as I’m not a trained cryptographer. Nor am I a codebreaker. I did not even spot that base32 length ratio. I only saw that it vaguely resembled base64 and that the file names of similarly named files were encrypted with a similar ciphertext. On a hunch, I thought to myself: “Let’s burn through some tokens while I work on this other stuff” and was mostly curious how Opus would handle such a request. It passed with flying colors, in my opinion. Afterwards, I went through more ciphertexts to get the remaining three characters. The resulting decryption (and encryption) script is below; most of the LLM-generated code is still in there, even though the full conversion table is now known. We share this so that you do not need to burn more tokens.
#!/usr/bin/env python3
"""
Encrypt/decrypt filenames using a base32 substitution cipher.
Cipher scheme (encryption):
1. Base32-encode the plaintext filename (lowercase, no padding)
2. Apply a monoalphabetic substitution on each base32 character
3. Append ".df_win"
Decryption reverses the process:
1. Strip ".df_win"
2. Apply the inverse substitution on each character
3. Base32-decode to recover the original filename
Usage:
python decrypt.py -d <ciphertext> Decrypt a filename
python decrypt.py -e <plaintext> Encrypt a filename
python decrypt.py Decrypt from break.txt
The substitution table was recovered by aligning base32-encoded known
plaintexts against their ciphertexts.
"""
import argparse
import base64
import sys
from itertools import permutations
B32_ALPHA = "abcdefghijklmnopqrstuvwxyz234567"
# Forward substitution table: plain base32 char -> cipher base32 char
# Recovered from known-plaintext analysis of example.txt pairs, confirmed
# by round-trip verification against the break.txt ciphertext.
SUBSTITUTION_TABLE = {
"a": "g", "b": "w", "d": "n", "e": "6", "f": "l",
"g": "3", "h": "b", "i": "k", "j": "4", "k": "5",
"l": "o", "m": "2", "n": "z", "o": "e", "p": "c",
"q": "v", "r": "i", "s": "7", "t": "x", "u": "t",
"v": "y", "w": "q", "x": "r", "y": "p", "z": "s",
"2": "u", "3": "d", "4": "m", "6": "h", "5": "a",
"7": "j", "c": "f"
}
INVERSE_TABLE = {v: k for k, v in SUBSTITUTION_TABLE.items()}
KNOWN_PAIRS = [
("4s5l5tul5kr6kv5t", "NTUSER.DAT"),
("zsub5mdlekr37dx4", "ntuser.ini"),
("zsub5mdlekr3kpotls36hisi", "ntuser.dat.LOG1"),
]
def b32_encode(data: str | bytes) -> str:
if isinstance(data, str):
data = data.encode()
return base64.b32encode(data).decode().lower().rstrip("=")
def b32_decode(b32_str: str) -> bytes:
padding = (8 - len(b32_str) % 8) % 8
return base64.b32decode(b32_str.upper() + "=" * padding)
def encrypt(plaintext: str) -> list[str]:
"""Encrypt a plaintext filename and return possible ciphertexts.
Returns a single result when all base32 characters have known mappings,
or all valid permutations of the 3 unknown entries when they are needed.
"""
b32 = b32_encode(plaintext)
missing = sorted(set(c for c in b32 if c not in SUBSTITUTION_TABLE))
if not missing:
return ["".join(SUBSTITUTION_TABLE[c] for c in b32) + ".df_win"]
remaining_plain = sorted(c for c in B32_ALPHA if c not in SUBSTITUTION_TABLE)
remaining_cipher = sorted(c for c in B32_ALPHA if c not in INVERSE_TABLE)
results = []
for perm in permutations(remaining_cipher):
trial_fwd = dict(SUBSTITUTION_TABLE)
for p, c in zip(remaining_plain, perm):
trial_fwd[p] = c
results.append("".join(trial_fwd[c] for c in b32) + ".df_win")
return sorted(set(results))
def decrypt(ciphertext: str) -> list[str]:
"""Decrypt a ciphertext by inverting the substitution table.
If the table is incomplete for this ciphertext, brute-forces the remaining
mappings and returns all candidates that decode to valid printable ASCII.
"""
if ciphertext.endswith(".df_win"):
ciphertext = ciphertext[: -len(".df_win")]
# Check if all cipher chars have known inverses
missing_cipher = sorted(set(c for c in ciphertext if c not in INVERSE_TABLE))
if not missing_cipher:
dec_b32 = "".join(INVERSE_TABLE[c] for c in ciphertext)
return [b32_decode(dec_b32).decode("ascii")]
# Brute-force the 3 unknown entries (6 permutations)
remaining_plain = sorted(c for c in B32_ALPHA if c not in SUBSTITUTION_TABLE)
remaining_cipher = sorted(c for c in B32_ALPHA if c not in INVERSE_TABLE)
results = []
for perm in permutations(remaining_cipher):
trial_inv = dict(INVERSE_TABLE)
for p, c in zip(remaining_plain, perm):
trial_inv[c] = p
try:
dec_b32 = "".join(trial_inv[c] for c in ciphertext)
raw = b32_decode(dec_b32)
text = raw.decode("ascii")
except (KeyError, UnicodeDecodeError):
continue
if all(ch.isprintable() for ch in text):
results.append(text)
return sorted(set(results))
def main():
parser = argparse.ArgumentParser(
description="Encrypt or decrypt filenames using the base32 substitution cipher."
)
mode = parser.add_mutually_exclusive_group()
mode.add_argument("-e", "--encrypt", metavar="FILENAME",
help="Encrypt a plaintext filename")
mode.add_argument("-d", "--decrypt", metavar="CIPHERTEXT",
help="Decrypt an encrypted filename")
parser.add_argument("-v", "--verify", action="store_true",
help="Show substitution table and verify known pairs")
args = parser.parse_args()
# --- Encrypt mode ---
if args.encrypt:
candidates = encrypt(args.encrypt)
if len(candidates) == 1:
print(candidates[0])
else:
print(f"{len(candidates)} possible encryptions (3 unknown table entries):")
for c in candidates:
print(f" {c}")
return
# --- Decrypt mode ---
if args.decrypt:
target = args.decrypt
elif not args.verify:
# Default: read from break.txt
try:
with open("break.txt") as f:
target = f.read().strip()
except FileNotFoundError:
parser.print_help()
sys.exit(1)
else:
target = None
# --- Optional verification output ---
if args.verify:
print(f"Substitution table ({len(SUBSTITUTION_TABLE)}/32 known):")
for ch in B32_ALPHA:
if ch in SUBSTITUTION_TABLE:
print(f" '{ch}' -> '{SUBSTITUTION_TABLE[ch]}'")
else:
print(f" '{ch}' -> ??? (brute-forced per ciphertext)")
print("\nKnown-pair verification:")
for ct, pt in KNOWN_PAIRS:
results = encrypt(pt)
status = "PASS" if ct + ".df_win" in results else "FAIL"
print(f" {pt:25s} [{status}]")
if not target:
return
print()
# --- Decrypt target ---
if target:
candidates = decrypt(target)
if not candidates:
print("No valid decryption found.", file=sys.stderr)
sys.exit(1)
if len(candidates) == 1:
print(candidates[0])
else:
for c in candidates:
print(c)
if __name__ == "__main__":
main()
New possibilities at our fingertips
Having spotted this new capability waiting for us in the terminal, just a tmux session away, it’s easy to get more examples like this. Instead of reverse engineering an opaque database format, we simply pointed Claude Code at the GitHub repository of an RMM tool that the threat actor had used, and ask it to create a tool to dump the contents of the database. Claude Code promptly analysed the code, came up with a plan for a tool to dump the database, and proceeded to build that tool. We could then easily dump the database contents using that tool.
In the end, the database didn’t contain what we had hoped (such as exact timestamps of connections), but it was far better than squinting our eyes at the output of strings or having to scroll through a hex dump of the database.
Breaking asymmetry
Coming up with a prompt to do something is typically much easier than doing the thing itself, especially when the code doesn’t need to run in production and it’s a one-off kind of tool. So while attackers may be using LLMs to accelerate their attacks, so can (in this case) incident responders use LLMs to speed up processes on their end, breaking the inherent asymmetry that exists between attackers and defenders. A small idea that is set aside because it would take too long is just a matter of the right perspective and a small prompt to an LLM.