Problem Set 3:
CSI: Blockchain

Due: Sunday, 8 November at 8:29pm

Purpose

The goal of this assignment is for everyone to gain experience analyzing the blockchain and understanding of the anonymity of bitcoin (and mechanisms that are used to increase or decrease it), and to hopefully help the FBI catch some heinous criminals.

Collaboration Policy

For this assignment, you may either work alone or with one other person of your choice. If you work with a partner, you should together submit one assignment with both of your names on it that reflects your combined efforts. You may, and are encouraged to, discuss all of the problems with anyone else you want (both on-line using the course web site or any other means you choose, and in person).

Submission

Submit your answers as a single ZIP file using this link. The name of your file should be <your email ID>-ps3.zip (if you worked alone) or <partner1 email ID>-<partner2 email ID>-ps3.zip. Your submission should include a PDF document containing all your written (non-code) answers, all of the source code for your programs, and any additional data you think it useful to provide.

Warm-Up: Manual Exploration

You should have received by email a list of bitcoin addresses that were given to us by the FBI. These are addresses that were used to collect ransom payments from victims of bitcoin ransomware attacks. In the scheme perpetrators encrypt a victim’s files and then demand that ransom payment be made to a specific bitcoin address in return for the key to decrypt the files.

One of the addresses are marked with your email ID. The intent of this is to ensure that the examination efforts are well distributed over the provided addresses. For the manual exploration question, you should start with that address (but feel free to look at any of the addresses in the file, and for the programmatic analysis questions, you’ll want to use all the addresses).

You can use any tool you want to answer these questions, but we suggest using blockchain.info. One useful thing you can do there is look at the taint analysis to see which other addresses a given addresses is interacting with (e.g., examine the reverse taint analysis to see the addresses that are receiving bitcoin from the given address).

Problem 1.
(a) What address are you investigating?
(b) When was that address in use?
(c) How many victims appear to have paid into the address?
(d) How much was collected from each victim (the most relevant value would be the value in US dollars at the time of the payment)? Did all the victims pay the same amount, or did the requested amount vary by victim?
(e) How did the ransomist (owner of this address) distribute the proceeds? Is there a percentage split, or some other fee being paid by the ransomist to other addresses?
(f) What else can you learn about the operation starting from your transaction?
(g) Added 4 Nov: If you are answering Problem 1 after class Wednesday, also answer: How is your address similar to and different from what has been observed by other students for other addresses?

Since our goal is to learn as much as we can about the overall operation, you should post your answers to Problem 1 (for your address) as a comment on this page.

Privacy in the Blockchain

The questions in the section focus on this paper:

You should also read:

Problem 2. Part of the strategy used in the Sarah Meiklejohn et al. paper was for the authors to open accounts with services and conduct transactions with questionable merchants.
(a) Why was it useful to actually conduct transactions with merchants operating in dark markets to conduct this research?
(b) What ethical and legal issues does contudcting a transaction with a questionable merchant raise?

Problem 3. The Heuristic 1 (Section 4.3) used to detemine sets of public addresses owned by the same entity is based on the assumption that all inputs to a bitcoin transaction are controlled by the same entity. According to the paper, “the sender in the transaction must know the private signing key belonging to each public key used as an input, so it is unlikely that the collection of public keys are controlled by multiple entities (as these entities would need to reveal their private keys to each other).” Explain why this is not actually true. (A good answer will consider in more detail what is needed in the unlocking script to spend each input.)

Problem 4. The analysis in this paper was done before hierarchical deterministic wallets became standard. How does the widespread use of HD wallets today impact the effectiveness of their Heuristic 2?

Blockchain APIs

One way to analyze the blockchain is to run a full bitcoin node, and operate on your own local copy of the blockchain. This requires downloading and processing a large amount of data (approximately 45 GB today). Instead, you can use an external service that provides an API for interacting with the blockchain. Several such APIs exist including blockchain.info, Bitpay’s Insight API, and BlockCypher’s Blockchain API.

For this assignment, you are welcome to use any services and open source bitcoin libraries and openly-licensed code you want, but must follow the license requirements of any code you use and credit this code in your submission.

The starting examples we provide will use Python and the BlockCypher Blockchain API, and we provide some directions to get started using this next (but feel free to use other tools if you prefer).

Setting up BlockCypher API

Install Python. Start by downloading and installing Python (https://www.python.org/downloads/release/python-350/. Note that Python V3 is incompatible with Python V2 (on MacOSX, the default python is Python 2.7.2; to use Python 3, use python3). Even if you are not familiar with Python, if you feel comfortable learning a new language on the fly you should be fine jumping right into this assignment. If you prefer a more structured intro to Python there are many tutorials available, including http://www.learnpython.org/.

Install the Blockcypher Python library. The BlockCypher Python library provides a convenient way to use the APIs. To install it, execute pip install blockcypher (in the command shell).

Here is some example code that uses the BlockCypher API to find all the addresses which received bitcoin directly from a sending address:

import blockcypher

def get_receivers(sending_address):
    receiving_addresses = set()
    r = blockcypher.get_address_details(sending_address)
    for tx in r['txrefs']: # loop through all transactions
        hash = tx['tx_hash']
        transaction = blockcypher.get_transaction_details(hash)
        # was this address an input
        sender = False
        for input in transaction['inputs']:
            if sending_address in input['addresses']:
                sender = True
                break
        if sender:
            for output in transaction['outputs']:
                for adr in output['addresses']:
                    receiving_addresses.add((adr, output['value']))

    return receiving_addresses

def satoshi_to_BTC(sval):
    return sval / 100000000

sender = '1Ez69SnzzmePmZX3WpEzMKTrcBF2gpNQ55'
print ("Received from " + sender + ": ")
for receiver in get_receivers(sender):
    print("   " + receiver[0] + " (" + str(satoshi_to_BTC(receiver[1])) + " BTC)")

To avoid API limits, you will want to obtain an API key, and add an api_key=APIKEY parameter to your requests:

   txdetails = blockcypher.get_transaction_details(tx, api_key=API_KEY)

You can also use direct web API requests:

     r = requests.get('https://api.blockcypher.com/v1/btc/main/addrs/' + adr 
       	 	       + '/full', params={'token': API_KEY}).json() 

To learn if an address is connected with a known wallet, you can use the WalletExplorer.com API (note the WalletExporer API is not publicly documented, but Aleš Janda, the WalletExplorer developer, kindly made it available to us):

def address_display(adr):
    try:
        r = requests.get('https://www.walletexplorer.com/api/1/address-lookup?address=' 
		         + adr + '&caller=virginia.edu').json()
        return adr + " (" + r['label'] + ")"
    except:
        return adr

Mixing Services

Before looking at the ransomware addresses, get some practice using a blockchain API by examining a simple mixer (as far as we can tell, the ransomwarists are not actually using any mixing service, though).

In transaction 0197dabc3c31c8221b5d7883a9d03240bcf7a3042e1bf6dcc26c8d3aa60c58ab, Dave sent 0.01523 BTC to the mixing service BitMixer.io.

BitMixer is a mixing service for the impatient - it will send the input amount, less the collected fees, to the requested output address immediately after the input transaction is confirmed (a single confirmation is enough for small amounts, so the mixer output transaction is very likely to be in the next block in the blockchain). (In this case, Block 379818).

Note BitMixer’s fee structure: “Our minimum fee is 0.5% plus 0.0005 BTC for every forward address. We recommend to set higher custom fee to prevent advanced amount-based blockchain analysis.” (They don’t, to many customers dismay, actually make any promises about their maximum fee.)

Problem 5. Assuming you know the output transaction is in the block immediately following the input transaction and that BitMixer’s actual fee on this transaction is between 0.5% + 0.0005 BTC and 0.6% + 0.0005 BTC (but don’t know anything else), how big is the anonymity set for the output transaction? (Hint: use the blockcypher API to look at all the output values for transactions in the target block.)

Challenge Bonus. Which address is the output address? Your answer should include (a) a wager amount indicating your confidence (up to 10 points), which you win or lose depending on the correctness of your answer, and (b) an explanation of why you think that is the actual output address.

Problem 6. The BitMixer.io site suggests, “In order to further enhance the security of your transactions we provide the opportunity to use two or more forward addresses as well as convenient time delays.” How much does it enhance anonymity to split your output across two forward addresses? (A good answer will get into how using two forward addresses impacts both the size of the anonymity set and the computational cost of performining the de-anonymizing analysis. A great answer would actually test this by examining the blockchain to get a concrete result.)

Follow the Ransomware Money

Now it’s time to follow the money! We’ll provide a few hints for things to do, but you should use these as starting points and form your own ideas about ways to explore the blockchain.

First, we examine the direct transactions involving the ransomware addresses (in the suspects.txt file provided to you). Since we know these addresses were used to collect money from ransomware victims, we are more interested in looking at transactions where bitcoin is sent from the ransomware collection addresses to other addresses.

Problem 7. Write a program that reads in a file containing suspect addresses (e.g., the suspects.txt file containing the ransomware payment addresses that was sent to you), and examines the blockchain to find addresses that receive bitcoin from the suspect addresses. By using the WalletExplorer API, you should be able to find some interesting transactions, including one where the ransomware address sent bitcoin to an address WalletExplorer identifies as being “BitPay.com-old”. Report on any interesting transactions found by your program.

If the criminals are more careful, they should avoid making any revealing transactions directly from the collection addresses, and instead try to hide their money flows by making lots of transfers between other addresses (or, if they trust a mixnet operator sufficiently, use a mixer to disconnect their money trail).

So, we want to figure out if there are any common addresses that eventually receive coin from more than one of the starting addresses. This could detect if the suspect addresses are running received coins through a mixnet, but outputting to the same address eventually, which could mean payments to the same vendor.

A good approach to this problem could be writing a program that iteratively follows successive outputs from a series of seed addresses down to a reasonable depth and then checks for overlap between sets. We want to find addresses that eventually receive bitcoin sent from more than one of the ransom collection addresses, since these are likely to be addresses involved in meaningful transactions.

Problem 8. Write a program that reads in a file containing suspect addresses (e.g., the suspects.txt file containing the ransomware payment addresses that was sent to you), and outputs a list of all the addresses that eventually receive bitcoin from more than one of the suspect addresses. The receiving addresses are not limited to direct transactions, but could receive funds through a sequence of transactions starting from suspect addresses. From the ransonware payment addresses you should be able to find at least 34 such addresses (including the 18dwCxqqmya2ckWjCgTYReYyRL6dZF6pzL address which was already known to the FBI to have been used by the ransomware operation.)

We are also interested in finding out the relative importance of the addresses found. You can determine this based on factors such as amount of bitcoins they receive, number of seed addresses they are associated with, or how many hops away the addresses are on average from associated seed addresses. You can also use the WalletExplorer.com API to see if the addresses found are associated with known wallets.

Problem 9. Improve your program to provide additional information about the addresses found, and order the addresses according to the total value they received connected to the starting addresses.

The final problem is open-ended. Your goal is to hopefully find some information that would be useful to the FBI, or short of that, to at least impress your classmates.

We have a few suggested options below (from which you can select one to do). If you have a better idea for what to do, though, feel free to do that instead. If you want to do anything that involves interacting with criminals, though, please discuss your idea with the course staff and get permission before doing anything potentially risky!

Problem 10 (option 1). Develop and implement a new strategy for exploring the ransomware transactions, and report on what you find.

Problem 10 (option 2). Try applying your programs to other interesting sets of addresses. There are lots of potential sources of such addresses, including, of course, searching the blockchain itself for addresses involved in interesting transactions. (You can also look at posts such as 9 Infamous Bitcoin Addresses or Blockchain.info’s list of popular addresses.)

Problem 10 (option 3). Figure out how a mixing service like BitMixer.io (or select another one) works, and how much anonymity it really provides.

Submission

Follow the submission instructions at the beginning of this page by 8:29pm on Sunday, 8 November. If you do find anything of particular interest, please let us know right away, though!

Credits: This assignment was developed by Ori Shimony and David Evans for Cryptocurrency Cabal Fall 2015. We thank Jeremy D’Errico, Benjamin Blome, David Cochran, and Stuart VonCanon from the FBI (Richmond Office) for providing the impetus and data for this assignment. We thank Aleš Janda for making the WalletExplorer API available to us.

More Reading
Newer// Projects
Comments powered by Disqus