Problem Set 3:
CSI: Blockchain
Purpose
The goal of this assignment is for everyone to gain experience analyzing the blockchain and understanding of the anonymity of bitcoin (and mechanisms that are used to increase or decrease it), and to hopefully help the FBI catch some heinous criminals.
Collaboration Policy
For this assignment, you may either work alone or with one other person of your choice. If you work with a partner, you should together submit one assignment with both of your names on it that reflects your combined efforts. You may, and are encouraged to, discuss all of the problems with anyone else you want (both on-line using the course web site or any other means you choose, and in person).
Submission
Submit your answers as a single ZIP file using this
link. The name
of your file should be <your email ID>-ps3.zip
(if you worked alone)
or <partner1 email ID>-<partner2 email ID>-ps3.zip
. Your submission
should include a PDF document containing all your written (non-code)
answers, all of the source code for your programs, and any additional
data you think it useful to provide.
Warm-Up: Manual Exploration
You should have received by email a list of bitcoin addresses that were given to us by the FBI. These are addresses that were used to collect ransom payments from victims of bitcoin ransomware attacks. In the scheme perpetrators encrypt a victim’s files and then demand that ransom payment be made to a specific bitcoin address in return for the key to decrypt the files.
One of the addresses are marked with your email ID. The intent of this is to ensure that the examination efforts are well distributed over the provided addresses. For the manual exploration question, you should start with that address (but feel free to look at any of the addresses in the file, and for the programmatic analysis questions, you’ll want to use all the addresses).
You can use any tool you want to answer these questions, but we suggest using blockchain.info. One useful thing you can do there is look at the taint analysis to see which other addresses a given addresses is interacting with (e.g., examine the reverse taint analysis to see the addresses that are receiving bitcoin from the given address).
(a) What address are you investigating?
(b) When was that address in use?
(c) How many victims appear to have paid into the address?
(d) How much was collected from each victim (the most relevant value would be the value in US dollars at the time of the payment)? Did all the victims pay the same amount, or did the requested amount vary by victim?
(e) How did the ransomist (owner of this address) distribute the proceeds? Is there a percentage split, or some other fee being paid by the ransomist to other addresses?
(f) What else can you learn about the operation starting from your transaction?
(g) Added 4 Nov: If you are answering Problem 1 after class Wednesday, also answer: How is your address similar to and different from what has been observed by other students for other addresses?
Since our goal is to learn as much as we can about the overall operation, you should post your answers to Problem 1 (for your address) as a comment on this page.
Privacy in the Blockchain
The questions in the section focus on this paper:
- A Fistful of Bitcoins: Characterizing Payments among Men (sic) with No Names by Sarah Meiklejohn, Marjori Pomarole, Grant Jordan, Kirill Levchenko, Damon McCoy, Geoffrey M. Voelker, and Stefan Savage. Internet Measurement Conference, 2013.
You should also read:
- Chapter 6: Bitcoin and Anonymity, from Arvind Narayanan, Joseph Bonneau, Edward Felten, Andrew Miller, Steven Goldfeder. Bitcoin and Cryptocurrency Technologies
(a) Why was it useful to actually conduct transactions with merchants operating in dark markets to conduct this research?
(b) What ethical and legal issues does contudcting a transaction with a questionable merchant raise?
Blockchain APIs
One way to analyze the blockchain is to run a full bitcoin node, and operate on your own local copy of the blockchain. This requires downloading and processing a large amount of data (approximately 45 GB today). Instead, you can use an external service that provides an API for interacting with the blockchain. Several such APIs exist including blockchain.info, Bitpay’s Insight API, and BlockCypher’s Blockchain API.
For this assignment, you are welcome to use any services and open source bitcoin libraries and openly-licensed code you want, but must follow the license requirements of any code you use and credit this code in your submission.
The starting examples we provide will use Python and the BlockCypher Blockchain API, and we provide some directions to get started using this next (but feel free to use other tools if you prefer).
Setting up BlockCypher API
Install Python. Start by downloading and installing Python
(https://www.python.org/downloads/release/python-350/.
Note that Python V3 is incompatible with Python V2 (on MacOSX, the
default python
is Python 2.7.2; to use Python 3, use python3
).
Even if you are not familiar with Python, if you feel comfortable
learning a new language on the fly you should be fine jumping right
into this assignment. If you prefer a more structured intro to Python
there are many tutorials available, including
http://www.learnpython.org/.
Install the Blockcypher Python library. The BlockCypher Python
library provides a
convenient way to use the APIs. To install it, execute pip install
blockcypher
(in the command shell).
Here is some example code that uses the BlockCypher API to find all the addresses which received bitcoin directly from a sending address:
import blockcypher
def get_receivers(sending_address):
receiving_addresses = set()
r = blockcypher.get_address_details(sending_address)
for tx in r['txrefs']: # loop through all transactions
hash = tx['tx_hash']
transaction = blockcypher.get_transaction_details(hash)
# was this address an input
sender = False
for input in transaction['inputs']:
if sending_address in input['addresses']:
sender = True
break
if sender:
for output in transaction['outputs']:
for adr in output['addresses']:
receiving_addresses.add((adr, output['value']))
return receiving_addresses
def satoshi_to_BTC(sval):
return sval / 100000000
sender = '1Ez69SnzzmePmZX3WpEzMKTrcBF2gpNQ55'
print ("Received from " + sender + ": ")
for receiver in get_receivers(sender):
print(" " + receiver[0] + " (" + str(satoshi_to_BTC(receiver[1])) + " BTC)")
To avoid API limits, you will want to obtain an API
key, and add an api_key=APIKEY
parameter to your requests:
txdetails = blockcypher.get_transaction_details(tx, api_key=API_KEY)
You can also use direct web API requests:
r = requests.get('https://api.blockcypher.com/v1/btc/main/addrs/' + adr
+ '/full', params={'token': API_KEY}).json()
To learn if an address is connected with a known wallet, you can use the WalletExplorer.com API (note the WalletExporer API is not publicly documented, but Aleš Janda, the WalletExplorer developer, kindly made it available to us):
def address_display(adr):
try:
r = requests.get('https://www.walletexplorer.com/api/1/address-lookup?address='
+ adr + '&caller=virginia.edu').json()
return adr + " (" + r['label'] + ")"
except:
return adr
Mixing Services
Before looking at the ransomware addresses, get some practice using a blockchain API by examining a simple mixer (as far as we can tell, the ransomwarists are not actually using any mixing service, though).
In transaction 0197dabc3c31c8221b5d7883a9d03240bcf7a3042e1bf6dcc26c8d3aa60c58ab, Dave sent 0.01523 BTC to the mixing service BitMixer.io.
BitMixer is a mixing service for the impatient - it will send the input amount, less the collected fees, to the requested output address immediately after the input transaction is confirmed (a single confirmation is enough for small amounts, so the mixer output transaction is very likely to be in the next block in the blockchain). (In this case, Block 379818).
Note BitMixer’s fee structure: “Our minimum fee is 0.5% plus 0.0005 BTC for every forward address. We recommend to set higher custom fee to prevent advanced amount-based blockchain analysis.” (They don’t, to many customers dismay, actually make any promises about their maximum fee.)
Follow the Ransomware Money
Now it’s time to follow the money! We’ll provide a few hints for things to do, but you should use these as starting points and form your own ideas about ways to explore the blockchain.
First, we examine the direct transactions involving the ransomware
addresses (in the suspects.txt
file provided to you). Since we know
these addresses were used to collect money from ransomware victims, we
are more interested in looking at transactions where bitcoin is sent
from the ransomware collection addresses to other addresses.
suspects.txt
file containing the ransomware
payment addresses that was sent to you), and examines the blockchain to
find addresses that receive bitcoin from the suspect addresses. By
using the WalletExplorer API, you should be able to find some
interesting transactions, including one where the ransomware address
sent bitcoin to an address WalletExplorer identifies as being
“BitPay.com-old”. Report on any interesting transactions found by your
program.If the criminals are more careful, they should avoid making any revealing transactions directly from the collection addresses, and instead try to hide their money flows by making lots of transfers between other addresses (or, if they trust a mixnet operator sufficiently, use a mixer to disconnect their money trail).
So, we want to figure out if there are any common addresses that eventually receive coin from more than one of the starting addresses. This could detect if the suspect addresses are running received coins through a mixnet, but outputting to the same address eventually, which could mean payments to the same vendor.
A good approach to this problem could be writing a program that iteratively follows successive outputs from a series of seed addresses down to a reasonable depth and then checks for overlap between sets. We want to find addresses that eventually receive bitcoin sent from more than one of the ransom collection addresses, since these are likely to be addresses involved in meaningful transactions.
suspects.txt
file containing the ransomware
payment addresses that was sent to you), and outputs a list of all the
addresses that eventually receive bitcoin from more than one of the
suspect addresses. The receiving addresses are not limited to direct
transactions, but could receive funds through a sequence of transactions
starting from suspect addresses. From the ransonware payment addresses
you should be able to find at least 34 such addresses (including the
18dwCxqqmya2ckWjCgTYReYyRL6dZF6pzL
address which was already known to
the FBI to have been used by the ransomware operation.)We are also interested in finding out the relative importance of the addresses found. You can determine this based on factors such as amount of bitcoins they receive, number of seed addresses they are associated with, or how many hops away the addresses are on average from associated seed addresses. You can also use the WalletExplorer.com API to see if the addresses found are associated with known wallets.
The final problem is open-ended. Your goal is to hopefully find some information that would be useful to the FBI, or short of that, to at least impress your classmates.
We have a few suggested options below (from which you can select one to do). If you have a better idea for what to do, though, feel free to do that instead. If you want to do anything that involves interacting with criminals, though, please discuss your idea with the course staff and get permission before doing anything potentially risky!
Submission
Follow the submission instructions at the beginning of this page by 8:29pm on Sunday, 8 November. If you do find anything of particular interest, please let us know right away, though!