Problem Set 2:
The Blockchain
Purpose
The goal of this assignment is for everyone in the class to understand how bitcoin mining and the blockchain work, and evaluate risks to the blockchain.
Collaboration Policy
For this assignment, everyone should submit their own assignment and should writeup their own answers to the questions. You may, and are encouraged to, discuss all of the problems with anyone else you want (both on-line using the course web site or any other means you choose, and in person).
Submission
Submit your answers as a single PDF file using this
link. The name
of your file should be <your email ID>-ps2.pdf
.
Your submission should include clearly marked answers for all the problems (highlighted in yellow). None of the questions require submitting a program, although you may find it helpful to write programs to develop your answers. If the code you write is less than a page, it is best to just include it in the PDF writeup. If it is longer, you may submit separate code files (and mention them in the PDF submission).
Blockchain Consensus
These questions concern the original bitcoin paper:
- Satoshi Nakamoto, Bitcoin: A Peer-to-Peer Electronic Cash System, 2008.
What are the assumptions necessary to support Satoshi’s claim that it is more profitable for a greedy attacker with a majority of the mining power “to play by the rules”? (other than the assumption that the greedy attacker is a “he”)
Merkle Trees and Storage
Bitcoin uses Merkle Trees to record transactions in a way that enables a single hash to be used to record a set of transactions, and a small (logarithmic in the number of transactions) number of hashes to be sufficient to verify a transaction. In this question, you’ll explore another use of Merkle Trees to provide verifiable cloud storage.
Imagine you are storing a database using a cloud storage provider, since you do not have local storage. Assume that you have just enough local storage to store a small number of cryptographic hashes or keys. You fear, however, that your storage provider may be skimping, and may not actually be storing all of your data.
So, you want to verify each record as you receive it, to make sure the data is the same as what you wrote. In order to do so, you are willing to write extra cryptographic information with each write, but need to limit the amount of local storage.
Assume we are storing n records, and we can query the server for the i-th record for any i from 0 to n - 1. The data content of the i-th record will be denoted as data[i].
(a) There is a problem with this scheme unless the bytes of data[i] indicate that it is in fact from i-th index. What is this problem?
(b) Suppose we perform a write at position i, both data and signature. Later on, when we read it back, if the signature matches the data, can we be sure that it is indeed the data item we wrote? Explain.
(a) What is the write/read/verify procedure for this system?
(b) How does the cost of reading and writing to the database scale with n (the number of records)?
(a) What is the write/read/verify procedure for this system?
(b) How does the cost of reading and writing to the database scale with n (the number of records)?
Blockchain
These questions are related to this paper (and what was covered in Class 10 and Class 11):
- Ittay Eyal and Emin Gün Sirer, Majority is not Enough: Bitcoin Mining is Vulnerable, November 2013.
Orphaned Blocks are blocks that are submitted to the bitcoin network, and that are valid, but do not become part of the longest (consensus) blockchain.
One way to detect selfish mining is to be on the lookout for “orphaned” blocks, or blocks that were mined but never became part of the final blockchain.
Let’s say you set up a node to monitor orphaned blocks, and you are in a favorable position in the network that allows you to observe all orphaned blocks. Assume that the hashing difficulty stays constant and the expected block rate is constant at 10 minutes per block.
(a) If a mining pool has 15% (α = 0.15) of the total network hashing power, how many blocks is it expected to find in a day?
(b) Obtain a general formula for expected number of blocks a mining pool with α fraction of the total hashing power should find in t minutes.
For the next questions, you may assume a very simplified network model where you can view the network having two “supernodes”: one represents a particular mining pool, and the other represents all other nodes in the network. The latency between the two supernodes is L seconds.
With this simple (but very unrealistic) model, we assume that when one supernode announces a block, all of the rest of the network learns of the new block L seconds later (but no one learns of it before that).
Pool Hopping
Answer either question 10A or 10B (your choice).
Bonus Opportunities
The following questions are suggestions for further work, but it is not expected that everyone will solve them. (Note that some of these could be starting ideas for larger projects, although a good answer to any of them would be most impressive.)
Challenge 1. The network model in questions 8 and 9 is very unrealistic. Answer these questions for a more realistic model of the bitcoin network, and compare your results with the actual rates of orphaned blocks. To better detect misbehaving miners, you would want to also look into the contents of the dual blocks (note that the one that was orphaned is not necessarily the one from the selfish miner). There is an API for obtaining orphaned blocks provided by https://api.biteasy.com.
Challenge 2. The analysis in the paper and in class assumed that if the selfish miner is ahead by 2 blocks, it will always win by releasing the blocks when it observes a new block on the blockchain. It could still lose if the rest of the network finds a second block before the selfish miner’s blocks have propagated over the network. How does this possibility impact the analysis?
Challenge 3. The site that was demonstrated in Class 11 uses a very simple model to estimate the profitability of a particular bitcoin miner. Produce a better model, and use it to evaluated the expected profit (or loss) for mining hardware such as Antminer S7. A better model would need to capture expected increases in the difficulty, cost of capital, and other costs of mining besides electricity.
Submission
Follow the submission instructions at the beginning of this page by 8:29pm on Friday, 9 October.