I was trying to improve the functional tests for bumpfee
, a Bitcoin Core wallet feature that lets you increase the fee of a transaction that’s unconfirmed and stuck. Unfortunately I introduced a bug in the test, which I’m still in the process of tracking down. Every disadvantage has its advantage, so I took the opportunity to better understand the functional test framework and its powerful debugging tools.
Thanks to everyone who pointed me in the right direction on IRC (as well as for tips on how to use IRC without going insane).
You can view the changes I made, including the bug, in this pull request. Caveat: this is a PR to my own fork: don’t make pull requests like this to github.com/bitcoin/bitcoin. It’s generally a bad idea to change so many things at the same time, if only because it’s too much burden for code reviewers.
To reproduce the error yourself (assuming OSX or Linux, see here for Windows build instructions at the time this post was written):
git clone https://github.com/Sjors/bitcoin.git git checkout blog-2017-11-29-bug ./autogen ./configure --without-gui --with-incompatible-bdb make cd test/functional ./bumpfee.py
The test will start running and you’ll see a log message “Running tests”.
TestFramework (INFO): Initializing test directory /var/.../test4ivt2s9y TestFramework (INFO): Running tests TestFramework (ERROR): Assertion failed Traceback (most recent call last): ... File "./bumpfee.py", line 91, in test_simple_bumpfee_succeeds sync_mempools((rbf_node, peer_node)) ... raise AssertionError("Mempool sync failed")
Log statements
There’s not much information between the “Running tests” and “Assertion failed” log messages. To see more, switch the log level from the default INFO
to DEBUG
:
./bumpfee.py --loglevel=DEBUG
As I was investigating, I added several self.log.debug
statements to the test file, to get a better sense of what was going on before the error. Now my log looked something like this:
2017-11-29 12:29:24.935000 TestFramework (INFO): Initializing test directory /var/.../test4ivt2s9y 2017-11-29 12:29:25.068000 TestFramework.node0 (DEBUG): bitcoind started, waiting for RPC to come up 2017-11-29 12:29:25.071000 TestFramework.node1 (DEBUG): bitcoind started, waiting for RPC to come up ... 2017-11-29 12:29:27.297000 TestFramework (INFO): Running tests 2017-11-29 12:29:27.400000 TestFramework (DEBUG): Create RBF transaction 2017-11-29 12:29:27.485000 TestFramework (DEBUG): {'amount': Decimal('-25.00000000'), 'fee': Decimal('-0.00001000'), 'confirmations': 0, 'trusted': True, 'txid': '79cbe3db1f815a90b16cbfb27733b970722bc9658982984cd8e18d0e4683d861', 'walletconflicts': [], 'time': 1511958567, 'timereceived': 1511958567, 'bip125-replaceable': 'yes', 'details': [{'account': '', 'address': 'mk7wP6GE5YsqWtHUKfQmTG8xd6oMrHhc5Q', 'category': 'send', 'amount': Decimal('-25.00000000'), 'vout': 0, 'fee': Decimal('-0.00001000'), 'abandoned': False}], 'hex': '0200000001110bdbeedd5ee0e31ff1a93b45ffc8ddcb7479f3e37e3036778dbd23ef66cedc00000000494830450221009a383031670d0392488a0eec0eaef73a20380fa84b410e0309dd9f8eea0199b90220242020a6b6b44b8888e6954f8b300deb14b13a6804fb439f948ffef49eb08a4401fdffffff0200f90295000000001976a914327e4caec554e71fee268830dd13f594c6cf96ce88ac18f50295000000001976a914f0209d1e02367663173c51f1ccc63b9d65b7c50888ac00000000'} 2017-11-29 12:29:27.486000 TestFramework (DEBUG): ['79cbe3db1f815a90b16cbfb27733b970722bc9658982984cd8e18d0e4683d861'] 2017-11-29 12:29:27.484000 TestFramework (DEBUG): {'size': 1, 'bytes': 192, 'usage': 1008, 'maxmempool': 300000000, 'mempoolminfee': Decimal('0E-8')} 2017-11-29 12:29:27.485000 TestFramework (DEBUG): {'size': 0, 'bytes': 0, 'usage': 0, 'maxmempool': 300000000, 'mempoolminfee': Decimal('0E-8')} 2017-11-29 12:29:27.487000 TestFramework (DEBUG): Sync mempools after creating RBF transaction 2017-11-29 12:30:27.838000 TestFramework (ERROR): Assertion failed
I used self.log.debug(rbftx)
to print information about the RBF transaction that the test generated, self.log.debug(rbf_node.getrawmempool())
to show that the transaction made it into the mempool of the node that created it.
self.log.debug(peer_node.getrawmempool())
shows that it didn’t propagate to the other node. At least not immedidately, which makes sense: synchronisation of both mempools is not expected to happen until the next statement sync_mempools((rbf_node, peer_node))
. This test helper function waits for both mempols to be identical and fails otherwise.
A good way to learn more about what’s going on is to intentionally break things. For example if I modify the spend_one_input()
helper function to pay a 1 BTC miner fee, the test predicatably fails in a different way:
2017-11-29 12:45:14.515000 TestFramework (INFO): Running tests 2017-11-29 12:45:14.612000 TestFramework (DEBUG): Create RBF transaction 2017-11-29 12:45:14.692000 TestFramework (ERROR): JSONRPC error Traceback (most recent call last): ... File "./bumpfee.py", line 318, in spend_one_input txid = node.sendrawtransaction(signedtx["hex"]) ... test_framework.authproxy.JSONRPCException: 256: absurdly-high-fee (-26)
Print RPC commands
The functional tests work by sending commands to the test nodes via RPC. You can log these commands and their responses using ./bumpfee.py --tracerpc
. Now you can clearly see sync_mempools
in action:
2017-11-29 12:53:27.140000 TestFramework (DEBUG): Sync mempools after creating RBF transaction -31-> getrawmempool [] <-31- [0.000546] ["8646e3d63420ba49326f42b5327abe432058e3d93fa000f7a26fab1a0ae5b35d"] -32-> getrawmempool [] <-32- [0.000478] [] -33-> getrawmempool [] <-33- [0.000874] ["8646e3d63420ba49326f42b5327abe432058e3d93fa000f7a26fab1a0ae5b35d"]
sync_mempools
works by sending both nodes the getrawmempool
command and comparing the result. After a while it gives up and throws an error.
We still don’t know why the mempools aren’t synchronizing. So let’s dig deeper.
View node log files
So far we’ve been looking at the test logs. But there’s more: each test node has its own directory which includes a log file that you can inspect.
Note the path given after “Initializing test directory”, in this example /var/…/test4ivt2s9y
. The logs for the node that created the test transaction are in /var/…/test4ivt2s9y/node1/regtest/debug.log
.
You can change the level of detail in these blogs by adding "-debug=all"
to self.extra_args
in set_test_params()
. More fine-grained log options can be found via ../../src/bitcoind --help
:
-debug=<category> Output debugging information (default: 0, supplying <category> is optional). If <category> is not supplied or if <category> = 1, output all debugging information. <category> can be: net, tor, mempool, http, bench, zmq, db, rpc, estimatefee, addrman, selectcoins, reindex, cmpctblock, rand, prune, proxy, mempoolrej, libevent, coindb, qt, leveldb.
You can even combine the log files of all test nodes, in order to get a chronological picture of what happened:
test/functional/combine_logs.py /var/.../test4ivt2s9y > combined.log
I published my combined logs here:
node1 2017-11-28 18:24:14.601308 sending inv (37 bytes) peer=0 node0 2017-11-28 18:24:14.601634 received: inv (37 bytes) peer=0 node0 2017-11-28 18:24:14.601899 got inv: tx ca43e3bf038fd2b270f1f3d0b3a9c3a866df30afdbfec2078273376b5ba80ebf new peer=0 ... node1 2017-11-28 18:24:18.567446 sending inv (37 bytes) peer=1 node0 2017-11-28 18:24:18.567902 received: inv (37 bytes) peer=1 node0 2017-11-28 18:24:18.568054 got inv: tx ca43e3bf038fd2b270f1f3d0b3a9c3a866df30afdbfec2078273376b5ba80ebf new peer=1
It’s clear that node1
sent the transaction to node0
and there’s no obvious error message. One interesting observation is that node1
broadcast the transaction twice, even though the test was only run once.
View other node artifacts
Both node directories contain a file mempool.dat. Although you need other tools to really inspect their contents, in my case it was trivial to see that this file was empty for node0
and not empty for node1
, consistent with the--tracerpc
output above.
Use Python debugger
So we still don’t know what went wrong. Perhaps through manually interacting with the test nodes we can find out more. One way to do that is to use bumpfee.py --pdbonfailure
.
This gives you a Python console, where you ask things like:self.nodes[0].getrawmempool()
and notice it’s still empty.
Let’s try a manual broadcast:
# get a full list of RPC commands pp self.nodes[0].help() # see that the mempool of node0 is empty: pp self.nodes[0].getrawmempool() # get the full transaction in hexidecimal format from the mempool node1: tx = self.nodes[1].getrawtransaction("6332fe546864bf8e81a66a3f6c1868f006a88a3f8323c7eedb467de8d05531f6") # give it to node0: self.nodes[0].sendrawtransaction(tx) # see that the mempool of node0 now contains our transaction! pp self.nodes[0].getrawmempool()
Aha, that’s interesting! After this manual broadcast, we find that our transaction finally made it to node0. This still doesn’t solve our mystery, but at least provides another clue.
Inspect running nodes
If things get really desperate, you can leave the nodes running after the test, using--noshutdown
. That way you can poke at it using some other tool.
Use serendipity
No, that’s not a tool, it’s a proccess. Just go do other stuff. Eventually you might run into a solution. It turned out the test nodes thought they were still in IBD (Initial Blockchain Download), during which process they don’t synchronize their mempools. To tell the test nodes IBD is over, you need to mine an additional block using peer_node.generate(1)
. So I broke the tests by removing peer_node.generate(110)
.
More tests welcome
There’s still plenty of tests to write and improve in Bitcoin Core. Some integration tests, like the one in this article, are written in Python. Those could be a good place to start, until you’re a bit more familiar with the codebase. Please follow recommended practices. There are also C++ integration tests, as well as unit tests.