Transaction Management
This guide explains how to build a transaction management harness that can scale on the Aptos blockchain.
Background
In Aptos, transactions are mapped back to an account in terms of the entity that signs or authorizes that transaction and provides an account-based sequence number. When the Aptos network receives a new transaction, several rules are followed with respect to this:
- The transaction sent from an account must be authorized correctly by that account.
- The current time as defined by the most recent ledger update must be before the expiration timestamp of the transaction.
- The transaction’s sequence number must be equal to or greater than the sequence number on-chain for that account.
Once the initial node has accepted a transaction, the transaction makes its way through the system by an additional rule. If a transactions sequence number is higher than the current on-chain sequence number, it can only progress toward consensus if every node in the path has seen a transaction with the sequence number between the on-chain state and the current sequence number.
Example:
Alice owns an account whose current on-chain sequence number is 5.
Alice submits a transaction to node Bob with sequence number 6.
Bob the node accepts the transaction but does not forward it, because Bob has not seen 5.
In order to make progress, Alice must either send Bob transaction number 5 or Bob must be notified from consensus that 5 was committed. In the latter, Alice submitted the transaction through another node.
Beyond this there are two remaining principles:
- A single account can have at most 100 uncommitted transactions submitted to the blockchain. Any more than that and the transactions will be rejected. This can happen silently if Alice submits the first 100 to Bob the node and the next 100 to Carol the node. If both those nodes share a common upstream, then that upstream will accept Alice’s 100 sent via Bob but silently reject Alice’s 100 sent via Carol.
- Submitting to distinct transactions to multiple nodes will result in slow resolution as transactions will not make progress from the submitted node until the submitted knows that all preceding transactions have been committed. For example, if Alice sends the first 50 via Bob and the next 50 via Carol.
Building a Transaction Manager
Now that we understand the nuances of transactions, let’s dig into building a robust transaction manager. This consists of the following core components:
- A sequence number generator that allocates and manages available sequence numbers for a single account.
- A transaction manager that receives payloads from an application or a user, sequence numbers from the sequence number generator, and has access to the account key to combine the three pieces together into a viable signed transaction. It then also takes the responsibility for pushing the transaction to the blockchain.
- An on-chain worker, leader harness that lets multiple accounts share the signer of a single shared account.
Currently, this framework assumes that the network builds no substantial queue, that is a transaction that is submitted executes and commits with little to no delay. In order to address high demand, this work needs to be extended with the following components:
- Optimizing
base_gas_unit
price to ensure priority transactions can be committed to the blockchain. - Further handling of transaction processing rates to ensure that the expiration timer is properly set.
- Handling of transaction failures to either be ignored or resubmitted based upon desired outcome.
Note, an account should be managed by a single instance of the transaction manager. Otherwise, each instance of the transaction manager will likely have stale in-memory state resulting in overlapping sequence numbers.
Implementations
Managing Sequence Numbers
Each transaction requires a distinct sequence number that is sequential to previously submitted transactions. This can be provided by the following process:
- At startup, query the blockchain for the account’s current sequence number.
- Support up to 100 transactions in flight at the same time, that is 100 sequence numbers can be allocated without confirming that any have been committed.
- If there are 100 transactions in flight, determine the actual committed state by querying the network. This will update the current sequence number.
- If there are less than 100 transactions in flight, return to step 2.
- Otherwise, sleep for .1 seconds and continue to re-evaluate the current on-chain sequence number.
- All transactions should have an expiration time. If the expiration time has passed, assume that there has been a failure and reset the sequence number. The trivial case is to only monitor for failures when the maximum number of transactions are in flight and to let other services manages this otherwise.
In parallel, monitor new transactions submitted. Once the earliest transaction expiration time has expired synchronize up to that transaction. Then repeat the process for the next transaction.
If there is any failure, wait until all outstanding transactions have timed out and leave it to the application to decide how to proceed, e.g., replay failed transactions. The best method to waiting for outstanding transactions is to query the ledger timestamp and ensure it is at least elapsed the maximum timeout from the last transactions submit time. From there, validate with mempool that all transactions since the last known committed transaction are either committed or no longer exist within the mempool. This can be done by querying the REST API for transactions of a specific account, specifying the currently being evaluated sequence number and setting a limit to 1. Once these checks are complete, the local transaction number can be resynchronized.
These failure handling steps are critical for the following reasons:
- Mempool does not immediate evict expired transactions.
- A new transaction cannot overwrite an existing transaction, even if it is expired.
- Consensus, i.e., the ledger timestamp, dictates expirations, the local node will only expire after it sees a committed timestamp after the transactions expiration time and a garbage collection has happened.
Managing Transactions
Once a transaction has been submitted it goes through a variety of steps:
- Submission to a REST endpoint.
- Pre-execution validation in the Mempool during submission.
- Transmission from Mempool to Mempool with pre-execution validation happening on each upstream node.
- Inclusion in a consensus proposal.
- One more pre-execution validation.
- Execution and committing to storage.
There are many potential failure cases that must be considered:
- Failure during transaction submission (1 and 2):
- Visibility: The application will receive an error either that the network is unavailable or that the transaction failed pre-execution validation.
- If the error is related to availability or duplicate sequence numbers, wait until access is available and the sequence number has re-synchronized.
- Pre-execution validation failures are currently out of scope, outside of those related to duplicate sequence numbers, account issues are likely related to an invalid key for the account or the account lacks sufficient funds for gas.
- Failure between submission and execution (3, 4, and 5):
- Visibility: Only known by waiting until the transaction has expired.
- These are the same as other pre-execution validation errors due to changes to the account as earlier transactions execute. It is likely either duplicate sequence numbers or the account lacks sufficient funds for gas.
- Failure during execution (6):
- Visibility: These are committed to the blockchain.
- These errors occur as a result of on-chain state issues, these tend to be application specific, such as an auction where a new bid might not actually be higher than the current bid.
Workers and Identity
Using the above framework, a single account can push upwards of 100 transactions from the start of a block to the end of a block. Assuming that all 100 transactions are consumed within 1 block, it will take a bit of time for the next 100 slots to be available. This is due to the network delays as well as the multi-staged validator pipeline.
To fully leverage the blockchain for massive throughput, using a single user account is not enough. Instead, Aptos supports the concept of worker accounts that can share the responsibility of pushing work through a shared account, also known as a resource account.
In this model, each worker has access to the SignerCap
of the shared account,
which enables them to impersonate the shared account or generate the signer
for the shared account. Upon gaining the signer
, the transaction can execute
the logic that is gated by the signer of the shared account.
Another model, if viable, is to decouple the signer
altogether away from
permissions and to make an application specific capability. Then this capability
can be given to each worker that lets them operate on the shared infrastructure.
Note that parallelization on the shared infrastructure can be limited if any transaction would have any read or write conflicts. This won’t prevent multiple transactions from executing within a block, but can impact maximum blockchain performance.