Hyperledger v1 High Level Design

Hyperledger v1 Ledger
High-level Design
Objectives
Support v1 endorsement/consensus model - separation of simulation

(chaincode execution) and block commit
Endorsement/simulation (chaincode execution) can be performed on a subset of peers.
Parallel execution of chaincode (concurrency)
Improved scalability
Embed transaction read/write sets on the blockchain (input-version and postimage)

Immutability, Auditing, Provenance
Optimize data storage for blockchain use pattern

New file-based ledger for improved performance
Continue using RocksDB for indexes to optimize ledger queries
Support pluggable data stores and rich query language

Challenging, given the first objective most databases do not support simulation and
read/write set requirements. Limitations will be likely. Next priority for investigation...
Ledger - Current work focus
KV-ledger
(High level components)
Block storage
Stores and retrieves blocks
Assumes blocks arrive in exact sequence
Queries supported
Retrieve blocks by block-hash and block-number
Scan blocks range between two block numbers
Retrieve Transaction by txId
Transactions execution
Simulates transactions and produces ReadWriteSet (Endorser)
Queries/Updates GetKey/SetKey/GetKeyRange
Validates And applies ReadWriteSet (Committer)
Key version based validation (MVCC)
Read-only queries
GetKey/GetKeyRange
Filesystem-based Block Storage
Blocks are stored in file segments
Each file segment contains
File segment header (version etc.)

A sequence of
Varint encoded length of block-bytes followed by block-bytes
RocksDB contains block indexes to support common queries
Default segment size 64 MB
Index block by hash, Index block by number, Index transaction by Id

Value of index is a file-offset-pointer
Potentially encode starting block number in segment file name, include a segment-specific block index at
the end of each segment file, and use blockNumber_tranId for transaction id, so that you can easily jump
to segment file given a block number or transaction id, without needing an external blockNum or txId
index (would still need a blockHash external index)
Usage
Raw ledger store batches of raw transactions to be committed

Final validated ledger store committed blocks of valid transactions
File seg-1
RocksDB
block
index
blockHash
blockNum
txId
SegNo + offset
SegNo + offset
SegNo + offset
File seg header

Block-1 length
Block-1
Block-2 length
Block-2
Filesystem-based Block Storage

Pros
Blocks arrive in a sequential order resulting in efficient append-only workload
Avoids the write amplification associated with RocksDB and other storage solutions
Becomes more feasible to move large numbers of blocks in bulk, for example when a new
peer comes online (move entire files instead of reading/writing N blocks).
Cons
Custom block data management on file system
Need to maintain sanity of file segments and consistency between block files and RocksDB
indexes
Need utilities to validate that block files and RocksDB are in sync, and to re-build indexes
as needed
Logical structure of a RWSet

Block{
Transactions [
{
"Id" : txUUID1
"Invoke" : Method(arg1, arg2,..,argN)"
TxRWSet" : [
{ Chaincode : ccId
Reads:[{"key" : key1", "version:v1}] // if a Tx perform both read and write on a key, the key appears only in Writes
Writes:[{"key" : key2", "version:v2, value" : bytes1}] // a missing value indicates a delete operation
} // end chaincode RWSet
] // end TxRWSet
}, // end transaction with "Id" txUUID1
{ // another transaction },
] // end Transactions
}// end Block
JSON syntax only for conceptual representation

Data is serialized in binary representation - sorted order of ccIds and sorted keys within chaincode
Notes:
Need to add chaincode version. Will be used for auditing, and perhaps for commit validation as well - especially upon chaincode upgradeneed to go
through all upgrade scenarios, e.g. ensure simulation was done on latest chaincode version available.
Transaction execution - Version maintenance
Version maintenance
Should be possible to detect if a key has changed between simulation and committing phase of a
transaction (MVCC validation)
Versioning scheme for a unique version per key two options:
Incrementing numbers (initial implementation)
txID of the last committed transaction that updated the key (implement with config option and compare)
Pro/Cons of using TxId as version identifier

Pro
Does not require introducing a new concept (e.g., auto-incrementing number for each key
separately)
Consistent with popular bitcoin transaction structure
(key + version) is equivalent to 'input; (key + newValue) is equivalent to UTXO output
Provides built-in provenance a pointer to prior transaction for this key, that can easily be
traversed backwards to track full history of a key over time
Separate fork ID not required in PoW for uniqueness
Cons
Transaction ids significantly longer than incrementing numbers (txIds may be 32 bytes if used crypto
hash of contents) in the case of pbft
Transaction execution - Simulation (Chaincode execution)
Transaction simulation
RocksDB contains latest state index for fast simulation queries
A scheme for simulating a transaction on a consistent copy of the data
Index by composite key (ccId:keyId)

if chaincodes are limited in number, use a separate column family per chaincode (Configurable?)
Collocating keys of a chaincode for faster transaction simulation particularly for range scan queries
Latest value encoded as [version:deleteMarker:latestValueBytes(if present)]
Value bytes can be file-offset-pointer to block storage for vary large values (configurable default - over 1 MB?)
Tx simulation to perform on a stable snapshot, supporting concurrency (initial two options):
Locking based concurrency control (initial implementation)

Read locks on RocksDB state by simulator(s) and write lock during commit
Snapshot based concurrency control (implement with config option and compare under load)
Create a RocksDB snapshot and simulate on the snapshot
Does not prevent concurrent commit of new blocks
RocksDB
State
index
ccId+keyId
version+deleteMarker+latestValueBytes
This is a simulation runtime optimization. Alternatively, state key index could point to ledger block/transaction
storage write set, and we could read values from there as the single source of truth, but would not be as efficient.
Bitcoin uses a similar index in LevelDB for unspent transactions.
Transaction execution Validation/Commit

Committing peer choreography
Receive batches of transactions from consensus (ordering service)
Call Validation System Chaincode (VSCC) to ensure endorsement policy has been fulfilled
Call ledger to perform Multiversion Concurrency Control (MVCC ) check; remove invalid
transactions; build block of remaining valid transactions
Initial implementation with sequential validation
Extend to parallel validation of transactions in a block
Using lock manager that maintains one lock for each key (acquire locks
sequentially and once all the locks are acquired; start performing validation)
Split transactions in conflict free batches by dependency analysis and perform
validation in parallel
Call Committer System Chaincode (CSCC) via gossip to ensure final blocks are same
across peers
Call ledger to commit validated block to file-based storage and update RocksDB indexes
Notes: Also need to validate that transaction id has not already been used.
* Blue steps call ledger APIs

Hyperledger v1 High Level Design

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hyperledger v1 High Level Design

Uploaded by

Copyright:

Available Formats

Hyperledger v1 Ledger

Support v1 endorsement/consensus model - separation of simulation

Embed transaction read/write sets on the blockchain (input-version and postimage)

Optimize data storage for blockchain use pattern

Support pluggable data stores and rich query language

Ledger - Current work focus

Filesystem-based Block Storage

Blocks are stored in file segments

Each file segment contains

File segment header (version etc.)

RocksDB contains block indexes to support common queries

Default segment size 64 MB

Index block by hash, Index block by number, Index transaction by Id

Raw ledger store batches of raw transactions to be committed

File seg header

Filesystem-based Block Storage

Logical structure of a RWSet

JSON syntax only for conceptual representation

Transaction execution - Version maintenance

Pro/Cons of using TxId as version identifier

Transaction execution - Simulation (Chaincode execution)

RocksDB contains latest state index for fast simulation queries

A scheme for simulating a transaction on a consistent copy of the data

Index by composite key (ccId:keyId)

Tx simulation to perform on a stable snapshot, supporting concurrency (initial two options):

Locking based concurrency control (initial implementation)

Transaction execution Validation/Commit

* Blue steps call ledger APIs

You might also like