Databases and DLTs

New World Economic Forum report examines nine financial use cases for blockchains
Luke Parker, 14 Aug 2016 -Automated Compliance, Proxy Voting, Trade Finance, Wef
The World Economic Forum (WEF) published a comprehensive, 130-page report on Friday titled
'The future of financial infrastructure: An ambitious look at how blockchain can reshape
financial services'.
The entire document speaks to the subject of blockchain’s usefulness and their effects on the
world’s future financial infrastructure. The WEF cited that “Distributed ledger technology will
form the foundation of next generation financial services infrastructure in conjunction with
other existing and emerging technologies.”
“Our findings suggest this technology has the potential to “live-up to the hype” and
reshape financial services, but requires careful collaboration with other emerging
technologies, regulators, incumbents and additional stakeholders to be successful.”
The WEF is a Swiss nonprofit foundation, based in Geneva that is widely recognized as the
premier international institution for cooperation between the public and private sectors. Their
mission is to "improve the state of the world by engaging business, political, academic, and
other leaders of society to shape global, regional, and industry agendas."
In January, the forum was addressed by IMF director Christine Lagarde to describe “Virtual
Currency Schemes” including Bitcoin for the first time. Since that event, the international public
sector has produced paper after paper on blockchain technology.
“Using the existing World Economic Forum’s framework for “Disruptive Innovation in
Financial Services”, the report identifies potential blockchain use cases from across the
industry.”- Jesse McWaters, Project Lead, Disruptive Innovation in Financial Services, World
Economic Forum
The first of the use cases listed, Global Payments, is the one where the most change is likely to
be made; especially in remittances. “The focus of this use case is on low value−high volume
payments from an individual/business to an individual via banks or money transfer
operators,” they said.
A ‘seamless’ Know Your Customer (KYC) process is the first big improvement to existing
payments processes that they predict. Leveraging the digital user profiles of both the sender
and the receiver stored on-blockchain establishes trust and authenticates both parties quickly,
they point out.
Global Payments Future-state benefits
Also, the payments section suggests that “Banks can leverage cryptocurrency on the DLT to
facilitate global payments, eliminating supporting settlement platforms and foreign currency
buffers in nostro accounts,” referring to accounts denominated in foreign currencies.
“The use of cryptocurrency will add to additional volatility and will demand additional
hedging instruments, and Banks would be required to hold cryptocurrency as assets on
their books.”
The section concludes by pointing out that blockchains can enable banks to do real-time
settlement, reduce their fraud, enable micropayments, and eliminate errors using smart
contracts.
Insurance is another use case and the WEF examined property and casualty (P&C) claims
processing. P&C insurance is issued to protect against property losses such as homes or cars
and/or against legal liability resulting from injury or damage to the property of others.
“DLT has the potential to optimize the back-office operational costs of property and casualty
insurers,” the report reads. Through a smart contract, claim submissions are simplified and
automated, the need for brokers will be eliminated and processing time as well as fraud are
reduced.
P&C claims processing future-state benefits

Deposits and lending is an area where the WEF examined two specific use cases; syndicated
loans and trade finance.
Syndicated loans are large-scale diversified financing funded by a group of investors.
An independent report by Thomson Reuters states that global syndicated lending reached
US$4.7 trillion during full year 2015. “DLT has the potential to optimize syndicated loan back-
office operations,” the WEF report says, citing its record keeping functionality to provide
transparency and automate the selection process.
Some companies are already exploring this usage. In February, Japan's largest financial holding
companies, Mizuho Financial Group, announced a blockchain project to use Microsoft Azure
Blockchain-as-a-Service for the company's syndicated loan business. In March, Ipreo and
Symbiont also announced a project to overhaul the global syndicated loans market.
Syndicated loans future-state benefits

The second area of use case within the Deposit and Lending category is Trade Finance, which
is how importers and exporters mitigate their trade risk through the use of trusted 3rd parties
and investments. “DLT has the potential to optimize the regulatory and operations costs of
trade finance,” the report reads.
The report then suggests that blockchain tech can improve almost every step of trade finance
process from real-time review to proof of ownership, providing transparency, eliminating the
need for correspondent banks, and preventing double spending problems: An issue that is very
prominent in trade finance.
Trade Finance future-state benefits

This is another area where ongoing projects exist, such as the one between Bank of America
and Merrill Lynch, who have been working on their own trade finance platform since March.
The Capital Raising use case outlined in the report is Contingent Convertible (CoCo) Bonds.
Unlike regular convertible bonds, CoCo bonds have another threshold which triggers the
conversion such as if the bank capital falls below 7.5%. CoCos are also similar to Catastrophe
(Cat) bonds but usually have a longer duration and deal in much larger amounts.
Blockchain tech’s record keeping functionality can increase confidence and "lead to developing
a “CoCo” bond rating system," the report continued, which would attract large institutional
investors. Direct integration of calculations into DLT can also improve data input across banks
which can reduce the time to convert CoCos into equity.
CoCo future-state benefits
The WEF concludes that “No significant applications of DLT within the “CoCo” bond life cycle
have been reported or discussed within blockchain research released to date.” However, in
June, insurance giant Allianz and Nephila announced that they have successfully
piloted blockchain technology for CAT swaps, which is basically the little brother product to
Coco bonds.
For investment management, the WEF referred to processes that take advantage of smart
contracts, specifically automated compliance and proxy voting.
Blockchain-based ‘Automated compliance’ has “the potential to increase operational
efficiencies and provide regulators with enhanced enforcement tools,” says the WEF. “This use
case focuses on the key opportunities in the financial statement audit process to highlight an
automated compliance solution.”
Automated Compliance future-state benefits

Proxy voting is already the subject of a test by Nasdaq that started in October 2015.
According to the WEF, blockchain technology “has the potential to transfer value irrefutably.
This use case highlights the key opportunities to improve retail investor participation in proxy
voting,” the report reads.
Proxy Voting future-state benefits

Also listed is the use case for Market Provisioning, within which the WEF outlined two use
cases; asset rehypothecation and equity post-trade.
Asset rehypothecation is the practice by banks and brokers of using their clients’ collaterals
for their own purposes. Clients may be compensated by a lower cost of borrowing or fee
rebates.
The report states that “DLT increases processing efficiency, reducing manual processes and
associated costs,” citing that there are many proof-of-concepts in the works by various
organizations. Examples include the gold market, repurchase markets, and in asset transfers.
Asset rehypothecation future-state benefits

In March, Depository Trust & Clearing Corporation (DTCC) and Blythe Masters' Digital Asset
Holdingsannounced a similar plan, in her case using the hyperledger blockchain to manage the
clearing and settlement of U.S. Treasury and other agency mortgage-backed repurchase
agreement (repo) transactions.
Equity Post-Trade is one of the use cases studied in the WEF report. “DLT has the potential to
improve the efficiency of asset transfer,” it stated. “This use case highlights the key
opportunities to streamline clearing and settlement processes in cash equities.”
Equity Post-Trade future-state benefits

The report also foresees a possible process where custodian banks sent trade details to the DLT
and a smart contract validates the trade details and ensures transfer accuracies for all parties.
Finally, confirmation is stored in the DLT real-time and settlement is reduced to real-time,
rather than the current trade date plus one to two days.
Data Databases Distributed Networks
By malcolm mcewen | July 25, 2016 | greenman-23 digest
Random Thoughts on Cargo,

Ships and Oceans
(Data, Databases and Distributed Network)
We tend to regard data as if it were a thing with dimensions and boundaries. A product of the
information age we live in it travels like the cargo of a ship on the virtual ocean that is the
information highway; when in fact the cargo, the ship and the information highway are all data,
there is only ocean.
This ocean of data drives society, determines national budgets, aids decisions in industry and
pigeon holes us into social and economic groups. From the global to the personal level data
plays a significant role in all the decision processes of everyone’s life. Processes that if based on
poor inaccurate, out of date or misleading data risk making decisions that are equally poor,
misleading and out of date.
So, if we are to make good decisions, we need to know the outcomes, the benefits and
consequences of our actions on ourselves, our neighbours and our environment. We need to
understand the relationship between the macro and the micro, the local and the global and the
only way to do that is through the data.
According to some reports we have generated more data in the last five years than in our entire
history and each year we generate more. With this explosion in data comes opportunities for
improving our decision processes and achieving global sustainability objectives. However with
those opportunities come challenges in handling, differentiating and working out just what is and
is not useful. For no data, is better than the wrong data. The right data however, despite what
Mark Twain would aver, makes for good statistics and good statistics support good decision
processes. But what is the ‘right data’ in an information age awash with the stuff.
What Is Data
The internet is data, everything on it and every piece of software on a computer is made up of
Data. However in the context herein data has the more ‘narrow‘ scientific definition of
“a set of values or measurements of qualitative or quantitative variables, records or
information collected together for reference or analysis,” (Wikipedia)
it is the cargo on our ship…
The contents of a telephone book is an example of data collected for reference. Data that can
and is put into databases for analysis. Once entered it can be re-organized and sorted so as to
reveal how the names are distributed, measure their frequency and estimate ethnic or social
economic distributions. The analysis might reveal odd correlations, trends and anomalies, such as
the frequency at which three sixes appear in the telephone numbers of people with double barrel
names, that would otherwise be missed. Such anomalies can fuel conspiracies and are examples
of statistics being used like a drunk uses a lamp post, more for support than illumination. In truth
there is though little one can get from a telephone book other than a telephone number and an
address. That’s not to say that data isn’t useful.
Types Of Data
Data categorisation is very much dependent on purpose; there is no single category structure
applicable to all. With that in mind I propose four Data ‘spheres‘ to initially distinguish data types.
Personal Data
A telephone book is just one source of personal data, as is a mailing list, a club membership, a
bank account or a tax office receipt. Individually these data sources provide limited information
about an individual but contain fields (name,address, etc) that make it easy to link the data so
that collectively it documents extensive details about an individuals personal and financial life.
Scary stuff and whilst it’s the most precious kind of data it similarly makes up an insignificant
fraction of the total data currently held or being generated by the internet.
Economic Data
The state of the nation, the productivity of industry and the movement of goods and services
within and between trading entities relies on the supply of good data. The budget, government
policy and changes to or creation of new laws all rely on good relevant data. Without it there
would be no means to balance the books, to calculate a nations GDP and value it’s currency.
However data collection currently lags behind the policy that relies on it. At best the figures are
for the previous quarter but more often than not are estimates aggregated together from
different sources.
Sociopolitical Data
Domestic government policy on health and education as well as changes to and creation of new
laws all rely on good data. At the regional level Data determines how policy will be implemented
and budgets distributed between schools, policing, refuse collection, etc. National and local
government therefore needs quantitative and qualitative data on the demographics, social
trends, political, cultural and ethnic identities of the people it serves.
Environmental Data
Environmental data includes any lab, field and desktop data from any chemical, physical or
biological discipline from the natural sciences. All data relating to Earth and biological disciplines
from theoretical particle physics to the applied science of agriculture are forms of Environmental
Data.
Non Exclusive Nature Of Data
Within these spheres data can be quantitative/qualitative, spatial/temporal,
deterministic/stochastic or combinations there of. The data may similarly be relevant to a few,
many or have a lasting or fleeting influence, and whilst most data conforms to the categories
above some straddles more than one and all of it interacts with and influences the data in others.
So whilst we can can compartmentalize data we can only understand it in the context of the
whole.
What Is A Database
A database is an application (program) into which data can be input and organised to provide an
indexing system or display statistical information on the data. A simple data set could be a
membership list of a golf club. Each entry containing details on a members name, age, address,
joining/subscription date and details of their achievements (i.e. handicap, or records held). The
database would allow the club to sort the details by any field (name, age, address, joining date,
subscription renewal, handicap, etc) and compile simple statistics (i.e. avg age, length of
membership) or see who hadn’t paid their subs. A database might store values, charts, tables,
files or just the location of the data as with bit torrent file sharing sites or search engines (i.e.
google).
Types Of Database
All databases store information, ideally for easy retrieval. What differentiates one from another is
the way the data is stored (within the database itself, or links to an external location), where the
database is held (central or distributed), and how the data is subsequently accessed (public or
private).
Traditional Database
Whilst limited and not generally regarded as a true database, a spreadsheet performs all the
basic functions of one. MySQL the database in the LAMP (Linux Apache MySQL PHP) stack that
drives the internet is an example of a more complex database. A MySQL database stores the
content and links to a web sites media. This content is accessed though PHP scripts ( i.e. a
Content Management System like WordPress) and then served to the internet by an Apache
server built using Linux.
Distributed Hash Table (DHT)

A Distributed Hash Table (DHT) is a database that stores only the location(s) of a file along with a
hash value (a unique reference that is the sum of the contents of the file). The hash value stored
in the database can then be compared with that of the external file in order to qualify the
integrity of the external file. A DHT may also hold data on when the file(s) was added, the last
time it was accessed and the total number of calls made to the file. A DHT is a mechanism used
for indexing and distributing files across a P2P network.
Blockchain
The bitcoin blockchain solves trust issues for cryptocurrency, but burns a lot of fossil fuel in the
process. Although the bitcoin blockchain is referred to as a distributed database, it is more a
duplicated ledger with every node maintaining an identical copy of the entire database. All nodes
compete to balance the ledger by guessing a hash value; a value that can’t be calculated easily
and can only by discovered by brute force. Guessed correctly it balances the entire system, and
creates a block. That in a nut shell is the proof of work concept that makes the Bitcoin blockchain
secure; A very energy hungry solution to solve an integrity issue with Homo sapiens.
A Framework For Sustainability

In the previous post I summarised a recent technical report by the Open Data Institute
(ODI) which raised the need for a “blockchain ecosystem to emerge that mirrored the common
LAMP 7 web stack” and was “compatible with the Web we have already.”
Reliable and secure the software that underpins

the LAMP stack is, it is now nearly 20 years old and has arguably reached its peak. It has similarly
evolved to be better at generating data than dealing with it. It’s good at serving files, not dealing
with the information in them, so whilst the evolution of a data stack needs to evolve alongside
the existing web structure it will likely be an evolution independent of it. One ‘promising’ data
stack identified by the ODI team which met this criteria was “Ethereum as an application
layer, BigchainDB as a database layer and the Interplanetary File System (IPFS) as a storage layer”.
Application Database Storage (ADS) Network
Unlike the LAMP stack the data ecosystem is more likely to evolve as a weave of intertwined data
streams that converge on nodes that use the data. Similarly with the LAMP stack exchanges
between nodes occurs at the server level, in an ADS network exchanges of data would occur in all
layers, Application, Database and Storage.
The Application Layer

What makes databases powerful are the scripts, applications, programs and content
management systems that use it. Scripts that are similarly responsible for entering data and with
the rapid growth in smart appliances and the IoT this data inputting is increasingly becoming
automated. How useful all that data turns out to ultimately be will depend as much on the
applications that can use the data effectively as on the databases that store and organize it. Once
data no longer has a processing value it would be archived, an action that would be performed
by an application.
The Database and Storage Layers

Data with different economic, social and environmental relevance, much of it originating from the
application layer, is indexed and organized through the database layer before finding its way into
the storage layer. There is to a degree some blurring of the lines between these two layers with
the database layer being dynamic whilst the storage layer is more for large files, legacy
databases, redundant or archived data.
Blockchain As Metronomes In An ADS Network

The main function of a blockchain is to provide an immutable ledger that can be trusted. It’s a
property an ADS network can exploit in order to synchronize databases. In particular supply chain
auditing on a blockchain would provide a trusted data source for multiple users in a network.
Blockchain being the ideal tool with which to build an authentication and tracking system that
shadows produce as it moves from farm to fork (strengthening the food chain with a blockchain)
A Manifest Of Global Agricultural Produce
Providing invaluable data to producers, importers, retailers

and consumers alike, with an authentication and tracking system on the blockchain the the origin
and route produce took to market could be qualified.
Once established a consumer would have access to an audit trail where they would be able to
authenticate origin, standards in production or the carbon footprint of food. Detailing the
precise route that the produce took from the field to the shelf would give Importers and Retailers
insight into double handling, stalling and wastage on route, whilst National and Supranational
bodies would have precise data on the production, origin and consumption of agricultural
produce. If data be the cargo in an ADS network, supply chain authentication and tracking system
is the ship that carries that data.
Sowing The Seeds For Integrated Crop Production

And Management Systems
With an authentication and tracking system in place a farmer would be able track in real time
how much produce left the farm and reached the intended market. He would be able to see this
relative to his neighbour, relative to acreage of a given crop in a region and relative to all the
routes that crop took to market. Without having to communicate all farmers in a publicly
accessible authentication and tracking system would be exchanging data that would help all of
them plan and co-ordinate crop choices and market logistics.
It is a small step for that hub to widen, to encourage integrated crop production and
management in farms across a region and improved logistics to tackle over and under
production and transport wastage. One more step and farmers could begin to operate in their
own regional network not only to produce and supply food but to create co-operatives to
allocate resources more amicable or developing integrated fertility programs.
My experiment with IRCC Cameroon was an attempt to remotely put such a structure in place.
Supporting The Development Of A Peer To Peer
Economy
As well as farmers retailers and consumers could build co-operatives around a supply chain.
Orders could be automatically coordinated through logistics operators to find the optimum
route, and then tracked to the delivery address. On arrival the order could trigger payment or
payments. It’s a future that relies on the establishment of an authentication and tracking system
as well as the market places to promote and display the wares.
A good example of a blockchain authentication and tracking system is Deloitte’s ArtTracktive

blockchain. Launched in May of this year to “prove the provenance and movements of artwork” the
same technology, despite the huge difference in value of the goods, could be used to
authenticate and track a hand of bananas from the Caribbean to the corner shop as easily as it
can track a basket of fruit from Caravaggio to the Biblioteca Ambrosiana in Milan.
Widening the tracking remit are the London based startups Blockverify and Provenance.
A blockchain initiative on the Ethereum platform Provenance currently provides authentication
and traceability of bespoke goods . They are similarly actively exploring retail supply chain
tracking. Blockverify similarly claim to be able to provide blockchain authentication to the
pharmaceutical, luxury goods, diamonds and electronics industries.
Cropster, a company who create software solutions for the speciality coffee industry,
similarly provides provenance to coffee producers so they can “instantly connect to a centralized
market where thousands of roasters are actively looking.” Provenance which could be enhanced
further by an authentication and tracking system that follows the beans entire journey from
plantation to cup.
Undermining The Dark Web
Openbazaar, a peer to peer market place, now integrated with IPFS, is a decentralized
Amazon/Ebay that charges no fees and uses an escrow system with Bitcoin for payments.
Although Openbazaar discourages illicit trade, being a P2P network makes policing that policy
difficult. Escrow brings in a new layer of authentication, a layer that would be enhanced and
strengthened by an authentication and tracking system.
A decentralised market place using Bitcoin and supply chain tracking on a blockchain would
represent the first completely decentralized market place to be created on the web. Whilst not
completely ending the Dark Web an authentication and tracking system would address many of
the anonymity issues P2P networks and cryptocurrency create by authenticating sender, delivery
and recipient. Potentially a mechanism that is better suited to assisting the development
of wholesale markets than a P2P reinvention of Yahoo Auctions.
Share this:
So what is a Bitcoin and what is a Blockchain?
All of us know Bitcoin as a digital currency unit with which we can transfer money from one
anywhere in the world to anywhere else, as long as you have an internet connection. Just like how,
one can make call over internet without the intervention of any third parties.
It sounds great but is it so easy?
What if you have 5 bitcoins in your account and execute a spend transaction simultaneously over the
internet for 5 bitcoins to 5 different accounts at the same time? Since there is no central party or
mediators like in the case of a banking transaction who are tallying the accounts on a live basis, it
could well be possible that all the 5 persons would perhaps receive the transaction at the same time
and account for it! This is the classic problem called ‘Double spending; which many Technologists
and digital evangelists have not been able to solve for decades.
In computer science, this problem was formulated as Byzantine General’ Problem in 1982 by Leslie
Lamport, Robert Shostak and Marshall Pease in their 1982 paper. Byzantine refers to the
Byzantine Generals' Problem situation faced by a group of generals encircling and attacking a
city without any reliable communication path among themselves. The challenge is to come to the
right consensus of going with the right majority decision in face of lack of trust amongst
themselves due to the possible infiltration of their camp by the rival kingdom.
Byzantine fault tolerance (BFT) is the dependability of a fault-tolerant computer system,

particularly distributed computing systems, where components may fail and there is imperfect
information on whether a component is failed. In a "Byzantine failure", a component such as
a server can inconsistently appear both failed and functioning to failure-detection systems,
presenting different symptoms to different observers.
It is difficult for the other components to declare it failed and shut it out of the network, because
they need to first reach a consensus regarding which component is failed in the first place.
The solution for this problem had the capability to address the issue of ‘Double spending ‘
described earlier.
While many attempts have been made to solve this problem, it is only in 2008 October, that a
person or a group of persons in the name of ‘Satoshi Nakamoto’ presented a fool proof solution
as of the times through a white paper ‘Bitcoin: A Peer-to-Peer Electronic Cash System’
available at https://bitcoin.org/bitcoin.pdf .
Before we understand how Bitcoin works, let us look at an interesting situation.
We have a group of 100 business men maintaining there ledger in a book.
Every day, they are grouping all their transactions into a page and adding a page to the book.
All of them verify each transaction and approve on reaching majority and reach a consensus to
add the page. So they added 100 pages and things are going fine.
One day, they realised that some of the members of the group are trying to push wrong
transactions deliberately and this led to a lot of mistrust. Over and above that, once someone
misplaced the book and they had a nightmare fearing for the loss of all their transactions.
This is the classic problem due to a ‘Single Point of Failure’ which if attacked by an adversary,
the whole system could collapse.
So they decided that every one of the members should hold a copy of the book so that they are
not only able to verify all the transactions, but are also able to update the rest, in case any of the
pages is torn or of or if any of the books is spoiled due to any problem.
ALL THE MEMBERS WERE HAVNG A COPY OF ALL THE TRANSACTIONS GROUPED IN
SERIALLY LINKED PAGES AND TO CHANGE ANY TRANSATION ON ANY PAGE, MAJORITY
HAVE TO AGREE TO CHANGE THE ALL THE TRANSACTIONS FROM THERE TILL END.
So it was working fine and everyone was happy about the past data.
In case any new page had to be added, all of them would vote for the transactions and the
majority approved transaction could get added. So, in order to change any transactions and push
through one’s view of a different set of transactions, they needed to get the approval of over 50
other persons,
Over a period, the number of persons undertaking the transactions started growing by leaps and
bounds and so were the number of transactions. They had to find some way to ensure that they
were able to handle the volume as the number of people who were on the board to approve the
transactions was far lesser than the number of people who were conducting the transactions that
needed to be approved and grouped.
They devised a new system.
Each board member can catch around 100 transactions (fixed as per the number that can fit in a
page) from among the transactions that needed to be approved, approved or disapproved the
transactions, grouped them into a page and put them up for voting. All the members will view that
page holistically, verified each transaction once again and added them to the book. When more
than 50% of the board members approved the transaction, the page was considered final and the
race for the next page was on. The member whose page was added to the book was given a
reward of Rs 25/- and he could also earn some money out of the money offered by the users
whose transactions he chose to verify.
Verification of each transaction involved checking out whether the transacting parties are
genuine and the sender had the necessary balances to conduct the transaction. This multiple
verification process and the serial number based adding of the transactions in an immutable
manner was able to by far, solve the problem of ‘Double spending’ or multiple entries for the
same transaction.
But another problem remained. The huge number of transactions and the speed at which they
were being generated resulted in multiple pages being created at the same time and there was
utter confusion as to whose page had to be added to the book!
To overcome this they decided to add on more layer for selecting the page from among those
who created the page. They decided to make the page creators play a game and whoever wins,
their page would be selected, and they would be rewarded for creating the page.
Every one had to guess a random number, The number called ‘Nonce’ (Number used only once)
was paired with the transactions and a number derived out of the latest page number and a new
number was derived using a standard calculation by taking into account all the transactions
included for the page + the unique number Nonce+ a Single representative element for all the
transactions and a representation of the previous page
For example - if x was the previous page number, a function f(x) was created to represent all the
details upto the previous page. The winner member had to create a new number which
represented the following: f ( f(x)+representation of all transactions on the page)+all transactions
grouped for the new page + NONCE).
The function f(x) is termed as the Hash of x while element which represented all the transactions
in a unique number is called the Merkle root.
While hash is a unique number derived out of the base number, the Merkle root was derived out
of hashing pairs of transactions together till only one element was left.
Since the hash was unique, change in any transaction would result in the change in the Merkle
root and hence it would be easily caught in case anyone tried to tamper even with one
transaction at a later date.
A challenge was then thrown at the board members also called the Mining pool of members, in
the form of a targeted function value below which the the creator of the page would be
considered winner and his page of transactions could the be added to the book and reward
offered. For this, the member had to use NONCE a number of times befor he could meet the
target and this took a lot of effort.
All the board members have been playing this game ever since and new pages are being
created every 10 minutes ( or any interval as prescribed the group based on which the challenge
was chosen).
There have also been some issue with different pages added to the book a few times, but the
group was able to successfully resolve this by ensuring that at any time the book with the largest
serial number of approved pages with the majority of the group members prevails and the rest of
the pages are torn off and the transactions on those pages have to be taken into new pages in
the future, albeit as per the same process.
The system has been going well and there have so far been no issues of double spending or any
member trying to override the transactions through brute force or by any other means.
It is now doubly difficult for any member wanting to change the transaction as he not only has to
get his page entered into the consideration set for approval though a tedious process, but also
has to get the majority onto his side, a task which is almost impossible.
In the case of bitcoin, the members are called miners and those who are creating transactions
are the account holders submitting their transactions through their wallets. The pages are
represented by blocks and the transactions are the transfer details of the digital currency unit
from one account to another.
Bitcoin also provides for one level of identity protection for the users by delinking the real names
of the account holders with the transaction and also by ensuring that only authorised persons are
able to create and verify transactions and check balances.
https://medium.com/ipdb-blog/forever-isnt-free-the-cost-of-storage-on-a-blockchain-database-
59003f63e01
Forever Isn’t Free: The Cost

of Storage on a Blockchain
Database
Cloud storage services work as follows: You pay a monthly fee
up front for a fixed amount of storage space. During the paid
time, you can use any amount of storage space up to that limit.
When your paid time expires, you have two choices: pay for
another month or your files get deleted. Your cloud provider
only keeps your files for as long as you keep paying.
Blockchain databases can’t work on this model. A blockchain

database must store data indefinitely, so the recurring payment
model doesn’t work. Data storage costs must be paid up front,
and must cover not just that month but all the months and years
to come.
IPDB has developed a sustainable model for the long term

storage of data: a one time, up-front payment that covers the
cost of indefinite data storage. The payment must be enough to
cover the cost of storage and the IPDB Foundation’s operating
expenses.
This blog post is a deep dive into the numbers that led to a
single per-GB price point — the cost of storing data indefinitely
in a blockchain database.
This kind of analysis has been lacking in the hype around
blockchain technology. There are many problems that could be
addressed with blockchain technology, but without an
understanding of what a blockchain solution will cost, it is
impossible to say whether economic efficiencies can be
achieved. This post is a first step toward understanding which
use cases could truly benefit from the application of
blockchains.
Assumptions of the model:

Before we dive into the model, let’s outline some of our
underlying assumptions:
Conservative predictions: As a general rule, we have tried to

keep estimates and assumptions very conservative. We would
rather have happy surprises than unhappy surprises if our
numbers turn out to be off.
Replication: We want to have a replication factor of six,

meaning at least six copies of each transaction on six distinct
nodes. For extra comfort and security, there will be one
additional backup of the entire network.
Transaction Volume: We have estimated the rate of adoption

for IPDB. In the first few years we assume adoption to increase
exponentially like most technological adoption.
This curve is modelled by:
where t is the number of years away from 2017 i.e. t=1 for 2018.
We chose to model transaction growth using an S-curve because

the adoption of similar technologies followed that pattern. In
our model, the denominator gives the curve its S-shape. We
assume the IPDB will start with approximately 0.37
transactions/sec in 2018, so we include the 16 to shift the curve
to start here. The model exhibits a conservative ramp up, with
the 1.2 providing the compression of the curve’s growth. These
numbers were chosen to model the rapid adoption of successful
technologies in the 21st century and the usage we are predicting.
The number of transactions the network can handle can’t grow

to infinity, so we provide a limit for the number of transactions
per second. The cap is described by the numerator at 1 million
transactions every second which is achieved by the model in
about 15 years. We also consider limits at 500,000
transactions/sec and 5 million transactions/sec.
Transaction Size:
The maximum BSON document size in MongoDB is 16 MB.
Since a block is a document and can contain up to 1000
transactions we assume the soft limit for a single transaction to
be 16MB/1000 = 16kB. The size of a single transaction may be
anything smaller than this however, so we consider 1.5kB, 7kB
and 15kB in our calculations. Given usage of the IPDB Test
Network so far, we expect transaction size to trend toward
smaller sizes, likely in the 1.5kB to 7kB range.
Time Value of Money: Since we are planning to store data for as

long as possible, the majority of the cost of storing that data will
be spread out over years. The initial payment will leave the
IPDB Foundation with a significant balance that will be invested
conservatively. We assume a modest 3% return on that balance,
compounded annually.
Inflation: For all our costs, we account for inflation which

has historically been around 2%, compounded annually.
Forever: The IPDB plans to store data indefinitely but we only

run our calculations to 50 years. We embrace long-term
thinking, but even this timeframe is difficult to work with given
the pace of technological change.
The Model
Let’s start with the money coming into the IPDB Foundation.
Revenue
1.1 One Time Payment
Users will pay per gigabyte to write data to IPDB. In practice,

this will be an up-front fee that allows a certain amount of
storage, but for simplicity we will use a flat fee per gigabyte of
storage used. This is calculated as the amount of data stored in
GB, multiplied by the cost per GB in dollars. There will be no
ongoing cost for storing data. The initial fee is for indefinite
storage.
1.2 Balance from the Previous Year
The Balance is key to the sustainability of the IPDB financial

model. The amount not spent each year can be invested and
used in following years to cover the costs of indefinite storage.
where X is the per-GB cost in dollars.
Costs
2.1 Storage Costs
The cost of storing data has decreased exponentially as

technology improves:
 c(t) is the cost to store 1 GB of data in any given year t.

 A is the cost of storing 1 GB for one year at 2017 prices. Even
though we are using Microsoft Azure, we look to Amazon
Elastic File System for pricing here. Amazon EFS is more
expensive on a per-GB basis than traditional cloud pricing,
but offers an ease of scaling that would be desirable if a
similar product becomes available on Azure or when we roll
out nodes on the Amazon platform. With EFS, storing six
copies of 1 GB of data for a whole year costs $21.60 a year; A
= $21.60.
 k controls the rate at which storage costs go down over time.
The larger k is, the faster prices drop. Historical data from
the past 35 years set k = 0.2502, but predictions for future
storage costs suggest this rate of change will decline in the
future. We adopt the lower value, and set k = 0.173.
That shows us what a GB of storage will cost in any given year.

Now we can calculate our total costs for storage. Each year we
have to pay for the new data received in that year and continue
to store all data from previous years. So for each year the
storage cost is:
2.2. Intercluster Communication Costs
We need to factor in the cost of sending and receiving data.

Intracluster communication costs are the costs of transferring
data from one node to another within the same cloud network,
whereas intercluster costs are for outbound data transfers.
During our initial rollout, all nodes will be hosted in the
Microsoft Azure cloud for ease of deployment and support.
Within Azure, all inbound data transfers are free. Once we are
running in 2019 we aim to have approximately 2/3 of all nodes
hosted outside the Azure network. As a reference we consider
the Azure pricing model given below:
By 2025 we aim to have 50 nodes, with approximately 34 not on
Azure. We predict 366,184 GB of new data for 2025, all of which
must be sent to each of those 34 external nodes.
In 2025, the first 120,000 GB of data will cost $0.138 per GB so

we’ll pay $16,560. Similarly we pay $64,800 for the next
480,000 GB of data, $156,000 for the next 1.2 million GB,
$504,000 for the next 4.2 million GB and the $620,613 for the
remaining data.
In total this works out to $1,361,973 in intercluster costs for

2025.
To make it easier we will define:
 N(t) as the number of nodes at time t.

 Azure_outbound_data_cost(Predicted_GB(t)) as a function
that takes the total outbound data transfer (in GB) as an
input and calculates the intercluster costs for a given year.
We must also consider that bandwidth costs have been
decreasing rapidly since 1997. The literature shows a decrease of
27% annually. It seems safe to assume this trend will continue
as we see with higher utilisation rates for existing networks and
new fibre coming online. If costs continue to decline 27% each
year, by 2025 we should have:
This decrease has a significant effect on intercluster costs over

time. In general we have:
In reality, given the large volumes of data transfers predicted,

IPDB will be in a position to negotiate wholesale data transfer
rates. This model provides an upper limit on how high the price
for outbound data could be. Once we also factor in inflation,
costs will look like:
2.3. Fixed Costs

So far we’ve only considered the cost of storing and transferring
data. What about operational costs like staff, facilities,
marketing and outreach, legal and accounting, and other
expenses necessary to support IPDB? Unlike physical storage
costs, logistical costs (staff, rent, etc.) do not decline but rather
increase over time. Staffing costs assume we will grow the team
to keep up with the volume of work, and offer wage increases to
at least match inflation. Other costs increase to match the needs
of the organisation and to account for inflation. We’ve also
assumed that some people will not like IPDB or the data stored
on it, so we’ve budgeted for legal fees.
At the outset, operational costs are the majority of IPDB’s

expenses. These costs become a much smaller percentage of
IPDB’s overall expenses as usage increases. Over time, fixed
costs per GB decrease significantly. For example, even with the
new hires we have budgeted for, the ratio of data stored to staff
member salary will increase by factors of over 100.
The ultimate goal is for IPDB to become self-sustaining. This

will happen by 2023, according to this model and our
assumptions. Until then, we will work to minimize operational
costs. Many costs will be covered by BigchainDB. Further
operational costs will be funded by grants and donations in this
period. The total costs each year are given by:
where F(t) refers to fixed costs.
Cost Estimate
So what’s the final number?
Financial sustainability is the most important piece of the
puzzle. If we set the price too low, even though we can cover the
cost of storing data for a long time eventually the number of
new transactions each year will wane and yearly revenue will fall
below yearly costs. We need to set the price such that
investment income on the balance, not new fees, can be used to
cover ongoing costs.
That final number is a one time fee of $100 per GB. This
allows us to store data indefinitely while covering the cost of
operating the IPDB Foundation.
Charging $100 per GB will see us becoming revenue-positive by

2023 with a total shortfall of $3,248,796 that must be recovered
through donations or grants. This per-GB price also allows us to
break even in the same time scale if our transaction rate is
halved and capped at 500,000 transactions per second.
As it scales up, the marginal cost of storing each additional GB

falls significantly, allowing IPDB to focus resources on
becoming fully decentralized, semi-autonomous internet
infrastructure that can store vast amounts of data.
The $100 price point is a maximum because of our conservative

estimates. If costs drop faster than expected, we could reduce
that price over time. For example, decentralized file storage
provided by services like IPFS may prove cheaper than existing
cloud options or even self-managed storage, or technological
breakthroughs could dramatically reduce costs. But for now,
$100 is a safe estimate that provides certainty to people hoping
to build on IPDB.
At our $100 price point if we assume a single transaction is of

average size (7 kB) how much will it cost an IPDB user to
validate and store it for 50 years in IPDB?
That is $0.0007 or 7/100 of a cent for a transaction.
Comparison
As a comparison, how much would it cost to
store the same amount of data for an
indefinite period on the Bitcoin or Ethereum
blockchains?
Image by Wit Olszewski via Shutterstock
Bitcoin
Even though it is possible to store data on the Bitcoin
blockchain, the Bitcoin protocol was not designed with data
storage in mind. However, as blockchain use cases expanded
beyond finance, many companies started using the Bitcoin
blockchain as a database all the same.
To store data on the Bitcoin blockchain we would enter the data

in the OP_RETURN field of Bitcoin transactions. The
OP_RETURN field allows a user to send a transaction that
doesn’t actually send money to anyone, but allows a small
amount of data to be written to the Bitcoin blockchain. Each
OP_RETURN output has a maximum size of 80 bytes, and each
transaction can have one OP_RETURN output.
To store the same 7KB transaction we have been working with

would require 88 OP_RETURN messages. As long as each one
is a valid transaction, with a dust fee of 546 satoshis or more,
each message will be propagated through the network and
mined into a block.
At the current BTC/USD ($2518) exchange rate the dust fee is
As of July 2017, the median Bitcoin transaction fee is about

$1.82. So the cost to store 7KB would be:
1GB would need 12,500,000 OP_RETURN messages so would

cost approximately $22,766,250. This figure is highly dependent
on transaction fees, which have increased dramatically over the
past year as Bitcoin has not found a scaling solution. In any
event, this is a theoretical exercise and not a proposal to use the
Bitcoin blockchain for large-scale data storage.
Ethereum
Transactions in Ethereum work completely differently than in
Bitcoin, requiring “gas” to have data processed.
To execute any regular transaction with no embedded data on

the Ethereum blockchain uses 21,000 gas. This is the minimum
gas limit required.
If you want to include data in your transaction you can do so in

one of two ways: by creating a contract, or by sending a message
call. Sending a message call allows the user to interact with
other accounts or smart contracts without having to create their
own contract. It requires the least gas of the two methods, so
we’ll send 7KB via a message call.
The gas cost is not just based on how big your data is but also
how complex. The most basic data we could send in a 7KB
message call would be comprised of only zeroed bytes. If the
data included text this would mean the message would have
non-zeroed bytes. According to the Ethereum Yellow Paper each
zeroed 32 byte costs 4 gas and every non-zero 32 byte word of
data requires 68 gas to send so assuming all the bytes are zero
provides a minimum gas cost.
7000/32= 219 “32 bit words” so we would need an additional:
That doesn’t seem like much if that was all you wanted to do,
but storing the sent data is an additional operation. Every 32
byte word costs 20,000 gas. This is one of the largest gas
requirements for any EVM OPCODE, reflecting that this is not a
simple operation but one that is being replicated and stored
across thousands of nodes. To store all 7KB would be:
Storing and sending 7KB of data requires 4,401,876 units of gas.

At the current median gas price (28 Gwei)
and ETH/USD exchange rate ($267) this transaction will cost
you about $32.91.
In general, the cost to store data on Ethereum works out to

approximately 17,500 ETH/GB, or around $4,672,500 at
today’s prices.
As noted above, we are not suggesting that large quantities of

data — images, videos, audio, other datasets — should be written
to Bitcoin or Ethereum. We understand this is not the point.
However, it is important to understand which use cases are

economical and which are not. Many of the applications that
have been proposed for blockchains — energy markets, music
streaming services, IoT, and so on — will require storage of vast
quantities of transactional information. This is exactly the kind
of data that should be stored within the blockchain database.
In future posts we will explore the economic implications of
this. What use cases can IPDB unlock that would be
uneconomical on other blockchain databases?
Co-authored with Simon Schwerin and Greg McMullen.
Visuals by Wojciech Hupert (unless noted).
Thanks to Trent McConaghy, Bruce Pon, Troy

McConaghy, Tim Daubenschütz, and Simon de la Rouviere for
comments and support.

Databases and DLTs

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Databases and DLTs

Uploaded by

Copyright:

Available Formats

New World Economic Forum report examines nine financial use cases for blockchains

P&C claims processing future-state benefits

Syndicated loans future-state benefits

Trade Finance future-state benefits

Automated Compliance future-state benefits

Proxy Voting future-state benefits

Asset rehypothecation future-state benefits

Equity Post-Trade future-state benefits

Random Thoughts on Cargo,

Distributed Hash Table (DHT)

A Framework For Sustainability

Reliable and secure the software that underpins

The Application Layer

The Database and Storage Layers

Blockchain As Metronomes In An ADS Network

Providing invaluable data to producers, importers, retailers

Sowing The Seeds For Integrated Crop Production

A good example of a blockchain authentication and tracking system is Deloitte’s ArtTracktive

It sounds great but is it so easy?

Byzantine fault tolerance (BFT) is the dependability of a fault-tolerant computer system,

Forever Isn’t Free: The Cost

Blockchain databases can’t work on this model. A blockchain

IPDB has developed a sustainable model for the long term

Assumptions of the model:

Conservative predictions: As a general rule, we have tried to

Replication: We want to have a replication factor of six,

Transaction Volume: We have estimated the rate of adoption

We chose to model transaction growth using an S-curve because

The number of transactions the network can handle can’t grow

Time Value of Money: Since we are planning to store data for as

Inflation: For all our costs, we account for inflation which

Forever: The IPDB plans to store data indefinitely but we only

Users will pay per gigabyte to write data to IPDB. In practice,

1.2 Balance from the Previous Year

The Balance is key to the sustainability of the IPDB financial

where X is the per-GB cost in dollars.

The cost of storing data has decreased exponentially as

 c(t) is the cost to store 1 GB of data in any given year t.

That shows us what a GB of storage will cost in any given year.

2.2. Intercluster Communication Costs

We need to factor in the cost of sending and receiving data.

In 2025, the first 120,000 GB of data will cost $0.138 per GB so

In total this works out to $1,361,973 in intercluster costs for

To make it easier we will define:

 N(t) as the number of nodes at time t.

This decrease has a significant effect on intercluster costs over

In reality, given the large volumes of data transfers predicted,

2.3. Fixed Costs

At the outset, operational costs are the majority of IPDB’s

The ultimate goal is for IPDB to become self-sustaining. This

where F(t) refers to fixed costs.

Charging $100 per GB will see us becoming revenue-positive by

As it scales up, the marginal cost of storing each additional GB

The $100 price point is a maximum because of our conservative

At our $100 price point if we assume a single transaction is of

That is $0.0007 or 7/100 of a cent for a transaction.

To store data on the Bitcoin blockchain we would enter the data

To store the same 7KB transaction we have been working with

At the current BTC/USD ($2518) exchange rate the dust fee is

As of July 2017, the median Bitcoin transaction fee is about

1GB would need 12,500,000 OP_RETURN messages so would

To execute any regular transaction with no embedded data on

If you want to include data in your transaction you can do so in

7000/32= 219 “32 bit words” so we would need an additional: