Blockchain Whitepaper Generated by Neural Networks
Another day, another ICO.
And with almost every ICO comes a white paper telling the story and implementation details of its token. Using the torch-rnn library, I trained an LSTM recurrent neural network on 160 whitepapers (5M characters) so that it can learn to write in English and generate some crypto-jargon from scratch.
Here are some snippets from the output, with minor formatting to structure the sentences. It sounds like gibberish, but hopefully it’s somewhat entertaining. If you invest in an ICO with a whitepaper like this one, then I wish you the best of luck.
If you’re curious about the specifics of the process, you can find some discussion at the end of this post.
The special attack-defined applications and create state can program the public key connection and amounts.
The market is a incentive of the event into the contracts cryptocurrency.
In the first blockchain and important stage potential computers can be centralized
A reward attempts to create the class system.
The governance are presented content to ether, the ETH protocol network.
Hash Verifiers could generate in Bob from a hash public key algorithm.
For a cryptocurrency will be security to account directly
The account to the protocol government is a community.
But the oracles will be difficult to include small and basic services changes and the system.
Native trade taker devices of market nodes on the blockchain users.
This is incentive.
All consensus of the threshold cryptocurrency consumers will be distinct
It starting a platform during the system is all data to every pre-most network of a form of any products.
Thus only in intelligence of server in protocol terms to a number of account may be used by the network.
They insurance through the commitment and development of the protocol.
A agreement and computing its contracts, the responsible data.
Could award the transactions are used to host blockchains.
The creation of the low market balances.
Possible would the protocol level which will regulate the development is a new proof-of-work solution.
stored security for Bob can be deflective a transaction to the technology proceeds or blockchain.
The advantage with the user which have cryptocurrency the given, there is faction.
The system may property for trading people execution, stakeholders.
We will provide features of the forms for the custodians using a
successful are computational transactions in private k, the transaction from the proposal further in the code responsible to money that entities, CPU tokens.
Liquidity Providing transactions and proofs of contents at the involves a supply by data game quality.
Financial blockchain major hash, which are the factors for the middlemen. For the state.
Because exit cases to go the needs to contain any issuer on the system for the network will be difficult. The basic liquidity for the users.
In Such machines such it can achieve authority and funds in continuous nodes.
The correct holder in addition of the value presentations between a transaction and blocks.
The transaction escrowing stake of the community, and confirmation to arbitrary incentive.
A Chain pool enables competitives. First virtual actions to the service is the test construction.
There agency that successfully connected to contract issuers and all node currencies.
The systems, and goods or services in the Node may be developing them. We have all positions.
Effective transaction in the resources in the specific decentralized replicated billion of a problems are generated, and discussed by fees.
The data collection and training process
Inspired by Andrej Karpathy’s famous blog post “The Unreasonable Effectiveness of Neural Networks” as well as “deep writing” works like Harry Potter generated by AI, I was curious to try my hand at at a similar feat with blockchain whitepapers. The torch-rnn library mentioned in Karpathy’s blog makes it easy to process, train and generate texts like this.
Whitepapers for the corpus include those of the most popular tokens on coinmarketcap.com, many of which were collected from whitepaper repositories like bravenewcoin.com and whitepaperdatabase.com. Most whitepapers are published in PDF form though, so only the text was used to train this network (excluding math notations).
With text from 169 whitepapers, the corpus had about 5 million characters (5 MB). Evidently, more data could have been useful. The hardest part about this process was manually collecting texts from PDFs and cleaning the corpus. Here are the stats after running the pre-processing script:
Total vocabulary size: 189
Total tokens in file: 5068203
Training size: 4054563
Val size: 506820
Test size: 506820
The network was trained with 3 layers, batch size of 10, sequence length of 50 and dropout of 0.2. The output was sampled with a temperature of 0.8.
Big thanks to those who helped review drafts of this post: Samanee Mahbub, Andy Ly, Andres Galaviz, Victor Wu