Summary
Understanding RLP
- RLP (Recursive Length Prefix) is an encoding method essential to Ethereum’s data management.
- It efficiently handles arbitrarily nested arrays of binary data.
- RLP ensures compact and effective data storage and transmission within the Ethereum network.
Prefix Details
- Each item starts with a prefix byte indicating its type and length. - This prefix guides how the data should be interpreted and decoded.
RLP In Practice
- RLP encodes transactions, blocks, and state data, making it integral to Ethereum’s operation.
- Its role in the Ethereum Virtual Machine (EVM) underscores its importance in maintaining Ethereum’s performance and scalability.
RLP, or Recursive Length Prefix, might not be the most glamorous topic, but it’s the unsung hero behind Ethereum’s data efficiency. While JSON or XML are common in web development, RLP’s minimalist and efficient design is what makes Ethereum’s complex, nested data structures both compact and fast.
In Ethereum, RLP plays a crucial role by encoding transactions, blocks, and state data, contributing to the network’s efficiency and scalability. Its role is fundamental within the Ethereum Virtual Machine (EVM), ensuring that data is stored and transmitted in an optimal format.
The Basics
RLP stands for Recursive Length Prefix. It is a serialization method used in Ethereum to encode arbitrarily nested arrays of binary data. RLP is a binary encoding method that allows for efficient storage and transmission of data.
It has many uses, such as encoding transactions, blocks, and state data. RLP is used in Ethereum to encode data structures in a way that is efficient and easy to decode. It is a key component of the Ethereum protocol and is used extensively in the Ethereum Virtual Machine (EVM).
The Gory Details
What can be encoded with RLP?
RLP can encode an “item”, which is defined as one of:
- an unsigned integer (encoded as the binary representation of itself)
- a string (encoded as a sequence of bytes)
- a byte string (a sequence of bytes)
- or a list of items
More formally, an item can be described in TypeScript as:
type RLPItem = number | string | Uint8Array | RLPItem[];
What’s the “prefix” in RLP?
The “P” in RLP stands for “prefix”. Each item in RLP encoding starts with a single byte that indicates both the type of the item and its length, providing a way to determine how to decode the following data.
There are 5 types of values represented by the prefix byte:
- Single byte value: If the item is a single byte between 0x00 and 0x7f (0-127), the prefix is simply the byte itself. For example, the number 42 (
0x2a
) is encoded as0x2a
. - Short string: If the item is a string between 1 and 55 bytes long, the prefix is
0x80
plus the length of the string. For example, the string “hello” is encoded as0x8568656c6c6f
. - Long string: If the item is a string longer than 55 bytes, the prefix is
0xb7
followed by a length indicator and the string itself. - Short list: If the item is a list with a total encoded length less than 56 bytes, the prefix is
0xc0
plus the length of the list. For example, the empty list[]
is encoded as0xc0
. - Long list: If the item is a list with a total encoded length greater than 55 bytes, the prefix is
0xf7
followed by a length indicator and the list items.
The data that follows the prefix byte is the actual data being encoded.
These are a little hard to fully digest, so let’s look at an example for each.
Examples
Values less than or equal to 0x7f
(127) are encoded as themselves:
0x00
is encoded as0x00
0x69
is encoded as0x69
0x7f
is encoded as0x7f
If the value is between 0-55 bytes long (and does not match the above), it is encoded as 0x80
+ the length of the byte array + the byte array:
0x80
is encoded as0x8180
0x81
(0x80
+ 1 byte) - the prefix0x80
- the byte array
0x0100
(256) is encoded as0x820100
0x82
(0x80
+ 2 bytes) - the prefix0x01 0x00
- the byte array
0x123456
is encoded as0x83123456
0x83
(0x80
+ 3 bytes) - the prefix0x12 0x34 0x56
- the byte array
Otherwise, if the value is longer than 55 bytes, it is encoded as 0xb7
+ the length of the length + the length + the byte array:
- a value that is 56 bytes long is encoded as
0xb738
+ the 56 bytes0xb7
(0xb7
+ 1 byte) - the prefix0x38
- the length of the byte array
- a value that is 1024 bytes long is encoded as
0xb90400
+ the 1024 bytes0xb9
(0xb7
+ 2 bytes) - the prefix0x04 0x00
- the length of the byte array
However, if we’re encoding a list of items, we have a different prefix byte. If the total encoded length of the list is less than 56 bytes, it is encoded as 0xc0
+ the length of the list + the items:
[]
is encoded as0xc0
[0x00]
is encoded as0xc100
0xc1
(0xc0
+ 1) - the prefix0x00
- the item
[[0x00, 0x01], 0x02]
is encoded as0xc5c20001c102
0xc5
(0xc0
+ 5 bytes) - the prefix0xc2 0x00 0x01
- the first item (a 2-byte array with0x00
and0x01
)0xc1 0x02
- the second item (a 1-byte array with0x02
)
And finally, if the total encoded length of the list is greater than 55 bytes, it is encoded as 0xf7
+ the length of the length + the length + the items:
- A list with 56 items is encoded as
0xf738
+ the 56 items0xf7
(0xf7
+ 1 byte) - the prefix0x38
- the length of the list
- A list with 1024 items is encoded as
0xf90400
+ the 1024 items0xf9
(0xf7
+ 2 bytes) - the prefix0x04 0x00
- the length of the list
Decoding RLP
I’ve created a handy tool that allows you to paste in an RLP-encoded string and see the decoded result. If you’re not quite sure about some of the examples above, or just want to play around, here’s your chance!
Invalid hex string (does it start with "0x"?)
Decoded Result
Failed to decode
Explanation
Failed to decode
RLP In Practice
There are many occurrences where RLP is used in Ethereum. Here are a few examples:
- Encoding transactions - transactions are encoded using RLP before being included in a block
- Encoding accounts - in the Ethereum state trie, accounts are stored as the RLP encoding of the array
[nonce, balance, storageRoot, codeHash]
nonce
is the number of transactions sent from the accountbalance
is the amount of ether in the account (represented as wei)storageRoot
is the root of the storage trie for the account (article coming soon!)codeHash
is the Keccak-256 hash of the code for the account
It is also a building block used in many other concepts, including Patricia Merkle Tries (which will be introduced in the next article - stay tuned!).
Conclusion
RLP Serialization is the left-pad
of Ethereum—essential yet often underappreciated. It provides a robust mechanism for encoding complex data structures efficiently, a role in maintaining Ethereum’s performance and scalability that cannot be overstated.
By understanding RLP, you’re preparing to dive into many topics in Ethereum, such as the one we’ll cover next: Patricia Merkle Tries. Stay tuned for more!