I write a simple demo to figure out the data layout of flatbuffer as the official doc do not make enough sense to me.

Image we have some boxes to store things. Each box have a ‘name’ to identify the owner, a ‘weight’ and a ‘goods’ fields to track all the things in it. Each good have a ‘category’ associate with it. Then we can write the following FlatBuffer scheme to represent the box and goods.

FlatBuffer Scheme
namespace glove.flatbuffer.example;

enum Category:byte { Clothes = 0, Instruments = 1, Foods }

struct Good {
  category:Category = Foods;
}

table Box {
  name:string;
  weight:int;
  goods:[Good];
}

root_type Box;

Now, a person named ‘wzy’ puts his ‘honey’ and ‘clothes’ into a box, using the python way.

import flatbuffers
from glove.flatbuffer.example import Category, Good, Box


builder = flatbuffers.Builder(9527)

Box.BoxStartGoodsVector(builder, 2)
honey = Good.CreateGood(builder, Category.Category.Foods)
shirt = Good.CreateGood(builder, Category.Category.Clothes)
goods = builder.EndVector(2)

name = builder.CreateString("wzy")

Box.BoxStart(builder)
Box.BoxAddName(builder, name)
Box.BoxAddWeight(builder, 80)
Box.BoxAddGoods(builder, goods)
box = Box.BoxEnd(builder)

builder.Finish(box)
pickeled = builder.Output()

fout = open("example.bin", "wb")
fout.write(pickeled)
fout.close()

According to the code, Wzy’s actions are recorded into a binary file – example.bin. This binary file can be converted into json format easily by FlatBuffer’s compiler.

Shell command to convert a binary format data to json

flatc --raw-binary -t glove.fbs -- example.bin

Here is the converted json format.

{
  name: "wzy",
  weight: 80,
  goods: [
    {
      category: "Clothes"
    },
    {
      category: "Foods"
    }
  ]
}

Note that the key parts are not quoted in key-value pairs of converted json, so it’s not standard json format.

Let’s analyze the binary data layout.

Hexdump of data in binary format

00000000  10 00 00 00 00 00 0a 00  10 00 0c 00 08 00 04 00
00000010  0a 00 00 00 14 00 00 00  50 00 00 00 04 00 00 00
00000020  03 00 00 00 77 7a 79 00  02 00 00 00 00 02 00 00

The beginning is a dword offset to root table, which located at 0x10.

The first dword of root table is an offset to vtable. To get the address of vtable, we subtract the address of root table with the offset of vtable and get 0x06 (0x10 - 0x0a = 0x06).

For vtable located at 0x06, the first word is the size of vtable, which is 0x0a, the next word is the size of data stored inline in box table, and the following 3 words are offset to the ‘name’, ‘weight’ and ‘goods’ fields in box. To get ‘name’ field, we add address of root table with its offset and get 0x1c (0x10 + 0x0c = 0x1c). Now we arrive at 0x1c, and the data located at 0x1c is a dword offset 0x04 to the string as its a string/vector is stored outside of the table. Then, we can find the string by move 0x04 bytes from where we are, which lead us to address 0x20. The first dword is the length of the string, which is 0x03. The following 4 bytes is the content of the string plus a ‘\0’ terminator.

To get ‘weight’ filed, we add 0x10 with 0x08. As the weight field is a scalar, it is stored inline in the root table, the weight value is 0x80 as address 0x18.

To get the ‘goods’ vector, we add 0x10 with 0x04, the value stored there is an offset to the actual ‘goods’ vector. To find the ‘goods’ vector, we add the address of the offset with the value stored there (0x14 + 0x14 = 0x28) and arrived at address 0x28. The first dword is the size of the vector, which is 0x02, and the following 2 bytes are the enum value stored, which are 0x00 and 0x02.

Other parts in the binary file are paddings.

Maybe you have a hard time reading the text blob above, so I made a table to explain the layout.

definition address content  
offset to root table 0x00 10 00 00 00 0x10
padding 0x04 00 00  
vtable      
size of vtable 0x06 0a 00 10
inline data size of ‘box’ table 0x08 10 00 16
offset of ‘name’ start from ‘box’ table 0x0a 0c 00 0x10 + 0x0c = 0x1c
offset of ‘weight’ start from ‘box’ table 0x0c 08 00 0x10 + 0x08 = 0x18
offset of ‘goods’ start from ‘box’ table 0x0e 04 00 0x10 + 0x04 = 0x14
box table / root table      
offset of vtable 0x10 0a 00 00 00 0x10 - 0x0a = 0x06
offset of ‘goods’ start from here 0x14 14 00 00 00 0x14 + 0x14 = 0x28
inline data of ‘weight’ 0x18 50 00 00 00 80
offset of ‘name’ start from here 0x1c 04 00 00 00 0x1c + 0x04 = 0x20
‘name’ string      
length 0x20 03 00 00 00  
content 0x24 77 7a 79 00 “wzy\0”
‘goods’ vector      
length 0x28 02 00 00 00  
enum value Clothes 0x2b 00  
enum value Foods 0x2e 02  
padding 0x2e 00 00