# influxdb-pb-data-protocol

**Repository Path**: mirrors_influxdata/influxdb-pb-data-protocol

## Basic Information

- **Project Name**: influxdb-pb-data-protocol
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-07-09
- **Last Updated**: 2025-08-30

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# InfluxDB Protobuf Data Protocol

This protocol defines formatting and rules to write data to [InfluxDB/IOx](https://github.com/influxdata/influxdb_iox).
The protocol is optimized for efficient ingestion into IOx, and is also a good choice for transporting timeseries data generally.
It is not intended to replace [line protocol](https://docs.influxdata.com/influxdb/v2.0/reference/syntax/line-protocol/), but should be considered a good alternative to line protocol.

Features include:
- Column-oriented
- Canonical implementation of rules in Golang
- All line protocol features
  - string tags, typed fields
- All IOx types and features
  - explicit null values
  - value type time
  - IEEE-754 non-finite values
- Defined as protocol buffer and gRPC

The protocol is based on the [Influxdb/IOx internal write flatbuffer format](https://github.com/influxdata/influxdb_iox/blob/10f89a3e8dbca6bb6a71865a4480e622ba5ac8b0/entry/src/entry.fbs#L58-L123).

## Definition

The protocol is defined in [influxdb-pb-data-protocol.proto](influxdb-pb-data-protocol.proto).

### `DatabaseBatch`

`DatabaseBatch` associates multiple `TableBatch`s with one database.

The table names referenced by `TableBatch`s contained within a single `DatabaseBatch` are not required to be unique,
but the schemata of multiple `TableBatch`s referencing one table within a `DatabaseBatch` must be compatible with each other.
For example, if a `TableBatch` referencing table `mytable` references column `mycol` with type `IOx/I32`,
then any other `TableBatch` referencing column `mocol` in table `mytable` must also treat it as `IOx/I32`.

Most requests operate on a single database.
For example, `WriteRequest` is a wrapper for `DatabaseBatch`.

### `TableBatch`

`TableBatch` associates a table name with data stored in `Column`s.
`TableBatch` includes a `row_count` field, which must be set to the length of all `Column`s within the `TableBatch`.
The length of a `Column` is the count of all null and non-null values.

The `Columns` referenced in a single `TableBatch` must (1) have unique names and (2) be of equal length.

Receiving agents should reject `TableBatch`s where
the value of `row_count` is less than the length of any column within the `TableBatch`,
or where the value of `row_count` is greater than zero, and the length of any column is zero.

`row_count` may be greater than the length of any column within the `TableBatch`.
For details, see [Optimizations](#Optimizations).

### `Column`

`Column` associates a column name with data.
`Column` values may be null.
`Column` data has two types: value type and semantic type.

#### Value Type

Value type describes how the data is stored.

- I64
  - 64-bit integer
  - Range: `[-9223372036854775808, 9223372036854775807]`
- F64
  - Double-precision floating point (IEEE-754)
  - Allowed: +/-0, +/-∞, NaN (including payload bits)
- U64
  - 64-bit unsigned integer
  - Range `[0, 18446744073709551615]`
- String
  - UTF-8 encoded strings
  - Empty string allowed
  - Receiving agents should reject invalid UTF-8
- Bool
  - Allowed: `true` and `false`
- Bytes
  - Arbitrary sequence of 8-bit bytes
  - Allowed: NULL bytes, empty sequences

#### Semantic Type

Semantic type indicates how the data is treated at query time.
A specific set of (semantic type, value type) pairs are allowed.

- IOx
  - General purpose types
  - Value types: all
- Time
  - Point in time as nanoseconds since the Unix epoch
  - Value type: I64
- Field
  - Line protocol data columns
  - Value types: I64, F64, U64, String, Bool
- Tag
  - Line protocol primary key columns
  - Value type: String

#### Null Values

Column data values may be null.

Non-null values are represented, in order, in their corresponding value-typed field in message `Column.values`.
The positions of these non-null values are offset by the "on bits" in field `Column.null_mask`.
For example, a column containing I64 values `(10,11,12,13,14,null,16,17,null,99,100)` is represented as:
```text
Column:
  values:
    i64_values: 10,11,12,13,14,16,17,99,100

             7      0  15     8
  null_mask: 00100000  00000001
```

## Schema Constraints

Stateful receiving agents should reject requests that do not agree with existing schemata.

## Column Type Constraints

Receiving agents should reject data that does not conform to these column type constraints.

`TableBatch`s containing at least one column with
(semantic type `IOx`) OR (semantic type `Time` AND any name other than `time`):
- May contain zero to many columns with semantic type `Time`
- Must not contain any column with semantic type `Field` or `Tag`

Semantic types `Tag` and `Field` exist for compatibility with line protocol.
(If your use case doesn't require line protocol compatibility, then don't use `Tag` or `Field` semantic type.)
`TableBatch`s containing at least one column with semantic type `Field`:
- Must contain exactly one column named `time` with semantic type `Time`
- Must not contain any column with semantic type `IOx`

## Optimizations

The following optimizations can reduce the size of the serialized protocol.
They work within the limits of the protocol, without adding undue complexity.

Sending agents may implement these optimizations.

Receiving agents should implement these optimizations.

### Optimization #1: Trim null masks

Trailing zeros in a null mask can be omitted.
For example, this instance of `Column` contains 27 values,
and the first three column values are null,
and the remaining values are not null:
```text
TableBatch:
  row_count: 27

  columns:
    Column:
      values:
        i64_values: 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,22,23,24,25,26

                 7      0  15     8  23    16  31    24
      null_mask: 00000111  00000000  00000000  00000000
```

Trim the last three bytes to reduce the size of the bitmask:
```text
TableBatch:
  row_count: 27

  columns:
    Column:
      values:
        i64_values: 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,22,23,24,25,26

                 7      0
      null_mask: 00000111
```

### Optimization #1b: Omit empty null masks

If a column contains zero null values, then `Column.null_mask` contains all zeros.
Leave the field `Column.null_mask` unset (whatever that means in your language).

### Optimization #2: Trim repeated tail values

Trailing duplicate non-null values can be omitted.
For example, the non-null values in this column end with 4 equal values:
```text
TableBatch:
  row_count: 8

  columns:
    Column:
      values:
        i64_values: 14,14,14,14,99,99,99,99
```

Trim the trailing three values to reduce the length of field `i64_values`:
```text
TableBatch:
  row_count: 8

  columns:
    Column:
      values:
        i64_values: 14,14,14,14,99
```

As demonstrated, the space saved by this optimization is not necessarily significant.
Some use cases benefit more than others.
In this example, the column contains only one value, repeated one thousand times:
```text
TableBatch:
  row_count: 1000

  columns:
    Column:
      values:
        i64_values: 14,14,14,14,14,14,14,14,14,14,14,14,...
```

Again, trim the trailing duplicate values in field `i64_values`:
```text
TableBatch:
  row_count: 1000

  columns:
    Column:
      values:
        i64_values: 14
```

This example demonstrates significant space reduction.

## Reference Implementations

Consistency between implementations is guided by (1) this document and (2) reference implementations.
Corrections to the protocol specification should be implemented in both.

### Sending Agent

The reference sending agent exists in this repository at [go-influxdb-pb-data-protocol](go-influxdb-pb-data-protocol/README.md).
It is written in Golang.
The implementation includes generated protobuf and gRPC code, and an easy-to-use SDK.
The optimizations mentioned above are included in this SDK.

### Receiving Agent

A reference receiving agent exists in [InfluxDB/IOx](https://github.com/influxdata/influxdb_iox/blob/4e5d5c8c4c64fe55e304de45e125bc50b444a92e/entry/src/entry.rs#L303).
It is written in Rust.
The implementation is closely tied to InfluxDB/IOx, so is not generally useful by other applications.
The optimizations and constraints mentioned above are implemented.