Skip to content

Commit

Permalink
refactor: v2 (#150)
Browse files Browse the repository at this point in the history
  • Loading branch information
lspgn committed Aug 10, 2023
1 parent 1298b94 commit ae56e41
Show file tree
Hide file tree
Showing 75 changed files with 4,876 additions and 4,799 deletions.
91 changes: 46 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,17 +30,20 @@ Minimal changes in the decoding libraries.

## Modularity

In order to enable load-balancing and optimizations, the GoFlow library has a `decoder` which converts
the payload of a flow packet into a Go structure.
In order to enable load-balancing and optimizations, the GoFlow2 library has a `decoder` which converts
the payload of a flow packet into a structure.

The `producer` functions (one per protocol) then converts those structures into a protobuf (`pb/flow.pb`)
which contains the fields a network engineer is interested in.
The flow packets usually contains multiples samples
This acts as an abstraction of a sample.
The `producer` converts the samples into another format.
Out of the box, this repository provides a protobuf producer (`pb/flow.pb`)
and a raw producer.
In the case of the protobuf producer, the records in a single flow packet
are extracted and made in their own protobuf. Custom mapping allows
to add new fields without rebuilding the proto.

The `format` directory offers various utilities to process the protobuf. It can convert
The `format` directory offers various utilities to format a message. It calls specific
functions to marshal as JSON or text for instance.

The `transport` provides different way of processing the protobuf. Either sending it via Kafka or
The `transport` provides different way of processing the message. Either sending it via Kafka or
send it to a file (or stdout).

GoFlow2 is a wrapper of all the functions and chains thems.
Expand Down Expand Up @@ -103,55 +106,47 @@ By default, the samples received will be printed in JSON format on the stdout.

```json
{
"Type": "SFLOW_5",
"TimeFlowEnd": 1621820000,
"TimeFlowStart": 1621820000,
"TimeReceived": 1621820000,
"Bytes": 70,
"Packets": 1,
"SamplingRate": 100,
"SamplerAddress": "192.168.1.254",
"DstAddr": "10.0.0.1",
"DstMac": "ff:ff:ff:ff:ff:ff",
"SrcAddr": "192.168.1.1",
"SrcMac": "ff:ff:ff:ff:ff:ff",
"InIf": 1,
"OutIf": 2,
"Etype": 2048,
"EtypeName": "IPv4",
"Proto": 6,
"ProtoName": "TCP",
"SrcPort": 443,
"DstPort": 46344,
"FragmentId": 54044,
"FragmentOffset": 16384,
...
"IPTTL": 64,
"IPTos": 0,
"TCPFlags": 16,
"type": "SFLOW_5",
"time_received_ns": 1681583295157626000,
"sequence_num": 2999,
"sampling_rate": 100,
"sampler_address": "192.168.0.1",
"time_flow_start_ns": 1681583295157626000,
"time_flow_end_ns": 1681583295157626000,
"bytes": 1500,
"packets": 1,
"src_addr": "fd01::1",
"dst_addr": "fd01::2",
"etype": "IPv6",
"proto": "TCP",
"src_port": 443,
"dst_port": 50001
}
```

If you are using a log integration (e.g: Loki with Promtail, Splunk, Fluentd, Google Cloud Logs, etc.),
just send the output into a file.

```bash
$ ./goflow2 -transport.file /var/logs/goflow2.log
```

To enable Kafka and send protobuf, use the following arguments:

```bash
$ ./goflow2 -transport=kafka -transport.kafka.brokers=localhost:9092 -transport.kafka.topic=flows -format=pb
$ ./goflow2 -transport=kafka \
-transport.kafka.brokers=localhost:9092 \
-transport.kafka.topic=flows \
-format=bin
```

By default, the distribution will be randomized.
To partition the feed (any field of the protobuf is available), the following options can be used:
```
-transport.kafka.hashing=true \
-format.hash=SamplerAddress,DstAS
```
In order to partition the field, you need to configure the `key`
in the formatter.

By default, compression is disabled when sending data to Kafka.
To change the kafka compression type of the producer side configure the following option:

```
-transport.kafka.compression.type=gzip
```
Expand Down Expand Up @@ -189,9 +184,9 @@ in the InIf protobuf field without changing the code.
ipfix:
mapping:
- field: 252
destination: InIf
destination: in_if
- field: 253
destination: OutIf
destination: out_if
```
### Output format considerations
Expand All @@ -218,22 +213,28 @@ with a database for Autonomous System Number and Country.
Similar output options as GoFlow are provided.

```bash
$ ./goflow2 -transport.file.sep= -format=pb -format.protobuf.fixedlen=true | ./enricher -db.asn path-to/GeoLite2-ASN.mmdb -db.country path-to/GeoLite2-Country.mmdb
$ ./goflow2 -transport.file.sep= -format=bin | \
./enricher -db.asn path-to/GeoLite2-ASN.mmdb -db.country path-to/GeoLite2-Country.mmdb
```

For a more scalable production setting, Kafka and protobuf are recommended.
Stream operations (aggregation and filtering) can be done with stream-processor tools.
For instance Flink, or the more recent Kafka Streams and kSQLdb.
Direct storage can be done with data-warehouses like Clickhouse.

In some cases, the consumer will require protobuf messages to be prefixed by
length. To do this, use the flag `-format.protobuf.fixedlen=true`.
Each protobuf message is prefixed by its varint length.

This repository contains [examples of pipelines](./compose) with docker-compose.
The available pipelines are:
* [Kafka+Clickhouse+Grafana](./compose/kcg)
* [Logstash+Elastic+Kibana](./compose/elk)

## Security notes and assumptions

By default, the buffer for UDP is 9000 bytes.
Protections were added to avoid DOS on sFlow since the various length fields are 32 bits.
There are assumptions on how many records and list items a sample can have (eg: AS-Path).

## User stories

Are you using GoFlow2 in production at scale? Add yourself here!
Expand Down
81 changes: 18 additions & 63 deletions cmd/enricher/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,35 +2,30 @@ package main

import (
"bufio"
"bytes"
"context"
"encoding/binary"
"errors"
"flag"
"fmt"
"io"
"net"
"net/http"
"os"
"strings"

"github.com/oschwald/geoip2-golang"

"github.com/golang/protobuf/proto"
flowmessage "github.com/netsampler/goflow2/cmd/enricher/pb"
flowmessage "github.com/netsampler/goflow2/v2/cmd/enricher/pb"

// import various formatters
"github.com/netsampler/goflow2/format"
_ "github.com/netsampler/goflow2/format/json"
_ "github.com/netsampler/goflow2/format/protobuf"
_ "github.com/netsampler/goflow2/format/text"
"github.com/netsampler/goflow2/v2/format"
_ "github.com/netsampler/goflow2/v2/format/binary"
_ "github.com/netsampler/goflow2/v2/format/json"
_ "github.com/netsampler/goflow2/v2/format/text"

// import various transports
"github.com/netsampler/goflow2/transport"
_ "github.com/netsampler/goflow2/transport/file"
_ "github.com/netsampler/goflow2/transport/kafka"
"github.com/netsampler/goflow2/v2/transport"
_ "github.com/netsampler/goflow2/v2/transport/file"
_ "github.com/netsampler/goflow2/v2/transport/kafka"

"github.com/prometheus/client_golang/prometheus/promhttp"
"github.com/oschwald/geoip2-golang"
log "github.com/sirupsen/logrus"
"google.golang.org/protobuf/encoding/protodelim"
)

var (
Expand All @@ -49,19 +44,9 @@ var (
Format = flag.String("format", "json", fmt.Sprintf("Choose the format (available: %s)", strings.Join(format.GetFormats(), ", ")))
Transport = flag.String("transport", "file", fmt.Sprintf("Choose the transport (available: %s)", strings.Join(transport.GetTransports(), ", ")))

MetricsAddr = flag.String("metrics.addr", ":8081", "Metrics address")
MetricsPath = flag.String("metrics.path", "/metrics", "Metrics path")

TemplatePath = flag.String("templates.path", "/templates", "NetFlow/IPFIX templates list")

Version = flag.Bool("v", false, "Print version")
)

func httpServer() {
http.Handle(*MetricsPath, promhttp.Handler())
log.Fatal(http.ListenAndServe(*MetricsAddr, nil))
}

func MapAsn(db *geoip2.Reader, addr []byte, dest *uint32) {
entry, err := db.ASN(net.IP(addr))
if err != nil {
Expand Down Expand Up @@ -117,61 +102,31 @@ func main() {
defer dbCountry.Close()
}

ctx := context.Background()

formatter, err := format.FindFormat(ctx, *Format)
formatter, err := format.FindFormat(*Format)
if err != nil {
log.Fatal(err)
}

transporter, err := transport.FindTransport(ctx, *Transport)
transporter, err := transport.FindTransport(*Transport)
if err != nil {
log.Fatal(err)
}
defer transporter.Close(ctx)
defer transporter.Close()

switch *LogFmt {
case "json":
log.SetFormatter(&log.JSONFormatter{})
}

log.Info("Starting enricher")

go httpServer()
log.Info("starting enricher")

rdr := bufio.NewReader(os.Stdin)

msg := &flowmessage.FlowMessageExt{}
lenBufSize := binary.MaxVarintLen64
for {
msgLen, err := rdr.Peek(lenBufSize)
if err != nil && err != io.EOF {
log.Error(err)
continue
}

l, vn := proto.DecodeVarint(msgLen)
if l == 0 {
continue
}

_, err = rdr.Discard(vn)
if err != nil {
log.Error(err)
continue
}

line := make([]byte, l)

_, err = io.ReadFull(rdr, line)
if err != nil && err != io.EOF {
log.Error(err)
continue
}
line = bytes.TrimSuffix(line, []byte("\n"))

err = proto.Unmarshal(line, msg)
if err != nil {
if err := protodelim.UnmarshalFrom(rdr, msg); err != nil && errors.Is(err, io.EOF) {
return
} else if err != nil {
log.Error(err)
continue
}
Expand Down
2 changes: 1 addition & 1 deletion cmd/enricher/pb/flowext.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit ae56e41

Please sign in to comment.