-
Notifications
You must be signed in to change notification settings - Fork 2
IPFIXcol and distributed environment
For distributed data processing let's specify two types of server nodes, proxy and subcollector. Both of these nodes use IPFIXcol, but with different configurations.
- Proxy is an entry point for flows from your network probes (exporters). Main purpose of this node is to distribute flow records to specified subcollectors. Because this node can be a bottleneck, a configuration of IPFIXcol should contain only plugins for flow receiving and forwarding.
- Subcollector is a node for processing and storing flows.
An example configuration file for a UDP proxy node shows the configuration of the input plugin for receiving UDP pakets ("udp-cpgCollector") from IPFIX or NetFlow probes and the configuration of the output plugin ("forwarding").
As you probably noticed, the name of the input plugin is slightly different.
In connection with Quick Start Guide, we previously mentioned
the problem
with a distribution of IPFIX templates over UDP protocol.
Because we want to create a distributed
high availability collector, we have to be able to replace an active proxy
node with any backup proxy, for example, when the node with the active proxy
fails. This is not possible with standard UDP input plugin ("udpCollector"),
as the new proxy would have to wait for the templates and in the meantime
many IPFIX records would have been lost. The modified UDP plugin shares
the templates continuously among all (backup) proxies using Corosync as
communication layer and thus solves the issue with the templates.
The modified plugin, compared to the original plugin, adds the new option
<CPGName>ipfixcol</CPGName> with a name of a synchronization group.
The forwarding plugin distributes IPFIX packets over the network to one or more subcollectors using TCP protocol and non-blocking sockets. In this configuration every packet is forwarded to one of configured destinations using Round Robin distribution model. If one or more destination subcollectors are disconnected, the plugin will periodically try to reconnect them. When a packet cannot be delivered to a destination, the packet will be send to next destination in order to prevent packet lost.
In this case, the most important part of the configuration is the list of the
destination IP addresses (<destination>) of subcollector nodes that will
store flows. All incoming flows to the proxy will be distributed
to these machines. To avoid sending flows to the backup proxies,
that can run on the same node as any subcollector, flows should be
forwarded to different port (for example: 4741). An alternative version of
the startup configuration for receiving TCP packet ("tcpCollector") is
available here.
Feel free to copy above configurations and edit (at least) the list of
the destination IP adresses.
The configuration of a proxy node is common for all tasks below, so following subsections show only how to configure subcollector nodes for particular task type.
We strongly recommend to use time synchronization among subcollector nodes using NTP service, because files with flows are created based on the system time of the machine. Different system time may cause that results of future fdistdump queries will be misleading!
An essential task of subcollector nodes is storing incoming flows to files of 5 minutes timeslots readable by fdistdump (or nfdump). For this purpose, you can use the configuration that uses input plugin (tcpCollector) for receiving forwarded packets with flows from a proxy node. Flow storage is realized by the output plugin ("lnfstore") that converts and store IPFIX data into nfdump files (only IPFIX fields compatible with nfdump are stored).
In the configuration you should at least specify the storage
path <storagePath>. All files will be stored into the directories based on
the template <storagePath>/YYYY/MM/DD/ where "/YYYY/MM/DD" means
year/month/day and is replaced by a system time at the time of file creation.
A profiling is a specific type of a view on flow data that allows you to filter and store selected flows. A profile is defined by its name, type, one or more channels with flow filters, and directory for data storage. At least the profile 'live' is always available and is used to store all incoming flow data. 'live' profile includes one or more subprofiles that are used for a selection of flows.
An example configuration for IPFIXcol basically extends previous configuration. It adds a new intermediate plugin (profiler) and changes parameters of the storage plugin (lnfstore) to use metadata (directories, etc.) from the profiler for storing flows.
Profiles must be specified in XML file
(example here) and the path to
this file must be specified in the <profiles> parameter of the
<collectingProcess>.
For more info about profiles and the structure of the file, see the Wiki page or manual page of the intermediate profiler plugin (ipfixcol-profiler-inter).
Note: The specification of the <storagePath> in the startup
configuration is missing, because directories are specified in the
configuration of the profiles.
You can also stream flow data in readable JSON format for further processing. To do this use the configuration with a JSON output plugin. This plugin can stream JSON data directly to fixed IP adresses or it can also act as a TCP server. For JSON format configuration and more information see manual page of the JSON plugin (ipfixcol-json-output).
Note: All configurations of subcollector nodes above also can be used in non-distributed environment where flow data directly comes from network probes (useful for testing). Just keep in mind that all examples for subcollector nodes above use a non-standard port (to avoid collision with proxy nodes). Given configurations also can be combined, so one collector can store and stream flow simultaneously.
The SecurityCloud project is supported by the Technology Agency of the Czech Republic under No. TA04010062 Technology for processing and analysis of network data in big data concept.