Observability
ntop Webinar 2022 ntopconf 2023
Please respect & protect the privacy of others.
The purpose of this software is not to spy on others, but to detect network anomalies and malicious traffic.
nDPId is a set of daemons and tools to capture, process and classify network traffic. Its minimal dependencies (besides a half-way modern C library and POSIX threads) are libnDPI (>=4.9.0 or current github dev branch) and libpcap.
The daemon nDPId
is capable of multithreading for packet processing, but w/o mutexes for performance reasons.
Instead, synchronization is achieved by a packet distribution mechanism.
To balance the workload to all threads (more or less) equally, a unique identifier represented as hash value is calculated using a 3-tuple consisting of: IPv4/IPv6 src/dst address; IP header value of the layer4 protocol; and (for TCP/UDP) src/dst port. Other protocols e.g. ICMP/ICMPv6 lack relevance for DPI, thus nDPId does not distinguish between different ICMP/ICMPv6 flows coming from the same host. This saves memory and performance, but might change in the future.
nDPId
uses libnDPI's JSON serialization interface to generate a JSON messages for each event it receives from the library and which it then sends out to a UNIX-socket (default: /tmp/ndpid-collector.sock
). From such a socket, nDPIsrvd
(or other custom applications) can retrieve incoming JSON-messages and further proceed working/distributing messages to higher-level applications.
Unfortunately, nDPIsrvd
does not yet support any encryption/authentication for TCP connections (TODO!).
This project uses a kind of microservice architecture.
connect to UNIX socket [1] connect to UNIX/TCP socket [2]
_______________________ | | __________________________
| "producer" |___| |___| "consumer" |
|---------------------| _____________________________ |------------------------|
| | | nDPIsrvd | | |
| nDPId --- Thread 1 >| ---> |> | <| ---> |< example/c-json-stdout |
| (eth0) `- Thread 2 >| ---> |> collector | distributor <| ---> |________________________|
| `- Thread N >| ---> |> >>> forward >>> <| ---> | |
|_____________________| ^ |____________|______________| ^ |< example/py-flow-info |
| | | | |________________________|
| nDPId --- Thread 1 >| `- send serialized data [1] | | |
| (eth1) `- Thread 2 >| | |< example/... |
| `- Thread N >| receive serialized data [2] -' |________________________|
|_____________________|
where:
nDPId
capture traffic, extract traffic data (with libnDPI) and send a JSON-serialized output stream to an already existing UNIX-socket;
nDPIsrvd
:
nDPId
;consumers
are common/custom applications being able to receive selected flows/events, via both UNIX-socket or TCP-socket.
JSON messages streamed by both nDPId
and nDPIsrvd
are presented with:
\n
at the end;[5-digit-number][JSON message]
as with the following example:
01223{"flow_event_id":7,"flow_event_name":"detection-update","thread_id":12,"packet_id":307,"source":"wlan0", ...snip...}
00458{"packet_event_id":2,"packet_event_name":"packet-flow","thread_id":11,"packet_id":324,"source":"wlan0", ...snip...}
00572{"flow_event_id":1,"flow_event_name":"new","thread_id":11,"packet_id":324,"source":"wlan0", ...snip...}
The full stream of nDPId
generated JSON-events can be retrieved directly from nDPId
, without relying on nDPIsrvd
, by providing a properly managed UNIX-socket.
Technical details about the JSON-message format can be obtained from the related .schema
file included in the schema
directory
nDPId
generates JSON messages whereby each string is assigned to a certain event.
Those events specify the contents (key-value-pairs) of the JSON message.
They are divided into four categories, each with a number of subevents.
They are 17 distinct events, indicating that layer2 or layer3 packet processing failed or not enough flow memory available:
Detailed JSON-schema is available here
There are 4 distinct events indicating startup/shutdown or status events as well as a reconnect event if there was a previous connection failure (collector):
nDPId
startupnDPId
terminates gracefullyDetailed JSON-schema is available here
There are 2 events containing base64 encoded packet payloads either belonging to a flow or not:
Detailed JSON-schema is available here
There are 9 distinct events related to a flow:
-A
)libnDPI
was not able to reliably detect a layer7 protocol and falls back to IP/Port based detectionlibnDPI
sucessfully detected a layer7 protocollibnDPI
dissected more layer7 protocol data (after detection already done)Detailed JSON-schema is available here. Also, a graphical representation of Flow Events timeline is available here.
A flow can have three different states while it is been tracked by nDPId
.
-I
and -E
libnDPI
is allocated (this state consumes most memory)nDPId
build system is based on CMake
git clone https://github.com/utoni/nDPId.git
[...]
cd ndpid
mkdir build
cd build
cmake ..
[...]
make
see below for a full/test live-session
Based on your build environment and/or desiderata, you could need:
mkdir build
cd build
ccmake ..
or to build with a staticially linked libnDPI:
cmake -S . -B ./build \
-DSTATIC_LIBNDPI_INSTALLDIR=[path/to/your/libnDPI/installdir] \
-DNDPI_NO_PKGCONFIG=ON
cmake --build ./build
If you use the latter, make sure that you've configured libnDPI with ./configure --prefix=[path/to/your/libnDPI/installdir]
and remember to set the all-necessary CMake variables to link against shared libraries used by your nDPI build.
You'll also need to use -DNDPI_NO_PKGCONFIG=ON
if STATIC_LIBNDPI_INSTALLDIR
does not contain a pkg-config file.
e.g.:
cmake -S . -B ./build \
-DSTATIC_LIBNDPI_INSTALLDIR=[path/to/your/libnDPI/installdir] \
-DNDPI_NO_PKGCONFIG=ON \
-DNDPI_WITH_GCRYPT=ON -DNDPI_WITH_PCRE=OFF -DNDPI_WITH_MAXMINDDB=OFF
cmake --build ./build
Or let a shell script do the work for you:
cmake -S . -B ./build \
-DBUILD_NDPI=ON
cmake --build ./build
The CMake cache variable -DBUILD_NDPI=ON
builds a version of libnDPI
residing as a git submodule in this repository.
As mentioned above, in order to run nDPId
, a UNIX-socket needs to be provided in order to stream our related JSON-data.
Such a UNIX-socket can be provided by both the included nDPIsrvd
daemon, or, if you simply need a quick check, with the ncat utility, with a simple ncat -U /tmp/listen.sock -l -k
. Remember that OpenBSD netcat
is not able to handle multiple connections reliably.
Once the socket is ready, you can run nDPId
capturing and analyzing your own traffic, with something similar to: sudo nDPId -c /tmp/listen.sock
If you're using OpenBSD netcat
, you need to run: sudo nDPId -c /tmp/listen.sock -o max-reader-threads=1
Make sure that the UNIX socket is accessible by the user (see -u) to whom nDPId changes to, default: nobody.
Of course, both ncat
and nDPId
need to point to the same UNIX-socket (nDPId
provides the -c
option, exactly for this. By default, nDPId
refers to /tmp/ndpid-collector.sock
, and the same default-path is also used by nDPIsrvd
for the incoming socket).
Give nDPId
some real-traffic. You can capture your own traffic, with something similar to:
socat -u UNIX-Listen:/tmp/listen.sock,fork - # does the same as `ncat`
sudo chown nobody:nobody /tmp/listen.sock # default `nDPId` user/group, see `-u` and `-g`
sudo ./nDPId -c /tmp/listen.sock -l
nDPId
supports also UDP collector endpoints:
nc -d -u 127.0.0.1 7000 -l -k
sudo ./nDPId -c 127.0.0.1:7000 -l
or you can generate a nDPId-compatible JSON dump with:
./nDPId-test [path-to-a-PCAP-file]
You can also automatically fire both nDPId
and nDPIsrvd
automatically, with:
Daemons:
make -C [path-to-a-build-dir] daemon
Or a manual approach with:
./nDPIsrvd -d
sudo ./nDPId -d
or for a usage printout:
./nDPIsrvd -h
./nDPId -h
And why not a flow-info example?
./examples/py-flow-info/flow-info.py
or anything below ./examples
.
It is possible to change nDPId
internals w/o recompiling by using -o subopt=value
.
But be careful: changing the default values may render nDPId
useless and is not well tested.
Suboptions for -o
:
Format: subopt
(unit, comment): description
max-flows-per-thread
(N, caution advised): affects max. memory usagemax-idle-flows-per-thread
(N, safe): max. allowed idle flows whose memory gets freed after flow-scan-interval
max-reader-threads
(N, safe): amount of packet processing threads, every thread can have a max. of max-flows-per-thread
flowsdaemon-status-interval
(ms, safe): specifies how often daemon event status
is generatedcompression-scan-interval
(ms, untested): specifies how often nDPId
scans for inactive flows ready for compressioncompression-flow-inactivity
(ms, untested): the shortest period of time elapsed before nDPId
considers compressing a flow (e.g. nDPI flow struct) that neither sent nor received any dataflow-scan-interval
(ms, safe): min. amount of time after which nDPId
scans for idle or long-lasting flowsgeneric-max-idle-time
(ms, untested): time after which a non TCP/UDP/ICMP flow times outicmp-max-idle-time
(ms, untested): time after which an ICMP flow times outudp-max-idle-time
(ms, caution advised): time after which an UDP flow times outtcp-max-idle-time
(ms, caution advised): time after which a TCP flow times outtcp-max-post-end-flow-time
(ms, caution advised): a TCP flow that received a FIN or RST waits this amount of time before flow tracking stops and the flow memory is freedmax-packets-per-flow-to-send
(N, safe): max. packet-flow
events generated for the first N packets of each flowmax-packets-per-flow-to-process
(N, caution advised): max. amount of packets processed by libnDPI
max-packets-per-flow-to-analyze
(N, safe): max. packets to analyze before sending an analyse
event, requires -A
error-event-threshold-n
(N, safe): max. error events to send until threshold time has passederror-event-threshold-time
(N, safe): time after which the error event threshold resetsThe recommended way to run regression / diff tests:
cmake -S . -B ./build-like-ci \
-DBUILD_NDPI=ON -DENABLE_ZLIB=ON -DBUILD_EXAMPLES=ON
# optional: -DENABLE_CURL=ON -DENABLE_SANITIZER=ON
./test/run_tests.sh ./libnDPI ./build-like-ci/nDPId-test
# or: make -C ./build-like-ci test
Run ./test/run_tests.sh
to see some usage information.
Remember that all test results are tied to a specific libnDPI commit hash
as part of the git submodule
. Using test/run_tests.sh
for other commit hashes
will most likely result in PCAP diffs.
You may generate code coverage by using:
cmake -S . -B ./build-coverage \
-DENABLE_COVERAGE=ON -DENABLE_ZLIB=ON
# optional: -DBUILD_NDPI=ON
make -C ./build-coverage coverage-clean
make -C ./build-coverage clean
make -C ./build-coverage all
./test/run_tests.sh ./libnDPI ./build-coverage/nDPId-test
make -C ./build-coverage coverage
make -C ./build-coverage coverage-view
Special thanks to Damiano Verzulli (@verzulli) from GARRLab for providing server and test infrastructure.