When trying to figure out why the internet is slow, it can be hard to learn exactly which device on the network is eating up all the bandwidth. Many solutions to this problem require software to be installed on every device to be monitored. Instead, I tried to build a custom Raspberry Pi network monitor.
This post will show you how to monitor all internet traffic for every device on your network, without buying any specialty hardware.
This open-source solution has been used by readers of this site for monitoring family internet usage, LAN parties, and more.
This project came about when retrofitting our cabin in the woods to a smart-home. There internet providers out here advertise about 30 Mbps (down) and 2 Mbps (up). This isn’t much to work with. Plus, we have many home-made IOT devices scattered around the house. So… who knows where the traffic is going?
My goal was to not require any special software, yet monitor the internet traffic for every network device.
Ideally, I also wanted to create a beautiful Grafana dashboard. This would let me see what sites the devices were contacting with the Raspberry Pi network monitor.
It seemed like a coupleothers on Reddit were also interested in a solution to this problem, so I decided to give it a try. The result was an open-source Python script / Docker container, meant to be run on a Raspberry Pi, that exports data to Prometheus. While I set up a Raspberry Pi, the code should run on any Linux distribution. There are Docker images for both arm and amd.
Skip to the end of this post for source code & installation.
But first, let’s consider the different ways to monitor traffic on a home network…
Internet Traffic Monitor: Approaches
Based upon experience and some research, these are the possibilities I came up with:
Pi as a router The obvious way to monitor network traffic. The Raspberry Pi sits between the devices to be tracked and the internet (e.g., acting as a router or access point). Unfortunately, this can slow down the network, which causes many to avoid the approach (see the next section).
Router reporting Some modern routers provide features along these lines. But generally custom firmware is required.
Device reporting The standard protocol for this is SNMP, which will rely upon device side installations to self-report. It integrates well with Prometheus/Grafana though.
Packet sniffing You could theoretically monitor the wireless traffic (if all you care about is WiFi). This is the same concept that allows attackers to sniff traffic on a WiFi network.
Each of these has its drawbacks. I did not want to buy a new router, so router reporting was not an option. I could not install the necessary software on all the IOT devices, which prevents device reporting. And packet sniffing is an interesting idea, but I wanted to be able to handle wired as well as wireless traffic.
This left only one approach: set up a Raspberry Pi network monitor.
Raspberry Pi Home Network Monitor
If all internet traffic is going to pass through a device, it is good to use caution.
The first concern is that of security. I won’t say too much about that here, except to mention that a firewall of some kind is a good idea. I went with Uncomplicated Fire Wall (ufw) because it is, well, uncomplicated.
A less obvious concern is that of speed. When traffic passes through a router/switch, the primary bottleneck is the ethernet hardware. In other words, the CPU and RAM are not as important as in other cases. This was something of a problem with the Raspberry Pi 3B (and lower). However, the Raspberry Pi model 4 has an upgraded on-board 1000 Mbps eth0 port.
Make sure that the ethernet hardware meets the needs.
Failure to do so could slow down the entire network!
With that in mind, here is the exact list of parts I used.
The WiFi router is in Bridge mode. This means that eth0 must act as the DHCP server (assign IP addresses to the network).
Traffic between devices on the network will not flow through the Raspberry Pi. See the Performance Tests, below.
This means that the Pi is only a bottleneck for internet traffic. With 1000 Mbps hardware and an ISP that only provides 30 Mbps, we won’t be hitting this limit any time soon.
There are many ways to set up the eth1 <> eth0 connection. You could configure this using internet bonding software. This would let you add another internet connection (eth2) to make the internet connection even faster. For a more complete DIY Raspbery Pi router solution:
Running a custom router gives unprecedented insight into everything happening in a network. Building your own router with a a Raspberry Pi may be a little daunting, but it's surprisingly easy and rewarding to do... and the benefits are tremendous.
If you’re not using the Raspberry Pi as a router, this section is for you.
The traffic must flow through the device.
You must have two network interfaces over which the traffic you wish to capture passes.
You could flip the WiFi router and Rasbperry Pi network monitor from the above diagram. This approach can work better if you prefer to not use the Raspberry Pi as a router:
The WiFi router connects to the modem/internet (not in bridge mode).
The Raspberry Pi connects to the internet through the WiFi router.
The Raspberry Pi should have a static IP assigned by your WiFi router (see its documentation).
However, it does have one major disadvantage: the WiFi traffic (going to the router) will not be monitored. But the major advantage is: if you ever want to remove the Raspberry Pi network monitor, just plug the WiFi router directly in to the switch.
You could also run a separate DHCP server on the WAN side of the Raspberry Pi. In this case, again, the Pi is not the router. However, if the two network interfaces are bridges, then the traffic is flowing through the Pi.
No matter the design, the device acting as the router connects to the internet, and the device connected to the switch is in bridge mode. In other words, you must manually bridge the two interfaces on the Raspberry Pi. Therefore, the Pi’s eth0 is able to see traffic passing in and out of the LAN.
Performance Tests
After implementing the Pi as a router, I saw no decrease in speed for intra-network traffic. This was tested with iperf3. It showed:
~910 Mbits/sec for two computers connected via a physical switch.
~180 Mbits/sec when separated by a long WiFi hop.
Shockingly, I saw improved external (internet) speeds with the Raspberry Pi network monitor. I already had Node RED running a speedtest every 5 minutes and recording the data to Home Assistant + Prometheus. When using the CenturyLink provided DSL router, I rarely saw speeds above 25 Mbps (down). Now we consistently seeing speeds in the ~33 Mbps range. This is likely because the Raspberry Pi is using pppoeconf to establish the DSL connection directly, and it does a better job managing this connection than the modem provided by the ISP.
Accuracy Tests
Now, to test the accuracy of the Raspberry Pi network monitor.
Using Prometheus for throughput/bandwidth will not be perfectly accurate on a short time scale. This is due to the way a rate is averaged over an interval. However, by downloading a large file, I was able to compare the reported download speed from Chrome with that of the traffic graph:
In addition, the total download size matched that reported by Chrome:
Screenshots, Installation, & Source Code
This project is open-source. It is available as a Python script or Docker image.
The most important part of the configuration is setting up the tcpdump filters. For example, the following will restrict the captured traffic to that which flows in or out of the 192.168.0.0/24 subnet:
(src net 192.168.0.0/24 and not dst net 192.168.0.0/24) or (dst net 192.168.0.0/24 and not src net 192.168.0.0/24)
I’ll try to give copy and paste instructions below. However, as I mentioned in the “Using the Pi as a Router” section, some of these steps may be highly individual. The bridge steps, in particular, can depend on the exact Linux version you have installed. I’ll assume you…
Have eth0 (LAN) and eth1 (WAN) on your device.
Pi will reside at 192.168.0.1, either as the router or as a pass-through (alternate design).
IP addresses will be handed out on the 192.168.0.0/24 subnet.
You want to use Google’s DNS servers (8.8.8.8 and 8.8.4.4)
Running Raspbian Buster.
1.Build or Configure the Router
If you wist to use the Raspberry Pi as the router (first option), please see this article. If you wish for the WiFi router to connect to the internet (alternate design), follows its instruction manual to assign the Pi a static IP address (192.168.0.1 in this example).
2. Run the Raspberry Pi network monitor script
If you’re comfortable with it, a Docker/Kubernetes install may be easier than these manual steps. Otherwise…
sudo apt-get install git python3-pip tcpdump
sudo pip3 install argparse prometheus_client
git clone https://github.com/zaneclaes/network-traffic-metrics.git
cd ./network-traffic-metrics
sudo python3 ./network-traffic-metrics.py "(src net 192.168.0.0/24 and not dst net 192.168.0.0/24) or (dst net 192.168.0.0/24 and not src net 192.168.0.0/24)"
Open your web browser to http://192.168.0.1:8000/metrics to see the counters being exported for Prometheus. Verify that you’re seeing data that seems to match the traffic on your network. It may be hard to read, but upon refreshing the page you should see more lines added as people connect to different sites.
You can change several options by passing certain command-line flags to the script. For example, the script assumes that you want to listen to the eth0 interface. If you aliased this to the name lan, per the Raspberry Pi router guide, you could add the --interface lan flag. Or, the --port 80 flag would change from listening on port 8000 to port 80. Note that all the configuration variables may also be set via environment variables, like NTM_INTERFACE.
To make the script start on reboot, type sudo crontab -e and add:
@reboot python3 /home/pi/network-traffic-metrics/network-traffic-metrics.py "(src net 192.168.0.0/24 and not dst net 192.168.0.0/24) or (dst net 192.168.0.0/24 and not src net 192.168.0.0/24)" &
Alternatively, you could create a systemd service (generally preferred).
3. Install Prometheus
Prometheus and Grafana can be run anywhere on the same network. I’d recommend not running them on the same device doing the metric exporting. This prevents slowing down the machine. The following instructions are copied more or less exactly from the official Prometheus docs:
Edit prometheus.yml (which should have been included in the download). See the two comments in this sample file. In particular, make sure that the last line matches the IP/port of the device running the metrics script from the last step.
global:
scrape_interval: 15s # How frequently to report
external_labels:
monitor: 'network-traffic-metrics'
scrape_configs:
- job_name: 'network-traffic-metrics'
static_configs:
- targets: ['192.168.0.1:8000'] # The Network Traffic Metrics IP/port
Run Prometheus: ./prometheus --config.file=prometheus.yml
Check that you can access Prometheus: localhost:9090/metrics (or wherever it is located).
4. Install Grafana
Again, the official docs are a good place to start (these are copied fairly directly):
Open the side menu by clicking the Grafana icon in the top header.
In the side menu under the Dashboards link you should find a link named Data Sources.
Click the + Add data source button in the top header.
Select Prometheus from the Type dropdown.
The default options should match your installation from above. If you’ve used containers or otherwise installed Prometheus differently, you will need to use the appropriate URL for the Prometheus server. For example, I used the URL http://prometheus-server with a Kubernetes helm deployment. If your data source is configured correctly, you should now be able to use the Explore section to see the data in Prometheus.
You should now have a working Raspberry Pi network monitor that can be accessed from Grafana. The only thing left to do is make sure you have the filters set correctly. When you open the dashboard, at the top of the screen, are the settings which configure what data are shown:
LocalIPs: the IP addresses on your LAN to show.
Services: e.g., http, https
Protos: e.g., tcp, udp
ExcludedServers: regex for servers (outside your network) to hide.
You should verify that your local IPs show up in the dropdown, and that you have not excluded any data you might want to be visualizing. For example, I intentionally filter speedtests and similar such traffic, as they add noise to the data:
If your IP addresses are not showing up there, it is likely the case that the IP addresses on your local network do not conform to local subnet ranges. To fix this, open up the dashboard settings and look at the LocalIP variable’s regex. This regex filters all possible values, showing you the values which meet the regex at the bottom of the screen. You’ll need to modify the regex so that the IP addresses on your network show up in the “Preview of Values:”
This site began as a place to document DIY projects. It's grown into a collection of IOT projects, technical tutorials, and how-to guides. Read more about this site...
Can you please let me know what is wrong? However, I encounter following errors while running this step
2.Run the Raspberry Pi network monitor script. pi@raspberrypi:~/Downloads/network-traffic-metrics $ sudo python ./network-traffic-metrics.py (src net 192.168.0.0/24 and not dst net 192.168.0.0/24) bash: syntax error near unexpected token `(‘
My python is version 3.7 pi@raspberrypi:~/Downloads/network-traffic-metrics $ python –version Python 3.7.0
I have installed both argparse and prometheus_client
pi@raspberrypi:~/Downloads/network-traffic-metrics $ pip3 install prometheus_client Looking in indexes: [link to pypi.org], [link to www.piwheels.org] Requirement already satisfied: prometheus_client in /home/pi/.local/lib/python3.7/site-packages (0.8.0)
pi@raspberrypi:~/Downloads/network-traffic-metrics $ pip3 install argparse Looking in indexes: [link to pypi.org], [link to www.piwheels.org] Requirement already satisfied: argparse in /home/pi/.local/lib/python3.7/site-packages (1.4.0)
Hey, glad you enjoyed it! And sorry, this was my mistake — looks like the example got garbled when I formatted the code. You need quotation marks around the filters argument, so that the command is: sudo python ./network-traffic-metrics.py "(src net 192.168.0.0/24 and not dst net 192.168.0.0/24)". The post should be updated with the fix now, as well.
In case you’re curious, it’s because the entire filter clause is a single argument into the python script. Without the quotes, bash is trying to parse the arguments itself before passing them to the script, which it does not know how to do.
Thanks for the quick respnse! I understand what could have been the issue. However, I am hitting another error. Sorry. I’m really bad at regex, so no idea what’s wrong here.
pi@raspberrypi:~/Downloads/network-traffic-metrics $ sudo python ./network-traffic-metrics.py “(src net 192.168.0.0/24 and not dst net 192.168.0.0/24)” File “./network-traffic-metrics.py”, line 29 return f'(?P{pattern})’ ^ SyntaxError: invalid syntax
Ah. I suspect that you have both python2 and python3 installed. To check, try python --version. Whatever it says is the default Python version on your machine.
If it is <=3, as I suspect, first make such that which python3 works. You should discover the location to the actual python3 executable. If that exists, the easiest “fix” is to replace sudo python ... with sudo python3 .... Another approach would be to make an alias from /usr/bin/python to python3 so that v3 becomes your default python environment. Or you could use a python version management tool. Which approach is right for you depends on your circumstance, but the first is probably the least hassle.
Thanks! Yes! Indeed. My python installations seems to be a mess. I will fix that first. Thanks again.
Thanks for the quick reply. However, I am getting another error now. Sorry.
pi@raspberrypi:~/Downloads/network-traffic-metrics $ sudo python ./network-traffic-metrics.py “(src net 192.168.0.0/24 and not dst net 192.168.0.0/24)” File “./network-traffic-metrics.py”, line 29 return f'(?P{pattern})’ ^ SyntaxError: invalid syntax
Nice guide. I’m trying a slightly different approach – using a docker for graphana and another for Prometheus. With that in mind, when creating the dashboard in the last step, the JSON file has the “DS_PROMETHEUS” that wont work. Is there any way to make this work with this docker setup? What do I have to change?
Thanks for saying so! FWIW, I do have one container for Grafana and one for Prometheus, just like you. The difference may be that I use Kubernetes to deploy the containers, not the Docker agent. The problem you’re experiencing suggests that you do not have Prometheus configured as a “Data Source” in Grafana, or that somehow the name of that datasource does not match the convention (DS_PROMETHEUS).
Hi, this is probably perfect for what I’m looking for. The instructions seem good too. One question, I would probably do this for the non-wifi devices so keep my router as the internet connection. That has a 192.168.1.1 ip address and i’d like to keep the other devices on the same subnet. Does the pi HAVE to be 192.168.0.1 or could it be 192.168.1.2 meaning all other devices which have static IP addresses would not have to change?
Glad to hear it 🙂 In that case, what you describe is actually preferable. The subnet I used was for example purpose. If your Pi is behind the router, it’s easiest to keep it on the same subnet. To do so, you just want to bridge the traffic between the two interfaces without adding a DHCP server or anything of that nature. This links might help: Bridging eth0 and eth1. If that doesn’t work or you need more help lmk.
I’m keeping the router as the DHCP server and will connect the pi to that and have the router set the static address.
Do I then just follow the alternate settings or do I still need to do the bridging eth0 and eth1 step in your original reply. and then install what after that?
Hello and thank you so much for the guide! I’m running into a problem running the monitor script, and I’d appreciate your help.
I’ve confirmed that pip3 installed prometheus_client, but when I run the script straight out of the box, I get [code]ModuleNotFoundError: No module named ‘prometheus_client'[/code] I tried to fix this by adding sys.path.append() on line 2 to the path pip3 gave for prometheus_client. That cleared up that error, but gave me this one:
File “/usr/lib/python3.7/subprocess.py”, line 1522, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: ‘tcpdump’: ‘tcpdump’
It sounds like you’re having Python environment problems. You may have multiple versions of Python3 installed and/or some strange symlinks, given that python3 couldn’t find the package installed by pip3. Specifically, it sounds like Python3 is not referencing the same import paths as pip3 is installing to. If forcing an absolute path worked on the import, great, though I do tend to worry this may be causing other problems down the line.
The second problem suggests that you don’t have tcpdump installed on the machine, or that Python3 cannot find it. The latter would match with your prior problem. Namely, that the shell environment from which Python3 is being run is not resolving your user paths. For example, on my RPi, which tcpdump gives /usr/sbin/tcpdump. If that command also works for you, it suggests that you’re invoking Python3 in such a way that this requirement is not resolved in the same way it is from your shell. You could just edit line 71 of the script to invoke the absolute path to tcpdump, just like you did with the prior problem, but again I worry that your shell environment may continue to cause problems.
For the Python issue, I tried defining PYTHONPATH in my .bashrc, but that didn’t help. So, for now I stuck with the edit on your script.
The tcpdump issue was simply that it wasn’t installed. Easy fix.
After I got those resolved, I started the script and received an error that eth0 couldn’t be found. Following the router guide, I had given eth0 the alias ‘lan’ and eth1 ‘wan’. So I updated the one instance of eth0 and now it seems to be running happily.
“(src net 192.168.1.0/24 and not dst net 192.168.1.0/24) or (dst net 192.168.1.0/24 and not src net 192.168.1.0/24)” tcpdump: listening on lan, link-type EN10MB (Ethernet), capture size 262144 bytes [SKIP] 06:33:43.603149 IP
Ah! Thanks for pointing out the lan/wan alias thing. I had added that “convenience” to the guide after the initial development of the tool. Glad you got it working!
By the way — rather than editing the script, you can just add “–interface lan” to the arguments in the script. If you run the script with just the --help flag, you can see that it accepts several command-line flags to change its default settings (here’s the bit of code which parses those arguments).
I love this post – this is EXACTLY what I have been looking for, for like six months. I’ve even converted one of my old machines to pfsense just to do this, but had to scrap it because it was a big waste, even as VM.
Question: Would this conflict with PiHole? It is my DHCP provider at the moment.
Also a problem, I am running this: sudo python3 ./network-traffic-metrics.py “(src net 192.168.1.0/24 and not dst net 192.168.1.0/24) or (dst net 192.168.1.0/24 and not src net 192.168.1.0/24)” and receiving this error “ModuleNotFoundError: No module named ‘prometheus_client'” I’ve tried installing pyEnv to fix this but no dice.
Do you happen to have a docker container or yml with everything running? That would be much easier for people.
I don’t see any reason this would conflict with PiHole. I run AdGuard Home myself these days, but I used to run PiHole. You should be able to substitute the DHCP server for PiHole’s.
Your problem with prometheus_client being missing is likely because you need to use sudo pip3 install. If you install without the sudo, but then try to run sudo python3, you’ll be using a different environment.
This is working now – The graphs are empty though in graphana. The question I have is why is it not doing a passthrough from eth0 (WAN) to eth1 (LAN)? I don’t think this was part of this guide. Do I need to manually bridge the interfaces?
You have already mentioned that you are not using the “Pi as a Router” guide (no DHCP server). Therefore, technically you fall under the “Alternate Design” header in the tutorial (which indicates you must bridge the two interfaces). This would also explain the lack of data in Grafana. FWIW, I’d recommend taking the time to stop at the Prometheus step and perform the recommended validations, and then go “Explore” the data in Grafana, so you understand what is actually going on.
Note that in the Alt design section I mainly referred to a Wifi router. You need to reason about your network and ensure that the traffic you wish to capture is flowing through your Pi, one way or another. That’s the key. As long as that is true, then the script will capture the data across those two network interfaces. There are lots of ways to do this, and without analyzing a network diagram of your setup it is hard to be perfectly accurate in my descriptions =/
No worries you put me on the right track after all – I am still new to Prometheus and grafana so I’ll play around with them. Yes – regarding the network setup I’ve already done that and everything should be flowing from the PI once I am done with the bridging. VDSL Router (No Wifi) => PI => Switch => Rest of the network. So nothing will connect to the internet without hitting the PI first.
Was unable to reply to your last comment for some reason so I put this here.
Many thanks for your support so far.
HI,
Could you please expand on how to set up the alternate design? It seems like wifi traffic cannot be captured in this design.
In the “Pi as a Router” guide which this page links to, there is section dedicated to firewalls. tl;dr: I use firewalld with zone-based routing. Start with a “secure by default” mentality and only open the ports you need and you’ll be fine.
If you really don’t want to go that route, there is one other option, though it requires more hardware. Use another physical router upstream of the Pi. But keep your WiFi router in bridge mode, downstream of the Pi.
Hi! This may sound very very amateur, but I am extremely new to coding and program. I see a lot of helpful information online and I see the examples in the gray boxes. My question is where do you put those codes? What program are you using and how do you make those codes work on your computer? I get the process but I don’t get those at first initial step I’m very new. I use macOS and have a raspberry pi 4. I’m stuck at this very initial step.
Hello, this tutorial requires at least a beginner understanding of the “shell” (or “terminal” or “command prompt”). Every computer has one. I would recommend you follow some tutorials from the official Raspberry Pi website first and make sure you understand concepts like “SSH” and “sudo.” If you’re not at least familiar with these ideas, there is a good chance you will end up breaking something trying to follow upper-beginner tutorials like these.
Hello! I know about shell, Sudo, Linux, and I know perfecting SSH. My specialty is the command line. I just don’t know the first basics about coding. I’m taking a few online courses and they have been helping. I will be there soon. I have my raspberry pi-4 and speedify, I want to be able to monitor everything on my network- (like Lil Snitch without the GUI drivel). My question is what program or text editor are you plugging those commands in on. The ones in the boxes. Is it something like sublime or atom atom? Or is it something via the command line and terminal. That is my basic question and I appreciate the follow through because you are the only one I found online who still gives a damn!
everything I mean at work that’s why I chose is the tutorialis my thing command line is my thing I just don’t know what program those boxes are referring to is it a tech text editor like sublime or Adam’s at something else.
Hello? Thank you for the follow through I really appreciate it. I do know the command line, SSH, and also the command prompt. Coding is my next hurdle and I plan to conquer that by the end of summer. I’m already in rolled in online classes and they’re actually paying off. I’ve seen these boxes all over the Internet and they seem helpful. Yet, has a single one site has ever said where they paste those codes. Is it into a text editor like Atom or Sublime? Or directly into the command prompt? Please help me out I really appreciate it.
Hello? Thank you for the follow through I really appreciate it. I do know the command line, SSH, and also the command prompt. Coding is my next hurdle and I plan to conquer that by the end of summer. I’m already in rolled in online classes and they’re actually paying off. I’ve seen these boxes all over the Internet and they seem helpful. Yet, has a single one site has ever said where they paste those codes. Is it into a text editor like Atom or Sublime? Or directly into the command prompt? Please help me out I really appreciate it.
The gray boxes in the “Step-by-Step” section are shell commands. I’m assuming you can recognize common commands, like cd, tar, apt-get, etc. The last of these, for example, is how you install software on the Raspberry Pi. When it comes to editing files, it makes absolutely no difference which editor you use (which is why nobody talks about it). Thankfully, your shell will have a default editor built-in, like nano or vi, that lets you edit files from the command line (Google these commands for help).
If you haven’t made it far enough to understand these concepts, with all due respect, I highly suggest you start with a book or more basic resource than this post. I’m writing for a relatively technical audience, and need to skip over things I think they understand. If you don’t know how to edit a file from the command line, try googling that specific problem first. Then work your way up.
Of course I recognize those commands those are classic Linux commands. With all due respect, it’s answers like those that keep people silent and unknowing for so long. No I’m not too familiar with editing a file at the command line. Instead of telling me of what you think I don’t know you could at least provide links or books are helpful to beginners. I went from hopeful to now feeling shamed. “A true wizard teaches their apprentice without forethought …” *said like Tyrion Lannister*
This was not my intent at all. I am here responding to you because I wish to help. At this point, I’m honestly confused myself, though. It’s sometimes very hard to tell what someone’s skill level is. If I under-estimated yours, I apologize for sounding condescending by explaining “down” to you. You’re right to say that I could/should have provided you with direct links to resources to learn things. I actually Googled for some for you while I was writing my last comment, but I didn’t have the time to sort through them and try to figure out what would be most useful to you. My second paragraph was not meant to be snarky (though I can see how it would come across that way). Being able to Google your way out of something you don’t understand is, tbh, the most important skill I think exists wrt programming. I probably search Google over 100x a day for answers on computer questions I’m working on. I was hoping I could show you how to break down your question/problems into chunks and research them yourself.
Best, – Zane
Firstly, Zane, thank you very much for this write-up. I am very interested in getting this set up. Secondly, I am having a variety of issues starting with the correct setup of the RPi. I have tried to sign up for you mailing list multiple times and keep getting a system error from your end. I’m assuming this is what I need to get access to the build guide.
I am not using RPi hardware any longer. Rather, I am using Proxmox VM’s but setting them up as RPi’s. This has worked well. I have added a second USB Ethernet adapter and the basic Debian install is working. Python 3.7.3 is installed. I have another VM already running with Grafana, Influxdb and also successfully installed Prometheus.
I’m not sure where my problems are but believe once I can review the build instructions, I can troubleshoot it. Frankly, I am confused regarding the proper configuration of the ethernet. I am using a Cisco router with Ubiquiti network switches and AP. My home network is on 192.168.2.x.
Any assistance and guidance is appreciated. Thanks in advance.
Hi Mike, thanks. Not sure what you mean about the system error; the mailing list is hosted by Mailchimp. You can drop me a line on the contact page, if it helps.
I’ve never personally used Proxmox, though I use a LOT of Docker+Kubernetes. I can’t quite tell what your problem is based on the description, but I can say that VMs really mess with networking. In the case of Docker/K8s, everything happens on a virtual private bridge network… isolated from the main network. I’m sorry that I don’t know enough about Proxmox to comment if this is specifically related to your case or not. One way to test this would be to get into a command prompt inside the VM. If you can ping the Cisco router at 192.168.2.1 (or whatever), all is well.
Next, I’m not sure exactly what you’re trying to accomplish with the VM per se. The crux of this post is that the RPi can “spy” on the traffic flowing through it. It requires two physical ports, bridging the traffic from the LAN to the WAN. The Raspberry Pi must act as a pass-through for the traffic, which can be done via the two methods described in this post. Once that is the case, the github repo / script will work to collect stats on that traffic which is already being passed through the Pi.
Thanks, Zane. I tried again to sign up but was again rejected due to a “system problem”. If you are using MailChimp, it may have something to to with running PiHole on my network.
There is no magic to Proxmox. I have used Debian VM’s to replace all the utilities that I used to have running on RPi’s. I have no problem pinging my Cisco router. I have a second USB Ethernet adapter that is seen by the VM. I will try again. I also just learned that I can mirror a port on my Cisco router which may allow me to mirror the WAN port for monitoring. I’ll post an update in the coming days/week and let you know if I’ve gotten any further.
Got it. I guess I’m used to having to work around the K8s abstractions 😉 That’s interesting about port mirroring on the Cisco router. I had no idea such a feature existed (I’m just a hobbyist). But it seems like a great way to avoid the bandwidth bottlenecks you might otherwise impose with a Raspberry Pi pass-through (?).
After a few days of research and tweaks, I have this mostly working.
I did not use RPi’s since I did not want to purchase more RPi’s. I have several older RPi 3’s but did not want to invest in RPi 4. I moved to Proxmox VM on an Intel NUC some time ago for my utility devices.
I created a new VM using Debian 10 (Buster) and struggled a bit to add a second USB network adapter. But I finally succeeded in getting it working after tweaking /etc/network/interfaces. I now had two working NIC’s on the VM.
After researching promiscuous mode some more, I was able to set one adapter into promiscuous mode but was only able to read broadcast packets. And this made sense since I still was not reading ALL network traffic.
I have a Cisco SMB router and Ubiquiti switches and AP. I was able to mirror the outbound port of my Cisco SMB router (all network traffic) to another network port which I plugged into the second network adapter on the VM. After running a test with tcpdump, I was able to see all network traffic. I also tested it with the Python script and verify data was flowing to Prometheus. So far, so good.
Next I added the Grafana dashboard and this is where I am currently stuck. I validated that I can read the ntm data elements in both Prometheus console and Grafana. However, the Grafana dashboard is not working. I suspect it has something to do with the queries and the exclusions. Any suggestions on troubleshooting this are appreciated. Many thanks in advance.
Progress! The issue was that the regex used in the Grafana local server dropdown was filtering to 192.168.0.x subnet. My home network is on 192.168.2.x. Once I updated the regex, the Grafana dashboard worked!
I still need to do some performance analysis to see if this will work using a Proxmox VM long term. Thanks so much, Zane, for making this available. I learned a lot during the implementation on my home network.
Wow! So glad to hear it. I really want to try out this promiscuous mode approach. I’m going to see if I can even make a RPi flashable image available to folks 🙂
Hello again! I’ve successfully gotten my Pi running as both a router and network monitor. As I’m error checking it before implementing it, I have a question about name resolution for local hosts.
I put all of my local devices in /etc/hosts, and from looking at the NTM metrics, it appears they are resolving correctly:
However, in Grafana, all the instances of that host are showing up as the IP 192.168.0.10 in the By Host, Bytes Transferred, and individual detail dropbowns.
Since it looks like the data Prometheus is getting has the alias, how can I get it to display it in the graphs?
That’s very strange. Based upon the Prometheus line you pasted, it’s not even recording the local IP address. Are you sure you’re not accidentally viewing old data? You may want to use the data explorer directly in Grafana, not the pre-built dashboard. Try filtering for the device on both the sending and receiving side.
BLUF: If your router resolves local devices you need to edit your REGEX field for the local network variable.
Hey — excellent guide. Your ability to step through things is uncanny.
Wanted to highlight an issue I had. I run opnsense (fork of pfsense) and a netgear switch that can port mirror. I basically mirror everything coming into the opnsense router from my wifi network (this represents 90% of traffic, only my main PC and pihole are on the wire). The mirrored packets go to a raspi that runs the monitor script — all good I see everything.
Problem was the opnsense resolves all of my local devices. So the graphs showed nothing as variables that define the local network contains only the private address space. I replaced the regex with [A-Za-z0-9\.\-]{0,} and it starts showing stuff. I have a couple of things left to massage before its all right, but thought I’d put it out there that resolved devices on the local network will break the graphs.
Good point about the resolved devices. Since that didn’t seem possible with my setup, I indeed coded it to look for IP addresses instead. But I would certainly prefer to have cleartext names instead of obscure IP addresses, myself. Maybe eventually I’ll figure out how to do this without opnsense. It seems like since I’m running the DHCP server on the Pi anyways, it should be able to resolve these names…
My router (AVM FritzBox) is resolving the names automatically. Here’s the regex I’m using to get named devices included in the stats: /^(([A-Za-z0-9\-]+\.fritz)|(fritz.box)|((127\.\d+\.)|(10\.\d+\.)|(172\.1[6-9]\.)|(172\.2[0-9]\.)|(172\.3[0-1]\.)|(192\.168\.)\d+\.\d+))$/
I don’t understand REGEX could you give me some more details on how your REGEX with Fritz works. My router is the DHCP server (192.168.2.1) how would I make the REGEX work for that ? Thanks
I’m able to explore the data in Grafana. But when importing the dashboard, there is no data to show. I have checked and metrics are being produced just not in the dashboard.
I’m going to guess that your LocalIPs are not showing up in the dropdown at the top of the dashboard. I just updated the post with instructions how to fix this, in case you’re using a nonstandard subnet. Let me know how it goes!
I have been looking around at the many network monitoring tools and the overall data capture and presentation of yours cannot be topped.
I am also planning to install piHole on my system according to [link to www.smarthomebeginner.com]. I saw a comment earlier in this post but wanted to get a fresh one going. How would I add that server in your above diagram. I am also thinking that piHole should be on a separate RP unless you think differently. Any help with the settings would also be appreciated. I am a bit newer to the RPi and my Unix days are decades back.
TBH, I would highly recommend AdGuard Home over PiHole. The Home Assistant folks switched over the entire community many months ago, and I agree with that decision. It’s much easier to use. I run it on my router, actually. I just installed it via the official instructions on the router. No other steps required, IIRC. But you could equally well run it on a different Pi.
Great write-up. I was able to follow everything with no errors. The piece I’m struggling with is: “Open your web browser to [link to 192.168.0.1] to see the counters being exported for Prometheus.” I get a timeout from the raspi. I currently have (WAN)->Raspi_WAN->Raspi_LAN->WirelessRouter_Bridged->Ethernet_Client. If I force stop the script on the Raspi, I get something like this:
pi@raspberrypi:~/network-traffic-metrics $ sudo python3 ./network-traffic-metrics.py -i lan “src net 192.168.0.0/24 or dst net 192.168.0.0/24” tcpdump: listening on lan, link-type EN10MB (Ethernet), capture size 262144 bytes [SKIP] 04:09:52.472470 IP ^C914994 packets captured 915130 packets received by filter
Thanks! First, just to address the obvious — have you checked that you’re trying to connect to the correct IP address? If the IP of your Pi is not `192.168.0.1`, you will need to change it. If that is correct, try SSHing into the Pi and doing a `curl localhost:8000/metrics` to see if you can access the Prometheus endpoint from the pi itself. If that works, then something about your network topology (firewall?) is preventing the other machine from accessing the Pi. If it doesn’t work, then for some reason the script is not listening or creating the webserver. Check out `journalctl -xe` to look for errors reported by the script, perhaps. You could also try changing the port it listens on on, e.g., `–port 8001`.
Big thanks for this tutorial, it is detailed and easy to follow.
I followed all steps and it seems to work because I see some data at localhost/800/metrics
However, I am unable to bring the data flow in Grafana despite following all instructions … In addition I don’t get METRICS when I go to Explore as it is shown in the tuto.
Im using Raspberry PI 4 8Gb with Raspbian OS 32 Bit via Ethernet connexion to ASUS Routeur linked to a freebox routeur, maybe the double DHCP creates an issue with Grafana ? (seems strange because Prometheus is working well and Grafana should pickup the Prometheus data)
Hmm, it sounds like Prometheus is not configured correctly or otherwise unable to scrape the `metrics` endpoint. I’d triple-check how you’ve configured the service in Grafana, ensuring that the Prometheus connection can actually reach the target URL from whatever host machine is instantiating the request. You could also look at the Prometheus & Grafana logs to see if they complain about anything in particular.
Thanks for this write-up Zane! I am up & running, all on a RPi4B. Here are my “lessons learned.”
I think there’s a small typo in the Prometheus run command. As written, it uses the example/default config file (prometheus.yml). Readers here should either change that to “prometheus.yaml”, or save the config file they create, overwriting prometheus.yml. I installed a service for this to run at startup, similar to TCPdump (below).
I had some trouble installing Grafana on my Raspberry Pi 4B (Raspberry Pi OS 64 bit, Linux raspberrypi 5.4.51-v8+ #1333 SMP PREEMPT Mon Aug 10 16:58:35 BST 2020 aarch64 GNU/Linux), but the “sudo apt-get install grafana-enterprise” command returns “E: Unable to locate package grafana-enterprise”. I had to follow some slightly different steps on their website: [link to grafana.com]
As others mentioned, I needed to update the network-traffic-metrics.py script to look for ‘lan’ instead of ‘eth0’. I also used a systemd service to get tcpdump to run at startup: tcpdump.service saved in /etc/systemd/system/: [Unit] Description=TCPDump service for traffic monitoring After=network-online.target [Service] Type=idle ExecStart=python3 /home/pi/network-traffic-metrics/network-traffic-metrics.py “(src net 192.168.0.0/24 and not dst net 192.168.0.0/24) or (dst net 192.168.0.0/24 and not src net 192.168.0.0/24)” [Install] WantedBy=default.target
Ugh, the whole `yml` vs. `yaml` extension thing always gets me. Every tool seems to have some different level of compatibility 🙁
I’ll try to take a pass at incorporating this feedback soon. Very glad to have people posting such helpful comments, to make sure it works well for all. Cheers! – Z
Thanks for putting this article together. It was a lot of fun setting up and easy to follow. I did run into one snag that I thought I would share and give you the option to include in the article if deemed appropriate.
I run dynamic DNS on my home network and configured the rpi router with the proper search domain and DNS servers to make use of that. For the devices on my network that had registered their hostnames, data wasn’t populating through to the dashboard. Looking at the data exposed by the network-monitor python script, it had populated the data with the FQDNs instead of IPs which, awesome! The issue was that now those flows have to be identified by name and not by IP address. The dashboard, however, is only configured to look at IPs. I updated the regex for the LocalIPs variable of the dashboard to include “(.*\.myhomedomain\.local)” and then everything worked as expected. The document alludes to this but only if the IP subnet is non standard for local networks. Granted, it was that note that lead me to the proper place to make the update.
I don’t know if it’s possible for the user to supply this as a variable somehow as part of the build since it’s the dashboard that ultimately needs to be updated. Perhaps just a note in the same area mentioning what to do for DDNS setups.
In any event, this was a really fun project, thanks again for putting it together!
I am using a DNS also and I was trying to figure it out a way to display the names instead of the IPs (I did not even realise that the named resources were being left out :-)) I will be updating the regex to include my domain – thanks for that. But I have a question – Did you have to update the filter also?
(src net 192.168.0.0/24 and not dst net 192.168.0.0/24) or (dst net 192.168.0.0/24 and not src net 192.168.0.0/24)
Hi, this is great. loved the tutorial. i got the metrics to work on port 8000 without any issues running it as a service on the Raspi-4 (Prom + Grafana already running and having some dashboards)
when i update the prometheus.yml with the following parameters mentioned and try to restart the service it fails with the following
● prometheus.service – Prometheus Server Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Mon 2020-12-28 18:04:46 GMT; 3s ago Docs: [link to prometheus.io] Process: 1186 ExecStart=/home/pi/prometheus/prometheus –config.file=/home/pi/prometheus/prometheus.yml –storage.tsdb.path=/home/pi/prometheus/data (code=exited, status=2) Main PID: 1186 (code=exited, status=2)
Dec 28 18:04:46 raspberrypi systemd[1]: prometheus.service: Service RestartSec=100ms expired, scheduling restart. Dec 28 18:04:46 raspberrypi systemd[1]: prometheus.service: Scheduled restart job, restart counter is at 5. Dec 28 18:04:46 raspberrypi systemd[1]: Stopped Prometheus Server. Dec 28 18:04:46 raspberrypi systemd[1]: prometheus.service: Start request repeated too quickly. Dec 28 18:04:46 raspberrypi systemd[1]: prometheus.service: Failed with result ‘exit-code’. Dec 28 18:04:46 raspberrypi systemd[1]: Failed to start Prometheus Server.
i have given permissions to the pi user to the data folder (i can see data being written for the other dashboards there but for some reason this fails on the start itself
I can’t quite see enough information there to know what went wrong. The log only shows that the application crashed, but does not include the reason for the crash. You need to run `journalctl -xe` and scroll up to find the actual cause of the problem.
I use a raspberry pi with a single eth interface connected to my modem. The pi has a static address, serves as DHCP server and uses ip forwarding to redirect all traffic to the modem/router. I disabled ICMP redirects.
When I try to download large files, it seems that all the traffic goes through the pi and the traffic is rather in agreement with Chrome. However, the traffic shown in Grafana is significantly lower than what is shown by Chrome/iftop (In the order of a few Mb rather than 500Mb).
How exactly are you viewing the data in Grafana? If you’re using one of the SUM aggregations (e.g., total bytes), keep in mind that cardinality resets will create cliffs, so your recording needs to be continuous through the interval period (and Prometheus must have sufficient storage). If you’re looking at one of the line graphs, keep in mind that interval aggregation is prone to sampling errors depending on the size of your interval windows. For example, if you’re summing in 1min chunks, you’ll actually only look at the average over that one minute (which will be very low).
Thanks for this guide! I have a couple comments to it. 1. I haven’t found how to add 2 static IP’s to 10-network.rules file 2. Before I could start Prometheus, sudo chmod 777 -R on prometheus folder was required 3. Installation guide for Grafana is little bit changed from the moment when this article was posted, so, it’s better to follow the original guide on Grafana website
What is missing for me on Grafana dashboard is the labels for local IP’s. My Access point (Orbi) detects device name correctly. Now it is a question how to assign them (dynamically) to IPs on dashboard
1. It should just be the same syntax, repeated in two different blocks. What’s not working, there? 2. You just gave full R/W access to everybody, which is generally not a good idea. If the Prometheus folder was not owned by the user running Prometheus, that should be fixed in a more targeted and secure way (i.e., sudo chown -R 1000:1000 ./prometheus). 3. Agreed. Though, thanks for the heads up — I’ve updated the Grafana section.
Re: assigning names, one easy way to do this may be to add Grafana variables to the dashboard representing your various IPs. It’s much harder to get working implicitly, and would require setting up MDNS such that all the IPs were internally resolved to their canonical hostnames.
Had fun building this and getting to know some new parts of Debian / Linux I didn’t know. A few additions for beginners like me:
1. systemd is a pain to make work with Prometheus, and in the end I used and Environment variable: [Service] Environment=”Netmonitor=”\””(src net 192.168.2.0/24 and not dst net 192.168.2.0/24) or (dst net 192.168.2.0/24 and not src net 192.168.2.0/24)”\””” ExecStart=/usr/bin/python3 /home/pi/network-traffic-metrics/network-traffic-metrics.py $Netmonitor
2. For Prometheus I used the guide here [link to devconnected.com] to create the prometheus links in /usr/local/bin and then setup the service in systemd using ExecStart=/usr/local/bin/prometheus \ –config.file=/etc/prometheus/prometheus.yml \ –storage.tsdb.path=”/data/prometheus”
3. Grafana automatically starts up after install so no .service file needed
4. My pi bridge blocked all network requests (DHCP etc. etc.) to anything downstream when first setup. The reason was these variables were set to 1 when bridge was up (/proc/sys/net/bridge): net.bridge.bridge-nf-call-arptables net.bridge.bridge-nf-call-ip6tables net.bridge.bridge-nf-call-iptables Adding above 3 lines with =0 at the end to /etc/sysctl.conf fixed the issue for me and persisted on reboots. See more here: [link to wiki.libvirt.org]
Indeed, systemd can be challenging at first. Your first point talks about Prometheus, but the code is not for Prometheus (it’s for the script). I’m not clear what your environment variable is meant to solve, though. Why was this work-around necessary?
For (2), I would caution other readers about following this post too literally. It involves nginx and reverse proxies, which is something I very intentionally kept out of the discussion for both simplicity and security reasons. Glad that the systemd file worked for you though.
Hi Zane, thanks for the reply. Great solution this, although my kids don’t agree as I can now show them their computers are stopping the TV working 🙂 For the systemd yes my error it is for the python script not Prometheus. I had to use the environment variable because the arguments (src net…..) weren’t being parsed correctly. It wouldn’t take the whole argument as systemd removes the quotes. I tried lots of different escape characters but only worked when I used the variable. One other question, how easy would it be to get Grafana to show the Network name instead of IP where one exists ? Thanks again for the great solution.
After all, this is super amazing to play around with. Some more thoughts:
1) it looks like this is only working with IPv4. Any chance to get it capturing IPv6 as well?
2) any specific reason for manually installing Prometheus instead of using the version from the repository? Might save some headache…
3) I’d love to have a stat for the average bandwidth utilization (up/down) over a selected period. I clicked a bit around in the Grafana interface, but not sure I found the right parameters. If this could be added to the dashboard, would be amazing!
In general: I’m using Raspberry Pi as WiFi access point to measure utilization for certain devices, hence I’m not 100% sure if up/down bandwidth might get mixed up in that setup. But that’s a minor issue. Thanks a lot, this is exactly what I was looking for!
Correction: UDP traffic *is* being *captured* correctly, however it will not show up in the results as UDP does not include the “service” variable and NULL-values are currently not supported for variables (see [link to github.com]).
Manually adjusting the queries to not include the “services” filter does the trick.
Good catch! I decided to modify the python script instead to output ‘ntm-Unknown’ during the service lookup function to avoid modifying the dashboard. Actually, having a method to filter services can be pretty nice.
Thanks so much for this posting! It was pretty straight forward to setup with your walkthrough, but I’m a little stuck on the Grafana piece – it seems like if I turn off host resolution for tcpdump in the prometheus client script (‘-n’ : Don’t convert addresses to names), all the traffic will properly display on the dashboards, but if anything in the internal addresses is resolved using -f or default resolution, then most traffic is dropped on the dashboard. Can you help me walkthrough how the grafana dashboards are setup? Maybe I can figure out what’s happening and push a fix back to your repo? I have no idea how Grafana expressions work and it might be faster just to chat.
It sounds like you need to adjust the filters on the Grafana dashboard. It is unlikely that the traffic is being dropped, since there are no filters that happen prior to the dashboard itself, which is merely a visual filter (it does not restrict the data which is collected). You should start by using the “Explore” section of Grafana to see all the data in order to debug, and then adjust the dashboard filters appropriately.
This is a great project – I hacked this into a Rock64 board with DietPi, it’s been a great learning experience.
Everything works for me up to the monitor script. I run the traffic metrics script but don’t get the 192.168.0.1:8000/metrics page, it’s like the webserver isn’t starting. I see traffic in my terminal, so I know something is working. I’ve struggled with this particular piece all day without getting anywhere.
If anyone comes across this and has a recommendation, i’d appreciate it.
Can you curl the localhost endpoint from the same device? If so, then the IP address is wrong. If not, you’ll need to look into the logs from the terminal. Most likely the web server failed to start for some reason.
Hi Zane, Thanks a lot for the awesome work here. When I try to filter on just one host like below for example: “(src net 192.168.1.122/32) or (dst net 192.168.1.122/32)” I get very accurate results, but when I filter on a range like below, I get totally erroneous readings: “(src net 192.168.1.0/24) or (dst net 192.168.1.0/24)”
I have only one interface, connected behind a router. I looks like I only get upload traffic. How can I graph both? my filters are “src net 192.168.0.0/24 or dst net 192.168.0.0/24”. Can you help?
Hello! Thank you for your work. I have one question: is it possible to monitor on standalone server, which is not router? I already installed your monitor, but my ntm metrics are not shown in Prometheus. How can I repair this problem?
As one of the headlines in the post says: “the traffic must flow through the device.” Just below that, it explains the process for bridging the traffic through the device (which does not constitute a router). However, it still means that the device needs to act as the physical connection between the LAN and the WAN, even if not acting as a router itself it must be bridging the traffic. Note that some other commenters have pointed out that certain high-end routers can avoid this need, but they are relatively uncommon.
First I want to say thanks’ for this tutorial. Is works so far, but there are some open points or questions.
I have chosen the alternate setup: WWW FritzBox eth1 – Raspberry 4B – eth0 LAN
The FritzBox Wifi is almost ide (no clients connected) since I have another WLAN mesh, not the AVM mesh, so all the internet traffic goes through the raspberry.
I had some trouble setting up the bridge mode, finally I used the original tutorial from raspberry about wireless access-point-bridged replacing the wireless interface by eth1 and leaving out wireless setup. Because my FritzBox uses 192.168.178.0/24 network my network-traffic-metrics.py parameters are:
(src net 192.168.178.0/24 and not dst net 192.168.178.0/24) or (dst net 192.168.178.0/24 and not src net 192.168.178.0/24)
I added –fqdn as well, without it Grafana will not display much, or wired data with FritzBox. According to AFO’s post here I changed the LocalIps regex matching my LAN slightly to be:
So now I see a lot of data in Grafana, wow! I started to play around using speedtest-cli and crul some big files from the internet using another raspberry in my LAN. While the FritzBox is showing the traffic immediately nothing happens in Grafana! I tried to set the scrape_interval in prometheus to 1s, no effect. Grafana shows the client running the curl command, but only some B/s, not the peak expected (kB or mB) . And only up-stream data is shown, no down-stream. Anyone any suggestion to it?
I enjoy this project very, so after solving the above problem I will try to create a Grafana dashboard showing not only the through output of each client, which is still very helpful, but also the addresses it is talking to. I think that information is within the metrics.
Thanks in advance for feedback about the curl/speedtest-cli problem.
First, let’s look at the fact that the curl command is not showing the throughput you’d expect. It sounds like you may be misunderstanding what the “scrape interval” does. Each time prometheus scrapes, it’s simply counting upwards. If you are downloading 1 byte per second, over the course of 15 seconds, a 1 second scrape interval would generate 15 data points of 1 byte each. OTOH, a 15 second scrape interval would generate just 1 data point of 15 bytes. In other words, changing the interval will not impact the integrity of the data, only the fidelity.
What you’re looking for is probably the Grafana $__interval (see the chart configs). This tells Grafana the time window in which to aggregate the data. For example, even if you have 1 second SCRAPE intervals, if Grafana is only aggregating at 30 seconds then it will combine 30 data points into 30s of averaged data. If your download (or curl command) took, say 1 second… then you’d only see 1/30th the speed you expect, because the other 29 seconds were idle.
As for why some data would be missing, I can only really point you at tcpdump. The NTM script is EXTREMELY simple. All it does is capture that data and make it available to Prometheus. It’s a very low-level, ubiquitous program. So without digging into your network architecture, I can only speculate that some of the traffic is not flowing across the Pi, or the filters are excluding that traffic.
FWIW, the fact that you had to add `.fritz` and `fritz.box` to your local IP regex is quite suspicious to me. I’m not familiar with FritzBox, but it sounds like it’s set up your DNS resolution in a pretty nonstandard way. My recommendation would be to take a step back and look at the raw Promeutheus data by using the “Explore” section in Grafana. If the data is missing there, then your problem lies with network architecture or TCPdump filters. Otherwise, your problem is with Grafana queries.
Hi, thanks for your comments about the scrape interval. You are right of course, I did read some doc’s meanwhile about the scrape interval and had become the same understanding as you perfectly explained, thanks again for pointing out. So my scrape interval is back to 15 now.
One problem zooming into a smaller time range (eg. 5minutes) is that you get “No Data” in grafana sometimes. This is because one should always use al longer interval in grafana dashboard than the prometheus scrape interval is. The $__interval is exactly the scrape interval and is the theoretical smallest time you can use to get (meaningful?) data in grafana dashboard. For longer time ranges (15-60minutes) it works so far, I guess by chance. So according to my evaluation one should use $__interval+1, or may be even better $__rate_interval which is four times of the $__interval. There are several explanations on that in the internet. So one would not go into that “No data” problem in grafana while zooming in which is confusing in the beginning.
The “missing data” problem is ongoing. I am working on it. All the data definitely goes through the raspberry, there is no other way. So yes, it must have something to do with the tcpdump settings and/or how the FritzBox does the networking. May be special grafana dashboards are necessary too. I do it step by step now. We will see. Thanks for all so far.
Hi again, meanwhile I think something goes wrong, or not as expected, parsing the tcpdump within the python script. But I am not in python. Starting the metrics script without –fqdn gives output in the pibridge:8000/metrics looking like the following, where the piblue is the raspberry doing the speedtest-cli:
Hi again, meanwhile I solved the “problem”. In the extract_domain function the domain-names from tcpdump are somehow transformed if no –fqdn parameter was given. Somehow this does not match the traffic when using FritzBox. I do not exactly understand what is happening in the function because I am not a python programmer. So I just simplified (or disabled) that function by always “return string”. That’s it. I can use the python script using –fqdn parameter or not, both work now. Thanks for all loden
Interesting. The purpose of the “extract_domain” function is to effectively strip subdomains. It will convert “www.technicallywizardry.com” to “technicallywizardry.com”. I implemented it in rather simplistic way, though. It’s basically just keeping only the last two pieces of the URL (where each “piece” is separated by a dot). Fritzbox appears to append unique identifiers to the end of the TLD (“piblue.fritz.box.53528”). This is semantically abhorrent, as it adds unique identifiers to the END of the TLD instead of a subdomain.
In any case, I think a better solution for you would be to simply add a single line near the beginning of the “extract_domain” function: if string.startswith('piblue.fritz.box'): return 'fritz.box'. This will preserve all of the other functionality, but tag your fritzbox traffic as such. You could even change the return value to anything you wanted.
Hi again, thanks for your reply. I agree with you that “piblue.fritz.box.53528” is “semantically abhorrent” as you mentioned. Just to clarify, may be I was a little bit inaccurate: “piblue.fritz.box.53528” is the direct tcpdump output. I guess including the port. “53528” disappears in the …:8080/metrics output somehow. May be because of different tcpdump parameters I used in command line and the script uses. Anyway, it still works fine now since a week! loden
Hello Zane,
Nice tutorial you wrote above there, but i have several question about it, are Network Monitor work with Pi used as local DNS (Pi-hole)?
Little explanation about my architecture, all my clients connected into Router as DHCP and i set the router DNS address into static IP of my raspi which connected into that router also, and it work well i can grab any clients traffic on Pi-hole, then i tried following your tutorial and successful run on service x.x.x.x:8000/metrics and x.x.x.x:9090/metrics (it show some data). But when i try to show them on Grafana with imported your Grafana dashboard it didn’t show any data, and even the clients IP doesn’t show, any idea about it? Really appreciate it, thanks!
Yes, the NTM can be used with a self-hosted DNS server. It sounds like you need to adjust the filters at the top of the Grafana dashboard to match your network settings. Try using the “explore” tab in Grafana to play around with the data and get used to how it works.
First: This whole setup works perfectly with Raspberry Pi OS – Bullseye, prometheus 2.33 and Grafana 8.3.4.
Second: Like others have commented here, it would be nice to have host names in the graphs rather than IP addresses. I’ve got 30 IP devices in the house, it’s hard to keep track. Here’s how I do it: 1. Manually maintain a list of IP devices –> hostnames I’ve assigned in /etc/hosts: 192.168.0.2 -network_accesspoint_basement 192.168.0.102 -security_WyzeCam_back_yard 192.168.0.117 -entertainment_Sonos_kitchen etc. 2. Notice the dash in front of each device name. So in the Grafana dashboard I edited the Regex for LocalIPs to be “/^\-|^192.168.0./” 3. In the first several hours/days using this system, some forgotten devices pop up. So just edit /etc/hosts, save, then (since I’m using systemd for the tcpdump, etc.) just restart the service to get the new IP–>hostname mapping: sudo systemctl stop network-metrics sudo systemctl start network-metrics
Hi, I really appreciate this project and it can fit in perfectly with the master’s thesis I am doing. I have a small problem when representing the data in Grafana for each device. I am running an access point in bridge mode with the hostapd daemon on a raspberry pi. So when prometheus captures the information it only shows the devices connected by LAN but it doesn’t show Grafana any wireless devices. I would greatly appreciate some clarification on where I can fix this problem.
Thanks for the tutorial! I have an alternate design. I work from home, and my company provides a hardware firewall, that also acts as the router for my home network. Everything (work and non-work data) goes through my wireless router, acting as a access point (no DHCP). I’m trying to determine how much of my monthly data usage is due to my kids gaming, etc. So the pi is purely for monitoring throughput on the home side. The problem I’m having is in the Grafana setup, everything is in terms of Mbps in your dashboard. I’ve fiddled with adding panels, and changing the definition of the existing ones, but I’ve had no luck. I’m looking for the total data passed for each device. Any ideas on how I can modify/create a panel to summarize my total usage/time period based on IP address? Here’s an example of how my ISP shows my total usage: [link to imgur.com] Thanks!
Hi there, thanks for the tutorial. I am running into a problem that I cannot connect the prometheus instance to my existing Grafana server. I am running Grafana inside Home Assistant and this is the error I get.
Error reading Prometheus: Post “http://10.71.71.1:8000/metrics/api/v1/query”: dial tcp 10.71.71.1:8000: connect: no route to host. -10.71.71.0 is my subnet.
I have tried doing port scans on the router and found that both ports 8000 and 9090 aren’t open even though both programs – python and prometheus – are running without any errors. Thanks in advance.
Thanks, followed the above, using rpi 3B+ sat inside my network, with Grafana installed on HomeAssistant(rpi 4) Url in Grafana is set to the 3B+ port 8000 If I do Save and test on teh data source I get the following error Error reading Prometheus: bad_response: readObjectStart: expect { or n, but found #, error found in #1 byte of …|# HELP pyth|…, bigger context …|# HELP python_gc_objects_collected_total Objects co|… it looks like grafana is reading everything including the # HELP # TYPE etc. is there a way to get it to ignore these? so it only reads what it is expecting?? Thanks
Zane, How can I make this totally passive so I could install this at a customer site and collect usage data. I just want to be able to collect usage data for a week or two so I know what is actually being consumed average and peak. It would need to store the data on the RPI or have an email method to send me the collected resulting data. That way I don’t have to reconfigure anything at the customer site. Thx
Hello, Zane. Thank you for this tutorial. I don’t know if you are still checking out this page but I’ve got a question. I seem to have got the python code and dashboard set up and working, but I’ve got two problems, and I was hoping to get some help.
First, the downloaded bytes seem to be very small. Like I am only getting about tenth or lesser bytes recorded in the dashboard. My guess would be that when tcpdump outputs multiple lines of same contents very fast, the python code seems to only get the first or the last and discards all the duplicates, which is not ideal as the length (bytes) get decreased significantly.
Second, I seem to not get some of the records from tcpdump to transfer to the dashboard. eq. While I can find the below record in the raspberry pi (192.168.100.189:8000/metrics), “comcast.net” doesn’t show up in the “bytes transferred (by server)” part of the dashboard. “ntm_packets_total{dst=”comcast.net”,proto=”udp”,service=””,src=”192.168.100.191″} 7056.0″ This, I have no idea why it’s happening.
Any ideas or help would be greatly appreciated. Thank you in advance.
So I have a cable connection and therefore can’t put the Pi on the WAN side of the main router. I also don’t have a USB Ethernet port so looking to use only the on board Pi port (using a virtual interface/IP) ([link to forums.raspberrypi.com])
I was looking to set the Pi up on the LAN side of the ISP router. The 192.168.0.x/24 network would only have one ip (.1) on the LAN of the ISP router and one ip (.2) on the WAN side of the Pi. The Pi would then have a second (virtual) interface with 192.168.1.x/24 subnet running the DHCP server. All network devices would be on the 192.168.1.x/24 subnet.
Any reason why this wouldn’t work? I know it would double the bandwidth used on the Pi ethernet connection but it’s 1Gbps so it should have lots of headroom.
I briefly tried it but couldn’t get it to respond to DHCP requests.
Brilliant, many thanks for your efforts. Just what I was looking for to keep an eye on the use of the internet by my teenage grand-daughters. Took me a couple of hours to set up the Pi, install the software and get it all working, simply connected to the network behind my ISP router 🙂 Just one question, how easy would it be to add a list of named IP addresses and get the (known) names displayed in Grafana in place of the IP addresses ? I’m not a programmer but can usally cobble together enough Python to get a job done, some pointers as to how to achieve this would be extremely helpful.
Home Assistant support for MotionEye cameras limited. I created a custom component with auto-discovery and actions for my pan tilt zoom security camera. You can find the Home Assistant MotionEye component on Github. I've been working on automating...
Building a DIY Raspberry Pi security camera is much easier than it might sound thanks to open-source security camera software. We use several such cameras placed around the house, as part of our DIY CCTV security camera system. One such camera is...
No silly gimmicks. This collection of home automation ideas will actually make your home more enjoyable for you and your guests. I've personally implemented many of the ideas in this list. These all came from our DIY Home Automation project...
Dashcams (video cameras in cars) are a great security and safety feature. As with the rest of the vanlife IOT, I built my own DIY dashcam that has some unique features — like motion detection and automatic recording. On its surface, this is a post...
A DIY smart doorbell with a built-in camera, microphone, and speaker. This steampunk-themed design integrates with home assistant and our multi-room audio system to communicate with the rest of our DIY smart home. Rather than buying a Ring Doorbell...
Aside from playing music, a multi-room audio system is also capable of becoming a loudspeaker network. Using Home Assistant, it's easy to broadcast audio alerts to the entire household. Playing a Wav File in Home Assistant The simplest approach is...
Thank you for the tutorial!
Can you please let me know what is wrong?
However, I encounter following errors while running this step
2.Run the Raspberry Pi network monitor script.
pi@raspberrypi:~/Downloads/network-traffic-metrics $ sudo python ./network-traffic-metrics.py (src net 192.168.0.0/24 and not dst net 192.168.0.0/24)
bash: syntax error near unexpected token `(‘
My python is version 3.7
pi@raspberrypi:~/Downloads/network-traffic-metrics $ python –version
Python 3.7.0
I have installed both argparse and prometheus_client
pi@raspberrypi:~/Downloads/network-traffic-metrics $ pip3 install prometheus_client
Looking in indexes: [link to pypi.org], [link to www.piwheels.org]
Requirement already satisfied: prometheus_client in /home/pi/.local/lib/python3.7/site-packages (0.8.0)
pi@raspberrypi:~/Downloads/network-traffic-metrics $ pip3 install argparse
Looking in indexes: [link to pypi.org], [link to www.piwheels.org]
Requirement already satisfied: argparse in /home/pi/.local/lib/python3.7/site-packages (1.4.0)
Hey, glad you enjoyed it! And sorry, this was my mistake — looks like the example got garbled when I formatted the code. You need quotation marks around the filters argument, so that the command is: sudo python
./network-traffic-metrics.py "(src net 192.168.0.0/24 and not dst net 192.168.0.0/24)"
. The post should be updated with the fix now, as well.In case you’re curious, it’s because the entire filter clause is a single argument into the python script. Without the quotes, bash is trying to parse the arguments itself before passing them to the script, which it does not know how to do.
Hope that helps!
Thanks for the quick respnse! I understand what could have been the issue. However, I am hitting another error. Sorry. I’m really bad at regex, so no idea what’s wrong here.
pi@raspberrypi:~/Downloads/network-traffic-metrics $ sudo python ./network-traffic-metrics.py “(src net 192.168.0.0/24 and not dst net 192.168.0.0/24)”
File “./network-traffic-metrics.py”, line 29
return f'(?P{pattern})’
^
SyntaxError: invalid syntax
Ah. I suspect that you have both python2 and python3 installed. To check, try
python --version
. Whatever it says is the default Python version on your machine.If it is <=3, as I suspect, first make such that
which python3
works. You should discover the location to the actual python3 executable. If that exists, the easiest “fix” is to replacesudo python ...
withsudo python3 ...
. Another approach would be to make an alias from/usr/bin/python
to python3 so that v3 becomes your default python environment. Or you could use a python version management tool. Which approach is right for you depends on your circumstance, but the first is probably the least hassle.Thanks!
Yes! Indeed. My python installations seems to be a mess. I will fix that first. Thanks again.
Thanks for the quick reply. However, I am getting another error now. Sorry.
pi@raspberrypi:~/Downloads/network-traffic-metrics $ sudo python ./network-traffic-metrics.py “(src net 192.168.0.0/24 and not dst net 192.168.0.0/24)”
File “./network-traffic-metrics.py”, line 29
return f'(?P{pattern})’
^
SyntaxError: invalid syntax
Nice guide. I’m trying a slightly different approach – using a docker for graphana and another for Prometheus.
With that in mind, when creating the dashboard in the last step, the JSON file has the “DS_PROMETHEUS” that wont work. Is there any way to make this work with this docker setup? What do I have to change?
Thanks
Thanks for saying so! FWIW, I do have one container for Grafana and one for Prometheus, just like you. The difference may be that I use Kubernetes to deploy the containers, not the Docker agent. The problem you’re experiencing suggests that you do not have Prometheus configured as a “Data Source” in Grafana, or that somehow the name of that datasource does not match the convention (DS_PROMETHEUS).
Others have had small hiccups importing the dashboard as well. You prompted me to do a little research and add the dashboard to the GrafanaLabs shared dashboards website. You can follow the import instructions with the GUID 12619. Hopefully that works better! First time I’ve shared a Grafana dashboard publicly 🙂
Hi, this is probably perfect for what I’m looking for. The instructions seem good too. One question, I would probably do this for the non-wifi devices so keep my router as the internet connection. That has a 192.168.1.1 ip address and i’d like to keep the other devices on the same subnet. Does the pi HAVE to be 192.168.0.1 or could it be 192.168.1.2 meaning all other devices which have static IP addresses would not have to change?
Glad to hear it 🙂 In that case, what you describe is actually preferable. The subnet I used was for example purpose. If your Pi is behind the router, it’s easiest to keep it on the same subnet. To do so, you just want to bridge the traffic between the two interfaces without adding a DHCP server or anything of that nature. This links might help:
Bridging eth0 and eth1. If that doesn’t work or you need more help lmk.
Hello, I’m finally about to give this a go!
I’m keeping the router as the DHCP server and will connect the pi to that and have the router set the static address.
Do I then just follow the alternate settings or do I still need to do the bridging eth0 and eth1 step in your original reply. and then install what after that?
Thanks!
Hello and thank you so much for the guide! I’m running into a problem running the monitor script, and I’d appreciate your help.
I’ve confirmed that pip3 installed prometheus_client, but when I run the script straight out of the box, I get [code]ModuleNotFoundError: No module named ‘prometheus_client'[/code] I tried to fix this by adding sys.path.append() on line 2 to the path pip3 gave for prometheus_client. That cleared up that error, but gave me this one:
File “/usr/lib/python3.7/subprocess.py”, line 1522, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: ‘tcpdump’: ‘tcpdump’
Hey Mike,
It sounds like you’re having Python environment problems. You may have multiple versions of Python3 installed and/or some strange symlinks, given that python3 couldn’t find the package installed by pip3. Specifically, it sounds like Python3 is not referencing the same import paths as pip3 is installing to. If forcing an absolute path worked on the import, great, though I do tend to worry this may be causing other problems down the line.
The second problem suggests that you don’t have tcpdump installed on the machine, or that Python3 cannot find it. The latter would match with your prior problem. Namely, that the shell environment from which Python3 is being run is not resolving your user paths. For example, on my RPi,
which tcpdump
gives/usr/sbin/tcpdump
. If that command also works for you, it suggests that you’re invoking Python3 in such a way that this requirement is not resolved in the same way it is from your shell. You could just edit line 71 of the script to invoke the absolute path to tcpdump, just like you did with the prior problem, but again I worry that your shell environment may continue to cause problems.Cheers! Hope that helps a little…
– Z
Thanks for your help!
For the Python issue, I tried defining PYTHONPATH in my .bashrc, but that didn’t help. So, for now I stuck with the edit on your script.
The tcpdump issue was simply that it wasn’t installed. Easy fix.
After I got those resolved, I started the script and received an error that eth0 couldn’t be found. Following the router guide, I had given eth0 the alias ‘lan’ and eth1 ‘wan’. So I updated the one instance of eth0 and now it seems to be running happily.
“(src net 192.168.1.0/24 and not dst net 192.168.1.0/24) or (dst net 192.168.1.0/24 and not src net 192.168.1.0/24)”
tcpdump: listening on lan, link-type EN10MB (Ethernet), capture size 262144 bytes
[SKIP] 06:33:43.603149 IP
Ah! Thanks for pointing out the lan/wan alias thing. I had added that “convenience” to the guide after the initial development of the tool. Glad you got it working!
By the way — rather than editing the script, you can just add “–interface lan” to the arguments in the script. If you run the script with just the
--help
flag, you can see that it accepts several command-line flags to change its default settings (here’s the bit of code which parses those arguments).I love this post – this is EXACTLY what I have been looking for, for like six months.
I’ve even converted one of my old machines to pfsense just to do this, but had to scrap it because it was a big waste, even as VM.
Question: Would this conflict with PiHole? It is my DHCP provider at the moment.
Also a problem, I am running this:
sudo python3 ./network-traffic-metrics.py “(src net 192.168.1.0/24 and not dst net 192.168.1.0/24) or (dst net 192.168.1.0/24 and not src net 192.168.1.0/24)”
and receiving this error “ModuleNotFoundError: No module named ‘prometheus_client'”
I’ve tried installing pyEnv to fix this but no dice.
Do you happen to have a docker container or yml with everything running? That would be much easier for people.
Hey! Thanks for saying so.
I don’t see any reason this would conflict with PiHole. I run AdGuard Home myself these days, but I used to run PiHole. You should be able to substitute the DHCP server for PiHole’s.
Your problem with prometheus_client being missing is likely because you need to use
sudo pip3 install
. If you install without the sudo, but then try to runsudo python3
, you’ll be using a different environment.Please refer to the section of the README devoted to deploying via Docker. There’s also a section devoted to Kubernetes based deployment, which is what I use personally.
Many thanks – I will give it a go.
This is working now – The graphs are empty though in graphana.
The question I have is why is it not doing a passthrough from eth0 (WAN) to eth1 (LAN)?
I don’t think this was part of this guide. Do I need to manually bridge the interfaces?
You have already mentioned that you are not using the “Pi as a Router” guide (no DHCP server). Therefore, technically you fall under the “Alternate Design” header in the tutorial (which indicates you must bridge the two interfaces). This would also explain the lack of data in Grafana. FWIW, I’d recommend taking the time to stop at the Prometheus step and perform the recommended validations, and then go “Explore” the data in Grafana, so you understand what is actually going on.
Note that in the Alt design section I mainly referred to a Wifi router. You need to reason about your network and ensure that the traffic you wish to capture is flowing through your Pi, one way or another. That’s the key. As long as that is true, then the script will capture the data across those two network interfaces. There are lots of ways to do this, and without analyzing a network diagram of your setup it is hard to be perfectly accurate in my descriptions =/
No worries you put me on the right track after all – I am still new to Prometheus and grafana so I’ll play around with them.
Yes – regarding the network setup I’ve already done that and everything should be flowing from the PI once I am done with the bridging.
VDSL Router (No Wifi) => PI => Switch => Rest of the network.
So nothing will connect to the internet without hitting the PI first.
Was unable to reply to your last comment for some reason so I put this here.
Many thanks for your support so far.
HI,
Could you please expand on how to set up the alternate design? It seems like wifi traffic cannot be captured in this design.
Hi Nick,
The alternate design section says:
Was there some other clarification you were looking for?
What kind of firewalls rules did you put in place at the pi?
I’m concerned about the security using pi as the router.
In the “Pi as a Router” guide which this page links to, there is section dedicated to firewalls. tl;dr: I use firewalld with zone-based routing. Start with a “secure by default” mentality and only open the ports you need and you’ll be fine.
If you really don’t want to go that route, there is one other option, though it requires more hardware. Use another physical router upstream of the Pi. But keep your WiFi router in bridge mode, downstream of the Pi.
Hi! This may sound very very amateur, but I am extremely new to coding and program. I see a lot of helpful information online and I see the examples in the gray boxes. My question is where do you put those codes? What program are you using and how do you make those codes work on your computer? I get the process but I don’t get those at first initial step I’m very new. I use macOS and have a raspberry pi 4. I’m stuck at this very initial step.
Hello, this tutorial requires at least a beginner understanding of the “shell” (or “terminal” or “command prompt”). Every computer has one. I would recommend you follow some tutorials from the official Raspberry Pi website first and make sure you understand concepts like “SSH” and “sudo.” If you’re not at least familiar with these ideas, there is a good chance you will end up breaking something trying to follow upper-beginner tutorials like these.
Hello! I know about shell, Sudo, Linux, and I know perfecting SSH. My specialty is the command line. I just don’t know the first basics about coding. I’m taking a few online courses and they have been helping. I will be there soon. I have my raspberry pi-4 and speedify, I want to be able to monitor everything on my network- (like Lil Snitch without the GUI drivel). My question is what program or text editor are you plugging those commands in on. The ones in the boxes. Is it something like sublime or atom atom? Or is it something via the command line and terminal. That is my basic question and I appreciate the follow through because you are the only one I found online who still gives a damn!
everything I mean at work that’s why I chose is the tutorialis my thing command line is my thing I just don’t know what program those boxes are referring to is it a tech text editor like sublime or Adam’s at something else.
Hello? Thank you for the follow through I really appreciate it. I do know the command line, SSH, and also the command prompt. Coding is my next hurdle and I plan to conquer that by the end of summer. I’m already in rolled in online classes and they’re actually paying off. I’ve seen these boxes all over the Internet and they seem helpful. Yet, has a single one site has ever said where they paste those codes. Is it into a text editor like Atom or Sublime? Or directly into the command prompt? Please help me out I really appreciate it.
Hello? Thank you for the follow through I really appreciate it. I do know the command line, SSH, and also the command prompt. Coding is my next hurdle and I plan to conquer that by the end of summer. I’m already in rolled in online classes and they’re actually paying off. I’ve seen these boxes all over the Internet and they seem helpful. Yet, has a single one site has ever said where they paste those codes. Is it into a text editor like Atom or Sublime? Or directly into the command prompt? Please help me out I really appreciate it.
The gray boxes in the “Step-by-Step” section are shell commands. I’m assuming you can recognize common commands, like
cd
,tar
,apt-get
, etc. The last of these, for example, is how you install software on the Raspberry Pi. When it comes to editing files, it makes absolutely no difference which editor you use (which is why nobody talks about it). Thankfully, your shell will have a default editor built-in, likenano
orvi
, that lets you edit files from the command line (Google these commands for help).If you haven’t made it far enough to understand these concepts, with all due respect, I highly suggest you start with a book or more basic resource than this post. I’m writing for a relatively technical audience, and need to skip over things I think they understand. If you don’t know how to edit a file from the command line, try googling that specific problem first. Then work your way up.
Of course I recognize those commands those are classic Linux commands. With all due respect, it’s answers like those that keep people silent and unknowing for so long. No I’m not too familiar with editing a file at the command line. Instead of telling me of what you think I don’t know you could at least provide links or books are helpful to beginners. I went from hopeful to now feeling shamed. “A true wizard teaches their apprentice without forethought …” *said like Tyrion Lannister*
This was not my intent at all. I am here responding to you because I wish to help. At this point, I’m honestly confused myself, though. It’s sometimes very hard to tell what someone’s skill level is. If I under-estimated yours, I apologize for sounding condescending by explaining “down” to you. You’re right to say that I could/should have provided you with direct links to resources to learn things. I actually Googled for some for you while I was writing my last comment, but I didn’t have the time to sort through them and try to figure out what would be most useful to you. My second paragraph was not meant to be snarky (though I can see how it would come across that way). Being able to Google your way out of something you don’t understand is, tbh, the most important skill I think exists wrt programming. I probably search Google over 100x a day for answers on computer questions I’m working on. I was hoping I could show you how to break down your question/problems into chunks and research them yourself.
Best,
– Zane
Firstly, Zane, thank you very much for this write-up. I am very interested in getting this set up. Secondly, I am having a variety of issues starting with the correct setup of the RPi. I have tried to sign up for you mailing list multiple times and keep getting a system error from your end. I’m assuming this is what I need to get access to the build guide.
I am not using RPi hardware any longer. Rather, I am using Proxmox VM’s but setting them up as RPi’s. This has worked well. I have added a second USB Ethernet adapter and the basic Debian install is working. Python 3.7.3 is installed. I have another VM already running with Grafana, Influxdb and also successfully installed Prometheus.
I’m not sure where my problems are but believe once I can review the build instructions, I can troubleshoot it. Frankly, I am confused regarding the proper configuration of the ethernet. I am using a Cisco router with Ubiquiti network switches and AP. My home network is on 192.168.2.x.
Any assistance and guidance is appreciated. Thanks in advance.
Hi Mike, thanks. Not sure what you mean about the system error; the mailing list is hosted by Mailchimp. You can drop me a line on the contact page, if it helps.
I’ve never personally used Proxmox, though I use a LOT of Docker+Kubernetes. I can’t quite tell what your problem is based on the description, but I can say that VMs really mess with networking. In the case of Docker/K8s, everything happens on a virtual private bridge network… isolated from the main network. I’m sorry that I don’t know enough about Proxmox to comment if this is specifically related to your case or not. One way to test this would be to get into a command prompt inside the VM. If you can ping the Cisco router at 192.168.2.1 (or whatever), all is well.
Next, I’m not sure exactly what you’re trying to accomplish with the VM per se. The crux of this post is that the RPi can “spy” on the traffic flowing through it. It requires two physical ports, bridging the traffic from the LAN to the WAN. The Raspberry Pi must act as a pass-through for the traffic, which can be done via the two methods described in this post. Once that is the case, the github repo / script will work to collect stats on that traffic which is already being passed through the Pi.
Thanks, Zane. I tried again to sign up but was again rejected due to a “system problem”. If you are using MailChimp, it may have something to to with running PiHole on my network.
There is no magic to Proxmox. I have used Debian VM’s to replace all the utilities that I used to have running on RPi’s. I have no problem pinging my Cisco router. I have a second USB Ethernet adapter that is seen by the VM. I will try again. I also just learned that I can mirror a port on my Cisco router which may allow me to mirror the WAN port for monitoring. I’ll post an update in the coming days/week and let you know if I’ve gotten any further.
Got it. I guess I’m used to having to work around the K8s abstractions 😉 That’s interesting about port mirroring on the Cisco router. I had no idea such a feature existed (I’m just a hobbyist). But it seems like a great way to avoid the bandwidth bottlenecks you might otherwise impose with a Raspberry Pi pass-through (?).
After a few days of research and tweaks, I have this mostly working.
I did not use RPi’s since I did not want to purchase more RPi’s. I have several older RPi 3’s but did not want to invest in RPi 4. I moved to Proxmox VM on an Intel NUC some time ago for my utility devices.
I created a new VM using Debian 10 (Buster) and struggled a bit to add a second USB network adapter. But I finally succeeded in getting it working after tweaking /etc/network/interfaces. I now had two working NIC’s on the VM.
After researching promiscuous mode some more, I was able to set one adapter into promiscuous mode but was only able to read broadcast packets. And this made sense since I still was not reading ALL network traffic.
I have a Cisco SMB router and Ubiquiti switches and AP. I was able to mirror the outbound port of my Cisco SMB router (all network traffic) to another network port which I plugged into the second network adapter on the VM. After running a test with tcpdump, I was able to see all network traffic. I also tested it with the Python script and verify data was flowing to Prometheus. So far, so good.
Next I added the Grafana dashboard and this is where I am currently stuck. I validated that I can read the ntm data elements in both Prometheus console and Grafana. However, the Grafana dashboard is not working. I suspect it has something to do with the queries and the exclusions. Any suggestions on troubleshooting this are appreciated. Many thanks in advance.
Progress! The issue was that the regex used in the Grafana local server dropdown was filtering to 192.168.0.x subnet. My home network is on 192.168.2.x. Once I updated the regex, the Grafana dashboard worked!
I still need to do some performance analysis to see if this will work using a Proxmox VM long term. Thanks so much, Zane, for making this available. I learned a lot during the implementation on my home network.
Wow! So glad to hear it. I really want to try out this promiscuous mode approach. I’m going to see if I can even make a RPi flashable image available to folks 🙂
Hello again! I’ve successfully gotten my Pi running as both a router and network monitor. As I’m error checking it before implementing it, I have a question about name resolution for local hosts.
I put all of my local devices in /etc/hosts, and from looking at the NTM metrics, it appears they are resolving correctly:
ntm_bytes_total{dst=”Zero”,proto=”tcp”,service=”https”,src=”1e100.net”} 25187.0
However, in Grafana, all the instances of that host are showing up as the IP 192.168.0.10 in the By Host, Bytes Transferred, and individual detail dropbowns.
Since it looks like the data Prometheus is getting has the alias, how can I get it to display it in the graphs?
That’s very strange. Based upon the Prometheus line you pasted, it’s not even recording the local IP address. Are you sure you’re not accidentally viewing old data? You may want to use the data explorer directly in Grafana, not the pre-built dashboard. Try filtering for the device on both the sending and receiving side.
BLUF: If your router resolves local devices you need to edit your REGEX field for the local network variable.
Hey — excellent guide. Your ability to step through things is uncanny.
Wanted to highlight an issue I had. I run opnsense (fork of pfsense) and a netgear switch that can port mirror. I basically mirror everything coming into the opnsense router from my wifi network (this represents 90% of traffic, only my main PC and pihole are on the wire). The mirrored packets go to a raspi that runs the monitor script — all good I see everything.
Problem was the opnsense resolves all of my local devices. So the graphs showed nothing as variables that define the local network contains only the private address space. I replaced the regex with [A-Za-z0-9\.\-]{0,} and it starts showing stuff. I have a couple of things left to massage before its all right, but thought I’d put it out there that resolved devices on the local network will break the graphs.
Thanks again for this write up!
Hey Jason, thanks for saying so!
Good point about the resolved devices. Since that didn’t seem possible with my setup, I indeed coded it to look for IP addresses instead. But I would certainly prefer to have cleartext names instead of obscure IP addresses, myself. Maybe eventually I’ll figure out how to do this without opnsense. It seems like since I’m running the DHCP server on the Pi anyways, it should be able to resolve these names…
My router (AVM FritzBox) is resolving the names automatically. Here’s the regex I’m using to get named devices included in the stats:
/^(([A-Za-z0-9\-]+\.fritz)|(fritz.box)|((127\.\d+\.)|(10\.\d+\.)|(172\.1[6-9]\.)|(172\.2[0-9]\.)|(172\.3[0-1]\.)|(192\.168\.)\d+\.\d+))$/
I don’t understand REGEX could you give me some more details on how your REGEX with Fritz works.
My router is the DHCP server (192.168.2.1) how would I make the REGEX work for that ? Thanks
found the solution?
I’m in the same boat 🙁
I’m able to explore the data in Grafana. But when importing the dashboard, there is no data to show. I have checked and metrics are being produced just not in the dashboard.
I’m going to guess that your LocalIPs are not showing up in the dropdown at the top of the dashboard. I just updated the post with instructions how to fix this, in case you’re using a nonstandard subnet. Let me know how it goes!
– Zane
I have been looking around at the many network monitoring tools and the overall data capture and presentation of yours cannot be topped.
I am also planning to install piHole on my system according to [link to www.smarthomebeginner.com]. I saw a comment earlier in this post but wanted to get a fresh one going. How would I add that server in your above diagram. I am also thinking that piHole should be on a separate RP unless you think differently. Any help with the settings would also be appreciated. I am a bit newer to the RPi and my Unix days are decades back.
Any suggestions would be appreciated.
Thanks! Glad you found it useful.
TBH, I would highly recommend AdGuard Home over PiHole. The Home Assistant folks switched over the entire community many months ago, and I agree with that decision. It’s much easier to use. I run it on my router, actually. I just installed it via the official instructions on the router. No other steps required, IIRC. But you could equally well run it on a different Pi.
Great write-up. I was able to follow everything with no errors. The piece I’m struggling with is: “Open your web browser to [link to 192.168.0.1] to see the counters being exported for Prometheus.” I get a timeout from the raspi.
I currently have (WAN)->Raspi_WAN->Raspi_LAN->WirelessRouter_Bridged->Ethernet_Client. If I force stop the script on the Raspi, I get something like this:
pi@raspberrypi:~/network-traffic-metrics $ sudo python3 ./network-traffic-metrics.py -i lan “src net 192.168.0.0/24 or dst net 192.168.0.0/24”
tcpdump: listening on lan, link-type EN10MB (Ethernet), capture size 262144 bytes
[SKIP] 04:09:52.472470 IP
^C914994 packets captured
915130 packets received by filter
Thanks! First, just to address the obvious — have you checked that you’re trying to connect to the correct IP address? If the IP of your Pi is not `192.168.0.1`, you will need to change it. If that is correct, try SSHing into the Pi and doing a `curl localhost:8000/metrics` to see if you can access the Prometheus endpoint from the pi itself. If that works, then something about your network topology (firewall?) is preventing the other machine from accessing the Pi. If it doesn’t work, then for some reason the script is not listening or creating the webserver. Check out `journalctl -xe` to look for errors reported by the script, perhaps. You could also try changing the port it listens on on, e.g., `–port 8001`.
Big thanks for this tutorial, it is detailed and easy to follow.
I followed all steps and it seems to work because I see some data at localhost/800/metrics
However, I am unable to bring the data flow in Grafana despite following all instructions …
In addition I don’t get METRICS when I go to Explore as it is shown in the tuto.
Im using Raspberry PI 4 8Gb with Raspbian OS 32 Bit via Ethernet connexion to ASUS Routeur linked to a freebox routeur, maybe the double DHCP creates an issue with Grafana ? (seems strange because Prometheus is working well and Grafana should pickup the Prometheus data)
Hmm, it sounds like Prometheus is not configured correctly or otherwise unable to scrape the `metrics` endpoint. I’d triple-check how you’ve configured the service in Grafana, ensuring that the Prometheus connection can actually reach the target URL from whatever host machine is instantiating the request. You could also look at the Prometheus & Grafana logs to see if they complain about anything in particular.
Thanks for this write-up Zane! I am up & running, all on a RPi4B. Here are my “lessons learned.”
I think there’s a small typo in the Prometheus run command. As written, it uses the example/default config file (prometheus.yml).
Readers here should either change that to “prometheus.yaml”, or save the config file they create, overwriting prometheus.yml.
I installed a service for this to run at startup, similar to TCPdump (below).
I had some trouble installing Grafana on my Raspberry Pi 4B (Raspberry Pi OS 64 bit, Linux raspberrypi 5.4.51-v8+ #1333 SMP PREEMPT Mon Aug 10 16:58:35 BST 2020 aarch64 GNU/Linux), but the “sudo apt-get install grafana-enterprise” command returns
“E: Unable to locate package grafana-enterprise”.
I had to follow some slightly different steps on their website:
[link to grafana.com]
As others mentioned, I needed to update the network-traffic-metrics.py script to look for ‘lan’ instead of ‘eth0’.
I also used a systemd service to get tcpdump to run at startup:
tcpdump.service saved in /etc/systemd/system/:
[Unit]
Description=TCPDump service for traffic monitoring
After=network-online.target
[Service]
Type=idle
ExecStart=python3 /home/pi/network-traffic-metrics/network-traffic-metrics.py “(src net 192.168.0.0/24 and not dst net 192.168.0.0/24) or (dst net 192.168.0.0/24 and not src net 192.168.0.0/24)”
[Install]
WantedBy=default.target
Followed by:
sudo systemctl daemon-reload
sudo systemctl enable tcpdump.service
sudo reboot
Hope this helps others. Please correct me for better ways to do this!
Thanks for the detailed thoughts, Will!
Ugh, the whole `yml` vs. `yaml` extension thing always gets me. Every tool seems to have some different level of compatibility 🙁
I’ll try to take a pass at incorporating this feedback soon. Very glad to have people posting such helpful comments, to make sure it works well for all. Cheers!
– Z
Thanks for putting this article together. It was a lot of fun setting up and easy to follow. I did run into one snag that I thought I would share and give you the option to include in the article if deemed appropriate.
I run dynamic DNS on my home network and configured the rpi router with the proper search domain and DNS servers to make use of that. For the devices on my network that had registered their hostnames, data wasn’t populating through to the dashboard. Looking at the data exposed by the network-monitor python script, it had populated the data with the FQDNs instead of IPs which, awesome! The issue was that now those flows have to be identified by name and not by IP address. The dashboard, however, is only configured to look at IPs. I updated the regex for the LocalIPs variable of the dashboard to include “(.*\.myhomedomain\.local)” and then everything worked as expected. The document alludes to this but only if the IP subnet is non standard for local networks. Granted, it was that note that lead me to the proper place to make the update.
I don’t know if it’s possible for the user to supply this as a variable somehow as part of the build since it’s the dashboard that ultimately needs to be updated. Perhaps just a note in the same area mentioning what to do for DDNS setups.
In any event, this was a really fun project, thanks again for putting it together!
Hey Rob,
I am using a DNS also and I was trying to figure it out a way to display the names instead of the IPs (I did not even realise that the named resources were being left out :-))
I will be updating the regex to include my domain – thanks for that. But I have a question – Did you have to update the filter also?
(src net 192.168.0.0/24 and not dst net 192.168.0.0/24) or (dst net 192.168.0.0/24 and not src net 192.168.0.0/24)
Hi, this is great. loved the tutorial. i got the metrics to work on port 8000 without any issues running it as a service on the Raspi-4 (Prom + Grafana already running and having some dashboards)
when i update the prometheus.yml with the following parameters mentioned and try to restart the service it fails with the following
● prometheus.service – Prometheus Server
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2020-12-28 18:04:46 GMT; 3s ago
Docs: [link to prometheus.io]
Process: 1186 ExecStart=/home/pi/prometheus/prometheus –config.file=/home/pi/prometheus/prometheus.yml –storage.tsdb.path=/home/pi/prometheus/data (code=exited, status=2)
Main PID: 1186 (code=exited, status=2)
Dec 28 18:04:46 raspberrypi systemd[1]: prometheus.service: Service RestartSec=100ms expired, scheduling restart.
Dec 28 18:04:46 raspberrypi systemd[1]: prometheus.service: Scheduled restart job, restart counter is at 5.
Dec 28 18:04:46 raspberrypi systemd[1]: Stopped Prometheus Server.
Dec 28 18:04:46 raspberrypi systemd[1]: prometheus.service: Start request repeated too quickly.
Dec 28 18:04:46 raspberrypi systemd[1]: prometheus.service: Failed with result ‘exit-code’.
Dec 28 18:04:46 raspberrypi systemd[1]: Failed to start Prometheus Server.
i have given permissions to the pi user to the data folder (i can see data being written for the other dashboards there but for some reason this fails on the start itself
any help would be appreciated
I can’t quite see enough information there to know what went wrong. The log only shows that the application crashed, but does not include the reason for the crash. You need to run `journalctl -xe` and scroll up to find the actual cause of the problem.
I am so close to getting this to work. The only link that seems to be not working is that my remote machine can not see 192.168.0.1:8000/metrics.
When I so curl 192.168.0.1:8000/metrics on the machine itself via SSH, the data stream comes up nicely.
My firewall doesn’t seem to be logging any problems. I don’t know enough to know why it isn’t working on the machine running Prometheus.
I had the same problem, and after opening port 8000/tcp on the firewalld zone=home for the “lan” side, I was able to get the browser to see the data.
Hi,
Thanks for this nice and complete tutorial.
I use a raspberry pi with a single eth interface connected to my modem. The pi has a static address, serves as DHCP server and uses ip forwarding to redirect all traffic to the modem/router. I disabled ICMP redirects.
When I try to download large files, it seems that all the traffic goes through the pi and the traffic is rather in agreement with Chrome. However, the traffic shown in Grafana is significantly lower than what is shown by Chrome/iftop (In the order of a few Mb rather than 500Mb).
Any idea of what I could have missed?
Thanks
How exactly are you viewing the data in Grafana? If you’re using one of the SUM aggregations (e.g., total bytes), keep in mind that cardinality resets will create cliffs, so your recording needs to be continuous through the interval period (and Prometheus must have sufficient storage). If you’re looking at one of the line graphs, keep in mind that interval aggregation is prone to sampling errors depending on the size of your interval windows. For example, if you’re summing in 1min chunks, you’ll actually only look at the average over that one minute (which will be very low).
Seems like only the up traffic is monitored, not the down. SHould be an issue with my network configuration on the pi.
Thanks for this guide!
I have a couple comments to it.
1. I haven’t found how to add 2 static IP’s to 10-network.rules file
2. Before I could start Prometheus, sudo chmod 777 -R on prometheus folder was required
3. Installation guide for Grafana is little bit changed from the moment when this article was posted, so, it’s better to follow the original guide on Grafana website
What is missing for me on Grafana dashboard is the labels for local IP’s. My Access point (Orbi) detects device name correctly. Now it is a question how to assign them (dynamically) to IPs on dashboard
1. It should just be the same syntax, repeated in two different blocks. What’s not working, there?
2. You just gave full R/W access to everybody, which is generally not a good idea. If the Prometheus folder was not owned by the user running Prometheus, that should be fixed in a more targeted and secure way (i.e., sudo chown -R 1000:1000 ./prometheus).
3. Agreed. Though, thanks for the heads up — I’ve updated the Grafana section.
Re: assigning names, one easy way to do this may be to add Grafana variables to the dashboard representing your various IPs. It’s much harder to get working implicitly, and would require setting up MDNS such that all the IPs were internally resolved to their canonical hostnames.
Had fun building this and getting to know some new parts of Debian / Linux I didn’t know. A few additions for beginners like me:
1. systemd is a pain to make work with Prometheus, and in the end I used and Environment variable:
[Service]
Environment=”Netmonitor=”\””(src net 192.168.2.0/24 and not dst net 192.168.2.0/24) or (dst net 192.168.2.0/24 and not src net 192.168.2.0/24)”\”””
ExecStart=/usr/bin/python3 /home/pi/network-traffic-metrics/network-traffic-metrics.py $Netmonitor
2. For Prometheus I used the guide here [link to devconnected.com] to create the prometheus links in /usr/local/bin and then setup the service in systemd using
ExecStart=/usr/local/bin/prometheus \
–config.file=/etc/prometheus/prometheus.yml \
–storage.tsdb.path=”/data/prometheus”
3. Grafana automatically starts up after install so no .service file needed
4. My pi bridge blocked all network requests (DHCP etc. etc.) to anything downstream when first setup.
The reason was these variables were set to 1 when bridge was up (/proc/sys/net/bridge):
net.bridge.bridge-nf-call-arptables
net.bridge.bridge-nf-call-ip6tables
net.bridge.bridge-nf-call-iptables
Adding above 3 lines with =0 at the end to /etc/sysctl.conf fixed the issue for me and persisted on reboots.
See more here: [link to wiki.libvirt.org]
Interesting findings!
Indeed, systemd can be challenging at first. Your first point talks about Prometheus, but the code is not for Prometheus (it’s for the script). I’m not clear what your environment variable is meant to solve, though. Why was this work-around necessary?
For (2), I would caution other readers about following this post too literally. It involves nginx and reverse proxies, which is something I very intentionally kept out of the discussion for both simplicity and security reasons. Glad that the systemd file worked for you though.
Great find on (4)!
Cheers,
– Zane
Hi Zane, thanks for the reply. Great solution this, although my kids don’t agree as I can now show them their computers are stopping the TV working 🙂
For the systemd yes my error it is for the python script not Prometheus.
I had to use the environment variable because the arguments (src net…..) weren’t being parsed correctly. It wouldn’t take the whole argument as systemd removes the quotes. I tried lots of different escape characters but only worked when I used the variable.
One other question, how easy would it be to get Grafana to show the Network name instead of IP where one exists ?
Thanks again for the great solution.
Thanks for the tutorial. I think there is a step missing for adding Grafana repository to apt sources list (see [link to grafana.com])
This was already explained in the “official docs” link in the post (the first sentence in the Grafana section).
After all, this is super amazing to play around with. Some more thoughts:
1) it looks like this is only working with IPv4. Any chance to get it capturing IPv6 as well?
2) any specific reason for manually installing Prometheus instead of using the version from the repository? Might save some headache…
3) I’d love to have a stat for the average bandwidth utilization (up/down) over a selected period. I clicked a bit around in the Grafana interface, but not sure I found the right parameters. If this could be added to the dashboard, would be amazing!
In general: I’m using Raspberry Pi as WiFi access point to measure utilization for certain devices, hence I’m not 100% sure if up/down bandwidth might get mixed up in that setup. But that’s a minor issue. Thanks a lot, this is exactly what I was looking for!
Here’s how tcpdump output looks different for IPv6 and IPv4. unfortunately, I’m not experienced enough in regex to fix the script myself:
[IPv6]
21:46:25.653730 IP6 (flowlabel 0x9ffb1, hlim 58, next-header TCP (6) payload length: 1452) speedtest.your-server.de.https > X1-AFO.fritz.box.64348: Flags [.], cksum 0x4bcc (correct), seq 2116401:2117833, ack 1038, win 8, length 1432
[IPv4]
21:44:59.969737 IP (tos 0x0, ttl 57, id 55329, offset 0, flags [DF], proto TCP (6), length 1452)
nbg.icmp.hetzner.com.https > X1-AFO.fritz.box.62424: Flags [.], cksum 0x8ce9 (correct), seq 272061:273461, ack 896, win 8, options [nop,nop,TS val 2752536159 ecr 3535136344], length 1400
Another observation: UDP traffic is not being captured correctly, e.g. when doing a MS Teams video call, almost no data appears in the metrics.
Correction: UDP traffic *is* being *captured* correctly, however it will not show up in the results as UDP does not include the “service” variable and NULL-values are currently not supported for variables (see [link to github.com]).
Manually adjusting the queries to not include the “services” filter does the trick.
Good catch! I decided to modify the python script instead to output ‘ntm-Unknown’ during the service lookup function to avoid modifying the dashboard. Actually, having a method to filter services can be pretty nice.
Thanks so much for this posting! It was pretty straight forward to setup with your walkthrough, but I’m a little stuck on the Grafana piece – it seems like if I turn off host resolution for tcpdump in the prometheus client script (‘-n’ : Don’t convert addresses to names), all the traffic will properly display on the dashboards, but if anything in the internal addresses is resolved using -f or default resolution, then most traffic is dropped on the dashboard. Can you help me walkthrough how the grafana dashboards are setup? Maybe I can figure out what’s happening and push a fix back to your repo? I have no idea how Grafana expressions work and it might be faster just to chat.
It sounds like you need to adjust the filters on the Grafana dashboard. It is unlikely that the traffic is being dropped, since there are no filters that happen prior to the dashboard itself, which is merely a visual filter (it does not restrict the data which is collected). You should start by using the “Explore” section of Grafana to see all the data in order to debug, and then adjust the dashboard filters appropriately.
Outstanding work. Looking to implement on a 4. Undecided as to the OS version. NOOBS or Ubuntu 20.04 or something else?
This is a great project – I hacked this into a Rock64 board with DietPi, it’s been a great learning experience.
Everything works for me up to the monitor script. I run the traffic metrics script but don’t get the 192.168.0.1:8000/metrics page, it’s like the webserver isn’t starting. I see traffic in my terminal, so I know something is working. I’ve struggled with this particular piece all day without getting anywhere.
If anyone comes across this and has a recommendation, i’d appreciate it.
Can you curl the localhost endpoint from the same device? If so, then the IP address is wrong. If not, you’ll need to look into the logs from the terminal. Most likely the web server failed to start for some reason.
Hi Zane,
Thanks a lot for the awesome work here.
When I try to filter on just one host like below for example:
“(src net 192.168.1.122/32) or (dst net 192.168.1.122/32)”
I get very accurate results, but when I filter on a range like below, I get totally erroneous readings:
“(src net 192.168.1.0/24) or (dst net 192.168.1.0/24)”
I have only one interface, connected behind a router. I looks like I only get upload traffic. How can I graph both? my filters are “src net 192.168.0.0/24 or dst net 192.168.0.0/24”. Can you help?
Sounds like your router is not configured to mirror all traffic into the port. The fix will depend on your router.
Hello! Thank you for your work. I have one question: is it possible to monitor on standalone server, which is not router? I already installed your monitor, but my ntm metrics are not shown in Prometheus. How can I repair this problem?
As one of the headlines in the post says: “the traffic must flow through the device.” Just below that, it explains the process for bridging the traffic through the device (which does not constitute a router). However, it still means that the device needs to act as the physical connection between the LAN and the WAN, even if not acting as a router itself it must be bridging the traffic. Note that some other commenters have pointed out that certain high-end routers can avoid this need, but they are relatively uncommon.
First I want to say thanks’ for this tutorial. Is works so far, but there are some open points or questions.
I have chosen the alternate setup:
WWW FritzBox eth1 – Raspberry 4B – eth0 LAN
The FritzBox Wifi is almost ide (no clients connected) since I have another WLAN mesh, not the AVM mesh, so all the internet traffic goes through the raspberry.
I had some trouble setting up the bridge mode, finally I used the original tutorial from raspberry about wireless access-point-bridged replacing the wireless interface by eth1 and leaving out wireless setup.
Because my FritzBox uses 192.168.178.0/24 network my network-traffic-metrics.py parameters are:
(src net 192.168.178.0/24 and not dst net 192.168.178.0/24) or (dst net 192.168.178.0/24 and not src net 192.168.178.0/24)
I added –fqdn as well, without it Grafana will not display much, or wired data with FritzBox. According to AFO’s post here I changed the LocalIps regex matching my LAN slightly to be:
/^(([A-Za-z0-9\-]+\.fritz)|(fritz.box)|((127\.\d+\.)|(10\.\d+\.)|(172\.1[6-9]\.)|(172\.2[0-9]\.)|(172\.3[0-1]\.)\d+)|(192\.168\.178)\.\d+)$/
So now I see a lot of data in Grafana, wow!
I started to play around using speedtest-cli and crul some big files from the internet using another raspberry in my LAN.
While the FritzBox is showing the traffic immediately nothing happens in Grafana!
I tried to set the scrape_interval in prometheus to 1s, no effect. Grafana shows the client running the curl command, but only some B/s, not the peak expected (kB or mB) . And only up-stream data is shown, no down-stream.
Anyone any suggestion to it?
I enjoy this project very, so after solving the above problem I will try to create a Grafana dashboard showing not only the through output of each client, which is still very helpful, but also the addresses it is talking to. I think that information is within the metrics.
Thanks in advance for feedback about the curl/speedtest-cli problem.
Heya,
First, let’s look at the fact that the curl command is not showing the throughput you’d expect. It sounds like you may be misunderstanding what the “scrape interval” does. Each time prometheus scrapes, it’s simply counting upwards. If you are downloading 1 byte per second, over the course of 15 seconds, a 1 second scrape interval would generate 15 data points of 1 byte each. OTOH, a 15 second scrape interval would generate just 1 data point of 15 bytes. In other words, changing the interval will not impact the integrity of the data, only the fidelity.
What you’re looking for is probably the Grafana $__interval (see the chart configs). This tells Grafana the time window in which to aggregate the data. For example, even if you have 1 second SCRAPE intervals, if Grafana is only aggregating at 30 seconds then it will combine 30 data points into 30s of averaged data. If your download (or curl command) took, say 1 second… then you’d only see 1/30th the speed you expect, because the other 29 seconds were idle.
As for why some data would be missing, I can only really point you at tcpdump. The NTM script is EXTREMELY simple. All it does is capture that data and make it available to Prometheus. It’s a very low-level, ubiquitous program. So without digging into your network architecture, I can only speculate that some of the traffic is not flowing across the Pi, or the filters are excluding that traffic.
FWIW, the fact that you had to add `.fritz` and `fritz.box` to your local IP regex is quite suspicious to me. I’m not familiar with FritzBox, but it sounds like it’s set up your DNS resolution in a pretty nonstandard way. My recommendation would be to take a step back and look at the raw Promeutheus data by using the “Explore” section in Grafana. If the data is missing there, then your problem lies with network architecture or TCPdump filters. Otherwise, your problem is with Grafana queries.
Good luck!
-Z
Hi,
thanks for your comments about the scrape interval. You are right of course, I did read some doc’s meanwhile about the scrape interval and had become the same understanding as you perfectly explained, thanks again for pointing out. So my scrape interval is back to 15 now.
One problem zooming into a smaller time range (eg. 5minutes) is that you get “No Data” in grafana sometimes. This is because one should always use al longer interval in grafana dashboard than the prometheus scrape interval is. The $__interval is exactly the scrape interval and is the theoretical smallest time you can use to get (meaningful?) data in grafana dashboard. For longer time ranges (15-60minutes) it works so far, I guess by chance. So according to my evaluation one should use $__interval+1, or may be even better $__rate_interval which is four times of the $__interval. There are several explanations on that in the internet. So one would not go into that “No data” problem in grafana while zooming in which is confusing in the beginning.
The “missing data” problem is ongoing. I am working on it. All the data definitely goes through the raspberry, there is no other way. So yes, it must have something to do with the tcpdump settings and/or how the FritzBox does the networking. May be special grafana dashboards are necessary too.
I do it step by step now. We will see.
Thanks for all so far.
Hi again,
meanwhile I think something goes wrong, or not as expected, parsing the tcpdump within the python script. But I am not in python.
Starting the metrics script without –fqdn gives output in the pibridge:8000/metrics looking like the following, where the piblue is the raspberry doing the speedtest-cli:
ntm_packets_total{dst=”igmp.mcast”,proto=”igmp”,service=””,src=”piblue.fritz”} 14.0
Only very little of these lines can be found with different counts.
While doing tcpdump directly on the command line on the pibridge using the same network filter gives output like this:
14:36:02.918600 IP xxx.xxxxxxx.xxx-xxx.xx.xxxx-xxx > piblue.fritz.box.53528: Flags [.], seq 41886:43326, ack 276, win 507, options [nop,nop,TS val 22730681 ecr 2217052601], length 1440: HTTP
And there are lots of it in the dump while running the speedtest-cli. Remark:I replaced the real src IP by x.
In the pibridge:8000/metrics I also found lots of:
ntm_packets_created{dst=”xxx.xx.xx.x”,proto=”udp”,service=””,src=”fritz.box”} 1.6271286862585142e+09
ntm_packets_created{dst=”fritz.box”,proto=”udp”,service=””,src=”xxx.xx.xx.x”} 1.6271286862589083e+09
So my theory is that the python script trims the eg. “piblue.fritz.box” to “fritz.box” in src and dst.
Also I found something like the following in the pibridge:8000/metrics:
ntm_packets_created{dst=”igmp.mcast”,proto=”igmp”,service=””,src=”piblue.fritz”} 1.6271286920686042e+09
So in other situations “piblue.fritz.box” is stripped to “piblue.fritz”.
Again I am not a python programmer, sorry.
Any suggestions/fixes/changes?
Thanks in advance.
loden
Hi again,
meanwhile I solved the “problem”.
In the extract_domain function the domain-names from tcpdump are somehow transformed if no –fqdn parameter was given. Somehow this does not match the traffic when using FritzBox. I do not exactly understand what is happening in the function because I am not a python programmer. So I just simplified (or disabled) that function by always “return string”.
That’s it. I can use the python script using –fqdn parameter or not, both work now.
Thanks for all
loden
Interesting. The purpose of the “extract_domain” function is to effectively strip subdomains. It will convert “www.technicallywizardry.com” to “technicallywizardry.com”. I implemented it in rather simplistic way, though. It’s basically just keeping only the last two pieces of the URL (where each “piece” is separated by a dot). Fritzbox appears to append unique identifiers to the end of the TLD (“piblue.fritz.box.53528”). This is semantically abhorrent, as it adds unique identifiers to the END of the TLD instead of a subdomain.
In any case, I think a better solution for you would be to simply add a single line near the beginning of the “extract_domain” function:
if string.startswith('piblue.fritz.box'): return 'fritz.box'
. This will preserve all of the other functionality, but tag your fritzbox traffic as such. You could even change the return value to anything you wanted.Anyways, glad it’s working!
– Z
Hi again,
thanks for your reply. I agree with you that “piblue.fritz.box.53528” is “semantically abhorrent” as you mentioned. Just to clarify, may be I was a little bit inaccurate: “piblue.fritz.box.53528” is the direct tcpdump output. I guess including the port. “53528” disappears in the …:8080/metrics output somehow. May be because of different tcpdump parameters I used in command line and the script uses.
Anyway, it still works fine now since a week!
loden
Hello Zane,
Nice tutorial you wrote above there, but i have several question about it, are Network Monitor work with Pi used as local DNS (Pi-hole)?
Little explanation about my architecture, all my clients connected into Router as DHCP and i set the router DNS address into static IP of my raspi which connected into that router also, and it work well i can grab any clients traffic on Pi-hole, then i tried following your tutorial and successful run on service x.x.x.x:8000/metrics and x.x.x.x:9090/metrics (it show some data). But when i try to show them on Grafana with imported your Grafana dashboard it didn’t show any data, and even the clients IP doesn’t show, any idea about it? Really appreciate it, thanks!
Yes, the NTM can be used with a self-hosted DNS server. It sounds like you need to adjust the filters at the top of the Grafana dashboard to match your network settings. Try using the “explore” tab in Grafana to play around with the data and get used to how it works.
Wow, so awesome! Thanks for sharing!
First: This whole setup works perfectly with Raspberry Pi OS – Bullseye, prometheus 2.33 and Grafana 8.3.4.
Second: Like others have commented here, it would be nice to have host names in the graphs rather than IP addresses. I’ve got 30 IP devices in the house, it’s hard to keep track. Here’s how I do it:
1. Manually maintain a list of IP devices –> hostnames I’ve assigned in /etc/hosts:
192.168.0.2 -network_accesspoint_basement
192.168.0.102 -security_WyzeCam_back_yard
192.168.0.117 -entertainment_Sonos_kitchen
etc.
2. Notice the dash in front of each device name. So in the Grafana dashboard I edited the Regex for LocalIPs to be “/^\-|^192.168.0./”
3. In the first several hours/days using this system, some forgotten devices pop up. So just edit /etc/hosts, save, then (since I’m using systemd for the tcpdump, etc.) just restart the service to get the new IP–>hostname mapping:
sudo systemctl stop network-metrics
sudo systemctl start network-metrics
Works pretty well so far!
Hi, I really appreciate this project and it can fit in perfectly with the master’s thesis I am doing. I have a small problem when representing the data in Grafana for each device. I am running an access point in bridge mode with the hostapd daemon on a raspberry pi. So when prometheus captures the information it only shows the devices connected by LAN but it doesn’t show Grafana any wireless devices. I would greatly appreciate some clarification on where I can fix this problem.
Thanks for the tutorial! I have an alternate design. I work from home, and my company provides a hardware firewall, that also acts as the router for my home network. Everything (work and non-work data) goes through my wireless router, acting as a access point (no DHCP). I’m trying to determine how much of my monthly data usage is due to my kids gaming, etc. So the pi is purely for monitoring throughput on the home side. The problem I’m having is in the Grafana setup, everything is in terms of Mbps in your dashboard. I’ve fiddled with adding panels, and changing the definition of the existing ones, but I’ve had no luck. I’m looking for the total data passed for each device. Any ideas on how I can modify/create a panel to summarize my total usage/time period based on IP address? Here’s an example of how my ISP shows my total usage: [link to imgur.com]
Thanks!
Hi there, thanks for the tutorial. I am running into a problem that I cannot connect the prometheus instance to my existing Grafana server. I am running Grafana inside Home Assistant and this is the error I get.
Error reading Prometheus: Post “http://10.71.71.1:8000/metrics/api/v1/query”: dial tcp 10.71.71.1:8000: connect: no route to host. -10.71.71.0 is my subnet.
I have tried doing port scans on the router and found that both ports 8000 and 9090 aren’t open even though both programs – python and prometheus – are running without any errors.
Thanks in advance.
Thanks,
followed the above, using rpi 3B+ sat inside my network, with Grafana installed on HomeAssistant(rpi 4)
Url in Grafana is set to the 3B+ port 8000
If I do Save and test on teh data source I get the following error
Error reading Prometheus: bad_response: readObjectStart: expect { or n, but found #, error found in #1 byte of …|# HELP pyth|…, bigger context …|# HELP python_gc_objects_collected_total Objects co|…
it looks like grafana is reading everything including the # HELP # TYPE etc. is there a way to get it to ignore these? so it only reads what it is expecting??
Thanks
Zane, How can I make this totally passive so I could install this at a customer site and collect usage data. I just want to be able to collect usage data for a week or two so I know what is actually being consumed average and peak. It would need to store the data on the RPI or have an email method to send me the collected resulting data. That way I don’t have to reconfigure anything at the customer site. Thx
Hello, Zane.
Thank you for this tutorial.
I don’t know if you are still checking out this page but I’ve got a question.
I seem to have got the python code and dashboard set up and working, but I’ve got two problems, and I was hoping to get some help.
First, the downloaded bytes seem to be very small. Like I am only getting about tenth or lesser bytes recorded in the dashboard.
My guess would be that when tcpdump outputs multiple lines of same contents very fast, the python code seems to only get the first or the last and discards all the duplicates, which is not ideal as the length (bytes) get decreased significantly.
Second, I seem to not get some of the records from tcpdump to transfer to the dashboard.
eq. While I can find the below record in the raspberry pi (192.168.100.189:8000/metrics), “comcast.net” doesn’t show up in the “bytes transferred (by server)” part of the dashboard.
“ntm_packets_total{dst=”comcast.net”,proto=”udp”,service=””,src=”192.168.100.191″} 7056.0″
This, I have no idea why it’s happening.
Any ideas or help would be greatly appreciated.
Thank you in advance.
So I have a cable connection and therefore can’t put the Pi on the WAN side of the main router. I also don’t have a USB Ethernet port so looking to use only the on board Pi port (using a virtual interface/IP) ([link to forums.raspberrypi.com])
I was looking to set the Pi up on the LAN side of the ISP router. The 192.168.0.x/24 network would only have one ip (.1) on the LAN of the ISP router and one ip (.2) on the WAN side of the Pi. The Pi would then have a second (virtual) interface with 192.168.1.x/24 subnet running the DHCP server. All network devices would be on the 192.168.1.x/24 subnet.
Any reason why this wouldn’t work? I know it would double the bandwidth used on the Pi ethernet connection but it’s 1Gbps so it should have lots of headroom.
I briefly tried it but couldn’t get it to respond to DHCP requests.
Brilliant, many thanks for your efforts. Just what I was looking for to keep an eye on the use of the internet by my teenage grand-daughters. Took me a couple of hours to set up the Pi, install the software and get it all working, simply connected to the network behind my ISP router 🙂
Just one question, how easy would it be to add a list of named IP addresses and get the (known) names displayed in Grafana in place of the IP addresses ? I’m not a programmer but can usally cobble together enough Python to get a job done, some pointers as to how to achieve this would be extremely helpful.