Everything to know about Self-Healing Power over Ethernet (PoE) Networks

What is a PoE Watchdog?

A PoE watchdog function on a Power over Ethernet network switch is a “self-healing” network feature that monitors the status of connected PoE-enabled devices and provides a way to reset them if they become unresponsive or stop working properly. Suppose a PoE-enabled device or Powered Device (PD) does not respond within a certain amount of time. In that case, the switch will automatically reset that device by cutting off and restoring power. This procedure is designed to avoid network downtime and keep connected devices operational. The PoE watchdog function is especially useful in applications where connected devices are critical to network operation, such as in industrial or surveillance settings. The PoE watchdog function contributes to network stability and reliability.

This technology is referred to as PD Alive Check, PD Device Alive Check, Powered Device Monitor or PoE Watchdog. At the core of each, regardless of the specifics of the implementation by different switch manufacturers and chipset vendors, is the same basic function: Monitor connected PoE devices and, if they stop responding, power cycle the PoE device by briefly cutting power to it and forcing it to restart. After the restart, the PoE device will function again without the intervention of the network technician.

Why do we need PoE Self-healing?

Why is this even an issue? Why do we need a PD Alive check in the first place? How often do network surveillance cameras crash?

Several variables can affect how frequently a PoE-enabled device will crash or fail. Using a network surveillance camera as an example, some of these variables include the quality of the camera, its age, the environment it is installed in, and how well it is maintained. In general, high-quality network security cameras that are installed and maintained in a suitable environment are unlikely to frequently crash or fail. Utilizing top-notch cameras from dependable manufacturers, adhering to suggested maintenance practices, and routinely updating the camera firmware are all crucial for reducing the risk of camera crashes or failures. The risk of problems can also be decreased by placing cameras in appropriate environments and sheltering them from extreme weather.

But occasionally, even the best cameras can and will develop problems.

Like any computer, network cameras eventually need to be rebooted, and this is where the PoE Watchdog (PDM, PD Alive) function can help reduce downtime and maintain the stability of the network.

Can a PoE Self-Healing Network Switch save money?

Industry reports and polls show that in 2021, US IT service providers charge $100 to $200 per hour, with some costing up to $300 per hour for specialist services. These estimates are averages and may not reflect particular IT service providers' or services' costs. The COVID-19 epidemic has also influenced IT cost and service delivery, and not in a good way.

A PD-Alive-enabled Power over Ethernet switch can restart network devices that have stopped working, without any human interaction. So, the answer is a resounding “Yes, PoE Switches with PoE Watchdog functionality can and will indeed save money.”

How does the PoE Watchdog / PD Alive function work?

There are multiple ways a PoE switch can monitor connected PoE devices or powered devices (PD). We are looking at two common ones.

How does the PoE Watchdog / PD Alive function work?

A PoE switch can monitor connected PoE devices in multiple ways. Here are two common ones.

1. PoE Watchdog using the Network Ping command

This implementation is only available in PoE switches that provide management functionality (e.g., through an admin interface that can be accessed via a standard web browser). The PoE switch will send ping packets to specified IP addresses, and the connected devices are supposed to reply. If no response is received within a specified time interval, the Power over Ethernet switch will cut power to the port that the unresponsive device is connected to.

Below is an example configuration illustrating various common parameters typically found in managed or web-smart PoE switches that provided a PD Alive Check via ping.

Port#StatusIP AddressInterval (sec.)#RetriesActionPower Off Time (sec.)Start-up Time (sec.)
1Enabled192.168.0.100602reboot10120
2Enabled192.168.0.1011203alarm15180
3Disabled0.0.0.0
4Disabled0.0.0.0
5Disabled0.0.0.0

Explanations

IP Address:
An IP address is a unique numerical identifier assigned to devices on a network to enable communication and data transfer. In our example, a PoE surveillance camera with the IP address 192.168.0.100 is connected to switch port number 1.

Interval (sec.):
Defines how often the switch sends out the ping request. Care should be taken not to send too many requests in too short a time, unless a device's uptime is mission-critical. In the example, the switch will ping the device on port 1 once every 60 seconds but will wait 120 seconds between pings on port two.

#Retries:
If a ping request goes unanswered, the PoE switch will try again as often as specified here. In the example above, the switch will re-send the ping command two times for port 1 and three times for port 2.

Action:
Some PoE switches allow you to specify which corrective action to take if the switch deems a connected PD to be unresponsive. In our example, if an incident were to occur on port 1, the switch would cut the power to the port briefly and therefore reboot the connected PD. If port 2 was triggered, the switch would merely alert the system administrator by sending an SMTP trap message. The exact implementation very much depends on the chipset used and the firmware customizations made by the switch manufacturer.

Power Off Time (Sec.):
This parameter defines how many seconds the port should be without power before power is re-applied. Waiting at least five seconds before re-applying power allows time for residual electrical charges to dissipate and for the device's capacitors to discharge fully. This ensures that when power is restored, the device will start up correctly and operate as intended. In the example above the switch will wait for 10 seconds on port 1 and 15 seconds on port 2 before re-activating the port.

Start-Up Time (Sec.):
Different network devices have different boot-up or start-up times. One network camera may be operational 30 seconds after start-up, whereas another camera may require twice as long. During the start-up time, the switch will not probe the connected PoE-enabled device and will not send out any ping commands. Specifying too short a start-up time could accidentally create a situation where the switch and connected PD enter a never-ending loop, where the switch may cut power to the PD before it even has a chance to become operational.

2. PoE Watchdog Monitoring Port Traffic

This implementation is available on both managed "smart" and unmanaged PoE switches. Instead of using very targeted ping commands, this method has the switch monitor traffic on the specified port. More specifically, the switch simply checks for any kind of network packets. If no packets are received, the switch will cut power to the offending port for a moment and then re-apply power to the port. The idea behind this implementation is that if a connected device with an active link no longer sends any traffic for a specific time interval, the device must be malfunctioning and therefore needs to be rebooted.

This implementation can be found in both managed – smart and unmanaged PoE switches. Instead of using very targeted ping commands, this method has the switch monitor traffic on the specified port. More specifically, the switch simply checks for any kind of network packets. If no packets are received, the switch will cut power to the ‘offending’ port for a moment, and then re-apply power to the port. The idea behind this implementation is that if a connected device with an active link no longer sends any traffic at all for a specific time interval, the device must be malfunctioning and therefore needs to be rebooted.

What are the limitations of PoE Self-Healing?

We have seen how PoE Self Healing technology can help improve the uptime of your networking devices. However, there are limitations to the technology, which will address here. Let's talk about a widespread scenario where a few network cameras and an NVR are connected to a PoE switch.

Of course, a network camera's most important function is to transmit video, possibly audio, and provide night-vision capabilities. Many technicians in the surveillance industry know that, in many cases, when a surveillance camera develops a fault and stops sending audio/video, the camera can still be accessed and configured via the web browser and continue to reply to ping commands. Since the ping command is so basic in TCP/IP networks for troubleshooting connectivity, reachability and name resolution and operates on a far lower level than the video encoding mechanism, an IP camera could respond to ping commands while no longer transmitting video. And since the camera still responds on a networking level, neither the PoE watchdog's ping nor the traffic monitoring version would recognize this problem.

Additional PoE management features, such as PoE-Scheduling, which allows for preemptively restarting conncted devices based on a fixed schedule, can be used to further improve the uptime of certain networking devices.

"PoE Self-Healing is not the be-all and end-all solution.
But it helps."