I have a small OrangePi zero on my home network turn on my desktop PC using Wake-on-Lan to SSH in and grab files/run processing when I am out and about. I use zerotier so I can SSH into both the orangepi and my desktop without needing Dynamic DNS or opening ports in my router.
However, the SD card used is a bit unreliable, and so occasionally the device will lock up. It would be nice to detect this automatically and reboot when it happens. The board also has some LEDs, which we could use to show the health of the system as well.
I am using Armbian, but these scripts should work with most embedded linux boards, providing you change the LEDs used 1
Setting up the watchdog
A watchdog timer is a hardware device which counts down 2 at a fixed rate unless reset or “kicked”. When it reaches 0, the system is reset. Watchdog timers are useful because they allow us to restart the system automatically if a process crashes of freezes, by having that process periodically reset the watchdog.
So useful and simple are watchdog peripherals, that they can be found built into most processors, even tiny 8-bit devices. The Allwinner H3 processor on our OrangePi contains a watchdog peripheral, as does the ever popular Raspberry Pi.
Linux’s support for watchdogs consists of 2 parts, a kernel driver that creates the /dev/watchdog
device, and a userspace daemon watchdog(8) to repeatedly check the system is behaving and write to the watchdog device causing the countdown timer to reset. If misbehavior is detected, the daemon will trigger a clean shutdown. If the daemon itself fails, the hardware watchdog timer will time out and trigger a reset, which may lead to filesystem corruption, but this is generally a reasonable risk since if the watchdog daemon is dead the system itself is probably not in a recoverable state.
Installing the watchdog daemon is easy, just run sudo apt-get install watchdog
.
However the daemon is not configured to do very much by default, and so we need to edit the configuration file /etc/watchdog.conf
to check for some errors (e.g. loss of network).
I have the following settings which checks network connectivity, the zerotier service zerotier-one
is running, and runs a script to flash an LED. By extension, this checks the filesystem is accessible as well, as the LED script cannot be run if the filesystem has failed.
/etc/watchdog.conf
ping = 8.8.8.8 # Check google DNS
ping = 192.168.0.1 # Check router
interface = eth0
interval = 20 # Run tests every 20 seconds
test-binary = /usr/local/bin/watchdog_led.sh
test-timeout = 10
realtime = yes
priority = 1
# Check if zerotier-one is still running by enabling the following line
pidfile = /var/lib/zerotier-one/zerotier-one.pid
Initializing LEDs
The Linux LED subsystem exposes LED’s under /sys/class/leds
, which can be controlled directly or controlled from an event source in the kernel using a trigger.
A number of triggers are available, such as disk-activity
, bluetooth-power
etc. Here we are setting the red LED to none
so it is directly controlled, and the green led to heartbeat
which causes the kernel to flash it at a rate proportional to load, starting at once every few seconds.
The following script is run once at boot using a trivial systemd service. By default, my Armbian distribution uses the green LED as a power LED, so this script also causes it to change to flashing when booted, which is a pleasant side effect.
heartbeat_led.sh
Put under /usr/local/bin/heartbeat_led.sh
and make executable.
#!/bin/bash
#Enable the heartbeat LED on boot, and set status LED to on demand
set -e
echo heartbeat > /sys/devices/platform/leds/leds/orangepi:green:pwr/trigger
echo "Enabled heartbeat led"
echo none > /sys/devices/platform/leds/leds/orangepi:red:status/trigger
echo 0 > /sys/devices/platform/leds/leds/orangepi:red:status/brightness
echo "Turned off status LED"
heartbeat_led.service
Put under /etc/systemd/system/heartbeat_led.service
3 and run sudo systemctl enable heartbeat_led.service
.
[Unit]
Description=Heartbeat LED enabler
[Service]
Type=simple
ExecStart=/bin/bash /usr/local/bin/heartbeat_led.sh
[Install]
WantedBy=multi-user.target
Flashing the red LED every time the watchdog runs
The watchdog daemon can run external programs to check the status of the system, either a single program or a collection of them in /etc/watchdog.d
. A return code other than 0
indicates failure, and the watchdog will restart the system.
We can slightly abuse this functionality to flash an LED every time the watchdog is run, and so get another visual indication the watchdog has actually been started. The script below turns on the LED for half a second, and is run by the daemon by setting test-binary = /usr/local/bin/watchdog_led.sh
in watchdog.conf
. Therefore, the LED will flash every time the daemon checks the system, which is every 20 seconds in my case.
Note this program needs the LED trigger to be set to none
, which is done by heartbeat_sh.led
. Otherwise, the LED trigger may have been set by your distro to something else (e.g. disk activity) and you will get unexpected results.
watchdog_led.sh
Save this under /usr/local/bin/watchdog_led.sh
and make it executable.
#!/bin/bash
# Chris Hemingway 2019, MIT License
# Flash the LED every time the watchdog runs
# Must be run as root, which should be the case for watchdogd
LED=/sys/devices/platform/leds/leds/orangepi:red:status
set -e # Exit on first error
echo 255 > $LED/brightness
sleep 0.5
echo 0 > $LED/brightness
exit 0 # Success
-
Run
ls /sys/class/leds/
to see what LEDs your board has. If you don’t see any, your kernel might not have any configured, and you will need to change your kernel config or add a device tree overlay. ↩︎ -
Can be up on some systems, but this is abstracted away. ↩︎
-
As this is not from a debian package etc,
/etc/systemd/system
is the preferred directory. ↩︎