patch-2.4.19 linux-2.4.19/Documentation/watchdog-api.txt

Next file: linux-2.4.19/MAINTAINERS
Previous file: linux-2.4.19/Documentation/video4linux/meye.txt
Back to the patch index
Back to the overall index

diff -urN linux-2.4.18/Documentation/watchdog-api.txt linux-2.4.19/Documentation/watchdog-api.txt
@@ -0,0 +1,390 @@
+The Linux Watchdog driver API.
+
+Copyright 2002 Christer Weingel <wingel@nano-system.com>
+
+Some parts of this document are copied verbatim from the sbc60xxwdt
+driver which is (c) Copyright 2000 Jakob Oestergaard <jakob@ostenfeld.dk>
+
+This document describes the state of the Linux 2.4.18 kernel.
+
+Introduction:
+
+A Watchdog Timer (WDT) is a hardware circuit that can reset the
+computer system in case of a software fault.  You probably knew that
+already.
+
+Usually a userspace daemon will notify the kernel watchdog driver via the
+/dev/watchdog special device file that userspace is still alive, at
+regular intervals.  When such a notification occurs, the driver will
+usually tell the hardware watchdog that everything is in order, and
+that the watchdog should wait for yet another little while to reset
+the system.  If userspace fails (RAM error, kernel bug, whatever), the
+notifications cease to occur, and the hardware watchdog will reset the
+system (causing a reboot) after the timeout occurs.
+
+The Linux watchdog API is a rather AD hoc construction and different
+drivers implement different, and sometimes incompatible, parts of it.
+This file is an attempt to document the existing usage and allow
+future driver writers to use it as a reference.
+
+The simplest API:
+
+All drivers support the basic mode of operation, where the watchdog
+activates as soon as /dev/watchdog is opened and will reboot unless
+the watchdog is pinged within a certain time, this time is called the
+timeout or margin.  The simplest way to ping the watchdog is to write
+some data to the device.  So a very simple watchdog daemon would look
+like this:
+
+int main(int argc, const char *argv[]) {
+	int fd=open("/dev/watchdog",O_WRONLY);
+	if (fd==-1) {
+		perror("watchdog");
+		exit(1);
+	}
+	while(1) {
+		write(fd, "\0", 1);
+		sleep(10);
+	}
+}
+
+A more advanced driver could for example check that a HTTP server is
+still responding before doing the write call to ping the watchdog.
+
+When the device is closed, the watchdog is disabled.  This is not
+always such a good idea, since if there is a bug in the watchdog
+daemon and it crashes the system will not reboot.  Because of this,
+some of the drivers support the configuration option "Disable watchdog
+shutdown on close", CONFIG_WATCHDOG_NOWAYOUT.  If it is set to Y when
+compiling the kernel, there is no way of disabling the watchdog once
+it has been started.  So, if the watchdog dameon crashes, the system
+will reboot after the timeout has passed.
+
+Some other drivers will not disable the watchdog, unless a specific
+magic character 'V' has been sent /dev/watchdog just before closing
+the file.  If the userspace daemon closes the file without sending
+this special character, the driver will assume that the daemon (and
+userspace in general) died, and will stop pinging the watchdog without
+disabling it first.  This will then cause a reboot.
+
+The ioctl API:
+
+All conforming drivers also support an ioctl API.
+
+Pinging the watchdog using an ioctl:
+
+All drivers that have an ioctl interface support at least one ioctl,
+KEEPALIVE.  This ioctl does exactly the same thing as a write to the
+watchdog device, so the main loop in the above program could be
+replaced with:
+
+	while (1) {
+		ioctl(fd, WDIOC_KEEPALIVE, 0);
+		sleep(10);
+	}
+
+the argument to the ioctl is ignored.
+
+Setting and getting the timeout:
+
+For some drivers it is possible to modify the watchdog timeout on the
+fly with the SETTIMEOUT ioctl, those drivers have the WDIOF_SETTIMEOUT
+flag set in their option field.  The argument is an integer
+representing the timeout in seconds.  The driver returns the real
+timeout used in the same variable, and this timeout might differ from
+the requested one due to limitation of the hardware.
+
+    int timeout = 45;
+    ioctl(fd, WDIOC_SETTIMEOUT, &timeout);
+    printf("The timeout was set to %d seconds\n", timeout);
+
+This example might actually print "The timeout was set to 60 seconds"
+if the device has a granularity of minutes for its timeout.
+
+Starting with the Linux 2.4.18 kernel, it is possible to query the
+current timeout using the GETTIMEOUT ioctl.
+
+    ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
+    printf("The timeout was is %d seconds\n", timeout);
+
+Envinronmental monitoring:
+
+All watchdog drivers are required return more information about the system,
+some do temperature, fan and power level monitoring, some can tell you
+the reason for the last reboot of the system.  The GETSUPPORT ioctl is
+available to ask what the device can do:
+
+	struct watchdog_info ident;
+	ioctl(fd, WDIOC_GETSUPPORT, &ident);
+
+the fields returned in the ident struct are:
+
+        identity		a string identifying the watchdog driver
+	firmware_version	the firmware version of the card if available
+	options			a flags describing what the device supports
+
+the options field can have the following bits set, and describes what
+kind of information that the GET_STATUS and GET_BOOT_STATUS ioctls can
+return.   [FIXME -- Is this correct?]
+
+	WDIOF_OVERHEAT		Reset due to CPU overheat
+
+The machine was last rebooted by the watchdog because the thermal limit was
+exceeded
+
+	WDIOF_FANFAULT		Fan failed
+
+A system fan monitored by the watchdog card has failed
+
+	WDIOF_EXTERN1		External relay 1
+
+External monitoring relay/source 1 was triggered. Controllers intended for
+real world applications include external monitoring pins that will trigger
+a reset.
+
+	WDIOF_EXTERN2		External relay 2
+
+External monitoring relay/source 2 was triggered
+
+	WDIOF_POWERUNDER	Power bad/power fault
+
+The machine is showing an undervoltage status
+
+	WDIOF_CARDRESET		Card previously reset the CPU
+
+The last reboot was caused by the watchdog card
+
+	WDIOF_POWEROVER		Power over voltage
+
+The machine is showing an overvoltage status. Note that if one level is
+under and one over both bits will be set - this may seem odd but makes
+sense.
+
+	WDIOF_KEEPALIVEPING	Keep alive ping reply
+
+The watchdog saw a keepalive ping since it was last queried.
+
+	WDIOF_SETTIMEOUT	Can set/get the timeout
+
+
+For those drivers that return any bits set in the option field, the
+GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
+status, and the status at the last reboot, respectively.  
+
+    int flags;
+    ioctl(fd, WDIOC_GETSTATUS, &flags);
+
+    or
+
+    ioctl(fd, WDIOC_GETBOOTSTATUS, &flags);
+
+Note that not all devices support these two calls, and some only
+support the GETBOOTSTATUS call.
+
+Some drivers can measure the temperature using the GETTEMP ioctl.  The
+returned value is the temperature in degrees farenheit.
+
+    int temperature;
+    ioctl(fd, WDIOC_GETTEMP, &temperature);
+
+Finally the SETOPTIONS ioctl can be used to control some aspects of
+the cards operation; right now the pcwd driver is the only one
+supporting thiss ioctl.
+
+    int options = 0;
+    ioctl(fd, WDIOC_SETOPTIONS, options);
+
+The following options are available:
+
+	WDIOS_DISABLECARD	Turn off the watchdog timer
+	WDIOS_ENABLECARD	Turn on the watchdog timer
+	WDIOS_TEMPPANIC		Kernel panic on temperature trip
+
+[FIXME -- better explanations]
+
+Implementations in the current drivers in the kernel tree:
+
+Here I have tried to summarize what the different drivers support and
+where they do strange things compared to the other drivers.
+
+acquirewdt.c -- Acquire Single Board Computer
+
+	This driver has a hardcoded timeout of 1 minute
+
+	Supports CONFIG_WATCHDOG_NOWAYOUT
+
+	GETSUPPORT returns KEEPALIVEPING.  GETSTATUS will return 1 if
+	the device is open, 0 if not.  [FIXME -- isn't this rather
+	silly?  To be able to use the ioctl, the device must be open
+	and so GETSTATUS will always return 1].
+
+advantechwdt.c -- Advantech Single Board Computer
+
+	Timeout that defaults to 60 seconds, supports SETTIMEOUT.
+
+	Supports CONFIG_WATCHDOG_NOWAYOUT
+
+	GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
+	The GETSTATUS call returns if the device is open or not.
+	[FIXME -- silliness again?]
+	
+eurotechwdt.c -- Eurotech CPU-1220/1410
+
+	The timeout can be set using the SETTIMEOUT ioctl and defaults
+	to 60 seconds.
+
+	Also has a module parameter "ev", event type which controls
+	what should happen on a timeout, the string "int" or anything
+	else that causes a reboot.  [FIXME -- better description]
+
+	Supports CONFIG_WATCHDOG_NOWAYOUT
+
+	GETSUPPORT returns CARDRESET and WDIOF_SETTIMEOUT but
+	GETSTATUS is not supported and GETBOOTSTATUS just returns 0.
+
+i810-tco.c -- Intel 810 chipset
+
+	Also has support for a lot of other i8x0 stuff, but the
+	watchdog is one of the things.
+
+	The timeout is set using the module parameter "i810_margin",
+	which is in steps of 0.6 seconds where 2<i810_margin<64.  The
+	driver supports the SETTIMEOUT ioctl.
+
+	Supports CONFIG_WATCHDOG_NOWAYOUT.
+
+	GETSUPPORT returns WDIOF_SETTIMEOUT.  The GETSTATUS call
+	returns some kind of timer value which ist not compatible with
+	the other drivers.  GETBOOT status returns some kind of
+	hardware specific boot status.  [FIXME -- describe this]
+
+ib700wdt.c -- IB700 Single Board Computer
+
+	Default timeout of 30 seconds and the timeout is settable
+	using the SETTIMEOUT ioctl.  Note that only a few timeout
+	values are supported.
+
+	Supports CONFIG_WATCHDOG_NOWAYOUT
+
+	GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
+	The GETSTATUS call returns if the device is open or not.
+	[FIXME -- silliness again?]
+
+machzwd.c -- MachZ ZF-Logic
+
+	Hardcoded timeout of 10 seconds
+
+	Has a module parameter "action" that controls what happens
+	when the timeout runs out which can be 0 = RESET (default), 
+	1 = SMI, 2 = NMI, 3 = SCI.
+
+	Supports CONFIG_WATCHDOG_NOWAYOUT and the the magic character
+	'V' close handling.
+
+	GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call
+	returns if the device is open or not.  [FIXME -- silliness
+	again?]
+
+mixcomwd.c -- MixCom Watchdog
+
+	[FIXME -- I'm unable to tell what the timeout is]
+
+	Supports CONFIG_WATCHDOG_NOWAYOUT
+
+	GETSUPPORT returns WDIOF_KEEPALIVEPING, GETSTATUS returns if
+	the device is opened or not [FIXME -- I'm not really sure how
+	this works, there seems to be some magic connected to
+	CONFIG_WATCHDOG_NOWAYOUT]
+
+pcwd.c -- Berkshire PC Watchdog
+
+	Hardcoded timeout of 1.5 seconds
+
+	Supports CONFIG_WATCHDOG_NOWAYOUT
+
+	GETSUPPORT returns WDIOF_OVERHEAT|WDIOF_CARDRESET and both
+	GETSTATUS and GETBOOTSTATUS return something useful.
+
+	The SETOPTIONS call can be used to enable and disable the card
+	and to ask the driver to call panic if the system overheats.
+
+sbc60xxwdt.c -- 60xx Single Board Computer
+
+	Hardcoded timeout of 10 seconds
+
+	Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic
+	character 'V' close handling.
+
+	No bits set in GETSUPPORT
+
+scx200.c -- National SCx200 CPUs
+
+	Not in the kernel yet.
+
+	The timeout is set using a module parameter "margin" which
+	defaults to 60 seconds.  The timeout can also be set using
+	SETTIMEOUT and read using GETTIMEOUT.
+
+	Supports a module parameter "nowayout" that is initialized
+	with the value of CONFIG_WATCHDOG_NOWAYOUT.  Also supports the
+	magic character 'V' handling.
+
+shwdt.c -- SuperH 3/4 processors
+
+	[FIXME -- I'm unable to tell what the timeout is]
+
+	Supports CONFIG_WATCHDOG_NOWAYOUT
+
+	GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call
+	returns if the device is open or not.  [FIXME -- silliness
+	again?]
+
+softdog.c -- Software watchdog
+
+	The timeout is set with the module parameter "soft_margin"
+	which defaults to 60 seconds, the timeout is also settable
+	using the SETTIMEOUT ioctl.
+
+	Supports CONFIG_WATCHDOG_NOWAYOUT
+
+	WDIOF_SETTIMEOUT bit set in GETSUPPORT
+
+w83877f_wdt.c -- W83877F Computer
+
+	Hardcoded timeout of 30 seconds
+
+	Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic
+	character 'V' close handling.
+
+	No bits set in GETSUPPORT
+
+wdt.c -- ICS WDT500/501 ISA and
+wdt_pci.c -- ICS WDT500/501 PCI
+
+	Default timeout of 60 seconds.  The timeout is also settable
+        using the SETTIMEOUT ioctl.
+
+	Supports CONFIG_WATCHDOG_NOWAYOUT
+
+	GETSUPPORT returns with bits set depending on the actual
+	card. The WDT501 supports a lot of external monitoring, the
+	WDT500 much less.
+
+wdt285.c -- Footbridge watchdog
+
+	The timeout is set with the module parameter "soft_margin"
+	which defaults to 60 seconds.  The timeout is also settable
+	using the SETTIMEOUT ioctl.
+
+	Does not support CONFIG_WATCHDOG_NOWAYOUT
+
+	WDIOF_SETTIMEOUT bit set in GETSUPPORT
+
+wdt977.c -- Netwinder W83977AF chip
+
+	Hardcoded timeout of 3 minutes
+
+	Supports CONFIG_WATCHDOG_NOWAYOUT
+
+	Does not support any ioctls at all.
+

FUNET's LINUX-ADM group, linux-adm@nic.funet.fi
TCL-scripts by Sam Shen (who was at: slshen@lbl.gov)