Yang Chen, Omprakash Gnawali, Maria Kazandjieva, Philip Levis, John Regehr, Surviving Sensor Network Software Faults, In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP 2009), October 11-14, 2009.

Abstract:

We describe Neutron, a version of the TinyOS operating system that efficiently recovers from memory safety bugs. Where existing schemes reboot an entire node on an error, Neutron's compiler and runtime extensions divide programs into recovery units and reboot only the faulting unit. The TinyOS kernel itself is a recovery unit: a kernel safety violation appears to applications as the processor being unavailable for 10'20 milliseconds. Neutron further minimizes safety violation cost by supporting 'precious' state that persists across reboots. Application data, time synchronization state, and routing tables can all be declared as precious. Neutron's reboot sequence conservatively checks that precious state is not the source of a fault before preserving it. Together, recovery units and precious state allow Neutron to reduce a safety violation's cost to time synchronization by 94% and to a routing protocol by 99.5%. Neutron also protects applications from losing data. Neutron provides this recovery on the very limited resources of a tiny, low-power microcontroller.

Bibtex:

@inproceedings{cgklr-ssnsf-09,
	author = {Yang Chen and Omprakash Gnawali and Maria Kazandjieva and Philip Levis and John Regehr},
	title = {{Surviving Sensor Network Software Faults}},
	booktitle = "{Proceedings of 22nd ACM Symposium on Operating Systems Principles (SOSP 2009)}",
	year = "2009",
	month = "October",
	address = "Big Sky, Montana, USA",
}