One of my customers is using various (Samsung) SSD’s in their servers, and the first of these have started reaching their end-of-life. SSD’s have a somewhat different failure scenario then spinning metal disks, so monitoring their life-expectancy can be interesting.
Besides just logging and graphing the SMART attributes, it is also handy to have some alerting on when certain thresholds are crossed. To do this, I’ve written a simple nagios/icinga script which will alert on interesting SMART attributes, and will also calculate the percentage of total guaranteed writes on the SSD’s. Since the guaranteed TBW value will differ between various SSD vendors and product-ranges, this value needs to be specified on the command-line by the user.
I’ve integrated this check-script into my normal monitoring-scripts, but it can off-course also be used as a stand-alone tool. If has options to specify the device to smartctl, so disks behind raid-controllers can also be monitored.