Archive for June, 2005

computer problems

Friday, June 24th, 2005

I hate computers. Thinking it may be a good idea to consolidate some of our machines into virtual images under Xen, UML, or VServers.

Recently my workstation ran out of memory/swap at nearly 1 year uptime. It hadn't been rebooted since I moved into the house. Of course, Linux went crazy and killed the X server instead of whatever was responsible, so the console was unusable, and ssh couldn't fork a shell process, so I had no choice but to hit the reset button.

Today:
The power went out at home around 4:30 while I was at work. Eventually, the UPSes shut down, so I noticed a lack of network connectivity to my home machines at that point. The UPSes powered back on, but hardly any machine came back up correctly. The server console IBM keyboard was non-functional for some reason…

router: Couldn't bring up the network because of a malformed interfaces file that it had previously accepted. DNS server was non-functional, upgraded to latest version and re-did config files, seemed to fix it.
hydra: Can't find kernel. Of course, this machine has a non-functional PS/2 keyboard/mouse bus, so a console is impossible to use on it. The hard drive will have to be examined in another machine.
dbz: Couldn't log in because pam_krb5 couldn't find krb5.conf – even though pam_krb5 is optional! Sigh. I have the same problem on laptops since they are frequently disconnected from the network. Filed a bug.
zephyr: Came back up and auto-fscked after the outage. Spewing ATA errors all over the console because the Maxtor drive is all of a sudden dying, fsck continues blindly and “fixes” whatever it finds, probably trashing whatever I might have otherwise been able to recover. A SMART test on the drive fails out immediately. Applied for advance RMA. May have to send this one to a recovery house. fsck should not be so braindead when there is obviously a failing hardware. It should also not attempt to fsck partitions which are set 'noauto'. Just in case, I will be getting a new power supply for this machine because this is the third Maxtor drive which has failed within two years with it.
fileserver: Rebooted into 2.6 kernel with no RAID. ide-generic is not depended on by any IDE drivers, so it had to be entered into the initrd modules list or no hard drive was accessible. Apparently RAID and udev don't get along, so mdadm needs to create the md device nodes. CPU fan alarm was on, but this time it was not a false alarm – CPU2 fan had become clogged and noisy. Took it off the heatsink, blew both pieces off with compressed air, and oiled the fan, plugged the hole with lithium grease since it was previously capped with hard plastic. We will see how long it lasts.

blizzard,aurora,syrinx: since fileserver and DNS were down, these machines were very confused.
laptop: CD drive no longer responds correctly to ATA identify commands, so the drive is impossible to use.

I definitely need a supply of spare fans as well as a big enough hard drive to back up my 200 gig, two 40 gigs and 20 gig. Hard drive failures are ridiculously annoying.