[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
ADMIN: apology/explanation/corrective-action
Apology:
I apologize for the duplicated messages which were sent to the
tp750 list, and for whatever loss of service you suffered as
a result. I attempt to make my mailing lists robust and
reasonably immune to catastrophe, but this time it fell short
of that goal.
Explanation:
The massive dumping of duplicate messages to the tp750 list was caused
by a criminally brain-damaged mail gateway used by three of the
recipients of the list. For every mail message sent to the list, the
gateway was generating three separate bounce messages to the list owner
(that's me). In addition, the gateway was sending the entire message
back to the list. Our log files show a total of 1854 errant messages
sent by that gateway before it was stopped.
It took a long time to find the problem because of the tremendous
traffic load which caused: our mail server to run out of swap
space (thus slowing down local delivery), and my postmaster mailbox
to run out of disk space (which also caused delivery failure to my
tp750 mailbox). I spent quite some time attempting to clean up
those problems before I realized that the tp750 list was also
getting flooded. At that point I shut down our entire mail system,
and spent several hours flushing all of the messages which were queued
for the list, and devising a way to turn things back on again without
causing another loop. All in all it took a total of about 8 hours
from the time I noticed the problem until I was able to turn things
back on.
What makes this even more annoying (to me) is that I have contacted
this site several times to warn them that their mail gateway software
was likely to cause just this kind of problem, and urging them to
get their software fixed. They never even bothered to reply.
(In the future, I will immediately remove any site whose mail software
demonstrates the same bugs as this one.)
Corrective action:
1) the recipients have been permanently removed from the tp750 list
2) the mail server at cs.utk.edu now has a "dead end" route installed
so that it cannot talk to the mailer that caused the problem.
(the other end will always see "connection timed out").
Other actions (technical and otherwise) are being considered.
Suggestions for retaliation are welcomed.
Keith Moore
postmaster@cs.utk.edu