I ran into the following problems while helping with a friend’s WordPress blog.
- The FeedWordPress plugin, when updating a feed (either scheduled or on demand), would throw an XML parser error.
Feed Error: [http://abc.com/feed/rss/] update returned error: This XML document is invalid, likely due to invalid characters. XML error: Attribute without value at line 37, column 42
If I fetched the same feed URL in a browser or with wget
, it worked and produced a valid XML file (also, one that had far fewer than 37 lines). Mystery! Also,
- The Broken Link Checker plugin shows a “403 Forbidden” in the Status column for some webpage, http://abc.com/page.html.
But again, if I fetched the same supposedly failing URL by other means, there was no problem.
So, how could I find the cause?
There is a nice diagnostic feature of FeedWordPress. It can enabled by going to Syndication->Diagnostics
. Under Diagnostics output
, check “Echo in web browser as they are issued”; it can be easier to read the raw output that will be dumped. Then under Update Diagnostics
, click “displaying the raw HTTP data passed to and from the feed being checked for updates”. When you update a feed you should see raw headers and HTML in the output.
By doing this, I found that the output returned from the RSS server to the WordPress blog server was different: It was a CloudFlare captcha test!
Please complete the security check to access abc.com
The root cause was apparently a bad reputation for the server’s IP address, which could be verified by checking SenderBase and/or RBL lists. In this case, another domain (hosting client) served by the same IP address as the WordPress blog seemed to be sending a large volume of email that had been reported as spam. It seems that IP reputation triggers CloudFlare to require a captcha solution to pass its perimeter.
To avoid failures, a service automatically pulling RSS feeds or other information from services protected by CloudFlare will need to ensure that it is served from an IP address with a good reputation, or this functionality will break in non-obvious ways.
The above debugging steps should help find similar problems. For example, the FeedWordPress debugging feature could also be used to find problems fetching other arbitrary links on the server side (such as those flagged by Broken Link Checker).