The attack is simple. I create a page which looks like legitimate content and link to your site. Only the link includes some nefarious query string, designed to (hopefully) hack your site. Then I let Google index my page, e.g. submitting a sitemap. Google will scan my page and try to follow all links, including the ones with the nefarious parts. It's actually very simple. The benefit of this method is twofold:
- You don't now who tried or succeeded to hack you, so you can't block them or easily trace it back to the attacker. Tougher forensics mean less risk of being caught.
- Many sites give Googlebot a "free pass", not enforcing any security checks to traffic generated by GoogleBot. This means that the possibility of the attack being blocked is significantly lower.
Regarding the solution I proposed. Here's what's going on now. If GoogleBot repeatedly triggers exceptions on your site, its IP will be automatically blocked be Admin Tools. You don't want that. You want Googlebot to continue seeing the 403s for the nefarious URLs (so that it will stop trying to index them after a while), but still be able to "see" the rest of your site. That's exactly what my proposed solution does.
Nicholas K. Dionysopoulos
Lead Developer and Director
🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!