Support

Admin Tools for WordPress

#41295 Site going up and down - hight bandwidth?

Posted in ‘Admin Tools for WordPress’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

WordPress version
6.6.2
PHP version
8.2.18
Admin Tools version
1.6.7

Latest post by anotart on Wednesday, 13 November 2024 15:42 CST

anotart

This site has been going up and down since the end of August --- almost every day and many times 4 times a day.  Each occurrence is a few seconds to around 30 minutes -- most frequently 7 - 14 minutes. I became aware of this a few weeks ago and have been unable to figure it out. Hostgator has not been helpful. (mysigtes.guru sends me notices -- this is how I discovered it) When it happens, there is just a white screen -- no error message. Mysitesguru either says ETIMEDOUT or 500.

I noticed that the bandwidth usage is extremely high and started at this level in May (I included a pdf showing the bandwidth history).  I'm also wondering about the bots (including that list from Oct and Nov from AWStats).  I suspect this is causing the problem.  The last two days I've been combing through the Admin Tools manual and made many adjustments.  One that I hoped might help was the .htacess maker with the Block User Agents turned on.  However, the site has gone down twice since implementing that.

In the Oct list, the first thing I saw was related to Facebook and by googling, I found a plugin that would stop that.  I implemented it and it looks like that has stopped (I don't know if that's good or not).

This site is not complex and has been running fine for years before this. There have been no addition/changes for a long time, other than minor content changes. I have talked to my client about moving it off her Hostgator account, but we have to deal with 30GB of email first (I wondered if that was the issue -- getting mixed answers on that). I talked to my A2 Hosting support, and they encouraged me to find the cause of this before trying to move it (They are the ones that clued me into the AWStats bot list.)

Can Admin Tools help with the high bandwidth usage or whatever else might be causing this? Can you point me in any other direction to solve this problem? I've never dealt with anything like this before.

Thank you!

nicholas
Akeeba Staff
Manager

The naïve answer would be that yes, of course Admin Tools can help you, since .htaccess Maker has a feature which allows you to block traffic by User-Agent string.

This is a naïve answer because it ignores where your traffic comes from. For example, the majority of your traffic for October came from the Meta crawler used by Facebook / Instagram / Messenger. Sure you can block it, but your site will become invisible to these services. Is this really what you want?

The second biggest traffic source is search engine indexing. Again, sure, you can block them, but your site will become invisible to search engines. I am pretty confident you do NOT want that.

What I would do instead is put the site behind CloudFlare CDN, and use the System - Page Cache plugin to make the frontend of the site cacheable. This would serve the majority of your traffic directly from CloudFlare CDN, without hitting your server. This would alleviate your problems without taking drastic action which would make your site invisible to what I presume is how people get to find out that your site even exists.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

anotart

Thank you, Nicholas,

I apologize for the very late reply -- I've been mentally exhausted and "other occupied" from the results of our election.

Thank you for your suggestion -- I am naive about all of this.  Actually, the plugin I added "throttled" Facebook (facebookexternalhit) -- and I still see that item lower down on the list with an updated last visit date.

I was confused by your comment that the GB+ item was legit because I saw Googlebot, Bingbot, Applebot and others lower on the list with reasonable bandwidth.  Or, maybe you thought I intended to add them all to your tool?

One thing I discovered when looking at the raw access logs was many occurrences of this: meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler).  When I looked that up, I found this on the Meta website:

Meta-ExternalAgent

The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.

The specific UA string that you will see in your log files will be similar to one of the following:

  • meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
  • meta-externalagent/1.1

This crawler will roll out gradually over the next few weeks, expected to complete by 10/31/2024.

***

Somewhere else, I think I found start dates for this that lined up with when the bandwidth started being consumed. Also, I don't think I care if Meta can't teach its AI from this site. I added Meta-ExternalAgent to your provided user-agents to block, and the site hasn't gone down since and the bandwidth is more in line with the other similar sites I manage. I'm still seeing both the meta strings in the log, but it's not filling it -- don't understand this.

I hoped to be able to avoid adding Cloudflare -- I used it in the past and it just added confusion I hope to avoid.  The site was fine for years until August.

This is all very new to me -- I wanted to run this by you to make sure what I discovered makes sense and what I did doesn't cause a different problem. I'm attaching today's raw access log, image of robots & spiders and bandwidth by day that show's the dramatic drop.

Thank you for your help!

anotart

Re-sending log that didn't sent in prior comment

nicholas
Akeeba Staff
Manager

Since you are okay with blocking bots knowing full well what that entails, yes, what you are doing is not only correct, it's exactly what I would recommend doing for blocking specific bots from your site.

Before signing off, I want to give you my perspective on “AI” and social media bots, starting with the disclaimer that I find “AI” (can we please call them what they are, LLMs?) to be 90% hype and 10% substance, and social media to be the cesspits of our world.

The reality is that no matter what is my personal opinion on LLMs (Large Language Models – what people very erroneously call “AI” these days), a significant number of our clients use them as a faster substitute for support. This creates a bit of a problem for me. If these LLMs don't have training data from our public tickets they will regurgitate the very frequently asinine suggestions you find in other public fora, like the Joomla! Forum. Our clients will be getting wrong information, and stupidly blame us for it. Therefore it is in our best interest to allow these bots harvesting our public tickets for AI training to do so.

Even though my regard for mainstream social media is emphatically low, I would be remiss to ignore the vibrant local Joomla! communities using them to stay in touch, and indeed share content from my sites. Therefore, I have to allow these bots as well.

These are finer points that most clients I've had a discussion with about blocking bots have not considered. That's why I tend to err towards allowing these bots in my replies to you, and let you know about these points. I think it's important that voices of reason don't erase themselves out of every tool of mass communication.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

anotart

Thank you so much, Nicholas!  And, thanks for your perspective.  I have been EXTREMELY frustrated with company support that is using AI or LLM for (chat and phone)-- not only has it NEVER helped me -- too many times it sends me into loops I can't get out of trying to get to a real person.  The answers they provide I've already researched.  Chat for dummies, I guess. I totally understand your points and why you need them to access your site.  My main reason to block this is that it keeps bringing down the site -- and they have no need for Facebook to be learning from them anyway. I really don't understand why it consumed so much bandwidth -- and so far, I haven't seen that on any other site.

I appreciate your help and that you are always so willing to go further and help me understand more. 

Have a good day,

Anne