Support

Akeeba Backup for Joomla!

#33661 Remote quota management

Posted in ‘Akeeba Backup for Joomla! 4 & 5’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

Joomla! version
n/a
PHP version
n/a
Akeeba Backup version
n/a

Latest post by on Friday, 09 October 2020 01:17 CDT

davidtorr

Please look at the bottom of this page (under Support Policy Summary) for our support policy summary, containing important information regarding our working hours and our support policy. Thank you!

EXTREMELY IMPORTANT: Please attach a ZIP file containing your Akeeba Backup log file in order for us to help you with any backup or restoration issue. If the file is over 2Mb, please upload it on your server and post a link to it.

Description of my issue:

I am trying to configure remote quotas. We have a 100 GB account with Google and our typical backup is around 15GB so it seemed sensible to set a remote quota of 50GB. GB is not an option so I typed in 50000 MB and it was immediately reset to 4390.80 MB, which is not exactly very useful!

tampe125
Akeeba Staff

Hello,

what you see is the maximum allowed size, since some PHP installation can't handle larger numbers. As soon as we can raise the minimum PHP requirements to PHP 7.0 and stop supporting PHP 5.6, we will increase the maximum limit.

By the way, be aware that quotas are applied on profile basis, so if you have multiple sites and or profiles being backed up on the same Google Drive account, they won't work. 

Davide Tampellini

Developer and Support Staff

🇮🇹Italian: native 🇬🇧English: good • 🕐 My time zone is Europe / Rome (UTC +1)
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

davidtorr

Thank you for the response. A pity that such limits were not mentioned anywhere I could see on the website when I purchased the product, as I now have to rethink how we will use it and indeed whether it will be usable. Yes we can switch to a certain number of past versions being kept but that will still require intervention to check how much space is being used.

At the moment I would say your product is only really suitable for small websites and not those with reasonable amounts of data to back up.

nicholas
Akeeba Staff
Manager

Hello David,

I am Nicholas, the business owner and lead developer of this software since 2006, when it was first released as JoomlaPack.

The way the quota management works is documented in the Quota Management documentation page. Let me copy and paste the exact wording:

Quotas let you automatically remove backup archives and / or backup records based on specific criteria. Quotas are always calculated against the , not the backup archives on disk on or on remote storage. In other words, if you do not see a backup record in the Manage Backups page it is NOT taken into account when applying quotas.

Furthermore, quotas will take into account only the backup record, without checking if the file exists. If a backup is listed as OK or Remote in the Manage Backups page it participates in the quotas.

The quotas apply . They will only take into account backup records in the same backup profile.

Finally note that the quotas are only being applied at the end of a successful backup, even if post-processing (transferring it to remote storage) failed. It is therefore recommended that you keep an eye out for failed transfers – appearing as warnings in the backup logs and the CLI backup script's output – to avoid an over-zealous quota setting from removing your last full, good backup.

If you thought this wording was not clear enough you could have always ask a pre-sales question before making a purchase. We are obliged by EU law to answer these questions truthfully.

As to why it is implemented how it is, it's because having 14 years of experience writing backup software we can see that what you are asking is bound to cause significant problems. In fact, these are things that yours truly had already thought about a decade ago (March 2010 if you want me to be more precise) when choosing how to implement remote quotas. Please let me explain the top problems.

First of all, this would require recursively listing the contents of the remote folder and all of its subdirectories. Especially for Google Drive this incurs a 4-5 second delay per subdirectory. If you have more than 100 files you need to add this delay for each and every batch of 100 files. A Google Drive that holds the backups of 20 different sites with an average backup size of 100MB and a limit of 50GB would need more than 2 minutes to list its contents. This could only ever work when running backups from CLI, if and only if the host did not impose a small maximum CPU usage limit. It would also only work if PHP has enough memory to hold the list of files and sizes times three which is not a given either. The vast majority (over 95%) of our users do not fall in this very narrow set of criteria. Therefore we'd be shipping software that is broken by default. 

But let's say that this could be an option so that only people like you who do fall in this narrow set of criteria would enable. The next problem we have is that we let you choose freeform naming of the backup archives and we let you change that name anytime. There is no guarantee that an Akeeba Backup archive will be named in any particular way and there is no reliable way to deduce the site and backup profile that generated a file. The only assumption we can make is that any file with an extension of .jpa, .jps, .zip, .j** (e.g. .j01, .j02, ...) or .z** (e.g. .z01, .z02, ...) is likely to be an Akeeba Backup archive.

It is impossible to go back and force the naming of backup archives. Not only it's a regression (we are removing functionality established since the very first version in 2006), it would also be futile! Think about what happens when you have a site accessible from different domains/subdomains e.g. www.example.com, www.example.net and example.net. If the subdomain and domain name are to be used as keys for determining the originating site we are already considering backups of the same site taken by accessing the backend from its different sub-/domains as different. This would lead to false assumptions. We would have to use a long, random ID but that would make the archives useless for humans. Which site does the file qwetgwnwert62ugw_002_1599551259.jpa correspond to and when was it taken? Something like site-www.akeeba.com-full_site-20200908-104739eest.jpa is far easier to understand. Basically, there is no good way of having a naming scheme that makes sense to both humans and machines. Our clients overwhelmingly prefer something that makes sense to humans because that's the only way to restore a site when everything on the server is lost.

This creates a very dangerous situation when applying remote quotas based on the results of listing a remote folder's contents. For example, if you accidentally select your Google Drive root the quotas would be applied to all files stored there, even if they are not in fact related to Akeeba Backup. You would end up losing all ZIP files in your Drive, even if they are completely unrelated to Akeeba Backup.

Even if you use a special folder on your Drive just for backups this still has major problems. As I said, there's no way to know which site and profile the backup was taken with. Therefore the quota setting would remove files indiscriminately, even if it's the only backup of a specific site and/or backup profile. You'd only know when it's too late, when you want to restore a backup that's not there. Chances are you'd blame us, not yourself, for it. In fact, even with the very strict and confined remote quotas this is still happening. This tells us that if humans can't take responsibility for something that can be described in a single sentence at eight grade reading level how can we possibly ask them to understand something that is far more abstract and convoluted? It would be terrible UX.

Moreover, quotas would be applied on each backup run, on each site. This does NOT put any guarantees against concurrent execution of the quotas. If anything, it makes it very likely. At best you'd have multiple sites trying to delete the same files at the same time which causes an error that stops quotas dead in their tracks. At worst, they'd be trying to delete each other's files creating dangerous inconsistencies in the Manage Backups page on each site.

This also means that there's a lot of opportunity for a small human error causing massive data loss. Think about having, say, 20 sites backing up to the same Drive folder. In 18 of the sites you disable the count quota (which was enabled by default) and enable the size quota, setting it to 50GB. In one site you forget to unset the count quota which has a default value of 3. In another site you accidentally set the quota to 50MB. Each of your sites backs up to around 100MB. Guess what will happen next? Only the backup of the site with the 50MB quota would survive remote quotas. Everything else would be deleted forever. If one of the other sites needs to be restored from a backup you'd be surprised to find out its backups are wiped. Are you going to accuse yourself or would you file an angry ticket about our crappy software losing your site? Based on 14 years of experience, you'd do the latter and we'd have to prove we're not elephants.

Also, by having n number of sites all apply quotas on the same remote storage folders you can no longer have frozen records or keep specific backup records like we do with the backup age quotas. This is a corollary of not being able to deduce which site and profile a backup was taken from.

So, what you think you want us to do is MOST DEFINITELY NOT what you actually need us to do. We shouldn't ever give you a feature with so many hidden gotchas that will cause permanent data loss. That would be unconscionable at best.

Based on our experience, what you really need is the quotas as implemented. Take a few backups and see how big they are. You can put that information in a spreadsheet and get a feeling of how many backups you can afford to keep from each site (and leave some room for frozen records). You can them use count quotas to implement sensible limits. The key to your decision shouldn't be just how many backups you can cram in the Drive but also how many backup you might actually need. If a site is small enough to allow 500 backups to be stored and another one is big enough to only justify one I can tell you that these numbers are wrong in the context of what you'd really need when you are in a situation where restoring a site from backup is your only viable option. It would make more sense to have the small site capped at 7 to 15 backups and the big site set up to store at the very least 3 backups with a better value being 7 if you can spare the storage. For bigger sites it makes more sense to keep 7-15 daily backups excluding large static media folders and one weekly backup with static media. That's why you have backup profiles. If you are about to or just made a big change on your site you may want to Freeze a backup record which exempts it from quotas until you Thaw it again. Backups should be structured in a way that makes it easier for you to restore your site to a last known good state, not in a way that simply doesn't run you over an arbitrary disk size limit.

But I understand that you don't believe this is a good fit for you. What you describe you need is a way to keep the maximum size of that folder to 50GB. This should NOT be part of the quotas of each backup for the reasons described above. That would best be implemented through an external script that runs once, after all backups have executed, pruning the oldest files exceeding the 50GB limit. The fact that nobody has written something like that for Google Drive should probably be a good indication that it's a bad idea. You can look for something like that but I'm telling you that I have never seen anyone implement it before.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

System Task
system
This ticket has been automatically closed. All tickets which have been inactive for a long time are automatically closed. If you believe that this ticket was closed in error, please contact us.

Support Information

Working hours: We are open Monday to Friday, 9am to 7pm Cyprus timezone (EET / EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets outside of our working hours, but we cannot respond to them until we're back at the office.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!