Support

Akeeba Backup for Joomla!

#8573 Amazon S3 backup scenario

Posted in ‘Akeeba Backup for Joomla! 4 & 5’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

Joomla! version
n/a
PHP version
n/a
Akeeba Backup version
n/a

Latest post by nicholas on Saturday, 23 July 2011 16:28 CDT

Brent58
Hi there,
I have what I think is a fairly straight forward requirement but not sure what the most efficient way is to go about it.
I manage a number of websites and have bought the Akeeba-Pro edition to help me backup the sites to Amazon S3.
What I want is to retain 4 weekly backups and 12 monthly backups for each site on S3 with names like
"site-name-week1.jpa" ... "site-name-week4.jpa"
"site-name-january.jpa" .... "site-name-december.jpa"
Alternatively I could just schedule a weekly and monthly cron job with the standard names but would then need some way of having old backups being deleted automatically on Amazon S3 (by setting the "expire" metadata tag ??).
Does any one know the easiest way to achieve this ?

Many thanks,
Brent
btw. just like to say that I think this is a well written and thought out product.

nicholas
Akeeba Staff
Manager
No, the expire meta tag has an impact only to serving the files, not storing them. We do not support quotas on cloud storage services for several reasons, the most important being that it's impossible to reliably get a file listing without risking a timeout error. The only trick I've found (as I'm doing exactly what you describe) is to use one directory per site, then perform a manual clean-up of the backup archives (delete the old ones) every 2 weeks using CloudBerry Explorer for Amazon S3. Since the default backup naming scheme includes the date and time of the backup, this is fairly easy and doesn't take more than half a minute per site.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

Brent58
Hi Nicholas,
Thar's a shame but if there's no other way I'll settle for that. Thanks

cazzani
The only trick I've found (as I'm doing exactly what you describe) is to use one directory per site, then perform a manual clean-up of the backup archives (delete the old ones) every 2 weeks using CloudBerry Explorer for Amazon S3

Can anyone suggest a similar solution to regularly clean up old backups on the DropBox cloud storage service?
The 2 GB free space would quickly fill othewise.

Stefano

nicholas
Akeeba Staff
Manager
With DropBox it's very easy. Under most Operating Systems you can run a search against your local (synchronized) DropBox folder for file older than X days. Select all, delete. DropBox automatically synchronizes the deletes to their servers. Done :) If you are more tech-savvy you may even create a script to automatically run this procedure on your local DropBox folder every day.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

cazzani
Thanks Nicholas, intersting workaround.

However, I still think a native automatic deletion of old backups stored in the cloud would be e desiderable feature: I will put it in the feature request.

nicholas
Akeeba Staff
Manager
It's already asked and I have already responded exactly why it can't be realiably implemented (unless you don't mind crashing your backup due to timeouts). Don't add it as a feature request, just browse the feature request forum and find my answer.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

cazzani
Nicholas

I am not saying that's trivial to implement, of course, but may be you will consider a simpler quota management that would suit the most common requirements.

If I understood correctly
We do not support quotas on cloud storage services for several reasons, the most important being that it's impossible to reliably get a file listing without risking a timeout error.

implying that you need to get a file listing.

But what about a much simpler quota management scheme based on LAST N backups? (like "Count quota" we already have in Akeeba for local storage).

In that case I think you could send the delete commands to the cloud service for the previous filenames used, that are available in the Akeeba database tables as we can see them in "Admnister Backup Files" menu.

Please note that I am not a professional programmer anymore, may be I am oversimplifying things. I am just suggesting how to further improve an already excellent product like Akeeba Pro.

Greetings from an happy customer that would like to be even happier...

nicholas
Akeeba Staff
Manager
When you tell Akeeba Backup to keep, say, the 3 last backups it has to remove all the backup archives except those three. Since we may be talking about hundrends of records, if it tried to delete all of them it would time out. It would need a file listing to figure out which of them are present on the cloud storage service, which would also cause a timeout.

In the case of size quotas, the filesize is not cached in the database, mainly because of PHP internally using 32-bit signed integers except on the Linux x86-64 platform. This simply means that anything over 2Gb is unreliably calculated. So, the only way to figure out the backup size in order to apply quotas is, once more, file listings.

I could create a half-baked approach of only applying count quotas and supposing that all backups before the last 3 are deleted except the fourth, so I could only delete the files of that backup attempt. This causes two major problems:
- Most people wouldn't like the difference in functionality between local and cloud quotas, file it as a bug and ultimately ending up in causing me to remove this feature to stop explaining that it is not a bug
- Not all cloud storage services support file deletes. DropBox and backup-to-email most notably can't do that. Even those which can may not be allowed to do so due to ACLs. For example, I always use a write-only account to save my backup archives to S3. This account can neither list files nor delete them. This differences between each cloud storage engine would also cause support requests as people would consider them bugs when they're not.

So, the only sane approach is to not add a feature which would be inconsistent with how the rest of Akeeba Backup currently works, increase the support requests without adding signifficant value to the product and which can be substituted with an easy manual process.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

cazzani
I see you point and I understand that's not as easy as one might expect.
Thanks for the explanation.

However, I strongly disagree on just this point:
add a feature ... without adding signifficant value to the product and which can be substituted with an easy manual process.

The key point here is "manual process".
That's exactly what's both I and my custormers want to avoid!

I would be prepared to pay extra for an AkeebaPro+ version or a plug-in with such an automated feature.
If someone else would agree on this point, may be you will change your mind... ;)

nicholas
Akeeba Staff
Manager
Upon re-reading my post, I think I've found a solution which will make all of us happy. The idea is this:
- Each post-processing engine can define its own count quota handling settings and provide the relevant code to run it.
- The quota enforcement can run on a new step to avoid timeouts
- Using variables (e.g. [DATE], [TIME]) in directory names will render quotas ineffective, but this shouldn't be that big of a concern, because you'd only use them if you don't care about quotas :)
- For S3, the quota handling code will assume that only the last backup's files need to be removed, i.e. if you have a quota of 3 backups only the fourth backup will be removed
- Failure to delete due to ACL restrictions will be logged as a warning, not a fatal error, so it won't stop the process from completing.

Would that be a satisfactory solution?

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

cazzani
Sounds good to me, although I am not sure what exactly this comment means:
- Using variables (e.g. [DATE], [TIME]) in directory names will render quotas ineffective, but this shouldn't be that big of a concern, because you'd only use them if you don't care about quotas


Anyway, I agree these settings should be off as default and applied only as a separate optional additional step for cloud storage.

A warning (ideally an email with a distinct subject) if something goes wrong with the quota management would be more than enough. The goal in this scenario is to fully automate day-to-day backup operations and to resort to manual activities just in case of problems.

nicholas
Akeeba Staff
Manager
Great! Added in the to-do list for version 3.2.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

user29168
The s3cmd tools, that works great under cygwin, allows to delete to a bucket via wildcard. So if your backups are named like sitename-yyyymmdd-hhmmsstt it is very easy to set up a script policy to delete old filenames, just a little math with dates (minimal if you are fine keeping the month level), and surely there should be something already written for this.

nicholas
Akeeba Staff
Manager
Remote quotas have been implemented, as promised, since Akeeba Backup 3.2. The latest version, Akeeba Backup 3.3.1, goes one step further. It now allows you to keep daily backups for a predefined period (e.g. the last 30 days) and keep the backups taken every Xth day of the month (e.g. every 1st of each months). This allows you to keep monthly backups forever and daily backups only for the last month. No need for s3cmd or any other tool; just Akeeba Backup all by itself :)

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

Support Information

Working hours: We are open Monday to Friday, 9am to 7pm Cyprus timezone (EET / EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets outside of our working hours, but we cannot respond to them until we're back at the office.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!