Support

Akeeba Backup for Joomla!

#29944 Restore does really restore the db

Posted in ‘Akeeba Backup for Joomla! 4 & 5’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

Joomla! version
n/a
PHP version
n/a
Akeeba Backup version
n/a

Latest post by on Tuesday, 28 August 2018 17:17 CDT

bom
Restoring a db does not restore the db to its state when the backup was done. Although set to "drop all tables" it does only drop the tables that are in the backup but not the new ones that came after the backup was taken (e.g. by installing a new component) and messed up the site. Is there an additional switch "drop ALL tables" that I missed?

tampe125
Akeeba Staff
Hello,

that's the expected behavior.
Akeeba Backup will restore the state of your database at the time of backup. This means that any table added later won't be dropped because is not part of the backup. Moreover, sometimes you just want to skip some tables from the backup, you really don't want to drop such tables.

Adding the option to drop all the tables is too dangerous for the average user, if you really want to perform a complete cleanup, you should connect to your database manager and drop them manually.

Davide Tampellini

Developer and Support Staff

🇮🇹Italian: native 🇬🇧English: good • 🕐 My time zone is Europe / Rome (UTC +1)
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

bom
"and drop them manually." - that is what I am forced to do but none of my customers can do that so I have to do all the restoring. That is not a sustainable workflow.

This concept of NOT restoring a site to the state it was in at the time of backup is highly questionable IMHO. I cant find any valid suppporting argument.

"Adding the option to drop all the tables is too dangerous for the average user" - I dont get that one, sorry. As if the options currently available would not be dangerous :)

Please add that option to purge the db with ALL tables in the db which of course defaults to NO.

tampe125
Akeeba Staff
I'll forward this request to Nicholas, but at the moment is out of office and he will come back next week.

Davide Tampellini

Developer and Support Staff

🇮🇹Italian: native 🇬🇧English: good • 🕐 My time zone is Europe / Rome (UTC +1)
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

nicholas
Akeeba Staff
Manager
There is a VERY GOOD reason why do not automatically remove files and database tables when restoring a backup. In fact, several of them.

Multiple sites, same hosting account. Many clients need to create temporary or permanent sites in the same hosting account as their main site. These sites could be dev sites to test something out, a special project of the company (I did that a lot with NGOs in the early ‘00s) and so on. Many, if not most, of these people cannot afford the fancy hosting package which allows for site isolation in the account. So they are forced to use a sub-domain, by default hosted as a sub-directory of the main site, and the same database as the main site but with a different prefix. If we deleted everything it would be catastrophic and quite the opposite of a safe backup solution.

Advanced partial backup and restore. This what the real pros do. Take a full backup of the live site. Restore it locally. Work on it. Back up the new state of the live site. Restore that on a staging server. Back up only the things which have changed from the dev site and restore that partial backup on the staging site. Make sure everything works. Restore the partial dev site backup on the live site. Site massively and instantly updated. If we removed files and folders not in the backup we’d be screwing up your sites, not restoring them.

Even if we limit the scope of automatic deletion to what you said, database tables, it’s still a massively bad idea since your request is open ended and subject to interpretation. Which table is a ripe subject for removal? Any table not in the backup? Any table with the backed up prefix? Any table with the new prefix you gave during restoration? Any table not having the prefix of the backed up site? Any table not having the prefix you gave during restoration? Should some tables be exempt from deletion? Should some tables always be deleted? What about backup tables? If you chose an option contingent to the table prefix, what about the tables without a prefix which are currently in the backup? What about those NOT in the backup and, most importantly, how can you tell which tables don’t have a prefix since by definition they have no common component in their name (or maybe they do, e.g. “cartridges” and “carnivals” both start with “car”)? All these questions are open ended and you’d end up with approximately 30 new and extremely confusing options which might or might not do what you want. Even worse, even if you said “Backup” the old tables they might disappear anyway: this option only works for the tables having the same prefix you chose to restore with.

At best we could add a “Drop ALL tables before restoring” option in the database restoration, possibly hidden under an “advanced” section, and it would be disabled by default. However, before implementing such a massively dangerous feature I want to know: is this what you REALLY want? Read what I explained above. Do you want ALL tables of the database to be removed BEFORE restoring anything, deleting all data and making it impossible to retrieve anything after the fact? Do you REALLY want the nuclear option? Think about this twice before answering. Both times this has come up in the past it turned out that the client did not want the nuclear option, the requirements of what they wanted to do were a bit cloudy so they ended up conceding that human action is the best approach. That’s why this feature was never implemented so far: the people who thought they needed it did not, in fact, need it.

Sorry for the long post, I can’t see how I could make it smaller and still explain thoroughly how much thought has gone into not implementing it yet.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

bom
Thank you for the elaboration and insight, very helpful to comprehend the possible situations.
I thought my clients have the most common hosting setup but maybe not. All my clients that I maintain have a dedicated webspace and db per site. There is no subdirectory installation or anything. They really need the EXACT folder/file structure and db they had at the time of the backup. So any file or table/column added during a (failed) extension installation needs to be gone after restore. Same in case of a hack - any uploaded files or injected tables/columns/rows need to be deleted during restore. For these scenarios a regular customer with no concept of db handling via phpmyadmin and files via FTP needs the "nuke" option. For my use cases that (hidden) option would make my life a lot easier.

tampe125
Akeeba Staff
I'm sorry, but Nicholas gave you a detailed explanation on why we won't implement it for all users.
You have a very specific use case that doesn't fit with the majority of our user base.

Davide Tampellini

Developer and Support Staff

🇮🇹Italian: native 🇬🇧English: good • 🕐 My time zone is Europe / Rome (UTC +1)
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

bom
What do you mean? He said "At best we could add a “Drop ALL tables before restoring” option in the database restoration, possibly hidden under an “advanced” section, and it would be disabled by default." That would already help a lot. Purging all files that are not in the backup is not that easy for sure since that would need some coding to prevent timeouts and so on. I still think my use case is the most common one. I never came accross a client that has multiple sites or dev or test installations in the same webspace. I would appreciate if Nic would consider putting that advanced option in.

nicholas
Akeeba Staff
Manager
Um, let's back up a bit :) It's not like I have never thought of what you mention. I've already explained my thinking about tables. Now let me explain my thinking about the residual two points in your post: handling file / directory removal and commonality of the requested features.

Your use case is NOT the most common one, I'm afraid. Even if I include the file system structure that you mentioned in passing it is something that has come up a grand total of six times in the nearly twelve years I'm doing this. That's why dealing with it was low on my priorities. I do, however, done extensive research into it and have two pending to-do items.

Now bear with me for a second because I have to explain the thinking behind handling filesystem removal. The operating assumption in my research was that it was to be handled by the restoration script. This is an important point and a big fallacy as it turns out.

Remember that you extract the backup archive before you restore it. So what you really need to do is a TRIPLE pass just to determine which files and folders to remove. Once to go through the backup archive to get a list of the files present in the archive. Two, list the filesystem situation. Three, find the files in the second set which don't belong in the first set. This requires 10MB of memory and several dozen seconds for a small site. This is an O(N^2) problem so things get exponentially worse the more files you have.

I know this is a stupid idea because that's how JoomlaPack 1.0 worked back in 2006; it'd run a first pass to determine what to back up and then run a second pass to copy everything into an uncompressed TAR archive. It'd take 10+ minutes to back up a tiny site and would fail on anything real-world. I had to write a proper backup engine for JoomlaPack 1.1 and then refactor it to Hell and back in 1.2 to get improve the backup speed by an order of magnitude.

So I already know that just figuring out what to delete is going to fail spectacularly - and I even know why and how. But let's pretend I am naive or insane and went with it.

Now the question is, how do you delete all those files. You can't do it in a single page load because of PHP and Apache timeouts. If you follow that line of thought you end up... with Akeeba Engine (the backup engine), namely its file-only backup mode. The only difference is that instead of an archiver which creates backup archives you have one which delete files. Now we've hit another roadblock. The Engine is big and complex, hardly what you want to have in a restoration script. Not only does it add bloat and a thick layer of JavaScript to use it, it also requires configuration. Suddenly the restoration becomes this complicated hot mess which requires you to fiddle with settings for several minutes before you can restore your site. And you will fail. And you have to retry. And these settings cannot be saved since the restoration script is transient. That's how you lose 99% of your users. The 1% remaining will have by now figured out that it's really dumb to spend all that time in the restoration interface when FTP and SFTP have been around literally for decades, are easier to grasp and take more than an order of magnitude less time.

But let's pretend I am thicker than a brick wall and want to go with it anyway. Ownership and permissions of the files to be deleted. Right. How do I deal with that? Of course, the Hybrid filesystem engine in Kickstart! Only that you need to configure it and you need to get it right. Otherwise here's another point of configuration failure you have to endure. Don't assume that because you run PHP under FastCGI and / or know what you're doing everyone else does too. It's hardly the case as 12 years of experience has taught me, the very hard way.

You know what this thing is called? Overengineered. There are two universal truths about overengineering: it always fails spectacularly and it's never the sign of intelligence on the engineer's part. I have worked long enough as a business consultant, mechanical engineer and software engineer to attest to the veracity of this axiom based on mistakes I've seen and I've done. That's the good thing about mistakes though: I get to learn from them and not repeat them.

The more I was thinking about this the more evident it became that this is getting out of hand and we're essentially writing a massive and fragile application just to restore a site to what would amount restoring it on brand new hosting.

Whoa. Wait. Brand new hosting? Is it that simple?

It turns out that if you definitely want to go back to the exact state of the backup -nothing added or removed- there is a simple way:

1. Delete all files from your web root using FTP, SFTP or even cPanel. Time to complete: 1'.
2. Delete all tables from your database using phpMyAdmin (again accessible through cPanel). Time to complete: 1'.
3. Restore the backup using Kickstart. Time to complete: 5' or less (our s l o w video tutorial is a testament to that).

Step 3 is already automated in code. Doing a cost / benefit analysis it turns out that steps 1 and 2 can be somewhat automated for specific use cases which do match yours.

Step 1 is on my to-do list since December and it amounts to this: "Add an option to Kickstart to delete all files and folders EXCEPT itself, the archive and its temporary directory". Even if you live in Permissions Hell, Kickstart has the Hybrid file write option which can be used for file deletion as well. The problem amounts to creating a recursive filesystem deletion which can save its state, a problem already solved in the Akeeba Engine and which probably can be ported in a concise form. Had I not spent 2+ months writing a GDPR compliance solution from scratch I'd have already done the research and probably implemented this feature already.

Step 2 is also on my to-do list and it amounts to a "Drop all tables" option in ANGIE. The sticky point was foreign keys and non-table entities. The former is straightforward: tell MySQL to ignore foreign keys and drop one table after the other. The latter is more complicated and won't be implemented: if you have functions, stored procedures or triggers they are not going to be automatically removed (hence "drop all tables"). Dropping them requires listing them which is really complicated as we know from building the backup engine.

The whole reason for these two features is to provide a quick "start over" feature in Kickstart. This cannot be implemented in the integrated restoration from the Manage Backups page because you would be leaving stuff behind, of the variety which has security implications (the removal of all files would make it impossible for the extraction script to know how to remove the restoration.php file which unlocks restore.php for over-the-web access, something which can be used over the span of several days / months to crack its encryption scheme and upload arbitrary code to your site).

So, while it's not exactly the same as manually removing everything and restoring a backup it would be really darned close and for your use case it'd be just fine. The whole point is finding time to do that. Frankly, it's a low priority item for me since it's an unusual use case AND it has a viable if manual workaround for those few people who want it. In the end of the day feature implementation is the cross section of a popularity contest and engineering sprint. If only I could code the features I find interesting on a technical level, not just the ones that actually sell subscriptions and earn me a living :(

Sorry for another long post and for being probably less coherent than usual. I'm just back from a transatlantic trip, meaning I got 2 hours of bad sleep on an airplane seat the size of a matchbox in the last 24 hours or so.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

bom
Another in depth post. Thank you for taking the time to explain in detail. A good insight for a backup novize like me. I did not know many of the pitfalls you highlighted. And I could not even spell r e s t o r e after such a flight 8) - respect!

The manual workaround you describe is the one I use at the minute. Unfortunately that means it is always me who has to do it because my customers are not capable of that. Plus the manual process consumes quite some time. The "start over" is exactly what I would need. But I fully understand that you have to keep the demand in mind for feature implementation. And I agree on the point of overengeneering and user failure in complex settings.

For the directory structure would that delete all files in document root (mostly htdocs probably) or would the script look up the config to see if files outside document root were included in the backup?

The non-table entities are not an issue for my use case as far as I can tell. Your announcement sounds promising - happy days.

PS: Why did you write that GDPR compliance solution from scratch? I thought in the end you used that one from Richeyweb although it does not (or did not) exclude the authentication cookie.

nicholas
Akeeba Staff
Manager
I've been doing this backup and restoration thing for so long I can explain all the pitfalls in my sleep - quite literally, as you witnessed :D

Now, regarding removing the folders, it will not be consulting the backup. The problem is that we have to delete the existing folders before we extract the backup archive (otherwise we'd be removing the extracted files which is all sorts of wrong). Since we cannot consult the backup we have to make assumptions. The only reasonable assumption which can be made is that you want me to delete all files and folders under your site's root (where kickstart.php lives) before extracting the backup archive. It's a really dumb solution.

Moreover, there is an issue about security here. If someone can guess the name of Kickstart's file they can remove your entire site before you're done restoring. This is in the same security risk category as being able to download arbitrary files from external sources (download from URL and download from S3). Therefore, this is going to be a Kickstart Professional feature only. Kickstart Professional currently won't work unless you rename it into something which doesn't include the work "kickstart" in its name, making guessing its name more difficult. One of the additional features I plan on including is being able to password protect it as well.

However, now that I'm typing this I am realizing that maybe it's not what you clients want. If your clients only know how to use the integrated restoration (from Akeeba Backup itself) we cannot implement this feature. The reason is pretty simple: deleting all files and folders would also delete the backup archives and restore.php, making restoration impossible (you have neither the backup nor the extraction script anymore!). So your clients would need to upload Kickstart and the backup archive by FTP, in which case they might just as well delete the folders themselves. Am I correct in my assumptions about your intent?

Regarding the GDPR solution, Michael's solution is only for cookies (the upcoming ePrivacy directive), not GDPR data compliance. I tried other solutions for GDPR data compliance but I found them lacking for the specific purpose I needed them. They are good in dealing with core Joomla! data but they could not deal with tickets and subscriptions. So I wrote my own.

Regarding Michael's ePrivacy extension for cookies, yes, it does not delete the authentication cookie because I've modified it not to do so. It should NOT delete it. The cookie laws (and GDPR) allow you to implicitly accept session / authentication cookies without which the site would not work. They are called "mandatory cookies". These are the cookies you cannot reject. The only other cookies used by our site are those set by Google Analytics. So these are the only cookies you can accept or reject. Note that this is not a GDPR requirement; we use session anonymization in Google Analytics (the last quarter of the IP address is removed), therefore the cookies are not personally identifiable information and can be implicitly accepted. The only reason we let you decline them is the upcoming ePrivacy directive which will make it a requirement being able to reject all cookies which are not strictly required for the operation of the site i.e. anything but session cookies. It's future-proofing.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

bom
Thanks for the extensive points.

"The only reasonable assumption which can be made is that you want me to delete all files and folders under your site's root (where kickstart.php lives) before extracting the backup archive. It's a really dumb solution."
==> That is exactly what I do right now manually before restoring (except for the backup folder).

"deleting all files and folders would also delete the backup archives and restore.php"
==> that specified backup folder should indeed not be deleted. I dont know if that is possible. Should the backups maybe stored outside document root? But surely ALL tables in the db can be deleted.

"So your clients would need to upload Kickstart and the backup archive by FTP, in which case they might just as well delete the folders themselves. Am I correct in my assumptions about your intent?"
==> correct, my clients have no concept of FTP, they rely fully on the Akeeba backend functions.

Thanks for the in depth explanation of the GSPR extensions.

nicholas
Akeeba Staff
Manager
OK, thank you for confirming my assumptions. Including this feature in the integrated restoration will come with a caveat: I can tell restore.php to not delete certain folders or files. I think I should be able to blacklist restore.php itself, the restoration.php file which holds the keys to the browser-server communication and the folder which holds the backup archive you are restoring (not just that one backup). I cannot tell it to not delete any backups because that requires looping through all backup records.

Why, you may ask, not just loop through all backup profiles and exclude their output directories? Excellent question! It's because the output directory of each backup profile is mutable. The backup profile could be created with output directory <site root>/administrator/components/com_akeeba/backup (default), take a few backups, then move it to an output directory <site root>/backups, take some more backups and finally change it to <site root parent folder>/backups and take another backup. The backup profile has backup archives in three different folders but its output directory is currently set up to a folder above the site's root. If I loop just the backup profiles to collect directories I should blacklist I will end up deleting your backups anyway. Each backup record stores the full path to the backup archive exactly for this reason: because the output directory can change.

Why not loop backup records, then? Because that would take a disproportionate amount of time and could lead to a timeout error before the restoration begins. Also, depending on how many backups you have and where they are stored, it might cause the list of blacklisted folders to grow too big to be practical. The way the blacklist (do not delete) code is implemented requires going through that list for each folder and file we are about to remove. Too many items on the list could cause a timeout there.

So I have to make another reasonable assumption here: if you will be using the "delete everything before extraction" feature for integrated restoration you are probably a user with a simple setup: one or a few backup profiles, all with the same backup output directory, and you'd like to keep these backups on your server just in case.

Based on what you described I think I am in the right track with this assumption.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

bom
Correct, my site structure is very simple and does not get changed after I set it up. There is the Joomla standard installation and 2 backup profiles (full and db) saving to one folder at doc root with htaccess protection which is the folder that needs to be delete blacklisted. Sounds good, thanks.

nicholas
Akeeba Staff
Manager
OK! You will probably see these features in the next feature update around September.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

bom
Brilliant! Thank you.

System Task
system
This ticket has been automatically closed. All tickets which have been inactive for a long time are automatically closed. If you believe that this ticket was closed in error, please contact us.

Support Information

Working hours: We are open Monday to Friday, 9am to 7pm Cyprus timezone (EET / EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets outside of our working hours, but we cannot respond to them until we're back at the office.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!