Support

Akeeba Backup for Joomla!

#29984 ZIP Archiver split archives bug

Posted in ‘Akeeba Backup for Joomla! 4 & 5’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

Joomla! version
n/a
PHP version
n/a
Akeeba Backup version
n/a

Latest post by on Tuesday, 28 August 2018 17:17 CDT

drlukacs
Attempting to back up with "Part size for split archives" that requirs splitting ZIP files into more than one part results in Akeeba entering into an infinite loop at the end of the backup process with repeating debug messages that read as follows:

DEBUG   |180722 00:12:01|PHP WARNING (not an error; you can ignore) on line 181 in file <root>/administrator/components/com_akeeba/BackupEngine/Archiver/Zip.php:
DEBUG   |180722 00:12:01|feof() expects parameter 1 to be resource, null given
DEBUG   |180722 00:12:01|PHP WARNING (not an error; you can ignore) on line 195 in file <root>/administrator/components/com_akeeba/BackupEngine/Archiver/Zip.php:


(The logs filled up nearly 3Gb of drive space before I stopped it.)

This has been tested with "Part size for split size" of 5Mb, 10Mb, and 20Mb.

nicholas
Akeeba Staff
Manager
This has nothing to do with split archives. The error you are receiving cannot happen unless PHP is broken. The variable which is null is a file pointer which is assigned a few lines above. PHP can either return a file pointer (resource) or boolean false, not null. I am adding a further check in the backup engine to catch this and prevent the infinite loop but I am not sure if the ZIP file will have been successfully finalized at this point.

Please use the latest development release (rev747413EE) and retry backing up. Let me know if the generated ZIP file can be extracted by Kickstart, Akeeba eXtract Wizard and PKZIP or WinRAR (please do not try other ZIP extraction software; they are using the decompression library InfoZIP which does not conform to the ZIP specification for split ZIP files, therefore they always fail to extract split ZIPs).

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

drlukacs
This has nothing to do with split archives.


I am skeptic about this one. The ZIP archiver works perfectly fine if the split size limit is large enough so that the backup can complete without having to split the file.

This problem occurs only when the archive needs to be split.

drlukacs
I can confirm that with the latest development release, the files are created without an issue, and they can be extracted without difficulty with Kickstart.

When do you expect this to become a stable release?

nicholas
Akeeba Staff
Manager
I am very sure that the problem has nothing to do with split archives since I saw exactly where your problem occurs in the code. It's in the ZIP finalization which is called for all archives and the problem spot is before we make any checks for split ZIPs. Moreover I see that the problem is that PHP returns null when we ask it to open the temporary file holding the ZIP Central Directory records which is common for single and multi part ZIP files.

I figured that for some reason this happens on your server after we're mostly done with finalization and have removed that file but not finished up finalizing yet. Normally this value should be boolean false, not null. Something's wrong with your PHP version, something we have never seen anywhere else before and cannot reproduce anywhere else either.

Since this is an extremely low priority fix (it fixes an impossibility which affects exactly one client) it does not warrant a new release anytime soon. Moreover we are about to enter August which is the period where we wind down operations so we can spend some time with our families ;) We plan a new release for Akeeba Backup around late September, with new features. Until then you can use the development release.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

drlukacs
Moreover I see that the problem is that PHP returns null when we ask it to open the temporary file holding the ZIP Central Directory records which is common for single and multi part ZIP files.


What is the reason then that this works fine for single, but not for multi part ZIP? Is it possible that the filename generation is affected by the passage of time?

I noticed (also in JPA format) that the [TIME] variable used for naming some of the files is different than for others.

nicholas
Akeeba Staff
Manager
A ZIP file consists of the following areas:
  • An indicator whether it's a split archive at the beginning of the file
  • File data records
  • ZIP Central Directory
  • ZIP stats and end of archive marker


The central directory contains some stats and essentially the headers of the files which are already present in the archive. It's used to quickly scan the contents of the archive and produce a ZIP file listing without having to read the entire archive. The trick is that the ZIP stats at the end of the file contain a pointer to the Central Directory (CD) so unzip applications can locate the CD fast. This was very important in the days of floppies because file seek time was in the order of hundreds of milliseconds. It's quite irrelevant now that we have SSDs and large in-memory file caches but the ZIP format is what it is and we have to abide by it (hence our JPA format: it doesn't have a central directory and follows other simplifications which can be afforded by modern computer architectures).

Because the CD is duplicated data at the end of the file we only add the file data records during backup to the archive (after the split file indicator!) and write the ZIP Central Directory records in a temporary file. At the end of the backup process we need to "glue together" the already written file data with the central directory and finally append the ZIP stats and end of archive marker. This is called the finalization process.

When you have a single part archive this happens as the final step of the backup process and it's pretty straightforward. You just do a quick buffered copy from one file to the other.

When it's a multipart archive you have to take into account two things. First, the ZIP format says that a Central Directory record cannot span two volumes (parts). Therefore we have to see if we have enough space in our last part. If not, we need to create a new part.

Here comes the second consideration, post-processing. If you have selected to post-process each archive part immediately the creation of a new part on the previous step triggers an immediate upload for the part. At this point it's possible that there is a step break and the finalization will resume in the next page load.

I can't see why that would cause a NULL return since we are, indeed, closing any open file pointers at the end of each backup step. Something weird happens there with PHP returning null when it's trying to reopen the file? I am not sure since I cannot reproduce this and nobody else has reported it. Blind debugging is no fun and it's about as accurate as trying to hit a fly's wing at a hundred years with a slingshot while blindfolded and spun around in a merry-go-round so there's that...

The fact that the archive DOES extract tells me that the NULL is returned after we are done finalizing which is truly weird but, hey, at least the code I've now put in place is catching that. I'll accept this as a viable solution.

Regarding the time format, please note that there are two variables, [TIME] and [TIME_TZ]. New backup profiles use the latter which includes the timezone (please read the documentation about it) whereas your old profiles would use the old default, [TIME], which prints out the UTC time stamp. People were confused with UTC, especially around the summer time switch-over twice a year, hence the introduction of TIME_TZ.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

drlukacs
Thank you for your detailed explanation about ZIP archives.

In order to isolate the issue, I was using no post-processing at all. The issue existed even without post-processing.

With respect to the time, here is what I noticed in split archives: the main archive (ZIP pr JRA) and the first part (.?01) get a different [TIME] value than all the other parts -- their [TIME] value is a few minutes later than the other parts.

So, if the naming instruction is site-[DATE]-[TIME], then creates filenames like these:

site-20180724-1101.z02
site-20180724-1101.z03
...
site-20180724-1115.z01
site-20180724-1115.zip

nicholas
Akeeba Staff
Manager
I cannot reproduce that. I made a very slow ZIP backup which creates a new part every 45-60 seconds but all files had the same base name. This makes perfect sense since the base name is calculated once, at the start of the backup, and then we only change the file extension for each part.

It also makes no sense that part #1 has a timestamp 11:15 and part #2 has a timestamp 11:05. That would require the backup executing backwards in time. Remember that the part files are created in the order .z01, .z02, ..., .zip.

What I believe happens is that you are looking at part files from two different backups.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

System Task
system
This ticket has been automatically closed. All tickets which have been inactive for a long time are automatically closed. If you believe that this ticket was closed in error, please contact us.

Support Information

Working hours: We are open Monday to Friday, 9am to 7pm Cyprus timezone (EET / EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets outside of our working hours, but we cannot respond to them until we're back at the office.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!