Support

Akeeba Backup for Joomla!

#41117 Backup or restore case insensitive files

Posted in ‘Akeeba Backup for Joomla! 4 & 5’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

Joomla! version
5
PHP version
8.2.23
Akeeba Backup version
9.9.2

Latest post by nicholas on Monday, 16 September 2024 08:59 CDT

arocholl

I have a problem restoring a joomla backup and having my case sensitive files restored with case sensitive names. My source-backup server is VPS almalinux 8.10 and my restore server is a VM with almalinux 8.10, both using Joomla 5 and akeeba pro.

For instance file images/pages/rf-explorer-pro/RFEProHand.png is being restored lowercase images/pages/rf-explorer-pro/rfeprohand.png

In the backup log attached I can see the images/pages/rf-explorer-pro/RFEProHand.png seems to be handled correctly, but after restore it goes lowercase. I see no option to modify this behavior, I also see no log on the restore process to know better why this is being done.

nicholas
Akeeba Staff
Manager

The log file never made it, but it's mostly irrelevant anyway.

File names are always stored case-sensitive, in the same character case reported by your filesystem. This is also what's logged, so we already know the correct file name was stored (RFEProHand.png).

The restoration script always asks PHP to open the file in the same character case it was stored in the backup which generally means that the file name's character case is preserved. However, whether that's the case depends entirely on your Operating System, its filesystem driver, and the filesystem's and filesystem driver's configuration.

You may think that Linux is always case-sensitive so what this guy is talking about, but that's not entirely accurate. That common knowledge only applies to the default options of the ext4 and btrfs filesystems used by default in the vast majority of Linux distributions. These are neither the only filesystems, nor are the default options the only possible options. It is perfectly possible to configure your Linux server in such a way that you end up in a situation like what you described.

I've been using Linux for 25 years, and I've been doing cross-filesystem and cross-OS restorations since 2004 when I wrote the first set of backup scripts which later evolved to JoomlaPack in 2006, which ultimately became Akeeba Backup since 2010. I have a very good understanding of the pitfalls one may encounter. So, let's talk about Linux in particular. Your distro is just the successor of RHEL. That's great; I cut my sysadmin teeth on RHEL and derivatives – I was using Mandrake Linux, a RedHat-based distro, as my daily driver between 2004 to 2010. Ah, the good old times! But I digress.

If the filesystem is in a Samba network share (smb3 or cifs filesystem type) which has been mounted with the nocase mount option, or an NTFS volume (ntfs-3g, lowntfs-3g or ntfs filesystem type) mounted with the ignore_case mount option all mixed-case or uppercase filenames will be squashed to lowercase.

It's possible to have a filesystem in Linux which is case-insensitive but does NOT squash filenames to all lowercase (that's the default behavior of cifs and ntfs-3g in the absence of the mount attributes I mentioned above). This causes problems as we will see below, because existing files or backed up files with the same name but different file case (e.g. abc versus ABC) will be treated as the same file.

Moreover, it's possible to have a traditional, case-sensitive filesystem which acts as case-insensitive, e.g. ext4 with the casefold mount option. What does that option do? As per the ext4 man page:

This ext4 feature provides file system level character encoding support for directories with the casefold (+F) flag enabled.  This feature is name-preserving on the disk, but it allows applications to lookup for a file in the file system using an encoding equivalent version of the file name.

That is to say, this option will let the filesystem store a file with the file case you specified (e.g. abc) but will allow applications to check if the file exists, or open the file, using any permutation of its file case (e.g. ABC, Abc, aBc, etc). Essentially, ext4 now works like NTFS in that you have file case, but the filesystem is effectively case-insensitive. Ouch!

I think you can already see where this is going.

In case everything gets squashed to lowercase the problem is immediately obvious. While we ask PHP to open a filename with uppercase characters (e.g. Abc) the filesystem will only create and open a file with lowercase characters (e.g. abc).

The case-insensitive filesystems (either by design, or acting as such because of a flag) cause more insidious problems which may be hard to understand. They all boil down to the fact that the filesystem treats any combination of filename character case as the same file e.g. abc, ABC, Abc, aBc etc effectively being the same file.

If you are restoring on an empty server you will not notice a problem in most cases since only ONE file with the same name in any character case will exist in the same folder. This is why Joomla! works, for example, on servers running on Windows and macOS which famously use case-insensitive filesystems (NTFS, FAT, exFAT, HFS+ case-insensitive, or APFS case-insensitive).

However, if you are restoring on a server which already has files with the same name as backed up files in different character cases, OR a backup taken on a case-sensitive filesystems where files with the same name in different character cases exist in the same folder things get… complicated in case-insensitive filesystems.

If a file named abc already exists but a file ABC is stored in the backup: The restoration script asks PHP to open the file ABC. However, the filesystem considers this to be the same as the existing abc, therefore the restored file will be named abc (and have the contents of the file named ABC in the backup!).

If you have backed up two files, abc and ABC: The restoration script first asks PHP to open the file abc. This file is created and written to. The restoration script asks PHP to open the file ABC. However, the filesystem considers this to be the same as the now-existing file abc, therefore the only restored file will be named abc (and have the contents of the file named ABC in the backup!).

Kindly note that none of that is something that can be addressed in our code. This is something that is entirely controlled by your Operating System, its filesystem drivers, and how you have configured your filesystems. In fact, this is something that PHP itself cannot control or provide any information about because PHP itself communicates with the Operating System using the standard C library (glibc, the GNU C standard library, in your distro).

The correct way to address these problems is the following:

  • NEVER attempt to restore to a filesystem which does not preserve file case at all (e.g. cifs with the nocase option, very old FAT12/FAT16 filesystems etc). This will always break your restored site. There is no solution to the problems caused by case squashing. By definition, case squashing causes loss of information: the information of filename casing is irreversibly lost!
  • If you have multiple files with the same name in different file cases, exclude the files with the “wrong” file case from your backup.
  • Delete existing files and folders before restoring a backup on a case-insensitive (or effectively case-insensitive such as ext4 with casefold) filesystem.

I can also tell you why we don't try to catch those problems.

First of all, we cannot simply check if a file with a specific file case exists since case-insensitive and case-squashing filesystems will return true if a file with the “wrong” file case exists for the reasons I explained above. This means that the only possibly way to detect that would be to ask PHP to create a directory listing after writing each file and check if the file in the correct case exists. However, this is incredibly slow – it will slow down the archive extraction 50 times or more, making it completely impractical. Moreover, issues like that are extremely rare. Yours is only the second time in over ten years I have seen this. Finally, even if we catch those issues all we can do is report there might be a problem, but not be able to do anything about it. This means that reporting, but not being able to fix, a ridiculously rare problem in a way that breaks the extraction for pretty much everyone is not a good idea. I'd rather have the one person every ten to fifteen years ask me directly instead.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

arocholl

The source system has the filesystem mounted as ext4 and confirmed to be case sensitive

[ ~]$ mount | grep ' / '
/dev/xx on / type ext4 (rw,relatime,lazytime,discard,data=ordered,balloon_ino=12,jqfmt=vfsv1,usrjquota=aquota.user,grpjquota=aquota.group)

The target system has the filesystem mounted as xfs and confirmed to be case sensitive

[~]# mount | grep ' / '
/dev/mapper/almalinux-root on / type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)

I also created a couple of files on each system manually to confirm case is preserved and differentiate two Test.txt and test.txt on the same folder with different contents.

So in this case I rule out the mounted filesystem to be the cause.

The process followed was to create the jpa file on the source system, download the jpa to an intermediate Windows machine with NTFS, then upload to the target system directly, never mounted in a CIFS or SAMBA folder. I understand the intermediate storage NTFS folder should not play a role...

Is there any restore log or other way to check on what is going on at the restore time to find the root cause?

 

nicholas
Akeeba Staff
Manager

The mount options you have on XFS shouldn't make it case-insensitive; the only way I know of that XFS could work like that would be using the version=ci mount option which doesn't seem to be the case here.

You are right that where you store the backup archive doesn't make a difference. Its binary contents do not change.

We can try creating a debug log for the extraction script. Edit the kickstart.php file and find thes lines:

// Uncomment the following line to enable Kickstart's debug mode
//define('KSDEBUG', 1);

change them so that they read:

// Uncomment the following line to enable Kickstart's debug mode
define('KSDEBUG', 1);

It will now create a file called debug.txt which tells you what Kickstart is doing when extracting files.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

Support Information

Working hours: We are open Monday to Friday, 9am to 7pm Cyprus timezone (EET / EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets outside of our working hours, but we cannot respond to them until we're back at the office.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!