Yes, if there's a log file growing in size it will definitely make all offset calculations go awry. The next file you try to read will have the wrong offset which lands you in the middle of a random file. That's probably what happened. Luckily it was another log file so you filed a ticket and I could tell you where the problem might lie.
> 1/ is it "normal" that one of the jxx chunck was a plain text file (part of the 260 Mb cron.log file) rather than a binary compressed one as the other jxx chunks ?
You are looking at this wrong. What you are seeing is part of the File Data Block for that file. Since that file is immense (260 Mb) it's stored uncompressed. So what you see is not plain text, it's the exact binary representation of that file's uncompressed data. Why am I saying they are different? “Plain text” implies that the application modifies line ends to be OS–specific (CRLF on Windows, CR on macOS 9 and earlier, LF everywhere else). We don't do that. We keep the file content binary identical to the source.
So, the correct question is “is it normal that a big file is stored uncompressed“ to which I will reply yes of course and it's documented https://www.akeeba.com/documentation/akeeba-solo/archiver-engines.html#archiver-jpa. It's the Big File Threshold which has the following documentation block:
Files over this size will be stored in the archive file uncompressed. Do note that in order for a file to be compressed, the application has to load it in its entirety to memory, compress it and then write it to disk. As a rule of thumb, you need to have free memory equal to 1.8 times the size of the file to compress, e.g. 18Mb for a 10Mb file. Furthermore, compression is resource intensive and will increase the time to produce a backup. If this value is too high, you might run into timeout errors. On most servers a value between 1 and 5Mb works best.
So, you now get the reason why it's stored uncompressed :)
> 2/ even if doing backup at night during a time frame were the human activity is the lowest, it could happen that someone uploading a pretty big file (video...) while we are bakcuping. Also Moodle as an internal cron triggered every minutes by the system cron. This cron is doing a lot of tasks including some which can involve create, delete or modify files in size or content during the backup time frame. Would such situation generate a broken jpa backup ? if so, how could we be sure that a given backup is reliable ? ie restorable ?
Files being created or deleted after the directory listing is produced are not a problem. They will just not be backed up.
Files becoming larger are typically not a problem because it's very unlikely that the file will grow while it's being read to be backed up. The only problem is when you have a very big file which takes several seconds or minutes to read. We DO try not to read past the file size the filesystem reported when we started backing up the file but this may still not work properly for various esoteric reasons that have to do with filesystems and operating systems. The biggest problem is large files being truncated or deleted while they are being backed up — we have no data to read, the archive WILL be broken. However, this is detected and you are given a warning.
In practice, only REALLY large files changing size are a problem and these are almost always log files. In fact, the only three use cases I know of in the PHP world where a very big file can change size over time are 1. log files; 2. backup archives; 3. chunked uploads (Moodle does not use that). Therefore in practice the only thing you need to take care of is exclude log and temporary files. This will minimise the chance that the backup will be broken.
Ideally you would need to disable these housekeeping CRON jobs during the backup. They can resume afterwards.
Uploading videos etc is not a big deal, as I said, because the file appears instantly: PHP uploads to an off-site temporary folder, Moodle then does a file move to the final location which performs an atomic filesystem operation. One moment the file isn't there, the next moment it's there in whole.
Nicholas K. Dionysopoulos
Lead Developer and Director
🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!