Support

Akeeba Backup for Joomla!

#24907 Provide hash for archives, perhaps in log

Posted in ‘Akeeba Backup for Joomla! 4 & 5’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

Joomla! version
n/a
PHP version
n/a
Akeeba Backup version
n/a

Latest post by born2webdesign on Monday, 11 April 2016 02:55 CDT

born2webdesign
Hi, when downloading a backup archive from the admin backend, I think it is a good idea to nudge the user to use an FTP client—however, the admin download is really more convenient, if I'm already there ;) So, we could have the best of both worlds if there was a hash visible in the admin (or at least at the very end of the log), just md5 would do—that way I could rest assured (almost) that the transfer went smoothly. Actually, wouldn't comparing the hash of a downloaded archive make the backend download even safer than FTP?

OT: I don't think, my J! and PHP versions are really relevant, here ;)

nicholas
Akeeba Staff
Manager
The problem is that producing an MD5 or SHA1 hash would have to worry for all backup archives, of all sizes, on all servers. While producing an SHA1 hash of a 20Mb backup archive on a modern VPS with the PHP hash extension is realistic, doing so for a 2Gb backup archive part on a 6 year old shared server without the PHP hash extension means a white page or worse. It's exactly the same problem which prevents us from calculating Adler32 (aka CRC32) checksums for files larger than 1Mb when producing ZIP archives. So while in theory it sounds great, in practice it's a disaster. Besides, as we tell you, there are two things you have to do when downloading the backup archive:
  1. Make sure that the size in bytes of each part you download is identical to the size in bytes on your server.
  2. Ideally, do a test extract


Regarding the first advice, there are two problems which can possibly occur when downloading a backup: it can get truncated or corrupt. Obviously if it gets truncated the byte size will be much smaller. The corruption can only occur when you are using FTP (which in itself is really bad for security, always use SFTP!) in ASCII mode. Then, assuming that you're downloading from a Linux server to a Windows PC every byte 0x10 will be replaced with the byte sequence 0x1013. This will cause the size to differ by at least a few hundred bytes (which doesn't register if you're viewing the size in Mb, which is why I tell you to compare the byte size).

Finally note that hashes are only useful when you want to ensure data integrity over an untrusted connection i.e. when you have reasons to believe that a malicious user has either replaced the original file on the server, or performs an active man in the middle attack to deliver a malicious version of the file you are trying to download. Since we're talking about your own server this means that you're either hacked or you are connecting over unencrypted FTP through an untrusted network, therefore the attacker is currently stealing your FTP login and hacking your site. In either case you're screwed because the attacker can also change the hashes, so there's no point comparing hashes anyway.

Considering all of the above the size in bytes is enough for the purposes of verifying the download of a backup archive from your own server. You already have access to it, it's easy for everyone to use (you don't need a separate application to calculate hashes) and catches the possible transfer issues. Hashes are an overkill which will only cause problems, that's why I have already decided not to implement them.

If you are interested about the security of your downloads NEVER use plain FTP (or even FTPS). Always use SFTP. In fact, only ever use SFTP using certificates to connect to your server and disable text (username/password) authentication on the server. Moreover, write down the signature of your server and print it on a plasticized card packed in a transparent tamper evident envelope. If you ever need to connect to your server from a different machine, which doesn't have the server signature in the .ssh/known_hosts file, compare the server signature to the one in the tamper evident card. But if you REALLY interested in security you should never, ever, EVER use a machine other than your own – a machine with full disk encryption with an at least 20 random character password. Only then we can talk about security and no, hashes are not part of it. It's pretty daft trusting a hash stored on the same server as the file you are downloading. If the server is compromised or the attacker is engaging in a Man In The Middle attack then by definition anything you read from the server is untrustworthy. I know that it all sounds paranoid but you asked me about security and that is still not even close to the minimum required security for sensitive corporate data. It's just better than FTP ;)

Nicholas K. Dionysopoulos

Lead Developer and Director

πŸ‡¬πŸ‡·Greek: native πŸ‡¬πŸ‡§English: excellent πŸ‡«πŸ‡·French: basic β€’ πŸ• My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

born2webdesign
Thank you Nicholas for your thorough reply! But actually, I was only concerned about data safety/integrity and didn't talk about security—so, for me, the first paragraph covered the necessary info ;) No big deal, but for small archives this would still be great—I mean, you could set a limit on the archive size of e.g. 100M and only append a hash to the log if smaller … Calculating an md5 on that would be fast, even on an old server … but you decide, obviously ;)

> hashes are only useful when you want to ensure data integrity over an untrusted connection
Why wouldn't they be good for ensuring integrity over a trusted connection?! (and better than file size alone—I'm thinking corruption, not manipulation)

When I wrote "FTP", of course I meant with an S—before or after … after all, which millennium is it ;) But I do acknowledge the reality of things … As this is a public ticket, I probably should have been more accurate—good on you for doing a good job of that :) But now you got me thinking: Why are you (slightly) against FTPS? Just because of the lack of key authentication or are there other security implications? Sadly, not every (shared) host supports a jailed SSH.
And yes, I would never connect to a server from a machine I don't (more or less) control (e.g. never from any Windows or Mac).

nicholas
Akeeba Staff
Manager
I mean, you could set a limit on the archive size of e.g. 100M and only append a hash to the log if smaller … Calculating an md5 on that would be fast, even on an old server


I beg to differ on the speed and reliability of hash calculation. A 5-years-old old shared server with 1000 or more sites on it has serious trouble calculating MD5 sums fast and reliably enough on anything over 5-10Mb. "Fast enough" means under 3 seconds, because the execution time limit you see on old servers is less than 5 seconds and you need to have enough leeway to work around CPU execution time exhaustion by adding extra time at the tail end of each MD5 sum calculation. "Reliably enough" means that if you have a backup archive split in 20 parts you expect to be able to calculate the MD5 sum for all of them without the server throwing instant 500 Internal Server Errors around part 5 because you've exceeded the maximum CPU time allotment. It's complicated.

When I wrote "FTP", of course I meant with an S—before or after … after all, which millennium is it ;)


Judging from the clients we ask to give us connection information I think we're stuck in a wormhole that sends us back circa 20 years :p

Why are you (slightly) against FTPS? Just because of the lack of key authentication or are there other security implications? Sadly, not every (shared) host supports a jailed SSH.


If all clients and servers used TLS 1.2 to implement FTPS I wouldn't be against it. The sad fact is that many implementations resort to known broken encryption methods making the implementation insecure.

Furthermore, I've yet to see a commercial FTPS server with properly signed certificates (instead of self-signed ones). This has trained the users to blindly accept whichever certificate is presented to them which is, of course, worse than not having encryption (false sense of security even though you don't protect against man in the middle attacks).

These are two risk factors largely mitigated by SSH – at least when you use it properly. On top of that, SFTP is at least an order of magnitude faster than FTPS in copying large amounts of smaller files since it doesn't need to authenticate and create new data ports all the freaking time and it being an inherently faster protocol anyway.

Finally, SSH can be configured to not allow username/password authentication, making brute force attacks impractical under current technology and in the foreseeable future. I personally consider this the most important reason to only ever use SFTP.

And yes, I would never connect to a server from a machine I don't (more or less) control (e.g. never from any Windows or Mac).


As Obi Wan Kenobi put it "Only a Sith deals in absolutes" :D

Linux is not any more immune to malware and viruses than Mac OS X. Furthermore, just because it's Linux it doesn't mean it doesn't come with spyware (see Ubuntu Linux and enabling Amazon search by default in a way that made it impossible for the average user to disable it, far harder than the optional data collection only in the technology preview a.k.a. beta versions of Windows 10; Android reporting everything you do and everywhere you go to mama Google; Linux based consumer devices like Amazon Kindle etc spying on you; I could go on forever). Also, having had used Windows for abut 15 years as my main OS and never contracting a virus once, the top priority in keeping your computer safe is having common sense. Don't open attachments even if the wording of the email you got from a trusted friend is slightly off, don't visit sites with naked ladies, uninstall that sieve of a software called Adobe Flash and think before you click.

If you are wondering, my "main" desktop OS is Mac OS X, I use Linux for my servers –live and test– and I have a backup desktop running Windows, used mostly for developing Admin Tools features that have to do with IIS. My main phone and tablet run iOS but I have a secondary tablet running Android. I am a daily user of all OS. I am not a fanboy, I just use my brain to make informed choices.

Nicholas K. Dionysopoulos

Lead Developer and Director

πŸ‡¬πŸ‡·Greek: native πŸ‡¬πŸ‡§English: excellent πŸ‡«πŸ‡·French: basic β€’ πŸ• My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

born2webdesign
Alright, thanks, just would have been nice—but no big deal.
And your are right on the TLS certs, of course, thanks.
Closing this, before we get more off-topic ;)

I know you are on a Mac ;) And it makes me really sad that many web-devs seem to be (probably my least favorite company). Something a Mac will never (there is the absolute, again) be able to compete on with Free/OS Software is … openness. You _shouldn't_ trust any system that you haven't proven to be correct and trustworthy—so, basically none, at all; certainly not your average Linux-based system. But IMO you _must not_ trust a system you _cannot_ even begin to verify. Other than that, I largely agree with your notion about security.

Support Information

Working hours: We are open Monday to Friday, 9am to 7pm Cyprus timezone (EET / EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets outside of our working hours, but we cannot respond to them until we're back at the office.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!