I will explain this in detail and link to Wikipedia as I go since this is a public ticket and other people might be reading it.
TL;DR: It's a networking issue. While you can change the TTL to 60 seconds it's not a guaranteed solution, it only narrows the time window where things may break. Ultimately, you need to prevent IP changes around backup time either by having an ISP which renews the DHCP lease instead of issuing a new one, or by getting a static IP.
And now let's get down to the gory details — and spend a moment of reflection: “Why did I ever think it's a good idea to be a software developer in a niche which requires me to also be a sysadmin and a junior network engineer?”. Quiet moment of reflection done; let's take this show on the road!
The CNAME is not a problem. The CNAME is effectively static in your use case. It always points to the same A record. Its TTL is irrelevant as the CNAME's contents never change.
The real problem is that your A record is expected to change over time, as your FTP server's IP address is changing (a.k.a. “dynamic DNS”). The way Dynamic DNS is implemented, its TTL, and the way caching name servers work compound to your observed problem — and as you'll see, it makes sense that it's intermittent.
First of all, there's a delay between the IP change (technically: the DHCP lease expiration event from your ISP) and the update of the A record in the DNS zone. Remember that the way dynamic DNS systems work is that they periodically execute a script which determines your effective “external” IP address (by using a service which "echoes" back that address). If this period is, say, 15 minutes and your IP address changes the very first second of that period then your A record remains with the old IP address for another 14 minutes and 59 seconds.
The second factor to your problem is the TTL of the A record which is 4 minutes. This means that no caching server will ask your DNS server more often than 4 minutes since the last time they asked for DNS resolution. Your host, like all servers, has a caching name server. Assuming it's a fairly up-to-date Linux server using systemd it's using resolved
. The problem here is that your A record may change anytime within the TTL. Let's say that your host just asked your DNS server for domain resolution and a second later the A record is updated. This means that for the next 3 minutes 59 seconds your host has the “wrong” (outdated) IP address for that A record.
These two problems can be compounded. In our example you may end up with up to nearly 19 minutes where your host “sees” the old IP address for your machine. This is a fairly large time window during which transferring your data to your FTP server will fail (since it tries to contact an IP address your FTP server is no longer listening to).
Conversely, if your host tries to resolve your FTP server's domain name to an IP address with a cold name cache, or while the cached record still points to the correct IP address, you will observe no problem. Your upload will work fine.
In other words, whether your upload will work is a game of chance. Even worse, if the IP change happens during the upload, your upload may start but never finish, returning an error about a server timeout.
So, how do you beat this game of chance?
Sure, one solution is to reduce the TTL and the execution frequency of your Dynamic DNS script to one minute? Then the window of opportunity is smaller, about 2 minutes. The downside is that DNS queries for that A record will be overwhelmingly resolved by your DNS server instead of the local name cache of clients. In your use case this should not be a problem. Moreover, this does not solve the problem. It only makes it less likely. It's like going from playing Russian Roulette with 3 bullets loaded to playing Russian Roulette with 1 bullet loaded. Your chances are better, but your survival is by no means guaranteed.
The better solution is to not have your IP change very often, and definitely not around the time you expect to receive data from your backups. However, this is not under your control, it's under your ISP's control. Some ISPs will merely renew the DHCP lease if your VDSL connection is still active (technically, while the PPPoE connection between your modem/router and the ISP's BRAS is alive) instead of issuing a new lease for a different IP address, as changing the IP address mid-connection will result in dropped packages with all sorts of odd results for the client. With ISPs like that having the modem/router on 24/7 and on battery backup (there's an Eaton mini UPS designed specifically for routers; it saved my sanity when the power company was doing some work along the main street where I live). Some will change the IP regardless, just to dissuade people like you from running their own servers.
If you are unlucky enough to have one of the latter ISPs, your only realistic chance at removing your headaches is asking for a static IP address which, of course, comes with an additional host.
Nicholas K. Dionysopoulos
Lead Developer and Director
🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!