Last night when I was trying to upload a new profile pic in our app, S3 uploads suddenly stopped working. Other S3 API methods are working fine except for the upload or PUT method. I have done many changes on my VM for the past couple of days that could have affected our app but it turns out to be just a DNS issue.
I run our app inside a Docker container inside a VM. When I attempt to upload a file via AWS S3 (using the AWS SDK for PHP), I got this error:
[curl] 6: Could not resolve host: <my bucket>.s3.amazonaws.com; Unknown error [url] https://<my bucket>.s3.amazonaws.com/profile-pic-123.jpg
I tried to curl the url inside docker and it work just fine. Therefore, I suspect the issue is something else since the container can connect to the AWS hostname.
I just switched from bridged networking to NAT (VirtualBox) and assume that the issue was due to the NAT thing. However, when I switched back to bridged networking, the issue remains. Therefore, the issue is on the VM itself or the Docker container.
I also tried to disable my VPN connection but it turns out that VPN is not the issue as well. This leads me to conclusion that the issue could be a DNS issue.
Another background, I used to use Google’s DNS servers. However, I noticed that Google hates my ISP that I have trouble resolving hostnames so I switched to OpenDNS. I used OpenDNS from my router settings, wifi adapter settings and even to the Docker daemon.
I remembered that Docker used Google’s DNS by default so I looked at the
/etc/resolv.conf inside the container and found out that it used Google’s indeed. My settings have been reset! Well, I’m regularly rebuilding my Docker images due to upstream updates.
Then I checked the VM’s Linux guest DNS settings and found out that it still used OpenDNS. The last thing I checked was the file
/etc/default/docker which now contains empty settings instead of my custom DNS setting. Beside the file was a file called
/etc/default/docker.orig containing the original setting that I used to use.
Slackware users can relate to this. Remember the prompt after a
slackpkg upgrade-all that asks how to handle new config files? I just choose the overwrite option and so it begins. Yes, I upgraded Docker and I wiped out the original setting without knowing or maybe I’m aware but just didn’t care. So I just restored the original settings instead.
Docker’s DNS settings
Here is the original content of my
## Set defaults used by the docker daemon ## These are flags passed after `docker -d` DOCKER_OPTS="--dns 220.127.116.11 --dns 18.104.22.168"
I restored the file. Did it fix it? No.
I tried manually editing the file
/etc/resolv.conf inside the container but that’s silly and didn’t work anyway.
I restarted the container but still didn’t fix it.
Worried that I need to rebuild the Docker image, I read the docs about Docker’s DNS settings and it talks about the daemon. Therefore I conclude that I don’t need to rebuilt the image. I suspect that I need to restart the Docker daemon and at the same time, recreate the container to make it work.
So I stopped all the containers and restarted the docker deamon. After the daemon restart, I deleted the app’s container and recreated it and then start it. I then login into the container and looked at
/etc/resolv.conf. Yes, it is now updated.
I also tested the upload and it is now working correctly. Finally!