Last night when I was trying to upload a new profile pic in our app, S3 uploads suddenly stopped working. Other S3 API methods are working fine except for the upload or PUT method. I have done many changes on my VM for the past couple of days that could have affected our app but it turns out to be just a DNS issue.
Background
I run our app inside a Docker container inside a VM. When I attempt to upload a file via AWS S3 (using the AWS SDK for PHP), I got this error:
[curl] 6: Could not resolve host: <my bucket>.s3.amazonaws.com; Unknown error [url] https://<my bucket>.s3.amazonaws.com/profile-pic-123.jpg
I tried to curl the url inside docker and it work just fine. Therefore, I suspect the issue is something else since the container can connect to the AWS hostname.
I just switched from bridged networking to NAT (VirtualBox) and assume that the issue was due to the NAT thing. However, when I switched back to bridged networking, the issue remains. Therefore, the issue is on the VM itself or the Docker container.
I also tried to disable my VPN connection but it turns out that VPN is not the issue as well. This leads me to conclusion that the issue could be a DNS issue.
DNS Settings
Another background, I used to use Google’s DNS servers. However, I noticed that Google hates my ISP that I have trouble resolving hostnames so I switched to OpenDNS. I used OpenDNS from my router settings, wifi adapter settings and even to the Docker daemon.
I remembered that Docker used Google’s DNS by default so I looked at the /etc/resolv.conf
inside the container and found out that it used Google’s indeed. My settings have been reset! Well, I’m regularly rebuilding my Docker images due to upstream updates.
Then I checked the VM’s Linux guest DNS settings and found out that it still used OpenDNS. The last thing I checked was the file /etc/default/docker
which now contains empty settings instead of my custom DNS setting. Beside the file was a file called /etc/default/docker.orig
containing the original setting that I used to use.
Slackware users can relate to this. Remember the prompt after a slackpkg upgrade-all
that asks how to handle new config files? I just choose the overwrite option and so it begins. Yes, I upgraded Docker and I wiped out the original setting without knowing or maybe I’m aware but just didn’t care. So I just restored the original settings instead.
Docker’s DNS settings
Here is the original content of my /etc/default/docker
.
## Set defaults used by the docker daemon ## These are flags passed after `docker -d` DOCKER_OPTS="--dns 208.67.222.222 --dns 208.67.220.220"
I restored the file. Did it fix it? No.
I tried manually editing the file /etc/resolv.conf
inside the container but that’s silly and didn’t work anyway.
I restarted the container but still didn’t fix it.
Worried that I need to rebuild the Docker image, I read the docs about Docker’s DNS settings and it talks about the daemon. Therefore I conclude that I don’t need to rebuilt the image. I suspect that I need to restart the Docker daemon and at the same time, recreate the container to make it work.
So I stopped all the containers and restarted the docker deamon. After the daemon restart, I deleted the app’s container and recreated it and then start it. I then login into the container and looked at /etc/resolv.conf
. Yes, it is now updated.
I also tested the upload and it is now working correctly. Finally!