When Tailscale MagicDNS isn’t
On a router, I have dnsmasq and kresd as DNS servers. Dnsmasq is accessible on the LAN interface and forwards queries to kresd, which is accessible on the loopback interface. This has been working for a long time.
I recently set up Tailscale and was confused as to why MagicDNS wasn’t working on this one device (I have two other NixOS devices that didn’t have any problems). I’m no stranger to investigating these problems, after using and tinkering with networking on Linux/FreeBSD for many years and if there’s a problem, it’s always DNS.
My first look at /etc/resolv.conf
suggested things should be fine, I thought.
# Generated by resolvconf
search my-network.ts.net
nameserver 127.0.0.1
nameserver ::1
‘why is this even generated by resolvconf
?’, I ask myself: this is a router with a static networking configuration and custom upstream nameservers. I’ll investigate that later, I tell myself.
I eventually realised that it should be using 100.100.100.100 for two reasons.
-
A working machine’s
/etc/resolv.conf
contains:# Generated by resolvconf search my-network.ts.net nameserver 100.100.100.100 options edns0
-
The Tailscale dashboard gives me a hint under the nameservers section:
Nameservers
Set the nameservers used by devices on your network to resolve DNS queries. Learn more ↗
my-network.ts.net ✨MagicDNS
100.100.100.100
The linked documentation (DNS in Tailscale) doesn’t even mention 100.100.100.100, nor does the documentation for MagicDNS. It is explained as part of a blog post under the heading ‘how MagicDNS works’, but that’s not the first place I’d look.
This led me to try to find what might be responsible for the nameserver 127.0.0.1
setting:
- I checked for systemd-resolved, but it wasn’t that (the nameserver would have been 127.0.0.53 if it were).
- I didn’t think it was anything to do with dnsmasq because that’s configured to use the LAN interface, not the loopback.
- I didn’t think it was kresd either, since another device uses kresd, but does not have the problem. kresd is not listening on a loopback address on the default port 53 on either machine, meaning that if it were doing this, DNS resolution would have been broken on both machines for a long time.
Eventually I stumbled upon something that worked: setting networking.resolvconf.useLocalResolver = false
. I then started to investigate in order to open an issue with NixOS, but found things got more confusing and I forgot why I was shaving a yak1.
Unexpected behaviours
-
The NixOS kresd module blindly sets
networking.resolvconf.useLocalResolver
to default totrue
because someone ran into resolver loops and this was accepted on the grounds that it’s ‘good to be consistent’ (with pdns-recursor, for example).- I consider this “spooky action at a distance”. I do not think it should do this, but I can see how it could be helpful. In enough cases, though? I’m not sure.
- In my case, kresd was never even listening on localhost port 53, meaning that this default setting would have led to a broken DNS setup, which at least would at least have led me to investigate the right thing at that time. :::{.aside} Why didn’t it, then? :::
-
The NixOS dnsmasq module sets
networking.resolvconf.useLocalResolver = true
ifservices.dnsmasq.resolveLocalQueries = true
. This is at least less spooky and distant, but still faulty.-
I had indeed set
services.dnsmasq.resolveLocalQueries = true
, because I would like to be able toping foo.lan
on the router and it made that possible. -
My erroneous assumption was likely that this setting changes the system nameserver to match the listen address of dnsmasq, rather than setting
networking.resolvconf.useLocalResolver
. SettingresolveLocalQueries = false
was one of the first things I tried and it didn’t make a difference, which was unexpected (because kresd was setting it).If that weren’t enough,
services.dnsmasq.resolveLocalQueries
setsnetworking.nameservers
to include 127.0.0.1, which makes things more confusing, and is wrong because I haven’t configured dnsmasq to listen on this address. -
dnsmasq’s
--interface=<interface name>
setting doesn’t work either as I expected or as it is documented.-i, –interface=<interface name>
Listen only on the specified interface(s).I had this set to listen only on the LAN interface and I could see it nevertheless listening on
*:53
inlsof
, rather than the addresses of the LAN interface.-
This behaviour is mentioned, under a different option,
--bind-interfaces
:On systems which support it, dnsmasq binds the wildcard address, even when it is listening on only some interfaces. It then discards requests that it shouldn’t reply to. This has the advantage of working even when interfaces come and go and change address. This option forces dnsmasq to really bind only the interfaces it is listening on. About the only time when this is useful is when running another nameserver (or another instance of dnsmasq) on the same machine.
I had enabled this setting when I set up kresd on the machine, because the last sentence applies to my case and interfaces are not ‘coming and going’2. The naming is weird as it would suggest it works like
--interface
; namely that--bind-interfaces=<interface name>
would be a reasonable use. Alas, it is a boolean flag and takes no value. -
Even when
--interface
and--bind-interfaces
are set, dnsmasq decides to ignore my intent and explicitly listen on loopback. This is weird, but, guess what, documented back under--interface
:Dnsmasq automatically adds the loopback (local) interface to the list of interfaces to use when the –interface option is used.
This explains why enabling kresd didn’t break things before; dnsmasq was listening on 127.0.0.1, even though I thought I had told it not to.
-
-
-
The option name
networking.resolvconf.useLocalResolver
and its documentation are unclear.Use local DNS server for resolving.
This sentence adds no additional data not present in the option name. Local to what? I’m on a Local Area Network and wish to use a DNS server on the LAN, should I enable this? No, that’s not what this option is for. It could mean local to this host, looking at its usage. (I deliberately avoided combining the words ‘local’ and ‘host’, for reasons below)
-
Even if it had a clearer name like
useLocalhostResolver
, the inaccuracy would remain, asgetent hosts localhost
andgetent ahosts localhost
both prefer ::1 over 127.0.0.1, not the other way around. -
What would have happened if I had enabled this setting with a server listening on ::1 and not 127.0.0.1? It would work, but not be the best setting as applications would attempt to reach a nameserver that is not listening on 127.0.0.1.
-
It might be confusing to reference a hostname when talking about nameserver reachability, since an IP address is required to reach a nameserver and a nameserver
iscould be required to resolve a hostname. Settingnameserver localhost
in/etc/resolv.conf
won’t work, I thought.-
I tried it. Neovim didn’t highlight it as an error3 and it did not break name resolution, presumably because
localhost
is added to/etc/hosts
and is a special case innss-myhostname
, the existence of which I might not have known had I not set up multicast DNS to resolve.local
hostnames in/etc/nsswitch.conf
in the past,The hostnames “localhost” and “localhost.localdomain” (as well as any hostname ending in “.localhost” or “.localhost.localdomain”) are resolved to the IP addresses 127.0.0.1 and ::1.
Why is localhost added to
/etc/hosts
if it’s handled bynss-myhostname
in/etc/nsswitch.conf?
? The man page ofnsswitch.conf
(mirror) gives a small clue:
/etc/nsswitch.conf
is used by the GNU C Library and certain other applications4Why isn’t even the choice of a name resolution mechanism for localhost unified in 2024?
- I tried on macOS and iOS. Both allow setting a named nameserver in the GUI, which surprised me. I remember that on earlier versions of Windows (at least on XP, Vista and 7) there were special input boxes that exclusively allowed an IPv4 address, although there was a separate dialogue to input IPv6 addresses. I wonder if that allows hostnames or not, but I don’t have Windows running at the moment to check.
-
-
-
Setting
networking.resolvconf.enable = false
doesn’t appear to do.. well.. anything.-
The generated by resolvconf comment in
/etc/resolv.conf
remains, as do the previous nameserver entries. -
/run/current-system/sw/bin/resolvconf
is not removed. -
man resolvconf
has content (because it’s a link toman resolvectl
, which is part ofsystemd-resolved
, which isn’t even enabled on this system) -
I was surprised that
/etc/resolv.conf
was writable at all, asallmany files under/etc/
are symlinks to their namesakes under/etc/static
, which itself is a symlink to a folder in the nix store which contains… more symlinks. Here I useresolvconf.conf
as an example, i.e. the configuration ofresolvconf
, the program that managesresolv.conf
.rwxrwxrwx 1 root root 27 May 26 23:14 /etc/resolvconf.conf -> /etc/static/resolvconf.conf lrwxrwxrwx 1 root root 51 May 26 23:14 /etc/static -> /nix/store/pm0yi93ak5kcvfmidv5lckzfixrh2gck-etc/etc/ lrwxrwxrwx 4 root root 63 Jan 1 1970 /nix/store/pm0yi93ak5kcvfmidv5lckzfixrh2gck-etc/etc/resolvconf.conf -> /nix/store/kf0lrhiqqqrc6w96h4qm0sysffnccx2d-etc-resolvconf.conf -r--r--r-- 3 root root 518 Jan 1 1970 /nix/store/kf0lrhiqqqrc6w96h4qm0sysffnccx2d-etc-resolvconf.conf
That’s too much indirection for me. If ‘we can solve any problem by introducing an extra level of indirection’, this suggests that at least three problems have been solved here.
Isn’t it odd that the symlinks are writable to all? I know that
/nix/store
is a read-only filesystem, but it looks odd. Upon searching the web for information, I was directed to the coreutilschmod
documentation:chmod
doesn’t change the permissions of symbolic links; thechmod
system call cannot change their permissions on most systems, and most systems ignore permissions of symbolic linksMost systems? What does this mean? Is it based on the filesystem used? I would assume that it doesn’t mean ’this is the case on Linux’ given the reference to the system call of the same name and that Linux was not mentioned. The man page for the system call mentions
flags can either be 0, or include the following flag:
AT_SYMLINK_NOFOLLOW
If pathname is a symbolic link, do not dereference it: instead operate on the link itself. This flag is not currently implemented.- The default value of
networking.resolvconf.enable
is!(config.environment.etc ? "resolv.conf")
, which I understand as if the content ofresolv.conf
isn’t otherwise assigned. - There are values in
networking.nameservers
, but these aren’t used as content forresolv.conf
, which I thought would have been reasonable.
What is the point of
networking.resolvconf.enable
, then? And what aboutnetworking.nameservers
? Where do its values even go? 5 -
-
This issue pushed me to drop flakes on the router so that
nixos-option
would help me as it does not support flakes6. -
The Tailscale module adds
resolvconf
to its path conditionally. The commit adding this condition explains that ‘trying to use [resolvconf] always fails because/etc/resolvconf.conf
contains anexit 1
’, which sounds perfectly reasonable.-
If
resolvconf
weren’t in tailscaled’s path, Tailscale would fall back to overwriting resolv.conf, which I found out about because it is a common enough problem/question to warrant a heading and its own page.This document is the most concise and informative clarification of my original issue; the last paragraph tells me everything I needed to know:
Even if you set
--accept-dns=false
, Tailscale’s MagicDNS server still replies at100.100.100.100
(orfd7a:115c:a1e0::53
), as long as MagicDNS is enabled on the tailnet. If you’d like to manually configure your DNS configuration, you can point*.ts.net
queries at100.100.100.100
.Sadly I didn’t look at this page earlier as Tailscale isn’t the one overwriting
/etc/resolv.conf
: it would have set the nameserver to be100.100.100.100
in that case. Its behaviour is reasonable as ‘there are an incredible number of ways to configure DNS on Linux’.- This blog post suggests that the upcoming (as of April 2021) Tailscale 1.8 will use/prefer using
systemd-resolved
to configure the system resolver - It convinced me that
systemd-resolved
would be the right choice even on a router as the nameserver should depend on the interface. Thanks Xe, I always like your posts!
- This blog post suggests that the upcoming (as of April 2021) Tailscale 1.8 will use/prefer using
-
This should be the end of my issues now then, right?
What happens when I enable systemd-resolved
and disable resolvconf
? The hilarity continues:
# resolv.conf(5) file generated by tailscale
# For more info, see https://tailscale.com/s/resolvconf-overwrite
# DO NOT EDIT THIS FILE BY HAND -- CHANGES WILL BE OVERWRITTEN
nameserver 100.100.100.100
search my-network.ts.net lan my-network.ts.net
I expected Tailscale not to overwrite resolv.conf
in this scenario, but instead configure systemd-resolved
(Adding the tailnet to the search domains without checking its presence is yet another issue). I think it’s a race condition that tailscaled
won that might have been caused by NixOS starting the services at the same time, however tailscaled.service
has After=systemd-resolved.service
. Restarting systemd-resolved
then tailscaled
explicitly did the right thing:
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
# [...]
nameserver 127.0.0.53
options edns0 trust-ad
search lan my-network.ts.net
DNS can be confusing sometimes!
-
Whilst checking to see if yak shaving was the right turn of phrase, Wiktionary suggested ‘when you’re up to your neck in alligators, it’s hard to remember that your initial objective was to drain the swamp’ which would be more fitting, but I first learned of this expression today and don’t think it’s as widely-known. ↩︎
-
Since the router has a dynamic
publicCGNAT IP address, it’s true that the addresses are changing, but that’s not relevant to dnsmasq given that it was not ever configured to listen on this interface. ↩︎ -
vim’s syntax highlighting in
/etc/resolv.conf
marksnameserver localhost
as an error, which is neat, but somewhat inaccurate here, as this does not appear to be invalid. ↩︎ -
As a Wikipedia editor would ask, which? ↩︎
-
I searched nixpkgs on Github and found it amusing that all the results where it is set correctly are in tests, but the other results show hardcoded definitions. ↩︎
-
After reading through the issue, the title does not appear to be accurate given that there are workarounds, however, upon loading the page and seeing that the issue is open since 2020 and has a small scroll bar, indicating many comments, it’s easy to be drawn to the assumption that it continues to be an issue. ↩︎