When Tailscale MagicDNS isn’t

On a router, I have dnsmasq and kresd as DNS servers. Dnsmasq is accessible on the LAN interface and forwards queries to kresd, which is accessible on the loopback interface. This has been working for a long time.

I recently set up Tailscale and was confused as to why MagicDNS wasn’t working on this one device (I have two other NixOS devices that didn’t have any problems). I’m no stranger to investigating these problems, after using and tinkering with networking on Linux/FreeBSD for many years and if there’s a problem, it’s always DNS.

My first look at /etc/resolv.conf suggested things should be fine, I thought.

# Generated by resolvconf
search my-network.ts.net
nameserver 127.0.0.1
nameserver ::1

‘why is this even generated by resolvconf?’, I ask myself: this is a router with a static networking configuration and custom upstream nameservers. I’ll investigate that later, I tell myself.

I eventually realised that it should be using 100.100.100.100 for two reasons.

  1. A working machine’s /etc/resolv.conf contains:

    # Generated by resolvconf
    search my-network.ts.net
    nameserver 100.100.100.100
    options edns0
    
  2. The Tailscale dashboard gives me a hint under the nameservers section:

Nameservers

Set the nameservers used by devices on your network to resolve DNS queries. Learn more ↗

my-network.ts.net ✨MagicDNS
100.100.100.100

The linked documentation (DNS in Tailscale) doesn’t even mention 100.100.100.100, nor does the documentation for MagicDNS. It is explained as part of a blog post under the heading ‘how MagicDNS works’, but that’s not the first place I’d look.

This led me to try to find what might be responsible for the nameserver 127.0.0.1 setting:

  • I checked for systemd-resolved, but it wasn’t that (the nameserver would have been 127.0.0.53 if it were).
  • I didn’t think it was anything to do with dnsmasq because that’s configured to use the LAN interface, not the loopback.
  • I didn’t think it was kresd either, since another device uses kresd, but does not have the problem. kresd is not listening on a loopback address on the default port 53 on either machine, meaning that if it were doing this, DNS resolution would have been broken on both machines for a long time.

Eventually I stumbled upon something that worked: setting networking.resolvconf.useLocalResolver = false. I then started to investigate in order to open an issue with NixOS, but found things got more confusing and I forgot why I was shaving a yak1.


Unexpected behaviours

  1. The NixOS kresd module blindly sets networking.resolvconf.useLocalResolver to default to true because someone ran into resolver loops and this was accepted on the grounds that it’s ‘good to be consistent’ (with pdns-recursor, for example).

    • I consider this “spooky action at a distance”. I do not think it should do this, but I can see how it could be helpful. In enough cases, though? I’m not sure.
    • In my case, kresd was never even listening on localhost port 53, meaning that this default setting would have led to a broken DNS setup, which at least would at least have led me to investigate the right thing at that time. :::{.aside} Why didn’t it, then? :::
  2. The NixOS dnsmasq module sets networking.resolvconf.useLocalResolver = true if services.dnsmasq.resolveLocalQueries = true. This is at least less spooky and distant, but still faulty.

    1. I had indeed set services.dnsmasq.resolveLocalQueries = true, because I would like to be able to ping foo.lan on the router and it made that possible.

    2. My erroneous assumption was likely that this setting changes the system nameserver to match the listen address of dnsmasq, rather than setting networking.resolvconf.useLocalResolver. Setting resolveLocalQueries = false was one of the first things I tried and it didn’t make a difference, which was unexpected (because kresd was setting it).

      If that weren’t enough, services.dnsmasq.resolveLocalQueries sets networking.nameservers to include 127.0.0.1, which makes things more confusing, and is wrong because I haven’t configured dnsmasq to listen on this address.

    3. dnsmasq’s --interface=<interface name> setting doesn’t work either as I expected or as it is documented.

      -i, –interface=<interface name>
      Listen only on the specified interface(s).

      I had this set to listen only on the LAN interface and I could see it nevertheless listening on *:53 in lsof, rather than the addresses of the LAN interface.

      1. This behaviour is mentioned, under a different option, --bind-interfaces:

        On systems which support it, dnsmasq binds the wildcard address, even when it is listening on only some interfaces. It then discards requests that it shouldn’t reply to. This has the advantage of working even when interfaces come and go and change address. This option forces dnsmasq to really bind only the interfaces it is listening on. About the only time when this is useful is when running another nameserver (or another instance of dnsmasq) on the same machine.

        I had enabled this setting when I set up kresd on the machine, because the last sentence applies to my case and interfaces are not ‘coming and going’2. The naming is weird as it would suggest it works like --interface; namely that --bind-interfaces=<interface name> would be a reasonable use. Alas, it is a boolean flag and takes no value.

      2. Even when --interface and --bind-interfaces are set, dnsmasq decides to ignore my intent and explicitly listen on loopback. This is weird, but, guess what, documented back under --interface:

        Dnsmasq automatically adds the loopback (local) interface to the list of interfaces to use when the –interface option is used.

      This explains why enabling kresd didn’t break things before; dnsmasq was listening on 127.0.0.1, even though I thought I had told it not to.

  3. The option name networking.resolvconf.useLocalResolver and its documentation are unclear.

    Use local DNS server for resolving.

    This sentence adds no additional data not present in the option name. Local to what? I’m on a Local Area Network and wish to use a DNS server on the LAN, should I enable this? No, that’s not what this option is for. It could mean local to this host, looking at its usage. (I deliberately avoided combining the words ‘local’ and ‘host’, for reasons below)

    • Even if it had a clearer name like useLocalhostResolver, the inaccuracy would remain, as getent hosts localhost and getent ahosts localhost both prefer ::1 over 127.0.0.1, not the other way around.

    • What would have happened if I had enabled this setting with a server listening on ::1 and not 127.0.0.1? It would work, but not be the best setting as applications would attempt to reach a nameserver that is not listening on 127.0.0.1.

    • It might be confusing to reference a hostname when talking about nameserver reachability, since an IP address is required to reach a nameserver and a nameserver is could be required to resolve a hostname. Setting nameserver localhost in /etc/resolv.conf won’t work, I thought.

      • I tried it. Neovim didn’t highlight it as an error3 and it did not break name resolution, presumably because localhost is added to /etc/hosts and is a special case in nss-myhostname, the existence of which I might not have known had I not set up multicast DNS to resolve .local hostnames in /etc/nsswitch.conf in the past,

        The hostnames “localhost” and “localhost.localdomain” (as well as any hostname ending in “.localhost” or “.localhost.localdomain”) are resolved to the IP addresses 127.0.0.1 and ::1.

        Why is localhost added to /etc/hosts if it’s handled by nss-myhostname in /etc/nsswitch.conf?? The man page of nsswitch.conf (mirror) gives a small clue:

      /etc/nsswitch.conf is used by the GNU C Library and certain other applications4

      Why isn’t even the choice of a name resolution mechanism for localhost unified in 2024?

      • I tried on macOS and iOS. Both allow setting a named nameserver in the GUI, which surprised me. I remember that on earlier versions of Windows (at least on XP, Vista and 7) there were special input boxes that exclusively allowed an IPv4 address, although there was a separate dialogue to input IPv6 addresses. I wonder if that allows hostnames or not, but I don’t have Windows running at the moment to check.
  4. Setting networking.resolvconf.enable = false doesn’t appear to do.. well.. anything.

    • The generated by resolvconf comment in /etc/resolv.conf remains, as do the previous nameserver entries.

    • /run/current-system/sw/bin/resolvconf is not removed.

    • man resolvconf has content (because it’s a link to man resolvectl, which is part of systemd-resolved, which isn’t even enabled on this system)

    • I was surprised that /etc/resolv.conf was writable at all, as all many files under /etc/ are symlinks to their namesakes under /etc/static, which itself is a symlink to a folder in the nix store which contains… more symlinks. Here I use resolvconf.conf as an example, i.e. the configuration of resolvconf, the program that manages resolv.conf.

       rwxrwxrwx 1 root root  27 May 26 23:14 /etc/resolvconf.conf -> /etc/static/resolvconf.conf
      lrwxrwxrwx 1 root root  51 May 26 23:14 /etc/static -> /nix/store/pm0yi93ak5kcvfmidv5lckzfixrh2gck-etc/etc/
      lrwxrwxrwx 4 root root  63 Jan  1  1970 /nix/store/pm0yi93ak5kcvfmidv5lckzfixrh2gck-etc/etc/resolvconf.conf -> /nix/store/kf0lrhiqqqrc6w96h4qm0sysffnccx2d-etc-resolvconf.conf
      -r--r--r-- 3 root root 518 Jan  1  1970 /nix/store/kf0lrhiqqqrc6w96h4qm0sysffnccx2d-etc-resolvconf.conf
      

      That’s too much indirection for me. If ‘we can solve any problem by introducing an extra level of indirection’, this suggests that at least three problems have been solved here.

      Isn’t it odd that the symlinks are writable to all? I know that /nix/store is a read-only filesystem, but it looks odd. Upon searching the web for information, I was directed to the coreutils chmod documentation:

      chmod doesn’t change the permissions of symbolic links; the chmod system call cannot change their permissions on most systems, and most systems ignore permissions of symbolic links

      Most systems? What does this mean? Is it based on the filesystem used? I would assume that it doesn’t mean ’this is the case on Linux’ given the reference to the system call of the same name and that Linux was not mentioned. The man page for the system call mentions

      flags can either be 0, or include the following flag:


    AT_SYMLINK_NOFOLLOW
    If pathname is a symbolic link, do not dereference it: instead operate on the link itself. This flag is not currently implemented.

    • The default value of networking.resolvconf.enable is !(config.environment.etc ? "resolv.conf"), which I understand as if the content of resolv.conf isn’t otherwise assigned.
    • There are values in networking.nameservers, but these aren’t used as content for resolv.conf, which I thought would have been reasonable.

    What is the point of networking.resolvconf.enable, then? And what about networking.nameservers? Where do its values even go? 5

  5. This issue pushed me to drop flakes on the router so that nixos-option would help me as it does not support flakes6.

  6. The Tailscale module adds resolvconf to its path conditionally. The commit adding this condition explains that ‘trying to use [resolvconf] always fails because /etc/resolvconf.conf contains an exit 1’, which sounds perfectly reasonable.

    • If resolvconf weren’t in tailscaled’s path, Tailscale would fall back to overwriting resolv.conf, which I found out about because it is a common enough problem/question to warrant a heading and its own page.

      This document is the most concise and informative clarification of my original issue; the last paragraph tells me everything I needed to know:

      Even if you set --accept-dns=false, Tailscale’s MagicDNS server still replies at 100.100.100.100 (or fd7a:115c:a1e0::53), as long as MagicDNS is enabled on the tailnet. If you’d like to manually configure your DNS configuration, you can point *.ts.net queries at 100.100.100.100.

      Sadly I didn’t look at this page earlier as Tailscale isn’t the one overwriting /etc/resolv.conf: it would have set the nameserver to be 100.100.100.100 in that case. Its behaviour is reasonable as ‘there are an incredible number of ways to configure DNS on Linux’.

      • This blog post suggests that the upcoming (as of April 2021) Tailscale 1.8 will use/prefer using systemd-resolved to configure the system resolver
      • It convinced me that systemd-resolved would be the right choice even on a router as the nameserver should depend on the interface. Thanks Xe, I always like your posts!

This should be the end of my issues now then, right?

What happens when I enable systemd-resolved and disable resolvconf? The hilarity continues:

# resolv.conf(5) file generated by tailscale
# For more info, see https://tailscale.com/s/resolvconf-overwrite
# DO NOT EDIT THIS FILE BY HAND -- CHANGES WILL BE OVERWRITTEN
nameserver 100.100.100.100
search my-network.ts.net lan my-network.ts.net

I expected Tailscale not to overwrite resolv.conf in this scenario, but instead configure systemd-resolved (Adding the tailnet to the search domains without checking its presence is yet another issue). I think it’s a race condition that tailscaled won that might have been caused by NixOS starting the services at the same time, however tailscaled.service has After=systemd-resolved.service. Restarting systemd-resolved then tailscaled explicitly did the right thing:

# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
# [...]
nameserver 127.0.0.53
options edns0 trust-ad
search lan my-network.ts.net

DNS can be confusing sometimes!


  1. Whilst checking to see if yak shaving was the right turn of phrase, Wiktionary suggested ‘when you’re up to your neck in alligators, it’s hard to remember that your initial objective was to drain the swamp’ which would be more fitting, but I first learned of this expression today and don’t think it’s as widely-known. ↩︎

  2. Since the router has a dynamic public CGNAT IP address, it’s true that the addresses are changing, but that’s not relevant to dnsmasq given that it was not ever configured to listen on this interface. ↩︎

  3. vim’s syntax highlighting in /etc/resolv.conf marks nameserver localhost as an error, which is neat, but somewhat inaccurate here, as this does not appear to be invalid. ↩︎

  4. As a Wikipedia editor would ask, which↩︎

  5. I searched nixpkgs on Github and found it amusing that all the results where it is set correctly are in tests, but the other results show hardcoded definitions. ↩︎

  6. After reading through the issue, the title does not appear to be accurate given that there are workarounds, however, upon loading the page and seeing that the issue is open since 2020 and has a small scroll bar, indicating many comments, it’s easy to be drawn to the assumption that it continues to be an issue. ↩︎

Tags: