Traefik routes intermittently timeout in Docker Swarm due to service DNS resolving to stale VIP instead of task IPs

# Traefik + Docker Swarm DNS Resolution Issue

## To Reproduce

1. Install Dokploy on a single-node Docker Swarm setup.
2. Deploy an application as a Docker Swarm service (e.g. frontend app listening on port `3000`).
3. Expose the application through Traefik using the service name as backend (default Dokploy behavior):

   ```
   http://<service-name>:3000
   ```

4. Access the application via a domain routed through Traefik.
5. Observe intermittent timeouts / `404` errors.
6. Inside the Traefik container, resolve the service name:

   ```bash
   getent ahostsv4 <service-name>
   ```

7. Notice that Docker DNS returns multiple IPs, including a stale VIP.
8. Traefik randomly selects one of the returned IPs and may hit the non-routable VIP, causing timeouts.

---

## Current vs. Expected Behavior

### Expected Behavior

Traefik should consistently route traffic to a healthy backend container when using a Docker Swarm service as an upstream.

### Current Behavior

Traefik intermittently times out because Docker DNS resolves the service name to:

- a valid task IP (working)
- a stale service VIP (not routable in this setup)

Traefik may select the VIP, resulting in connection timeouts.

---

## Environment Information

```bash
Operating System:
  Debian GNU/Linux 13 (trixie)

Kernel:
  6.17.2-2-pve

Architecture:
  x86_64 / amd64

Docker Engine:
  Docker Engine â€“ Community 28.5.0
  API version: 1.51
  Containerd: v2.2.1
  runc: 1.3.4

Docker mode:
  Docker Swarm active
  Single-node cluster
  Managers: 1
  Nodes: 1
  Node address: 10.202.20.128

Dokploy:
  Image: dokploy/dokploy:latest
  Running as Docker Swarm service
  (single replica on the same node)

Traefik:
  Image: traefik:v3.6.1
  Version: 3.6.1
  Codename: ramequin
  OS/Arch: linux/amd64
  Deployed and managed by Dokploy

Deployment type:
  Applications are deployed on the same server where Dokploy is installed

Application type:
  Frontend application built with Nixpacks
  Served by Caddy web server
  Internal listening port: 3000
  Deployed as a Docker Swarm service
```

---

## Affected Area(s)

- Traefik
- Docker

---

## Deployment Location

Applications are deployed on the **same server** where Dokploy is installed.

---

## Technical Investigation (Commands & Findings)

1. Application is healthy inside the container:

   ```bash
   docker exec -it <task-container> curl http://127.0.0.1:3000
   # HTTP/1.1 200 OK
   ```

2. Application is reachable via task IP:

   ```bash
   docker exec -it <task-container> curl http://<task-ip>:3000
   # HTTP/1.1 200 OK
   ```

3. Traefik can reach the task IP directly:

   ```bash
   docker exec -it dokploy-traefik wget http://<task-ip>:3000
   # HTTP/1.1 200 OK
   ```

4. Service name resolves to multiple IPs:

   ```bash
   docker exec -it dokploy-traefik getent ahostsv4 <service-name>
   ```

   Example output:

   ```
   10.0.1.43   # service VIP (stale / not routable)
   10.0.1.62   # active task IP
   ```

5. VIP IP does NOT belong to any container:

   ```bash
   docker network inspect dokploy-network | grep 10.0.1.43
   # no output
   ```

6. Enabling DNSRR does not remove VIP from DNS:

   ```bash
   docker service update --endpoint-mode dnsrr <service-name>
   ```

   DNS still returns both IPs:

   ```
   10.0.1.43
   10.0.1.62
   ```

7. Using `tasks.<service-name>` resolves only real task IPs:

   ```bash
   docker exec -it dokploy-traefik nslookup tasks.<service-name> 127.0.0.11
   ```

   Output:

   ```
   Address: 10.0.1.62
   ```

8. Traefik successfully connects using tasks DNS:

   ```bash
   docker exec -it dokploy-traefik wget http://tasks.<service-name>:3000
   # HTTP/1.1 200 OK
   ```

---

## Root Cause

Docker Swarm DNS resolves a service name to both:

- the service VIP
- the task IPs

In single-node Swarm setups (especially with published ports), the VIP may be non-functional.  
Traefik may randomly select this VIP, leading to intermittent routing failures.

---

## Workaround / Solution

Configure Traefik backends to use task-specific DNS instead of the service name:

```
http://tasks.<service-name>:<port>
```

This guarantees that Traefik routes traffic only to active task containers and avoids stale VIPs entirely.

---

## Affected Components

- Traefik
- Docker Swarm service discovery
- Dokploy auto-generated Traefik configuration

---

## Additional Context

This issue is reproducible on a clean single-node Swarm installation and disappears immediately when switching Traefik upstreams from `<service-name>` to `tasks.<service-name>`.

---

## Will You Send a PR to Fix It?

No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Traefik routes intermittently timeout in Docker Swarm due to service DNS resolving to stale VIP instead of task IPs #3480

Traefik + Docker Swarm DNS Resolution Issue

To Reproduce

Current vs. Expected Behavior

Expected Behavior

Current Behavior

Environment Information

Affected Area(s)

Deployment Location

Technical Investigation (Commands & Findings)

Root Cause

Workaround / Solution

Affected Components

Additional Context

Will You Send a PR to Fix It?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Traefik routes intermittently timeout in Docker Swarm due to service DNS resolving to stale VIP instead of task IPs #3480

Description

Traefik + Docker Swarm DNS Resolution Issue

To Reproduce

Current vs. Expected Behavior

Expected Behavior

Current Behavior

Environment Information

Affected Area(s)

Deployment Location

Technical Investigation (Commands & Findings)

Root Cause

Workaround / Solution

Affected Components

Additional Context

Will You Send a PR to Fix It?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions