As I might have mentioned before, we’re using OpenWrt-based routers to connect satellite offices and road warriors to our head office.
Our OpenWrt routers are at the “Backfire” level, with 2.6 kernels and strongSwan 4.5.
With dynamic addresses assigned to the satellite routers, we went for a simple hub&spoke setup and have “local exits”, which means that users at the remote offices are accessing other Internet services via local DMZ servers (rather than routing everything to the central office first). Access to central services was implemented via a tunnel per target IP subnet. This lead to a lot of tunnels and a rather lengthy setup, especially since we have established a mechanism to update fire wall rules per satellite, both at the remote and the central end of each tunnel, whenever a tunnel goes up or is brought down.
In a recent attempt to optimize and simplify configuration settings, we decided to set up single tunnels in a sort of “catch-all” manner – instead of tunneling per subnet (i.e. “10.99.1.0 to 10.98.1.0” plus “10.99.1.0 to 10.98.2.0”), we created a tunnel from i.e. 10.99.1.0/24 to 10.0.0.0/8 and disabled the old tunnel definitions.
Changing and activating the tunnel definitions on the central VPN router was done earlier on and created no problem. We mirrored the change to the satellite’s ipsec.conf and restarted IPsec… and lost access to the satellite router.
We were testing this sitting in front of one of the satellite VPN routers, so that, in case of problems, we were able to access and fix the satellite router via direct access. But unexpectedly, we found no way to login to the box :(.
We could not ping the router, could not telnet it, there were no ARP replies from it, nothing. Judging by the router LEDs, everything looked good. A reboot of the router didn’t help: It would answer to ICMP echo requests for some seconds, then again: silence. The timing indicated that the problem occurs as soon as the IPsec tunnel is brought up.
Surprisingly, we could access the router from our central office, via the IPsec tunnel, which was obviously established successfully.
Looking at the router’s standard diagnostic output gave no indication of the problem’s source: Both the LAN interface and WAN was up and seemed to be functional, “netstat -rn” gave no unusual routing table entries, the tunnel was active as configured.
Looking at a network dump (done on the satellite router) , we received the proper hint: While all traffic was received on the LAN interface as expected, no traffic was sent out. Since the standard routing table was set up properly (a default route via the WAN interface and a route for the local subnet), we had a look at the policy-based routing (“ip xfrm policy”) and saw three entries for the IPsec tunnel (especially saying that everything going to 10.0.0.0/8 is to be routed via the IPsec tunnel), but none for the local subnet.
This is a change in the way IPsec traffic is handled, which was introduced with kernel 2.6. Up to now, we’ve had no need to deal with this, as our current tunnels were all “per-subnet”. Now, as we have created a “super-net connection”, we have to to shunt the traffic to the local network, similar to creating more specific subnet entries in a standard Linux routing table when you have more general rules pointing to the wrong router.
The strongSwan pages have an example for configuring shunt policies, but for one reason or another, we weren’t able to get the “local-net” connection to apply. So we decided to chicken out and used our existing scripts to add the proper rules once the IPsec tunnel is established:
ip xfrm policy add src 10.99.1.0/24 dst 10.99.1.0/24 proto any dir in ptype main priority 1500
ip xfrm policy add src 10.99.1.0/24 dst 10.99.1.0/24 proto any dir out ptype main priority 1500
ip xfrm policy add src 10.99.1.0/24 dst 10.99.1.0/24 proto any dir fwd ptype main priority 1500
Basically, we’re adding three policies for local traffic (in, out and forwarding) with higher priority than the rules added by strongSwan (which means “a lower numeric priority value”).
Once these policies were in place, we could again reach our router from the local subnet for things like DHCP and DNS.