Trying to use Linphone for video calls

We’re playing with a number of video-capable VoIP solutions for Linux, and linphone was one of them.

We have an openSuSE 12.2 test system with webcam and installed the distributed version of Linphone (version 3.5.2). Registering it with our Asterisk server was a piece of cake, we then activated “video” and “self-image” in the Linphone drop-down menu and gave it a test drive.

But unfortunately, we got no video. Not on our end, not at the remote end. We checked the webcam configuration, the remaining system configuration, the Asterisk server, the remote client and even the watering status of the office plants: All to no avail – although “video” was activated and all other applications could access the device, we wouldn’t get any video call established.

Following a faint idea, we checked for debugging options within Linphone and found an according command line switch to activate verbose output.

And there it was, already on the unbelievable 228th line of the output:

> linphone --verbose
[...]
linphone-warning : This version of linphone was built without video support.

Which, by the way, is the same with the Packman version of this package.

Searching the net reveals the following source code segment, which clearly shows that this is some sort of compile-time option:

#ifndef VIDEO_ENABLED
      if (vcap_enabled || display_enabled)
            ms_warning("This version of linphone was built without video support.");
#endif

I’m not blaming the Linphone developers for not packaging the openSuSE or the Packman version without Linux support – this is something the team simply cannot influence.

I’m blaming the Linphone developers for not using the same compile-time switch to decide whether to hide or to show the menu entries that seemingly let the user enable video support. At least show a message to the user that while the menu entry works, video support has been disabled at compile time.

The way it is currently implemented has been a real waste of time.

Posted in Video conferencing | Tagged | Leave a comment

OpenWrt, strongSwan and policy-based routing

As I might have mentioned before, we’re using OpenWrt-based routers to connect satellite offices and road warriors to our head office.

Our OpenWrt routers are at the “Backfire” level, with 2.6 kernels and strongSwan 4.5.

With dynamic addresses assigned to the satellite routers, we went for a simple hub&spoke setup and have “local exits”, which means that users at the remote offices are accessing other Internet services via local DMZ servers (rather than routing everything to the central office first). Access to central services was implemented via a tunnel per target IP subnet. This lead to a lot of tunnels and a rather lengthy setup, especially since we have established a mechanism to update fire wall rules per satellite, both at the remote and the central end of each tunnel, whenever a tunnel goes up or is brought down.

In a recent attempt to optimize and simplify configuration settings, we decided to set up single tunnels in a sort of “catch-all” manner – instead of tunneling per subnet (i.e. “10.99.1.0 to 10.98.1.0” plus “10.99.1.0 to 10.98.2.0”), we created a tunnel from i.e. 10.99.1.0/24  to 10.0.0.0/8 and disabled the old tunnel definitions.

Changing and activating the tunnel definitions on the central VPN router was done earlier on and created no problem. We mirrored the change to the satellite’s ipsec.conf and restarted IPsec… and lost access to the satellite router.

We were testing this sitting in front of one of the satellite VPN routers, so that, in case of problems, we were able to access and fix the satellite router via direct access. But unexpectedly, we found no way to login to the box :(.

We could not ping the router, could not telnet it, there were no ARP replies from it, nothing. Judging by the router LEDs, everything looked good. A reboot of the router didn’t help: It would answer to ICMP echo requests for some seconds, then again: silence. The timing indicated that the problem occurs as soon as the IPsec tunnel is brought up.

Surprisingly, we could access the router from our central office, via the IPsec tunnel, which was obviously established successfully.

Looking at the router’s standard diagnostic output gave no indication of the problem’s source: Both the LAN interface and WAN was up and seemed to be functional, “netstat -rn” gave no unusual routing table entries, the tunnel was active as configured.

Looking at a network dump (done on the satellite router) , we received the proper hint: While all traffic was received on the LAN interface as expected, no traffic was sent out. Since the standard routing table was set up properly (a default route via the WAN interface and a route for the local subnet), we had a look at the policy-based routing (“ip xfrm policy”) and saw three entries for the IPsec tunnel (especially saying that everything going to 10.0.0.0/8 is to be routed via the IPsec tunnel), but none for the local subnet.

This is a change in the way IPsec traffic is handled, which was introduced with kernel 2.6. Up to now, we’ve had no need to deal with this, as our current tunnels were all “per-subnet”. Now, as we have created a “super-net connection”, we have to to shunt the traffic to the local network, similar to creating more specific subnet entries in a standard Linux routing table when you have more general rules pointing to the wrong router.

The strongSwan pages have an example for configuring shunt policies, but for one reason or another, we weren’t able to get the “local-net” connection to apply.  So we decided to chicken out and used our existing scripts to add the proper rules once the IPsec tunnel is established:

ip xfrm policy add src 10.99.1.0/24 dst 10.99.1.0/24 proto any dir in ptype main priority 1500
ip xfrm policy add src 10.99.1.0/24 dst 10.99.1.0/24 proto any dir out ptype main priority 1500
ip xfrm policy add src 10.99.1.0/24 dst 10.99.1.0/24 proto any dir fwd ptype main priority 1500

Basically, we’re adding three policies for local traffic (in, out and forwarding) with higher priority than the rules added by strongSwan (which means “a lower numeric priority value”).

Once these policies were in place, we could again reach our router from the local subnet for things like DHCP and DNS.

Posted in howto, Linux | Tagged , , , | Leave a comment

A quick work-around for the NFS mount bug

Sometimes, we’re stuck with older systems. Not because we like ’em, but because our customers need to keep an old version of something for one reason or another. And as a side-effect of this, we’re stuck with old bugs.

One of these oldies is the “NFS page allocation failure” bug in Linux, which manifests as failing mount commands and a syslog message like:

mount.nfs: page allocation failure. order:4, mode:0xd0

We’re running into these quite often on an openSuSE 11.4 system, but have seen these on a more recent openSuSE 12.1, too. On the 11.4 machine the problem usually persists – when we’re hit, we can no longer mount any NFS file systems.

There are two typical scenarios when you may expect this error to occur: One is during auto-mounted file systems, where you simply cannot access the requested file(s) as the mount fails (silently to the user), the other is during direct invocation of the “mount” command on the command line or within some shell script.

By pure chance I’ve come across something that can help as a work-around if you’re struck by this bug during manual or script invocation: Issuing a simple “umount” does fix the situation quite regularily for me!

So when your NFS mount fails, do an umount and retry the mount command. While the umount will of course fail (there’s no mounted filesystem, since the original mount failed), the following mount succeeds:

# mount nfserver:/path/to/share /your/mount/point
[insert your favorite error message here]
# umount /your/mount/point
umount: /your/mount/point: not mounted
# mount nfserver:/path/to/share /your/mount/point
#

Of course, this also works if your mount details are hidden within /etc/fstab and you’re running “mount /your/mount/point” without explicitly mentioning the NFS server.

First of all… your milage may vary. (Then please leave a comment and let me know!)

But secondly, which puzzles me most: Detailed discussions of this bug usually address the point that “mount.nfs” is doing an order 4 allocation, which is considered too much. But if mount.nfs is the source of trouble, then how does invoking “umount” cure this? Remember, the umount is issued against an unmounted directory, there’s no indication to the system that this is to be the location of a future NFS mount.

And thirdly, this all of course does not really help in the auto-mounter scenario. I don’t have enough “sample data” to determine if the single “umount” will cure this until the next reboot, or if the failure may re-occur any time after. I don’t believe that an early “umount” (issued before failures happen) will be of any help, and I haven’t tested if *any* umount will do or if you have to try to un-mount the exact (previously) failing mount point. So if you’re using autofs to mount some user directories from NAS and see the failure messages in syslog, you’ll be better off looking for a true solution to this problem.

Speaking of solutions: It is suggested that setting the Linux kernel compile-time option “CONFIG_NFS_USE_NEW_IDMAPPER” will fix this. And I’ve seen reports that a kernel version 3.3.0 has this issued fixed. Whether that helps, I cannot tell, but I’ve never seen this with a 3.4.x kernel, so there’s hope.

Finally, I’d really like to know from you if issuing the “umount” helped you. Please, if you’re reading this because you ran into the problem and have a chance to test my suggestion, please leave a comment and tell me if it worked. After all, I wouldn’t want to bark up the wrong tree.

Posted in howto, Linux | Leave a comment