We’ve been having trouble with our SAN server for years, up to severe disk content corruption. We’ve been using a pre-packaged software called DSS, and the problem persisted up to the current release.
Maybe our setup isn’t like that of most other DSS users, and we’re pushing the software to higher limits – but all well within the defined functionality (which is, by the way, quite nice: I’d still recommend DSS for the general user, despite the problems that led to this article). But since the company that created DSS was unable to help us, we’ve taken things in our own hands. They tried, and I don’t want to leave the impression their support is unresponsive, unfriendly or not good – it’s just that the company seems to be out of resources in the Fiber Channel area and resorted to simple re-packaging an open-source component there.
We’re running a set of SLES11 servers as a Xen cluster, using Fiber Channel storage as virtual disks. In preparation of a future feature of Xen (being able to use virtual FC adapters inside the VMs) we gave each VM a virtual HBA address, a feature already present in the Fiber Chanel protocol and named NPIV. Simply speaking, we’re creating virtual fiber channel adapters on our Xen servers, one per VM. The disk resources of each VM are accessed via its virtual HBA, and since the virtual HBA gets created during startup of the VM (and destructed after shutting down the VM), we have only disks of active VMs attached to a Xen server, giving less chances to accidentally corrupt them by parallel access.
What happened is that we accidentally corrupted those disks simply by moving VMs between cluster nodes.
After a lot of digging we found out what and why it happens: DSS uses the code of the SCST open-source project to provide iSCSI and Fiber channel target support. And in the piece of code that provides the target services for our hardware setup (“qla2x00t”), there is a bug – a serious one. We’ve since have provided a patch which is included in head of development, but i.e. not in the 2.2 release that was put out just a few days ago. Target milestone is said to be version 3.0, so be warned: If you run into the following scenario, you’re running into trouble.
From the SCST target’s point of view, NPIV adapters (“vHBA”) on any initiator are “slots in a table”, one table per real initiator. The initiator itself again simply keeps a corresponding table, and when you destruct one vHBA and recreate another, the new one inherits the “slot” of the former. Typically it has a WWPN different from that of the old virtual adapter, so it can be distinguished – but without the mentioned patch, the SCST target won’t bother!
So without the patched version, you’ll run into one of two nasty situations:
- You destruct vHBA A and create vHBA B on the same physical initiator, without creating vHBA A on another node. From SCST’s point of view, it still thinks it is vHBA A and serves the virtual disks defined for the WWPN of A to the server – which is running the VM that expects the disk of vHBA B. You’ll get surprising results, just like you get when you boot up a physical machine with the wrong disk installed.
- You destruct vHBA A and create vHBA B on a physical initiator,and create vHBA A on a second node. This is a common case when you migrate a VM from node 1 to node 2 and then start another VM on node 1.
What happens in addition to the first case is that the migrated VM on node 2 gets its own virtual disks – the same disk space that the VM B on node 1 accesses. Two VMs live on on disk: Yes, you’re in trouble. This is the *really* nasty case. And we’ve had more than a few of these, until we found out what’s going on.
Unfortunately, the DSS-providing company has not switched to using the patched version ’til this day, so we’re on our own. Fortunately, we’re skilled enough to be on our own – and luckily our “we want open source software” (and I’m not talking about the “free beer” notion) has payed off again. Had it been closed source, we’d have had no chance to even dig to the root cause.