Redundant iSCSI storage for Linux

5.00 avg. rating (91% score) - 1 vote

Here’s how to set up relatively cheap redundant iSCSI storage on Linux. The redundancy is achieved using LVM mirroring, and the storage servers consist of commodity hardware, running the OpenFiler Linux distribution, which expose their disks to the clients using iSCSI over Ethernet. The servers are completely separate entities, and the purpose of this mirroring is to keep the logical volumes available, even while one of the storage servers is down for maintenance or due to hardware failure.

Ultimately the disks of the iSCSI target servers will show up as normal SCSI disks on the client (/dev/sdb, /dev/sdc, …). The data will be moved across the network transparently. It is preferable to use multiple gigabit network interface cards on both the initiator and the target, and bond them together for reliability and speed gain (or use Device Mapper Multipath). A separate VLAN for iSCSI traffic is recommended for security and speed. By default, the traffic is not encrypted so your disk blocks can easily be sniffed using tcpdump.

I created identical logical volumes on both OpenFiler servers and mapped them to iSCSI targets. The iSCSI initiator (client) here is an Ubuntu 9.04 desktop.

Install Open-iSCSI and map targets

On the client, install Open-iSCSI.

Run the discovery to see available targets (the IP address is the address of one of the servers).

You should get a target list as the output.

Map the target to a SCSI disk.

dmesg should now show a that a new SCSI disk was detected.

You can now use the disk as a normal SCSI disk.

Discover the second storage server.

Target found:

Map the target.

Make persistent across reboots

The discovered nodes will automatically show up under /etc/iscsi/nodes. If you wish to make them available automatically after reboot, change the following line in the corresponding node file:

Change to:

Partition with fdisk (optional)

I partitioned the disks with fdisk. This is optional, but I like to do it because it makes easier to detect the type of the disk just by checking the partition table.

The LVM Part

Install Logical Volume Manager.

Create physical volumes and the volume group.

Create a mirrored logical volume.

Create filesystem and mount.

Speed

Test read speeds.

10 MB per second is about the max I can get with this test system which uses 100 Mbit/s ethernet.

On a production system, gigabit is a must (preferably multiple links bonded).

Status of the Mirrored Logical Volume

To check the status of the mirrored logical volume, run the command “lvs”:

The Copy% will show the percentage of copied extents. 100% indicates the mirrors are synced. Whenever a mirror is out-of-sync and is being updated, the percentage will be less.

The commands “lvdisplay -m” and “pvdisplay -m” will show you a detailed map of the extents on the physical volumes:

lvdisplay -m

pvdisplay -m

Testing for failure

When the other iSCSI server was brought down, it took about two minutes before the iSCSI initiator gave up. After this, the mounted volume was working without problems. During the two-minute timeout countdown, some slowness and waiting was experienced.

After the iSCSI server was brought up again, the other half of the mirror was restored and synced automatically. In conclusion, I would say my mirrored logical volume can be thought of as highly available.

It seems that the timeout value can be set in the node configuration file (although I didn’t test it):

Links and references

6 thoughts on “Redundant iSCSI storage for Linux”

  1. Hi,

    how did you end up with disabled read cache on sdb?

    i’m trying to setup a cluster filesystem on iSCSI, and i can’t disable read cache so all servers see different data on the disk 🙁

    thanks…

  2. Bill, I have to admit I don’t know. That’s how it was by default. Perhaps a setting somewhere under /etc/iscsi…?

  3. Hi,

    I seem to have the same problem as Bill. I have two nodes sharing the same ISCSI device, and if node1 writes to that device (I used dd), node2 keeps reading from its cache. If I invalidate the cache manually (echo 3 > /proc/sys/vm/drop_caches) on node2, I see the correct content.

    Strange enough, this only happens on my Linux Cluster based on VMware. On a bare metal Linux (same version and same Openfiler as ISCSI target) this does not happen.

    Any suggestions ?

    Many thanks in advance
    Reinhard

  4. Actually, this post describes a single host using dual storage servers for mirroring the same data. So it is storage server high availability, not clustering two nodes with a shared filesystem.

    I guess you really do need to disable caching on the nodes if you are using more than one client node. I don’t really know how to do that – someone wiser could comment on this.

    As a side note, I discovered that GFS works with dual client nodes, but performance is horrible when using LVM mirrored disks underneath (2 GFS nodes, 2 storage servers). I would like to test OCFS2 in this regard, but my test system is down, perhaps permanently, so I probably need to build a new one when I have some time. My guess is that DRBD mirroring is the way to go instead of LVM mirroring.

  5. You absolutely NEED a clustered filesystem to use the same iSCSI disk (LUN) from more than one client. Look for GFS2 or OCFS2. You can make LVM VG on top of this shared iSCSI disk, but then You need the same for LVM metadada. Look for CLVM.

  6. Where do the LVM knows, which of the two iscsi disks in a mirror is the one with the newest data? I mean if one of the iscsi disk get disconnected and come back after some time (or reboot), which disk is mirrored to the other?

Leave a Reply to Hugo Cancel reply