• Ben Swartzlander's avatar
    Avoid deleted iSCSI LUNs in the kernel · 6d23d8ed
    Ben Swartzlander authored
    This change ensures that iSCSI block devices are deleted after
    unmounting, and implements scanning of individual LUNs rather
    than scanning the whole iSCSI bus.
    
    In cases where an iSCSI bus is in use by more than one attachment,
    detaching used to leave behind phantom block devices, which could
    cause I/O errors, long timeouts, or even corruption in the case
    when the underlying LUN number was recycled. This change makes
    sure to flush references to the block devices after unmounting.
    
    The original iSCSI code scanned the whole target every time a LUN
    was attached. On storage controllers that export multiple LUNs on
    the same target IQN, this led to a situation where nodes would
    see SCSI disks that they weren't supposed to -- possibly dozens or
    hundreds of extra SCSI disks. This caused 3 significant problems:
    
    1) The large number of disks wasted resources on the node and
    caused a minor drag on performance.
    2) The scanning of all the devices caused a huge number of uevents
    from the kernel, causing udev to bog down for multiple minutes in
    some cases, triggering timeouts and other transient failures.
    3) Because Kubernetes was not tracking all the "extra" LUNs that
    got discovered, they would not get cleaned up until the last LUN
    on a particular target was detached, causing a logout. This led
    to significant complications:
    
    In the time window between when a LUN was unintentially scanned,
    and when it was removed due to a logout, if it was deleted on the
    backend, a phantom reference remained on the node. In the best
    case, the phantom LUN would cause I/O errors and timeouts in the
    udev system. In the worst case, the backend could reuse the LUN
    number for a new volume, and if that new volume were to be
    scheduled to a pod with a phantom reference to the old LUN by the
    same number, the initiator could get confused and possibly corrupt
    data on that volume.
    
    To avoid these problems, the new implementation only scans for
    the specific LUN number it expects to see. It's worth noting that
    the default behavior of iscsiadm is to automatically scan the
    whole bus on login. That behavior can be disabled by setting
    node.session.scan = manual
    in iscsid.conf, and for the reasons mentioned above, it is
    strongly recommended to set that option. This change still works
    regardless of the setting in iscsid.conf, and while automatic
    scanning will cause some problems, this change doesn't make the
    problems any worse, and can make things better in some cases.
    6d23d8ed
io_util.go 1.41 KB