139 lines
6.4 KiB
ReStructuredText
139 lines
6.4 KiB
ReStructuredText
|
Multipath and Anaconda
|
||
|
======================
|
||
|
|
||
|
:Authors:
|
||
|
Ales Kozumplik <akozumpl@redhat.com>
|
||
|
|
||
|
Introduction
|
||
|
------------
|
||
|
|
||
|
If there are two block devices in your /dev for which udev reports the same
|
||
|
'ID_SERIAL' then you can create a certain device mapper device which arbitrarily
|
||
|
uses those devices to access the physical device. And that is Multipath [1].
|
||
|
|
||
|
For instance, suppose there are::
|
||
|
|
||
|
/dev/sda, with ID_SERIAL of 20090ef12700001d2, and
|
||
|
/dev/sdb, with the same ID_SERIAL.
|
||
|
|
||
|
Those are probably some adapters in the system that just connect your box to a
|
||
|
storage area network (SAN) somewhere. There are perhaps two cables, one for sda,
|
||
|
one for sdb, and if one of the cables gets cut the other can still transmit
|
||
|
data. Normally the system won't recognize that sda and sdb have this special
|
||
|
relation to each other, but by creating a suitable device map using multipath
|
||
|
tools [2] we can create a DM device /dev/mapper/mpatha and use it for storing
|
||
|
and retrieving data.
|
||
|
|
||
|
The device mapper then automatically routes IO requests to /dev/mapper/mpatha to
|
||
|
either sda or sdb depending on the load of the line or network congestion on the
|
||
|
particular network etc.
|
||
|
|
||
|
The nomenclature I will use here is:
|
||
|
- 'multipath device' for the smart /dev/mapper/mpathX device.
|
||
|
- 'multipath member device' for the '/dev/sdX' devices. Also 'a path'.
|
||
|
|
||
|
|
||
|
What is expected from Anaconda
|
||
|
------------------------------
|
||
|
|
||
|
Anaconda is expected to:
|
||
|
- detect that there are multipath devices present
|
||
|
- coalesce all relevant (e.g. exclusiveDisks) multipath devices.
|
||
|
- only let the user interact with the multipath devices in filtering,
|
||
|
cleardiskssel and partition screen, that is once we know 'sdc' and 'sdd' are
|
||
|
part of 'mpathb' show only 'mpathb' and never the paths.
|
||
|
- install bootloader and boot from an mpath device
|
||
|
- make it happen so all the multipath devices (carrying or not the root
|
||
|
filesystem) we used for installation are correctly coalesced in the booted
|
||
|
system. This is achieved by generating a suitable /etc/multipath.conf and
|
||
|
writing it into sysroot.
|
||
|
- be able to refer to mpath devices from kickstart, either by name like 'mpatha'
|
||
|
or by their id like 'disk/by-id/scsi-20090ef12700001d2'
|
||
|
|
||
|
|
||
|
How Anaconda handles multipath
|
||
|
------------------------------
|
||
|
|
||
|
To detect presence of multipath devices we rely on multipath tools. The same we
|
||
|
do for coalescing, see pyanaconda/storage/devicelibs/mpath.py, the file that
|
||
|
provides some abstraction from mpath tools. During the device scan we use the
|
||
|
'multipath -d' output to find out what devices are going to end up as multipath
|
||
|
members. The MultipathTopology object also enhances the multipath member's udev
|
||
|
dictionaries with 'ID_FS_TYPE' set to 'multipath_member' (yes, this is a hack
|
||
|
surviving from the original mpath implementation, and righteous is he who
|
||
|
eradicates it). This information is picked up by DeviceTree when populating
|
||
|
itself. Meaning, if 'sda' and 'sdb' are multipath member devices DeviceTree
|
||
|
gives them MultipathMember format and creates one MultipathDevice for them (we
|
||
|
know its name from 'multipath -d'). We end up with:
|
||
|
|
||
|
DiskDevice 'sda', format 'MultipathMember'
|
||
|
DiskDevice 'sdb', format 'MultipathMember'
|
||
|
MultipathDevice 'mpatha', parents are 'sda' and 'sdb'.
|
||
|
|
||
|
From then on, Anaconda only deals with the MultipathDevice and generally leaves
|
||
|
anything with 'MultipathMember' format alone (understand, this is an inert
|
||
|
format that really is not there but we use it just to mark the device as
|
||
|
"useless beyond a multipath member", kind of like MDRaidMember).
|
||
|
|
||
|
Partition happens over the multipath device and during the preinstallconfig step
|
||
|
/mnt/sysimage/etc/multipath.conf is created and filled with information about
|
||
|
the coalesced devices. This is handled in the Storage.write() method. It is
|
||
|
important this file and /etc/multipath/wwids (autogenerated by mpath tools)
|
||
|
make it to the sysimage before the dracut image is generated.
|
||
|
|
||
|
|
||
|
Debugging multipath bugs
|
||
|
-------------------------
|
||
|
|
||
|
Unlike with iSCSI, to reproduce a multipath bug one does not need the same
|
||
|
specific hardware as the reporter. Just found any box connected to a multipathed
|
||
|
SAN and you are fine (at the moment, connecting to the same iSCSI target through
|
||
|
its IPv4 and IPv6 address also produces a multipathed device).
|
||
|
|
||
|
On top of that, much of the necessary information is already included in the
|
||
|
anaconda logs or can be easily extracted from the reporter. The things to
|
||
|
particularly look at are:
|
||
|
|
||
|
- storage.log, the output around 'devices to scan for multipath' and 'devices
|
||
|
post multipath scan'. The latter shows a triple with regular disks, disks
|
||
|
comprising multipath devices and partitions. This helps you quickly find out
|
||
|
what the target system is about.
|
||
|
|
||
|
- this information is also in program.log's calls to 'multipath' [3]. If mpath
|
||
|
devices are mysteriously appearing/disappearing between filtering and
|
||
|
partitioning screens look at those. 'multipath -ll' is called to display
|
||
|
currently coalesced mpath devices, 'multipath -d' is called to show the mpath
|
||
|
devices that would be coalesced if we ran 'multipath' now. This is exploited
|
||
|
by the device filtering screen.
|
||
|
|
||
|
|
||
|
Future of multipath in Anaconda
|
||
|
--------------------------------
|
||
|
|
||
|
Overall as of RHEL6.2, the shape of multipath in Anaconda is good and what's
|
||
|
more important it is flexible enough to sustain new RFEs and bugs. Those are
|
||
|
however bugs that I expect to appear sometime soon:
|
||
|
|
||
|
- enable or disable mpath_friendly_names in kickstart. Disabling friendly names
|
||
|
just means the mpath devices are called by their wwid,
|
||
|
e.g. /dev/mapper/360334332345343234, not '/dev/mapper/mpathc'. This is
|
||
|
straightforward to implement.
|
||
|
- extend support for mpath devices in kickstart in general. Currently mpath
|
||
|
devices should be accepted in most commands but I am sure there will be corner
|
||
|
cases. Difficulty medium.
|
||
|
- [rawhide] stop extending the udev info dictionary with 'ID_FS_TYPE' and
|
||
|
'ID_MPATH_NAME'. Doing it this way is asking for the trouble if a dictionary
|
||
|
of particular mpath device is reloaded from udev without running it through
|
||
|
the MultipathTopology object as it will miss those entries (and DeviceTree
|
||
|
depends on them a lot). Difficulty hard, but includes a lot of pleasant
|
||
|
refactoring.
|
||
|
- Improve support for multipathing iSCSI devices. Someone might ask for it one
|
||
|
day (in fact, with the NIC bounding they already did), and it will make mpath
|
||
|
debugging possible on any virt machine with multiple virt NICs.
|
||
|
|
||
|
|
||
|
.. [1] http://akozumpl.fedorapeople.org/archive/Multipass.jpg
|
||
|
.. [2] http://christophe.varoqui.free.fr/
|
||
|
.. [3] 'man 8 multipath'
|
||
|
|