Migrating Existing System to RAID 1

Tue, 24 Sep 2024

Since moving to SSDs in my laptops (long long ago), I’ve been uneasy due to the abruptness of the SSD failure mode. When HDDs failed you’d usually, before complete failure, see an increase in SMART errors or hear strange sounds. Even if you did receive read errors, a call to ddrescue would typically succeed in cloning the drive.

SSDs, on the other hand, just reboot to complete failure. Poof, gone everything, no obvious warnings, no sounds and ddrescue will not help you.

Of course, I always have backups. A script in /etc/network/if-up.d checks whether I’m connecting to my home network and triggers an rsync of my home folders. Loss of data isn’t a concern, but recovery time and latency are something to be minimized.

These days, even ultra-portables make room for two NVME slots and that means RAID! With RAID, if a drive goes poof, mdadm send a very friendly DegradedArray email and the system keeps running. I order a new drive and install it at some future convenient time. No worries, no frantic rebuilding for my presentation in an hour, it is all just smooth sailing with zero recovery time and latency.

Goals

I have an existing drive in my machine running with my system on it. The existing drive is set up with three partitions: EFI, /boot, and an LVM group with a few volumes. I have a blank (new) drive. I want,

The methods used below allow for all types of reshaping of the partitions or filesystem migrations.

This document is an adaptation from an ArchWiki article.

Conventions

I’ll use environment variables for each of the drives in these commands. Since this process involves rebooting and changing logins, make sure you remember to set these variables in each new session!

OLD=/dev/nvme0n1
NEW=/dev/nvme1n1

This process was done on a Debian 12 system. I would expect it to work unmodified for any Debian-based (or likely any other Linux) distribution. I’m not using anything particularly new or unstable so these instructions should work for a variety of older or newer systems. Whenever I use a non-standard command, I will include a comment nearby that says which package is needed for that command (e.g., # apt install less).

Disclaimer

These commands worked for me on my system. As such, various assumptions are baked into the commands. Read through the whole document before starting to make sure I’m doing something you actually want to do. Further,

THE DOCUMENT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THIS DOCUMENT OR THE USE OR OTHER DEALINGS IN THIS DOCUMENT.

Backup

Make sure you have backups of your system!

Install device and partition

First step, shutdown and physically install the new drive. You’re on your own for this. When you reboot, you can run normally for this first part (no need for single-user mode or a recovery disk).

Set 4k mode on the new device

My new drives support configurable LBA sizes, 4096 byte or 512 byte emulation. Since 4k is the future, we might as well jump right in. Most of this section comes from Jonathan Bisson.

To query whether your device supports 4k block sizes, use one of these commands:

$ sudo smartctl -c $NEW                 # apt install smartmontools
Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         1
 1 -    4096       0         0

# alternative:
$ sudo nvme id-ns -H $NEW               # apt install nvme-cli
LBA Format  0 : Metadata Size: 0   bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better (in use)
LBA Format  1 : Metadata Size: 0   bytes - Data Size: 4096 bytes - Relative Performance: 0 Best

If supported, you can select the 4k format using its ID. If you get an NVMe ACCESS_DENIED error, you may need to restart your machine (it sounds like the drive doesn’t like you to modify this setting if the OS might have accessed and cached the setting).

Reboot immediately if you modify this setting.

nvme format --lbaf=1 $NEW               # This will DELETE EVERYTHING on that drive!

Set-up your new partition layout

Our migration method will be very flexible so you can pretty-much do whatever you want here. My system uses coreboot, so I’ll need a standard EFI partition. Then, I am setting up the rest of the device for Linux MD-RAID.

This is a somewhat strange setup for a machine that plans to use btrfs since btrfs has its own raid implementation, but there are a few advantages to externalizing the RAID implementation.

The first benefit is that I can run LVM on MD-RAID. This will allow multiple growable partitions of potentially different filesystem types. There is great freedom in running an LVM layer.

The other benefit is that MD-RAID has, in my opinion, better behavior in the presence of a drive failure. When you boot a system with MD-RAID and a drive has failed, the remaining degraded array will happily mount and run with just some warnings printed. You DO need to monitor your boot logs or (better) have a working email setup so that emails generated by mdadm will reach you.

Btrfs filesystems will, by default, refuse to mount in the event of a failed disk. This is nice for getting your attention even if you don’t check your logs or have a working email setup, but not nice when grub fails to boot your machine just before a presentation. You can still boot the system by editing the machine boot flags from the grub interactive menu. You append rootflags=degraded to get the root filesystem to mount. If you also have additional mounts (which are not tagged nofail in your fstab) then the system will still not fully boot until you modify your fstab to include the degraded option on all required mounts. One technically could set the degraded flag all the time, but this is not recommended and I don’t like doing things that the developers of my filesystem recommend against.

To create the partitions, I use gdisk (which has pretty much the same UI as fdisk). Read some other sources or the gdisk man page for details, I just give a very brief summary here.

$ sudo gdisk $NEW                      # apt install gdisk
Command (? for help):

At the interactive prompt, create a “n“ew partition, number “1” using the default start sector and and some reasonable size (say 512M for EFI). The code for EFI partitions is EF00. Then, I created a new partition consuming the rest of the disk of type FD00 (Linux RAID). Type “p” at the prompt to print out the planned partition table.

$ sudo gdisk $NEW                      # apt install gdisk
Command (? for help): p
Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048         1050623   512.0 MiB   EF00  EFI system partition
   2         1050624      3907028991   1.8 TiB     FD00  Linux RAID

Changes aren’t made until you commit them using the “w“rite command. If you mess up, type “q” to quit without saving the changes.

EFI partition mirroring

Some internet guides suggest setting up RAID with metadata version 0.9 (metadata at end of partition) to achieve this. This is not a good idea because it is too easy for something to make modifications to one of the partitions when not mounted as a RAID device.

A much more robust strategy is to use a grub hook to keep the partitions in sync as suggested in the Debian Wiki which I’ll repeat with some modifications here.

Any future Debian or Ubuntu update that modifies the EFI partition will also update grub and the grub hook will therefore automatically update your backup EFI partition as well.

Set up the new disk

Set up a degraded RAID

When creating the new array, mdadm will not have all devices available to it, so if your old drive is smaller than your new drive, you must manually specify the RAID size rather than rely on he automatic “max” size. Size is specified in binary-kilobytes by default. Make sure the size you choose is no more than the final partition size your old disk will have, not the full disk size! Here I limit the RAID to about 931 GiB.

# apt install mdadm
sudo mdadm --create /dev/md0 --level=1 -n 2 --size 976261120  missing ${NEW}p2

Set up LVM and filesystems

LVM and filesystems have a lot of options. Feel free to go wild and shape your new drive however you like.

My laptop is called tsalmoth so I name my LVM volume group accordingly so that I don’t have conflicts with my old volume group or if I need to read these drives from another system at a future date.

# apt install lvm2
sudo vgcreate tsalmoth_2024 /dev/md0

I like to keep separate OS and user (not /usr) partitions to contain potential disasters. I also create a swap on the LVM.

Yes, swap on the LVM also puts the swap on the RAID which is admittedly quite odd. However, on modern systems with plenty of RAM and no HDD seek penalty, swap is really just a place to hold leaked memory and perhaps for hibernation if that’s even still a thing. On my machines, I decrease the swappiness by setting vm.swappiness=40 in /etc/sysctl.d/local.conf. Therefore, any waste or slowdown by having such a strange swap setup is irrelevant for my situation because swap just isn’t used for anything significant. If you need better swap handling, you’ll probably want to create dedicated swap partitions on each drive directly in gdisk above.

# apt install lvm2
sudo lvcreate --name swap      --size  10g tsalmoth_2024
sudo lvcreate --name root      --size 120g tsalmoth_2024
sudo lvcreate --name user      --size 800g tsalmoth_2024
sudo lvcreate --name docker    --size  20g tsalmoth_2024

Then create the filesystems,

sudo mkswap /dev/tsalmoth_2024/swap
# apt install btrfs-progs
sudo mkfs.btrfs --checksum xxhash /dev/tsalmoth_2024/root
sudo mkfs.btrfs --checksum xxhash /dev/tsalmoth_2024/user

If you use xxhash while it is still not the default, then you will need to make sure the xxhash algorithm is included in your initramfs,

echo xxhash-generic | sudo tee -a /etc/initramfs-tools/modules

For the root partition, I create a subvolume for actual “/” mountpoint and set that as the default mount target

sudo mount /dev/tsalmoth_2024/root /mnt
sudo btrfs subvolume create /mnt/debian-root
sudo btrfs subvolume set-default /mnt/debian-root
sudo umount /mnt

In the user filesystem, I create some subvolumes based on backup and lifetime strategies so that the --one-file-system argument remains useful in tar,

sudo mount /dev/tsalmoth_2024/user /mnt
sudo btrfs subvolume create /mnt/home
sudo btrfs subvolume create /mnt/cache
sudo btrfs subvolume create /mnt/backup
sudo umount /mnt

Copy data to new drive

Now you need to reboot the system into a USB recovery drive or else boot into single-user mode so that important files don’t change while you transfer them to the new drive.

If some or all of your partitions were created identically on the new drive, then you can dd those partitions over to the new drive.

sudo dd if=/dev/tsalmoth_2010/docker of=/dev/tsalmoth_2024/docker bs=1M

If you cloned any btrfs partitions, regenerate UUIDs on the new partitions in order to mount them (also update your fstab entries if you mount by ID).

btrfstune -u /dev/tsalmoth_2024/docker

If you did some more complicated reshaping of partitions or changed file system types, use rsync with options for maximal copy fidelity. For example,

sudo mkdir /mnt/A /mnt/B

# Old hierarchy (ext4)
sudo mount /dev/tsalmoth_2010/root  /mnt/A        -o ro
sudo mount ${OLD}p2                 /mnt/A/boot   -o ro
sudo mount /dev/tsalmoth_2010/home  /mnt/A/home   -o ro
sudo mount /dev/tsalmoth_2010/cache /mnt/A/cache  -o ro

# New hierarchy (btrfs)
sudo mount /dev/tsalmoth_2024/root /mnt/B         -o subvol=debian-root
sudo mount /dev/tsalmoth_2024/user /mnt/B/home    -o subvol=home
sudo mount /dev/tsalmoth_2024/user /mnt/B/cache   -o subvol=cache
sudo mount /dev/tsalmoth_2024/user /mnt/B/backup  -o subvol=backup

# Faithful copy
# apt install rsync
sudo rsync -a -AXUHS /mnt/A/ /mnt/B/

# Clean up the mounts
sudo umount /mnt/A/cache
...

Actually, for maximal maximal fidelity, you’d typically include --del --one-file-system. However, --del isn’t necessary in the above example since /mnt/B will be empty (it is new), and we do want to cross filesystems in /mnt/A this time.

Prepare to boot to new drive

Unmount and hide old partitions

To avoid confusing any of the grub updates that we will be doing, I “hid” my old partitions by deleting them (just the partition info, not the data).

Unmount all mounts pointing to the old drive before continuing!

sudo gdisk -l $OLD >old.parts
sudo gdisk $OLD
   Command (? for help): d
   Partition number (1-2): 2
   Command (? for help): d
   Partition number (1-2): 3

I deleted only my old /boot and LVM partitions. I did not delete the EFI partition, that is to be preserved and updated by the following commands.

This is a convenient and only somewhat dangerous trick. When gdisk deletes a partition it only changes the GPT partition information and does not erase, trim, or discard any of the partition data. If we need the data back we need only recreate the deleted partitions exactly using the partition data backed up in the first command — I am trusting, however, that there are not any NVME controllers out there that try to parse GPT partition info and “helpfully” discard any deleted partitions.

Mount the new partitions

The time has come to mount the new system. When doing this, I just mount everything as the new system would mount it – including partitions that aren’t really needed by grub and mkinitramfs.

NOTE: If you still have most partitions mounted at /mnt/B, you should either unmount them before mounting at /mnt or else adjust the following to mount at /mnt/B and skip over any duplicate mounts.

mount /dev/tsalmoth_2024/root /mnt
mount /dev/nvme0n1p1 /mnt/boot/efi
mount /dev  /mnt/dev  -o rbind
mount /sys  /mnt/sys  -o rbind
mount /proc /mnt/proc -o rbind
mount /run  /mnt/run  -o rbind
mount /dev/tsalmoth_2024/user   /mnt/home   -o subvol=home
mount /dev/tsalmoth_2024/user   /mnt/cache  -o subvol=cache
mount /dev/tsalmoth_2024/user   /mnt/backup -o subvol=backup
mount /dev/tsalmoth_2024/docker /mnt/var/lib/docker

Update fstab

Update newly mounted /mnt/etc/fstab. Example,

/dev/mapper/tsalmoth_2024-root   /               btrfs   subvol=debian-root
/dev/mapper/tsalmoth_2024-root   /_btrfs         btrfs   subvolid=5
/dev/nvme0n1p1                   /boot/efi       vfat    defaults
/dev/nvme1n1p1                   /boot/efi2      vfat    defaults,nofail,noauto
/dev/mapper/tsalmoth_2024-docker /var/lib/docker btrfs   defaults
/dev/mapper/tsalmoth_2024-user   /home           btrfs   defaults,subvol=home
/dev/mapper/tsalmoth_2024-user   /cache          btrfs   defaults,subvol=cache
/dev/mapper/tsalmoth_2024-user   /_user          btrfs   defaults

I also use lazytime,discard=async on all the btrfs mounts but omitted them above to reduce line length.

I mount my btrfs main volumes at /_btrfs and /_user so I can take snapshots into the main volume.

Now is also a good time to search through /mnt/etc for any occurrences of the old drive or LVM volume groups (e.g., sudo grep -r nvme0n1 /etc). You can ignore matches in /etc/lvm/archive and /etc/lvm/backup (generally include a “Hint only” comment).

Update boot

Now you must chroot into your new drive,

sudo chroot /mnt

The next several command blocks should be run from within the chroot.

The fist action is to clean out obsolete EFI boot variables.

# You are now root user inside your machine as mounted at /mnt
# apt install efibootmgr
efibootmgr                                 # List EFI boots
efibootmgr --delete-bootnum --bootnum N    # Remove any obsolete bootables

Next regenerate the boot files and configuration.

# You are now root user inside your machine as mounted at /mnt
update-initramfs -c -k all
dpkg-reconfigure grub-efi-amd64

Boot to new system

Exit out of the chroot and reboot the system!

If all went well, your system should boot to your own desktop, just like nothing has changed. If you deleted the old partitions as recommended above, you are guaranteed to be running from the new drive. If you did not delete the old partitions, double-check that the system really did boot into the new drive and hasn’t mounted any partitions from the old drive.

If things didn’t work out so well, don’t panic! Keep a level head and examine the situation carefully. Gather information before attempting to make any change. Make sure you understand and keep a record of any commands you run.

If the system boots but some things aren’t working correctly, you can make your fixes in the running system.

If the system fails to boot you may need to go back to your rescue disk, re-mount all the new partitions and chroot to the new system as you did above before examining the system state and looking for configuration issues.

Commit

Repartition the old drive

Once your system is working correctly from the new drive, it is time to perform the destructive changes to the old drive. Start up gdisk again and create the necessary RAID partitions.

# apt install gdisk
gdisk $OLD     # Create RAID partition(s)

Create raid partitions matching the new drive and write the changes. Gdisk just changes the partition table and leaves data in places. Thus, it is possible that Linux will examine the new RAID partition you created and see your old LVM volume group (or whatever partitions you had before). The presence of these metadata can create problems when you try to add the partition to your RAID, so you should blank out the metadata with dd (make sure you get your partition (“p2”) information correct). Writing 100MB to the front of the partition is usually sufficient.

sudo dd if=/dev/zero of=${OLD}p2 bs=1M count=100

Reboot the system one last time to clear out the partition metadata.

Add RAID device

Finally, we can complete our RAID array by adding the RAID partition on the old drive to the array. Make sure you get your partition (“p2”) information correct.

# apt install mdadm
sudo mdadm /dev/md0 -a ${OLD}p2

This will trigger an array sync. The progress can be tracked using,

cat /proc/mdstat

Once the rebuild completes, you are done!

Success!

Once finished, you can have a celebratory drink! Just don’t spill the drink on your laptop as that might take out both drives at once negating the benefit of RAID :)

Set Zero

Home

2024

2023

2022

2021

2020