Migrating Existing System to RAID 1
Tue, 24 Sep 2024
- Goals
- Conventions
- Disclaimer
- Backup
- Install device and partition
- EFI partition mirroring
- Set up the new disk
- Copy data to new drive
- Prepare to boot to new drive
- Boot to new system
- Commit
- Success!
Since moving to SSDs in my laptops (long long ago), I’ve been uneasy due to the
abruptness of the SSD failure mode. When HDDs failed you’d usually, before complete
failure, see an increase in SMART errors or hear strange sounds. Even if you did
receive read errors, a call to ddrescue
would typically succeed in cloning the drive.
SSDs, on the other hand, just reboot to complete failure. Poof, gone everything, no obvious warnings, no sounds and ddrescue will not help you.
Of course, I always have backups. A script in /etc/network/if-up.d
checks whether
I’m connecting to my home network and triggers an rsync
of my home folders. Loss of
data isn’t a concern, but recovery time and latency are something to be minimized.
These days, even ultra-portables make room for two NVME slots and that means RAID! With RAID, if a drive goes poof, mdadm send a very friendly DegradedArray email and the system keeps running. I order a new drive and install it at some future convenient time. No worries, no frantic rebuilding for my presentation in an hour, it is all just smooth sailing with zero recovery time and latency.
Goals
I have an existing drive in my machine running with my system on it. The existing drive is set up with three partitions: EFI, /boot, and an LVM group with a few volumes. I have a blank (new) drive. I want,
- RAID 1 the whole system (remove separate
/boot
) - Switch from ext4 to btrfs
- EFI partition mirroring
- No reinstall
- Reasonable length down time
The methods used below allow for all types of reshaping of the partitions or filesystem migrations.
This document is an adaptation from an ArchWiki article.
Conventions
I’ll use environment variables for each of the drives in these commands. Since this process involves rebooting and changing logins, make sure you remember to set these variables in each new session!
OLD=/dev/nvme0n1
NEW=/dev/nvme1n1
This process was done on a Debian 12 system. I would expect it to work unmodified for
any Debian-based (or likely any other Linux) distribution. I’m not using anything
particularly new or unstable so these instructions should work for a variety of older
or newer systems. Whenever I use a non-standard command, I will include a comment
nearby that says which package is needed for that command (e.g., # apt install
less
).
Disclaimer
These commands worked for me on my system. As such, various assumptions are baked into the commands. Read through the whole document before starting to make sure I’m doing something you actually want to do. Further,
THE DOCUMENT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THIS DOCUMENT OR THE USE OR OTHER DEALINGS IN THIS DOCUMENT.
Backup
Make sure you have backups of your system!
Install device and partition
First step, shutdown and physically install the new drive. You’re on your own for this. When you reboot, you can run normally for this first part (no need for single-user mode or a recovery disk).
Set 4k mode on the new device
My new drives support configurable LBA sizes, 4096 byte or 512 byte emulation. Since 4k is the future, we might as well jump right in. Most of this section comes from Jonathan Bisson.
To query whether your device supports 4k block sizes, use one of these commands:
$ sudo smartctl -c $NEW # apt install smartmontools
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 1
1 - 4096 0 0
# alternative:
$ sudo nvme id-ns -H $NEW # apt install nvme-cli
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better (in use)
LBA Format 1 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best
If supported, you can select the 4k format using its ID. If you get an NVMe
ACCESS_DENIED
error, you may need to restart your machine (it sounds like the drive
doesn’t like you to modify this setting if the OS might have accessed and cached the
setting).
Reboot immediately if you modify this setting.
nvme format --lbaf=1 $NEW # This will DELETE EVERYTHING on that drive!
Set-up your new partition layout
Our migration method will be very flexible so you can pretty-much do whatever you want here. My system uses coreboot, so I’ll need a standard EFI partition. Then, I am setting up the rest of the device for Linux MD-RAID.
This is a somewhat strange setup for a machine that plans to use btrfs since btrfs has its own raid implementation, but there are a few advantages to externalizing the RAID implementation.
The first benefit is that I can run LVM on MD-RAID. This will allow multiple growable partitions of potentially different filesystem types. There is great freedom in running an LVM layer.
The other benefit is that MD-RAID has, in my opinion, better behavior in the presence of a drive failure. When you boot a system with MD-RAID and a drive has failed, the remaining degraded array will happily mount and run with just some warnings printed. You DO need to monitor your boot logs or (better) have a working email setup so that emails generated by mdadm will reach you.
Btrfs filesystems will, by default, refuse to mount in the event of a failed disk. This is nice for getting your attention even if you don’t check your logs or have a working email setup, but not nice when grub fails to boot your machine just before a presentation. You can still boot the system by editing the machine boot flags from the grub interactive menu. You append
rootflags=degraded
to get the root filesystem to mount. If you also have additional mounts (which are not taggednofail
in your fstab) then the system will still not fully boot until you modify your fstab to include thedegraded
option on all required mounts. One technically could set the degraded flag all the time, but this is not recommended and I don’t like doing things that the developers of my filesystem recommend against.
To create the partitions, I use
gdisk
(which has pretty
much the same UI as fdisk). Read some other sources or the gdisk man page for
details, I just give a very brief summary here.
$ sudo gdisk $NEW # apt install gdisk
Command (? for help):
At the interactive prompt, create a “n
“ew partition, number “1
” using the default
start sector and and some reasonable size (say 512M
for EFI). The code for EFI
partitions is EF00
. Then, I created a new partition consuming the rest of the disk
of type FD00
(Linux RAID). Type “p
” at the prompt to print out the planned
partition table.
$ sudo gdisk $NEW # apt install gdisk
Command (? for help): p
Number Start (sector) End (sector) Size Code Name
1 2048 1050623 512.0 MiB EF00 EFI system partition
2 1050624 3907028991 1.8 TiB FD00 Linux RAID
Changes aren’t made until you commit them using the “w
“rite command. If you mess
up, type “q
” to quit without saving the changes.
EFI partition mirroring
Some internet guides suggest setting up RAID with metadata version 0.9 (metadata at end of partition) to achieve this. This is not a good idea because it is too easy for something to make modifications to one of the partitions when not mounted as a RAID device.
A much more robust strategy is to use a grub hook to keep the partitions in sync as suggested in the Debian Wiki which I’ll repeat with some modifications here.
-
Create a FAT-32 filesystem
$ sudo mkfs.fat -F 32 ${NEW}p1
-
Create an fstab entry for the backup partition. It can mount anywhere, but
/boot/efi2
is a reasonable choice. The fstab entry can be be given thenoauto
option to keep it protected during normal machine use.# in /etc/fstab /dev/nvme1n1p1 /boot/efi2 vfat defaults,nofail,noauto 0 1
-
Create a grub hook
/etc/grub.d/90_copy_to_efi_backup
(chmod 0755) containing:#!/bin/sh # https://wiki.debian.org/UEFI#RAID_for_the_EFI_System_Partition if mountpoint --quiet --nofollow /boot/efi; then # mount to allow noauto fstab flag (ignore errors in case already mounted) mount /boot/efi2 rsync --times --recursive --delete /boot/efi/ /boot/efi2/ umount /boot/efi2 fi exit 0
-
For an immediate update, run
sudo update-grub
. The backup partition will be populated from the main efi partition then unmounted — remember this part when you look to see if it worked :)
Any future Debian or Ubuntu update that modifies the EFI partition will also update grub and the grub hook will therefore automatically update your backup EFI partition as well.
Set up the new disk
Set up a degraded RAID
When creating the new array, mdadm will not have all devices available to it, so if your old drive is smaller than your new drive, you must manually specify the RAID size rather than rely on he automatic “max” size. Size is specified in binary-kilobytes by default. Make sure the size you choose is no more than the final partition size your old disk will have, not the full disk size! Here I limit the RAID to about 931 GiB.
# apt install mdadm
sudo mdadm --create /dev/md0 --level=1 -n 2 --size 976261120 missing ${NEW}p2
Set up LVM and filesystems
LVM and filesystems have a lot of options. Feel free to go wild and shape your new drive however you like.
My laptop is called tsalmoth so I name my LVM volume group accordingly so that I don’t have conflicts with my old volume group or if I need to read these drives from another system at a future date.
# apt install lvm2
sudo vgcreate tsalmoth_2024 /dev/md0
I like to keep separate OS and user (not /usr) partitions to contain potential disasters. I also create a swap on the LVM.
Yes, swap on the LVM also puts the swap on the RAID which is admittedly quite odd. However, on modern systems with plenty of RAM and no HDD seek penalty, swap is really just a place to hold leaked memory and perhaps for hibernation if that’s even still a thing. On my machines, I decrease the swappiness by setting
vm.swappiness=40
in/etc/sysctl.d/local.conf
. Therefore, any waste or slowdown by having such a strange swap setup is irrelevant for my situation because swap just isn’t used for anything significant. If you need better swap handling, you’ll probably want to create dedicated swap partitions on each drive directly in gdisk above.
# apt install lvm2
sudo lvcreate --name swap --size 10g tsalmoth_2024
sudo lvcreate --name root --size 120g tsalmoth_2024
sudo lvcreate --name user --size 800g tsalmoth_2024
sudo lvcreate --name docker --size 20g tsalmoth_2024
Then create the filesystems,
sudo mkswap /dev/tsalmoth_2024/swap
# apt install btrfs-progs
sudo mkfs.btrfs --checksum xxhash /dev/tsalmoth_2024/root
sudo mkfs.btrfs --checksum xxhash /dev/tsalmoth_2024/user
If you use xxhash while it is still not the default, then you will need to make sure the xxhash algorithm is included in your initramfs,
echo xxhash-generic | sudo tee -a /etc/initramfs-tools/modules
For the root partition, I create a subvolume for actual “/” mountpoint and set that as the default mount target
sudo mount /dev/tsalmoth_2024/root /mnt
sudo btrfs subvolume create /mnt/debian-root
sudo btrfs subvolume set-default /mnt/debian-root
sudo umount /mnt
In the user filesystem, I create some subvolumes based on backup and lifetime
strategies so that the --one-file-system
argument remains useful in tar,
sudo mount /dev/tsalmoth_2024/user /mnt
sudo btrfs subvolume create /mnt/home
sudo btrfs subvolume create /mnt/cache
sudo btrfs subvolume create /mnt/backup
sudo umount /mnt
Copy data to new drive
Now you need to reboot the system into a USB recovery drive or else boot into single-user mode so that important files don’t change while you transfer them to the new drive.
If some or all of your partitions were created identically on the new drive, then you
can dd
those partitions over to the new drive.
sudo dd if=/dev/tsalmoth_2010/docker of=/dev/tsalmoth_2024/docker bs=1M
If you cloned any btrfs partitions, regenerate UUIDs on the new partitions in order to mount them (also update your fstab entries if you mount by ID).
btrfstune -u /dev/tsalmoth_2024/docker
If you did some more complicated reshaping of partitions or changed file system
types, use rsync
with options for maximal copy fidelity. For example,
sudo mkdir /mnt/A /mnt/B
# Old hierarchy (ext4)
sudo mount /dev/tsalmoth_2010/root /mnt/A -o ro
sudo mount ${OLD}p2 /mnt/A/boot -o ro
sudo mount /dev/tsalmoth_2010/home /mnt/A/home -o ro
sudo mount /dev/tsalmoth_2010/cache /mnt/A/cache -o ro
# New hierarchy (btrfs)
sudo mount /dev/tsalmoth_2024/root /mnt/B -o subvol=debian-root
sudo mount /dev/tsalmoth_2024/user /mnt/B/home -o subvol=home
sudo mount /dev/tsalmoth_2024/user /mnt/B/cache -o subvol=cache
sudo mount /dev/tsalmoth_2024/user /mnt/B/backup -o subvol=backup
# Faithful copy
# apt install rsync
sudo rsync -a -AXUHS /mnt/A/ /mnt/B/
# Clean up the mounts
sudo umount /mnt/A/cache
...
Actually, for maximal maximal fidelity, you’d typically include --del --numeric-ids
--one-file-system
. However, --del
isn’t necessary in the above example since
/mnt/B
will be empty (it is new), --numeric-ids
should only be necessary in
strange situations, and we do want to cross filesystems in /mnt/A
this time.
Prepare to boot to new drive
Unmount and hide old partitions
To avoid confusing any of the grub updates that we will be doing, I “hid” my old partitions by deleting them (just the partition info, not the data).
Unmount all mounts pointing to the old drive before continuing!
sudo gdisk -l $OLD >old.parts
sudo gdisk $OLD
Command (? for help): d
Partition number (1-2): 2
Command (? for help): d
Partition number (1-2): 3
I deleted only my old /boot and LVM partitions. I did not delete the EFI partition, that is to be preserved and updated by the following commands.
This is a convenient and only somewhat dangerous trick. When gdisk
deletes a
partition it only changes the GPT partition information and does not erase, trim, or
discard any of the partition data. If we need the data back we need only recreate the
deleted partitions exactly using the partition data backed up in the first command
— I am trusting, however, that there are not any NVME controllers out there that
try to parse GPT partition info and “helpfully” discard any deleted partitions.
Mount the new partitions
The time has come to mount the new system. When doing this, I just mount everything as the new system would mount it – including partitions that aren’t really needed by grub and mkinitramfs.
NOTE: If you still have most partitions mounted at /mnt/B
, you should either
unmount them before mounting at /mnt
or else adjust the following to mount at
/mnt/B
and skip over any duplicate mounts.
mount /dev/tsalmoth_2024/root /mnt
mount /dev/nvme0n1p1 /mnt/boot/efi
mount /dev /mnt/dev -o rbind
mount /sys /mnt/sys -o rbind
mount /proc /mnt/proc -o rbind
mount /run /mnt/run -o rbind
mount /dev/tsalmoth_2024/user /mnt/home -o subvol=home
mount /dev/tsalmoth_2024/user /mnt/cache -o subvol=cache
mount /dev/tsalmoth_2024/user /mnt/backup -o subvol=backup
mount /dev/tsalmoth_2024/docker /mnt/var/lib/docker
Update fstab
Update newly mounted /mnt/etc/fstab
. Example,
/dev/mapper/tsalmoth_2024-root / btrfs subvol=debian-root
/dev/mapper/tsalmoth_2024-root /_btrfs btrfs subvolid=5
/dev/nvme0n1p1 /boot/efi vfat defaults
/dev/nvme1n1p1 /boot/efi2 vfat defaults,nofail,noauto
/dev/mapper/tsalmoth_2024-docker /var/lib/docker btrfs defaults
/dev/mapper/tsalmoth_2024-user /home btrfs defaults,subvol=home
/dev/mapper/tsalmoth_2024-user /cache btrfs defaults,subvol=cache
/dev/mapper/tsalmoth_2024-user /_user btrfs defaults
I also use lazytime,discard=async
on all the btrfs mounts but omitted them above to
reduce line length.
I mount my btrfs main volumes at /_btrfs
and /_user
so I can take snapshots into
the main volume.
Now is also a good time to search through /mnt/etc
for any occurrences of the old
drive or LVM volume groups (e.g., sudo grep -r nvme0n1 /etc
). You can ignore
matches in /etc/lvm/archive
and /etc/lvm/backup
(generally include a “Hint only”
comment).
Update boot
Now you must chroot into your new drive,
sudo chroot /mnt
The next several command blocks should be run from within the chroot.
The fist action is to clean out obsolete EFI boot variables.
# You are now root user inside your machine as mounted at /mnt
# apt install efibootmgr
efibootmgr # List EFI boots
efibootmgr --delete-bootnum --bootnum N # Remove any obsolete bootables
Next regenerate the boot files and configuration.
# You are now root user inside your machine as mounted at /mnt
update-initramfs -c -k all
dpkg-reconfigure grub-efi-amd64
Boot to new system
Exit out of the chroot and reboot the system!
If all went well, your system should boot to your own desktop, just like nothing has changed. If you deleted the old partitions as recommended above, you are guaranteed to be running from the new drive. If you did not delete the old partitions, double-check that the system really did boot into the new drive and hasn’t mounted any partitions from the old drive.
If things didn’t work out so well, don’t panic! Keep a level head and examine the situation carefully. Gather information before attempting to make any change. Make sure you understand and keep a record of any commands you run.
If the system boots but some things aren’t working correctly, you can make your fixes in the running system.
If the system fails to boot you may need to go back to your rescue disk, re-mount all the new partitions and chroot to the new system as you did above before examining the system state and looking for configuration issues.
Commit
Repartition the old drive
Once your system is working correctly from the new drive, it is time to perform the destructive changes to the old drive. Start up gdisk again and create the necessary RAID partitions.
# apt install gdisk
gdisk $OLD # Create RAID partition(s)
Create raid partitions matching the new drive and write the changes. Gdisk just
changes the partition table and leaves data in places. Thus, it is possible that
Linux will examine the new RAID partition you created and see your old LVM volume
group (or whatever partitions you had before). The presence of these metadata can
create problems when you try to add the partition to your RAID, so you should blank
out the metadata with dd
(make sure you get your partition (“p2”) information
correct). Writing 100MB to the front of the partition is usually sufficient.
sudo dd if=/dev/zero of=${OLD}p2 bs=1M count=100
Reboot the system one last time to clear out the partition metadata.
Add RAID device
Finally, we can complete our RAID array by adding the RAID partition on the old drive to the array. Make sure you get your partition (“p2”) information correct.
# apt install mdadm
sudo mdadm /dev/md0 -a ${OLD}p2
This will trigger an array sync. The progress can be tracked using,
cat /proc/mdstat
Once the rebuild completes, you are done!
Success!
Once finished, you can have a celebratory drink! Just don’t spill the drink on your laptop as that might take out both drives at once negating the benefit of RAID :)