Migration Weirdness: An Odyssey
by Dysnomia
ALLEGED LOG DATE: April 11, 2024
LOG DESCRIPTION
This dusty mess of a post seems to be from some sort of blog. It's not immediately
clear what either the blog or the post must have been about, but it's probably safe to assume it's
much less important than it sounds.
------------------------------------------------------------
"It's said that you can't be sure of anything, except death and taxes.
I will now arrogantly declare 'trying to change too many things on your system and
ending up with a weird mess' as part of the now trifecta.
But first a little background; I've been using a very old SSD of limited
capacity in one of my Gen2 machines for a long time, and while I've been thinking
of upgrading it for years at this point, it simply refused to give me an excuse
to toss it out the window and get a new one (disclaimer: I do not condone tossing any storage
device out the window). Maybe I was impressed by its resilience, its awe inspiring
will to live, or maybe the fact that it didn't seem to care what a TBW even is, but
I couldn't give up on it if the thing believed in itself so much. I spent years moving
storage hogs (mainly older VMs) from it to an external HDD I had lying around each time I
needed the extra space for a fresh couple of VMs, git-lfs repositories, flatpak pulling
10 Terabytes of runtimes to install Vim. I tried, I really did, but in the end I decided
it's time to put the hardworking little guy to rest and replace it with something New(TM)
(and maybe Better(TM)?).
I was now on the hunt for an SSD that's: a) even more reliable, b) even faster,
c) even bigger storage-wise, but also d) smaller in-irl-wise. It didn't take me too long
to find the much celebrated-on-the-interwebs [PRODUCT NAME REMOVED DUE TO LAINWIRED POLICY],
with many fellow users of the Wired singing, or rather writing in this case, its praises.
Unfortunately this sent me down a slippery slope of "Well, I'm getting a faster SSD anyway,
so why not get a new such-and-such as well?" that lasted for months until I realized it's not
really worth the hassle of upgrading the entire (pretty old, mind you) system when there's no
need for it. After a lot of extremely serious theorizing that I've cut out for brevity, it was
finally settled; the old warehouse was to be abandoned for the new warehouse.
Only, it wasn't that simple of course. See, other ideas popped up in my head; this being one of my
older systems, it was the only one left on an MBR partition table, so surely there was no better time
to switch to a GPT (no, not that GPT) one and slap (U)EFI boot onto it as well while I'm at it.
As for the benefits, there weren't really any I could use, except for the fact that I'd have an
easier time ditching GRUB for an objectively much 1337er EFI stub (but more on that later). The
GPT part is quite simple, and can actually be achieved on a live system very trivially. but the EFI
part needed some reading up on as I hadn't really messed anything like that on a hands-on system such
as Gentoo. Wearing my ignorance with pride, I decided to just get on with it because it sounded pretty
simple, and surely nothing could go wrong.
From here on it's going to get a little technical, so first a few things to keep in mind.
The devices I'll be mentioning below are these:
- /dev/sda (Quite-old-but-somehow-not-yet-dying-SSD)
|- /dev/sda1 (old system's /boot)
|- /dev/sda2 (old system's /)
- /dev/sdb (The latest and greatest in ESSESSDEE technology)
|- /dev/sdb1 (new system's slick and shiny /efi)
|- /dev/sdb2 (new system's slick and shiny /boot that was definitely not paritioned out of habit)
|- /dev/sdb3 (new system's slick and shiny /)
First a little disclaimer: If you somehow ended up here for anything other than entertainment, this follows the
process on a Gentoo system so some things may NOT be applicable on other distributions. The process SHOULD be
very similar though.
And so it begins. First, we'll just run fdi-
haha just kidding, the new disk isn't recognized. See,
due to (I assume) the system's age and the fact that it's an NVMe drive, I had to enable support for it in my
firmware settings. But that's not really a problem, just reboot into your motherboard's setup and enable support
for its M.2 slot. After another reboot, it would now be a good time to also enable NVMe support in the kernel if
it's not enabled already with:
~ # cd /usr/src/linux
~ # make menuconfig (or just change .config manually, what are you even afraid of)
Over there, you should make sure CONFIG_BLK_DEV_NVME
is set to y
.
After that's done, since we're using GRUB, we're just going to run make -j69
(funny number),
and then grub-mkconfig -o /boot/grub/grub.cfg
or wherever grub.cfg is stored these days. Reboot
and, wouldn't you know, the new drive is now visible as you'd expect..
NOW is the point where we'll run fdisk
(or any other tool really) for the pretty simple process of creating the aforementioned
(GPT) partitions on /dev/sdb
, mark /dev/sdb1
as an ESP, and we're done. It's now as easy as using another tool
of your choice, between: tar, rsync, cp, dd, pv
and just copying old /
over to new /
.
Just kidding though, first you'll need a liveCD of your distribution of choice because copying over a live system
is asking for problems. That being said, the liveCD needs to support (U)EFI boot,
as we're going to need efivars
for later.
At this point things are going quite well, and surely there won't be some sort of baffling user error that makes this
process longer than it needs to be, right? Well, in case you're wondering why we need efivars
, it's so you
don't end up with a mess like I did when I forgot all about it and attempted to install GRUB on the efi directory using
my own MBR system, complete with giving it incomplete/botched options. To GRUB's credit (or maybe discredit?), it DID try
to do something, but the result was a half-installed .efi
file in my efi directory, and the removal of all
my kernels under the boot directory. I'm still trying to figure out how the latter happened, especially since grub-install
shouldn't really touch anything kernel-specific, but I assume it was caused by me playing around with target directories for
the installation.
After a brief investigation that led nowhere, I reinstalled my kernels with make install
, and thought it might be
interesting to test if the .efi
file GRUB installed actually worked. I changed the fstab on my new /
to
all my new partitions after somehow thinking booting into something created by a command throwing multple errors was going to work,
and I rebooted my machine, removed all legacy BIOS settings, and booted straight into my new NVME drive. What happened next can only
be explained by dark magicks, as the computer DID boot, but on the wrong disk. Let me break down the process to make this simpler: a)
boot into NVME b) GRUB menu shows up c) ??? d) boot into /dev/sda2
somehow e) /dev/sda1
does NOT mount, but
/boot
does, and it includes the old kernels (the ones that were deleted on the NVME drive). f) None of the NVME partitions
are mounted (which would make sense IF /boot was empty). Spooky stuff.
At this point I came to the realization that I'd surely fucked something up, so a small retread could fix it. This is where I whipped out
my liveCD to rerun grub-install
. My liveCD was NOT a Gentoo one, so I decided it'd be safer to chroot into my new /
,
as my GRUB did come with the EFI USE flag by default. So after chrooting into my system with:
~ # mount /dev/sdb3 /mnt
~ # mount /dev/sdb2 /mnt/boot
~ # mount /dev/sdb1 /mnt/efi
~ # mount -t proc proc /mnt/proc
~ # mount -t sysfs sys /mnt/sys
~ # mount -o bind /dev /mnt/dev
~ # mount --bind /run /mnt/run
AND
~ # mount --bind /sys/firmware/efi/efivars /mnt/sys/firmware/efi/efivars
~ # chroot /mnt /bin/bash
It was finally time to try installing GRUB again, with grub-install --efi-directory=/efi --target=x86_64-efi
.
But before I talk about the results of this, a little something about my original plan:
As I mentioned earlier, the idea was to get rid of GRUB altogether as well, since I didn't feel like there was a point in keeping it around.
I really wanted to try out an EFI stub and signing that for secure-boot, since I've not done it before and it sounded like an interesting process.
However, after a few hours of research as well as trial and error with the migration, I felt it was a better idea to save that for another time.
AAAAAAND we're back. No errors this time, so I unmounted everything, happily kissed my liveCD goodbye and rebooted my system yet again. And what
do you know, it actually worked this time. Sort of. It seems ext4 DID NOT like the new NVME drive, so it started
complaining about bad superblocks. In addition, silly me had added /dev/sdb1
's fstab entry as 'ext4' which is not
what you want your ESP to be mounted as (so I hear). So after quickly fixing that oopsie, it was time to tackle the more concerning
issue, filesystem corruption (or is it?). It wasn't time to retire my liveCD yet so I booted into it one last time to e2fsck
the poor thing, which seemed to do the job. After rebooting one last time, the error showed up again, so I plugged my liveCD to run a
check ONE LAST TIME, and after that seemed to find absolutely nothing wrong with it this time around, I decided this would be my last reboot
(for real this time), and if the problem persisted I'd install everything from scratch. Lucky for me, that seemed to be it, no more complaining.
I let out a sigh of relief as my odyssey was now done and I could use my lightning fast system, which wasn't actually that much faster that my
old system, but hey. At least I'm future-proofed for X years now, right?"