Linux Software RAID (MDADM)

Linux specific questions/information are gathered here. The main thrust of topics are applied to Centos/RedHat(RH)/Debian/Ubuntu/Gentoo distributives

Linux Software RAID (MDADM)

Postby lik » Tue Apr 14, 2009 7:14 am

Managing Linux Software RAID

Marking a damaged disk as failed:
Code: Select all
mdadm --manage --set-faulty /dev/md1 /dev/sdd1

Removing the damaged disk from the array:
Code: Select all
mdadm /dev/md1 -r /dev/sdd1

Shutdown your system, bring it up into single user mode and partition and format your new disk. Set the partition type as Linux Software RAID autodetect. Of course you need to partition your new disk the same as the previous one. Bring the machine up into it's default runlevel, either by rebooting or by doing init 5 or init 2 depending on your Linux distribution.

Add your new disk to the array:
Code: Select all
mdadm /dev/md1 -a /dev/sdd1

Viewing the status of the array (the one shown here is repairing):
Code: Select all
cat /proc/mdstat

Personalities : [raid1]
md1 : active raid1 sdc1[2] sdd1[1]
142327744 blocks [2/1] [_U]
[==========>..........] recovery = 54.5% (77655168/142327744) finish=65.2min speed=16503K/sec
md0 : active raid1 sdb1[1] sda1[0]
142327744 blocks [2/2] [UU]

unused devices: <none>

Showing status detail for a particular array:
Code: Select all
mdadm --detail /dev/md1

/dev/md1:
Version : 00.90.01
Creation Time : Mon Sep 8 09:27:41 2008
Raid Level : raid1
Array Size : 142327744 (135.73 GiB 145.74 GB)
Device Size : 142327744 (135.73 GiB 145.74 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Fri Sep 12 12:07:18 2008
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0


Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 49 1 active sync /dev/sdd1
UUID : 4f4ab348:d8fe0303:1a07dcc9:ff9ba005
Events : 0.190631

View the Detail of a Particular Disk in an Array:
Code: Select all
mdadm --examine /dev/sdd1

/dev/sdd1:
Magic : a92b4efc
Version : 00.90.00
UUID : 4f4ab348:d8fe0303:1a07dcc9:ff9ba005
Creation Time : Mon Sep 8 09:27:41 2008
Raid Level : raid1
Device Size : 142327744 (135.73 GiB 145.74 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1

Update Time : Fri Sep 12 12:05:33 2008
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : 852842d6 - correct
Events : 0.190583


Number Major Minor RaidDevice State
this 1 8 49 1 active sync /dev/sdd1
0 0 8 33 0 active sync /dev/sdc1
1 1 8 49 1 active sync /dev/sdd1
lik
Founder
Founder
 
Posts: 497
Joined: Wed Dec 15, 2010 3:21 am

Create Linux Software RAID1 (MDADM)

Postby lik » Sat Oct 31, 2009 3:08 pm

How To Set Up Software RAID1 On A Running System

1 Preliminary Note

CentOS 5.3 system with two hard drives, /dev/sda and /dev/sdb which are identical in size. /dev/sdb is currently unused, and /dev/sda has the following partitions:

* /dev/sda1: /boot partition, ext3;
* /dev/sda2: swap;
* /dev/sda3: / partition, ext3

In the end we want to have the following situation:

* /dev/md0 (made up of /dev/sda1 and /dev/sdb1): /boot partition, ext3;
* /dev/md1 (made up of /dev/sda2 and /dev/sdb2): swap;
* /dev/md2 (made up of /dev/sda3 and /dev/sdb3): / partition, ext3

This is the current situation:
Code: Select all
df -h

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3             9.1G  1.1G  7.6G  12% /
/dev/sda1             190M   12M  169M   7% /boot
tmpfs                 252M     0  252M   0% /dev/shm


Code: Select all
fdisk -l

Disk /dev/sda: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          25      200781   83  Linux
/dev/sda2              26          90      522112+  82  Linux swap / Solaris
/dev/sda3              91        1305     9759487+  83  Linux

Disk /dev/sdb: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn't contain a valid partition table


2 Installing mdadm

The most important tool for setting up RAID is mdadm. Let's install it like this:
Code: Select all
yum install mkinitrd mdadm

Afterwards, we load a few kernel modules (to avoid a reboot):
Code: Select all
modprobe linear
modprobe multipath
modprobe raid0
modprobe raid1
modprobe raid5
modprobe raid6
modprobe raid10

Now run
Code: Select all
cat /proc/mdstat

The output should look as follows:

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>

3 Preparing /dev/sdb

To create a RAID1 array on our already running system, we must prepare the /dev/sdb hard drive for RAID1, then copy the contents of our /dev/sda hard drive to it, and finally add /dev/sda to the RAID1 array.

First, we copy the partition table from /dev/sda to /dev/sdb so that both disks have exactly the same layout:
Code: Select all
sfdisk -d /dev/sda > partition.txt

Code: Select all
vi partition.txt

and replace all strings which contain "/dev/sda" to "/dev/sdb".
Then 'install' the new partition table onto the new disk.
Code: Select all
sfdisk /dev/sdb < partition.txt

The command
Code: Select all
fdisk -l

should now show that both HDDs have the same layout.
Now we should "write" new partition table of /dev/sdb. This will let system to create/re-read all corresponding devices:
Code: Select all
fdisk /dev/sdb

The number of cylinders for this disk is set to 1305.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): <-- m
Command action
   a   toggle a bootable flag
   b   edit bsd disklabel
   c   toggle the dos compatibility flag
   d   delete a partition
   l   list known partition types
   m   print this menu
   n   add a new partition
   o   create a new empty DOS partition table
   p   print the partition table
   q   quit without saving changes
   s   create a new empty Sun disklabel
   t   change a partition's system id
   u   change display/entry units
   v   verify the partition table
   w   write table to disk and exit
   x   extra functionality (experts only)

Command (m for help): <-- t
Partition number (1-4): <-- 1
Hex code (type L to list codes): <-- L

 0  Empty           1e  Hidden W95 FAT1 80  Old Minix       bf  Solaris
 1  FAT12           24  NEC DOS         81  Minix / old Lin c1  DRDOS/sec (FAT-
 2  XENIX root      39  Plan 9          82  Linux swap / So c4  DRDOS/sec (FAT-
 3  XENIX usr       3c  PartitionMagic  83  Linux           c6  DRDOS/sec (FAT-
 4  FAT16 <32M      40  Venix 80286     84  OS/2 hidden C:  c7  Syrinx
 5  Extended        41  PPC PReP Boot   85  Linux extended  da  Non-FS data
 6  FAT16           42  SFS             86  NTFS volume set db  CP/M / CTOS / .
 7  HPFS/NTFS       4d  QNX4.x          87  NTFS volume set de  Dell Utility
 8  AIX             4e  QNX4.x 2nd part 88  Linux plaintext df  BootIt
 9  AIX bootable    4f  QNX4.x 3rd part 8e  Linux LVM       e1  DOS access
 a  OS/2 Boot Manag 50  OnTrack DM      93  Amoeba          e3  DOS R/O
 b  W95 FAT32       51  OnTrack DM6 Aux 94  Amoeba BBT      e4  SpeedStor
 c  W95 FAT32 (LBA) 52  CP/M            9f  BSD/OS          eb  BeOS fs
 e  W95 FAT16 (LBA) 53  OnTrack DM6 Aux a0  IBM Thinkpad hi ee  EFI GPT
 f  W95 Ext'd (LBA) 54  OnTrackDM6      a5  FreeBSD         ef  EFI (FAT-12/16/
10  OPUS            55  EZ-Drive        a6  OpenBSD         f0  Linux/PA-RISC b
11  Hidden FAT12    56  Golden Bow      a7  NeXTSTEP        f1  SpeedStor
12  Compaq diagnost 5c  Priam Edisk     a8  Darwin UFS      f4  SpeedStor
14  Hidden FAT16 <3 61  SpeedStor       a9  NetBSD          f2  DOS secondary
16  Hidden FAT16    63  GNU HURD or Sys ab  Darwin boot     fb  VMware VMFS
17  Hidden HPFS/NTF 64  Novell Netware  b7  BSDI fs         fc  VMware VMKCORE
18  AST SmartSleep  65  Novell Netware  b8  BSDI swap       fd  Linux raid auto
1b  Hidden W95 FAT3 70  DiskSecure Mult bb  Boot Wizard hid fe  LANstep
1c  Hidden W95 FAT3 75  PC/IX           be  Solaris boot    ff  BBT
Hex code (type L to list codes): <-- fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): <-- t
Partition number (1-4): <-- 2
Hex code (type L to list codes): <-- fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): <-- t
Partition number (1-4): <-- 3
Hex code (type L to list codes): <-- fd
Changed system type of partition 3 to fd (Linux raid autodetect)

Command (m for help): <-- w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.


To make sure that there are no remains from previous RAID installations on /dev/sdb, we run the following commands:
Code: Select all
mdadm --zero-superblock /dev/sdb1
mdadm --zero-superblock /dev/sdb2
mdadm --zero-superblock /dev/sdb3

If there are no remains from previous RAID installations, each of the above commands will throw an error, otherwise the commands will not display anything at all.

4 Creating Our RAID1 Arrays

Now let's create our RAID arrays /dev/md0, /dev/md1, and /dev/md2. /dev/sdb1 will be added to /dev/md0, /dev/sdb2 to /dev/md1, and /dev/sdb3 to /dev/md2. /dev/sda1, /dev/sda2, and /dev/sda3 can't be added right now (because the system is currently running on them), therefore we use the placeholder missing in the following three commands:

Code: Select all
mdadm --create /dev/md0 --level=1 --raid-disks=2 missing /dev/sdb1
mdadm --create /dev/md1 --level=1 --raid-disks=2 missing /dev/sdb2
mdadm --create /dev/md2 --level=1 --raid-disks=2 missing /dev/sdb3

The command:
Code: Select all
cat /proc/mdstat

should now show that you have three degraded RAID arrays ([_U] or [U_] means that an array is degraded while [UU] means that the array is ok):

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 sdb3[1]
9759360 blocks [2/1] [_U]

md1 : active raid1 sdb2[1]
522048 blocks [2/1] [_U]

md0 : active raid1 sdb1[1]
200704 blocks [2/1] [_U]

unused devices: <none>

Next we create filesystems on our RAID arrays (ext3 on /dev/md0 and /dev/md2 and swap on /dev/md1):
Code: Select all
mkfs.ext3 /dev/md0
mkswap /dev/md1
mkfs.ext3 /dev/md2


Next we create /etc/mdadm.conf as follows:
Code: Select all
mdadm --examine --scan > /etc/mdadm.conf

Display the contents of the file:
Code: Select all
cat /etc/mdadm.conf

In the file you should now see details about our three (degraded) RAID arrays:
Code: Select all
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=78d582f0:940fabb5:f1c1092a:04a55452
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=8db8f7e1:f2a64674:d22afece:4a539aa7
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=1baf282d:17c58efd:a8de6947:b0af9792


5 Adjusting The System To RAID1

Now let's mount /dev/md0 and /dev/md2 (we don't need to mount the swap array /dev/md1):
Code: Select all
mkdir /mnt/md0
mkdir /mnt/md2

Code: Select all
mount /dev/md0 /mnt/md0
mount /dev/md2 /mnt/md2

You should now find both arrays in the output of
Code: Select all
mount

# mount
/dev/sda3 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/md0 on /mnt/md0 type ext3 (rw)
/dev/md2 on /mnt/md2 type ext3 (rw)

Next we modify /etc/fstab. Replace LABEL=/boot with /dev/md0, LABEL=SWAP-sda2 with /dev/md1, and LABEL=/ with /dev/md2 so that the file looks as follows:
Code: Select all
vi /etc/fstab


/dev/md2 / ext3 defaults 1 1
/dev/md0 /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/md1 swap swap defaults 0 0

Next replace /dev/sda1 with /dev/md0 and /dev/sda3 with /dev/md2 in /etc/mtab:
Code: Select all
vi /etc/mtab


/dev/md2 / ext3 rw 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
/dev/md0 /boot ext3 rw 0 0
tmpfs /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0

Now up to the GRUB boot loader. Open /boot/grub/menu.lst and add fallback=1 right after default=0:
Code: Select all
vi /boot/grub/menu.lst

[...]
default=0
fallback=1
[...]
This makes that if the first kernel (counting starts with 0, so the first kernel is 0) fails to boot, kernel #2 will be booted.

In the same file, go to the bottom where you should find some kernel stanzas. Copy the first of them and paste the stanza before the first existing stanza; replace root=LABEL=/ with root=/dev/md2 and root (hd0,0) with root (hd1,0):

[...]
title CentOS (2.6.18-128.el5)
root (hd1,0)
kernel /vmlinuz-2.6.18-128.el5 ro root=/dev/md2
initrd /initrd-2.6.18-128.el5.img

title CentOS (2.6.18-128.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-128.el5 ro root=LABEL=/
initrd /initrd-2.6.18-128.el5.img

The whole file should look something like this:

Code: Select all
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda3
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
fallback=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.18-128.el5)
        root (hd1,0)
        kernel /vmlinuz-2.6.18-128.el5 ro root=/dev/md2
        initrd /initrd-2.6.18-128.el5.img

title CentOS (2.6.18-128.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-128.el5 ro root=LABEL=/
        initrd /initrd-2.6.18-128.el5.img

root (hd1,0) refers to /dev/sdb which is already part of our RAID arrays. We will reboot the system in a few moments; the system will then try to boot from our (still degraded) RAID arrays; if it fails, it will boot from /dev/sda (-> fallback 1).

Next we adjust our ramdisk to the new situation:
Code: Select all
mv /boot/initrd-`uname -r`.img /boot/initrd-`uname -r`.img_orig
mkinitrd /boot/initrd-`uname -r`.img `uname -r`

Now we copy the contents of /dev/sda1 and /dev/sda3 to /dev/md0 and /dev/md2 (which are mounted on /mnt/md0 and /mnt/md2):
Code: Select all
cp -dpRx / /mnt/md2

Code: Select all
cd /boot
cp -dpRx . /mnt/md0


6 Preparing GRUB (Part 1)

Afterwards we must install the GRUB bootloader on the second hard drive /dev/sdb:
Code: Select all
grub

On the GRUB shell, type in the following commands:
Code: Select all
root (hd0,0)

grub> root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83

grub>

setup (hd0)

grub> setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  15 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.

grub>

root (hd1,0)

grub> root (hd1,0)
 Filesystem type is ext2fs, partition type 0xfd

grub>

setup (hd1)

grub> setup (hd1)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd1)"...  15 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd1) (hd1)1+15 p (hd1,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.

grub>

quit

Now, back on the normal shell, we reboot the system and hope that it boots ok from our RAID arrays:
Code: Select all
reboot


7 Preparing /dev/sda

If all goes well, you should now find /dev/md0 and /dev/md2 in the output of
Code: Select all
df -h

# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md2 9.2G 1.1G 7.7G 12% /
/dev/md0 190M 14M 167M 8% /boot
tmpfs 252M 0 252M 0% /dev/shm

The output of
Code: Select all
cat /proc/mdstat

should be as follows:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1]
200704 blocks [2/1] [_U]

md1 : active raid1 sdb2[1]
522048 blocks [2/1] [_U]

md2 : active raid1 sdb3[1]
9759360 blocks [2/1] [_U]

unused devices: <none>


Now we must change the partition types of our three partitions on /dev/sda to Linux raid autodetect as well:
Code: Select all
fdisk /dev/sda

The number of cylinders for this disk is set to 1305.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): <-- t
Partition number (1-4): <-- 1
Hex code (type L to list codes): <-- fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): <-- t
Partition number (1-4): <-- 2
Hex code (type L to list codes): <-- fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): <-- t
Partition number (1-4): <-- 3
Hex code (type L to list codes): <-- fd
Changed system type of partition 3 to fd (Linux raid autodetect)

Command (m for help): <-- w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.



Now we can add /dev/sda1, /dev/sda2, and /dev/sda3 to the respective RAID arrays:
Code: Select all
mdadm --add /dev/md0 /dev/sda1
mdadm --add /dev/md1 /dev/sda2
mdadm --add /dev/md2 /dev/sda3

Now take a look at
Code: Select all
cat /proc/mdstat

... and you should see that the RAID arrays are being synchronized:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0] sdb1[1]
200704 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
522048 blocks [2/2] [UU]

md2 : active raid1 sda3[2] sdb3[1]
9759360 blocks [2/1] [_U]
[====>................] recovery = 22.8% (2232576/9759360) finish=2.4min speed=50816K/sec

unused devices: <none>

You can run
Code: Select all
watch cat /proc/mdstat
to get an ongoing output of the process. To leave watch, press CTRL+C.)

Then adjust /etc/mdadm.conf to the new situation:
Code: Select all
mdadm --examine --scan > /etc/mdadm.conf

/etc/mdadm.conf should now look something like this:
Code: Select all
cat /etc/mdadm.conf

ARRAY /dev/md0 level=raid1 num-devices=2 UUID=78d582f0:940fabb5:f1c1092a:04a55452
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=8db8f7e1:f2a64674:d22afece:4a539aa7
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=1baf282d:17c58efd:a8de6947:b0af9792


8 Preparing GRUB (Part 2)

We are almost done now. Now we must modify /boot/grub/menu.lst again. Right now it is configured to boot from /dev/sdb (hd1,0). Of course, we still want the system to be able to boot in case /dev/sdb fails. Therefore we copy the first kernel stanza (which contains hd1), paste it below and replace hd1 with hd0. Furthermore we comment out all other kernel stanzas so that it looks as follows:
Code: Select all
vi /boot/grub/menu.lst

Code: Select all
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda3
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
fallback=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.18-128.el5)
        root (hd1,0)
        kernel /vmlinuz-2.6.18-128.el5 ro root=/dev/md2
        initrd /initrd-2.6.18-128.el5.img

title CentOS (2.6.18-128.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-128.el5 ro root=/dev/md2
        initrd /initrd-2.6.18-128.el5.img

#title CentOS (2.6.18-128.el5)
#       root (hd0,0)
#       kernel /vmlinuz-2.6.18-128.el5 ro root=LABEL=/
#       initrd /initrd-2.6.18-128.el5.img


Afterwards, update your ramdisk:
Code: Select all
mv /boot/initrd-`uname -r`.img /boot/initrd-`uname -r`.img_orig2
mkinitrd /boot/initrd-`uname -r`.img `uname -r`

... and reboot the system:
Code: Select all
reboot

It should boot without problems.

That's it - you've successfully set up software RAID1 on your running CentOS 5.3 system!

9 Testing

Now let's simulate a hard drive failure. It doesn't matter if you select /dev/sda or /dev/sdb here. In this example I assume that /dev/sdb has failed.
To simulate the hard drive failure, you can either shut down the system and remove /dev/sdb from the system, or you (soft-)remove it like this:
Code: Select all
mdadm --manage /dev/md0 --fail /dev/sdb1
mdadm --manage /dev/md1 --fail /dev/sdb2
mdadm --manage /dev/md2 --fail /dev/sdb3

Code: Select all
mdadm --manage /dev/md0 --remove /dev/sdb1
mdadm --manage /dev/md1 --remove /dev/sdb2
mdadm --manage /dev/md2 --remove /dev/sdb3

Shut down the system:
Code: Select all
shutdown -h now

Then put in a new /dev/sdb drive (if you simulate a failure of /dev/sda, you should now put /dev/sdb in /dev/sda's place and connect the new HDD as /dev/sdb!) and boot the system. It should still start without problems.

Now run
Code: Select all
cat /proc/mdstat

and you should see that we have a degraded array:
# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0]
200704 blocks [2/1] [U_]

md1 : active raid1 sda2[0]
522048 blocks [2/1] [U_]

md2 : active raid1 sda3[0]
9759360 blocks [2/1] [U_]

unused devices: <none>

The output of
Code: Select all
fdisk -l

should look as follows:

# fdisk -l

Disk /dev/sda: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 25 200781 fd Linux raid autodetect
/dev/sda2 26 90 522112+ fd Linux raid autodetect
/dev/sda3 91 1305 9759487+ fd Linux raid autodetect

Disk /dev/sdb: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn't contain a valid partition table

Disk /dev/md2: 9993 MB, 9993584640 bytes
2 heads, 4 sectors/track, 2439840 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md2 doesn't contain a valid partition table

Disk /dev/md1: 534 MB, 534577152 bytes
2 heads, 4 sectors/track, 130512 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Disk /dev/md0: 205 MB, 205520896 bytes
2 heads, 4 sectors/track, 50176 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table

Now we copy the partition table of /dev/sda to /dev/sdb:

Code: Select all
sfdisk -d /dev/sda | sfdisk /dev/sdb

(If you get an error, you can try the --force option:
sfdisk -d /dev/sda | sfdisk --force /dev/sdb)

# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 1305 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature
/dev/sdb: unrecognized partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

Device Boot Start End #sectors Id System
/dev/sdb1 * 63 401624 401562 fd Linux raid autodetect
/dev/sdb2 401625 1445849 1044225 fd Linux raid autodetect
/dev/sdb3 1445850 20964824 19518975 fd Linux raid autodetect
/dev/sdb4 0 - 0 0 Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

Afterwards we remove any remains of a previous RAID array from /dev/sdb...
Code: Select all
mdadm --zero-superblock /dev/sdb1
mdadm --zero-superblock /dev/sdb2
mdadm --zero-superblock /dev/sdb3

... and add /dev/sdb to the RAID array:
Code: Select all
mdadm -a /dev/md0 /dev/sdb1
mdadm -a /dev/md1 /dev/sdb2
mdadm -a /dev/md2 /dev/sdb3

Now take a look at
Code: Select all
cat /proc/mdstat

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
200704 blocks [2/2] [UU]

md1 : active raid1 sdb2[1] sda2[0]
522048 blocks [2/2] [UU]

md2 : active raid1 sdb3[2] sda3[0]
9759360 blocks [2/1] [U_]
[=======>.............] recovery = 39.4% (3846400/9759360) finish=1.7min speed=55890K/sec

unused devices: <none>

Wait until the synchronization has finished:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
200704 blocks [2/2] [UU]

md1 : active raid1 sdb2[1] sda2[0]
522048 blocks [2/2] [UU]

md2 : active raid1 sdb3[1] sda3[0]
9759360 blocks [2/2] [UU]

unused devices: <none>

Then run
Code: Select all
grub

and install the bootloader on both HDDs:
Code: Select all
root (hd0,0)
setup (hd0)
root (hd1,0)
setup (hd1)
quit

That's it. You've just replaced a failed hard drive in your RAID1 array.
lik
Founder
Founder
 
Posts: 497
Joined: Wed Dec 15, 2010 3:21 am



Return to Linux specific

 


  • Related topics
    Replies
    Views
    Last post
cron