这三个新磁盘应该与已存在的磁盘组合成一个4磁盘RAID5阵列,因此我开始了迁移过程.
经过多次尝试(每次大约需要24小时)后,迁移似乎有效,但导致无响应的NAS.
那时我重置了NAS.一切都从那里走下坡路:
> NAS启动但将第一个磁盘标记为失败并将其从所有阵列中删除,使它们保持跛行.
>我在磁盘上运行检查并且找不到任何问题(无论如何这都很奇怪,因为它几乎是新的).
>管理界面没有提供任何恢复选项,所以我想我只是手动完成.
我使用mdadm(/ dev / md4,/ dev / md13和/ dev / md9)成功重建了所有QNAP内部RAID1阵列,只留下了RAID5阵列;的/ dev / md0的:
我现在已经尝试了多次,使用这些命令:
mdadm -w /dev/md0
(从阵列中删除/ dev / sda3后,NAS必须以只读方式挂载数组.在RO模式下无法修改阵列).
mdadm /dev/md0 --re-add /dev/sda3
之后,阵列开始重建.
它虽然停滞在99.9%,但系统极其缓慢和/或没有响应. (大多数时候使用SSH登录失败).
事物的现状:
[admin@nas01 ~]# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md4 : active raid1 sdd2[2](S) sdc2[1] sdb2[0] 530048 blocks [2/2] [UU] md0 : active raid5 sda3[4] sdd3[3] sdc3[2] sdb3[1] 8786092608 blocks super 1.0 level 5,64k chunk,algorithm 2 [4/3] [_UUU] [===================>.] recovery = 99.9% (2928697160/2928697536) finish=0.0min speed=110K/sec md13 : active raid1 sda4[0] sdb4[1] sdd4[3] sdc4[2] 458880 blocks [4/4] [UUUU] bitmap: 0/57 pages [0KB],4KB chunk md9 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1] 530048 blocks [4/4] [UUUU] bitmap: 2/65 pages [8KB],4KB chunk unused devices: <none>
(它现在停留在2928697160/2928697536几小时)
[admin@nas01 ~]# mdadm -D /dev/md0 /dev/md0: Version : 01.00.03 Creation Time : Thu Jan 10 23:35:00 2013 Raid Level : raid5 Array Size : 8786092608 (8379.07 GiB 8996.96 GB) Used Dev Size : 2928697536 (2793.02 GiB 2998.99 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Mon Jan 14 09:54:51 2013 State : clean,degraded,recovering Active Devices : 3 Working Devices : 4 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 64K Rebuild Status : 99% complete Name : 3 UUID : 0c43bf7b:282339e8:6c730d6b:98bc3b95 Events : 34111 Number Major Minor RaidDevice State 4 8 3 0 spare rebuilding /dev/sda3 1 8 19 1 active sync /dev/sdb3 2 8 35 2 active sync /dev/sdc3 3 8 51 3 active sync /dev/sdd3
在检查/mnt/HDA_ROOT/.logs/kmsg之后,事实证明实际问题似乎是/ dev / sdb3:
<6>[71052.730000] sd 3:0:0:0: [sdb] Unhandled sense code <6>[71052.730000] sd 3:0:0:0: [sdb] Result: hostbyte=0x00 driverbyte=0x08 <6>[71052.730000] sd 3:0:0:0: [sdb] Sense Key : 0x3 [current] [descriptor] <4>[71052.730000] Descriptor sense data with sense descriptors (in hex): <6>[71052.730000] 72 03 00 00 00 00 00 0c 00 0a 80 00 00 00 00 01 <6>[71052.730000] 5d 3e d9 c8 <6>[71052.730000] sd 3:0:0:0: [sdb] ASC=0x0 ASCQ=0x0 <6>[71052.730000] sd 3:0:0:0: [sdb] CDB: cdb[0]=0x88: 88 00 00 00 00 01 5d 3e d9 c8 00 00 00 c0 00 00 <3>[71052.730000] end_request: I/O error,dev sdb,sector 5859367368 <4>[71052.730000] raid5_end_read_request: 27 callbacks suppressed <4>[71052.730000] raid5:md0: read error not correctable (sector 5857246784 on sdb3). <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5:md0: read error not correctable (sector 5857246792 on sdb3). <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5:md0: read error not correctable (sector 5857246800 on sdb3). <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5:md0: read error not correctable (sector 5857246808 on sdb3). <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5:md0: read error not correctable (sector 5857246816 on sdb3). <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5:md0: read error not correctable (sector 5857246824 on sdb3). <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5:md0: read error not correctable (sector 5857246832 on sdb3). <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5:md0: read error not correctable (sector 5857246840 on sdb3). <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5:md0: read error not correctable (sector 5857246848 on sdb3). <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5:md0: read error not correctable (sector 5857246856 on sdb3). <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5: some error occurred in a active device:1 of md0. <4>[71052.730000] raid5: some error occurred in a active device:1 of md0.
对于585724XXXX范围内的各种(随机?)扇区,以稳定的速率重复上述序列.
我的问题是:
>为什么它停止如此接近结束,同时仍然使用系统停滞的那么多资源(md0_raid5和md0_resync进程仍在运行).
>有没有办法看到导致它失败/失速的原因? < - 可能由于sdb3错误.
>如何在不丢失3TB数据的情况下完成操作? (比如跳过sdb3上麻烦的扇区,但保留完整的数据?)
解决方法
无论如何,所有数据都是(或应该)完好无损,只有4个磁盘中的3个.
你说它从阵列中弹出有故障的磁盘 – 所以它应该仍在运行,尽管处于降级模式.
你能装吗?
您可以通过执行以下操作强制运行阵列:
>打印出数组的详细信息:mdadm -D / dev / md0
>停止数组:mdadm –stop / dev / md0
>重新创建数组并强制md接受它:“mdadm -C -n md0 –assume-clean / dev / sd [abcd] 3`
后一步是完全安全的,只要:
>你不写数组,和
>您使用了与以前完全相同的创建参数.
最后一个标志将阻止重建并跳过任何完整性测试.然后,您应该能够安装它并恢复您的数据.