典型的iostat -x输出有时看起来像这样,这是一个极端的样本,但绝不是罕见的:
- iostat (Oct 6,2013)
- tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
- 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
- 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
- 16.00 0.00 156.00 9.75 21.89 288.12 36.00 57.60
- 5.50 0.00 44.00 8.00 48.79 2194.18 181.82 100.00
- 2.00 0.00 16.00 8.00 46.49 3397.00 500.00 100.00
- 4.50 0.00 40.00 8.89 43.73 5581.78 222.22 100.00
- 14.50 0.00 148.00 10.21 13.76 5909.24 68.97 100.00
- 1.50 0.00 12.00 8.00 8.57 7150.67 666.67 100.00
- 0.50 0.00 4.00 8.00 6.31 10168.00 2000.00 100.00
- 2.00 0.00 16.00 8.00 5.27 11001.00 500.00 100.00
- 0.50 0.00 4.00 8.00 2.96 17080.00 2000.00 100.00
- 34.00 0.00 1324.00 9.88 1.32 137.84 4.45 59.60
- 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
- 22.00 44.00 204.00 11.27 0.01 0.27 0.27 0.60
让我为您提供有关硬件的更多信息.这是戴尔1950年代的盒子,Debian作为操作系统,其中uname -a报告如下:
- Linux xx 2.6.32-5-amd64 #1 SMP Fri Feb 15 15:39:52 UTC 2013 x86_64 GNU/Linux
该机器是一个专用服务器,可托管在线游戏,无需运行任何数据库或I / O繁重的应用程序.核心应用程序占用8 GB内存中的大约0.8,并且平均cpu负载相对较低.然而,游戏本身对I / O延迟反应相当敏感,因此我们的玩家会遇到大量的游戏延迟,我们希望尽快解决.
- iostat:
- avg-cpu: %user %nice %system %iowait %steal %idle
- 1.77 0.01 1.05 1.59 0.00 95.58
- Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
- sdb 13.16 25.42 135.12 504701011 2682640656
- sda 1.52 0.74 20.63 14644533 409684488
正常运行时间是:
- 19:26:26 up 229 days,17:26,4 users,load average: 0.36,0.37,0.32
硬盘控制器:
- 01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
硬碟:
- Array 1,RAID-1,2x Seagate Cheetah 15K.5 73 GB SAS
- Array 2,2x Seagate ST3500620SS Barracuda ES.2 500GB 16MB 7200RPM SAS
来自df的分区信息:
- Filesystem 1K-blocks Used Available Use% Mounted on
- /dev/sdb1 480191156 30715200 425083668 7% /home
- /dev/sda2 7692908 437436 6864692 6% /
- /dev/sda5 15377820 1398916 13197748 10% /usr
- /dev/sda6 39159724 19158340 18012140 52% /var
使用iostat -dx sdb 1生成的更多数据样本(2013年10月11日)
- Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
- sdb 0.00 15.00 0.00 70.00 0.00 656.00 9.37 4.50 1.83 4.80 33.60
- sdb 0.00 0.00 0.00 2.00 0.00 16.00 8.00 12.00 836.00 500.00 100.00
- sdb 0.00 0.00 0.00 3.00 0.00 32.00 10.67 9.96 1990.67 333.33 100.00
- sdb 0.00 0.00 0.00 4.00 0.00 40.00 10.00 6.96 3075.00 250.00 100.00
- sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00 100.00
- sdb 0.00 0.00 0.00 2.00 0.00 16.00 8.00 2.62 4648.00 500.00 100.00
- sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 100.00
- sdb 0.00 0.00 0.00 1.00 0.00 16.00 16.00 1.69 7024.00 1000.00 100.00
- sdb 0.00 74.00 0.00 124.00 0.00 1584.00 12.77 1.09 67.94 6.94 86.00
使用rrdtool生成的特征图表可以在这里找到:
iostat plot 1,24 min interval:http://imageshack.us/photo/my-images/600/yqm3.png/
iostat plot 2,120 min interval:http://imageshack.us/photo/my-images/407/griw.png/
由于我们有一个相当大的5.5 GB的缓存,我们认为测试I / O等待峰值是否可能是由缓存未命中事件引起可能是一个好主意.因此,我们进行了同步,然后刷新缓存和缓冲区:
- echo 3 > /proc/sys/vm/drop_caches
然后直接I / O等待和服务时间几乎通过屋顶,机器上的所有东西都感觉像是慢动作.在接下来的几个小时内,延迟恢复,一切都像以前一样 – 在短暂的,不可预测的时间间隔内发生中小滞后.
现在我的问题是:有没有人知道什么可能导致这种令人讨厌的行为?它是磁盘阵列或raid控制器死亡的第一个迹象,还是可以通过重新启动轻松修复的东西? (目前我们非常不愿意这样做,因为我们担心磁盘可能不会再次恢复.)
任何帮助是极大的赞赏.
提前致谢,
克里斯.
编辑添加:我们确实看到一个或两个进程在顶部进入’D’状态,其中一个似乎是相当频繁的kjournald.但是,如果我没有弄错的话,这并不表示导致延迟的过程,而是那些受其影响的过程 – 如果我错了,请纠正我.有关不间断睡眠过程的信息是否有助于我们以任何方式解决问题?
@Andy Shinn请求smartctl数据,这里是:
smartctl -a -d megaraid,2 / dev / sdb产生:
- smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
- Copyright (C) 2002-10 by Bruce Allen,http://smartmontools.sourceforge.net
- Device: SEAGATE ST3500620SS Version: MS05
- Serial number:
- Device type: disk
- Transport protocol: SAS
- Local Time is: Mon Oct 14 20:37:13 2013 CEST
- Device supports SMART and is Enabled
- Temperature Warning Disabled or Not Supported
- SMART Health Status: OK
- Current Drive Temperature: 20 C
- Drive Trip Temperature: 68 C
- Elements in grown defect list: 0
- Vendor (Seagate) cache information
- Blocks sent to initiator = 1236631092
- Blocks received from initiator = 1097862364
- Blocks read from cache and sent to initiator = 1383620256
- Number of read and write commands whose size <= segment size = 531295338
- Number of read and write commands whose size > segment size = 51986460
- Vendor (Seagate/Hitachi) factory information
- number of hours powered up = 36556.93
- number of minutes until next internal SMART test = 32
- Error counter log:
- Errors Corrected by Total Correction Gigabytes Total
- ECC rereads/ errors algorithm processed uncorrected
- fast | delayed rewrites corrected invocations [10^9 bytes] errors
- read: 509271032 47 0 509271079 509271079 20981.423 0
- write: 0 0 0 0 0 5022.039 0
- verify: 1870931090 196 0 1870931286 1870931286 100558.708 0
- Non-medium error count: 0
- SMART Self-test log
- Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
- Description number (hours)
- # 1 Background short Completed 16 36538 - [- - -]
- # 2 Background short Completed 16 36514 - [- - -]
- # 3 Background short Completed 16 36490 - [- - -]
- # 4 Background short Completed 16 36466 - [- - -]
- # 5 Background short Completed 16 36442 - [- - -]
- # 6 Background long Completed 16 36420 - [- - -]
- # 7 Background short Completed 16 36394 - [- - -]
- # 8 Background short Completed 16 36370 - [- - -]
- # 9 Background long Completed 16 36364 - [- - -]
- #10 Background short Completed 16 36361 - [- - -]
- #11 Background long Completed 16 2 - [- - -]
- #12 Background short Completed 16 0 - [- - -]
- Long (extended) Self Test duration: 6798 seconds [113.3 minutes]
smartctl -a -d megaraid,3 / dev / sdb产量:
- smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
- Copyright (C) 2002-10 by Bruce Allen,http://smartmontools.sourceforge.net
- Device: SEAGATE ST3500620SS Version: MS05
- Serial number:
- Device type: disk
- Transport protocol: SAS
- Local Time is: Mon Oct 14 20:37:26 2013 CEST
- Device supports SMART and is Enabled
- Temperature Warning Disabled or Not Supported
- SMART Health Status: OK
- Current Drive Temperature: 19 C
- Drive Trip Temperature: 68 C
- Elements in grown defect list: 0
- Vendor (Seagate) cache information
- Blocks sent to initiator = 288745640
- Blocks received from initiator = 1097848399
- Blocks read from cache and sent to initiator = 1304149705
- Number of read and write commands whose size <= segment size = 527414694
- Number of read and write commands whose size > segment size = 51986460
- Vendor (Seagate/Hitachi) factory information
- number of hours powered up = 36596.83
- number of minutes until next internal SMART test = 28
- Error counter log:
- Errors Corrected by Total Correction Gigabytes Total
- ECC rereads/ errors algorithm processed uncorrected
- fast | delayed rewrites corrected invocations [10^9 bytes] errors
- read: 610862490 44 0 610862534 610862534 20470.133 0
- write: 0 0 0 0 0 5022.480 0
- verify: 2861227413 203 0 2861227616 2861227616 100872.443 0
- Non-medium error count: 1
- SMART Self-test log
- Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
- Description number (hours)
- # 1 Background short Completed 16 36580 - [- - -]
- # 2 Background short Completed 16 36556 - [- - -]
- # 3 Background short Completed 16 36532 - [- - -]
- # 4 Background short Completed 16 36508 - [- - -]
- # 5 Background short Completed 16 36484 - [- - -]
- # 6 Background long Completed 16 36462 - [- - -]
- # 7 Background short Completed 16 36436 - [- - -]
- # 8 Background short Completed 16 36412 - [- - -]
- # 9 Background long Completed 16 36404 - [- - -]
- #10 Background short Completed 16 36401 - [- - -]
- #11 Background long Completed 16 2 - [- - -]
- #12 Background short Completed 16 0 - [- - -]
- Long (extended) Self Test duration: 6798 seconds [113.3 minutes]
解决方法
重要的是你的svctm(在iostat输出中)非常高,这表明RAID或磁盘存在硬件问题.普通磁盘的Svctm应该在4(ms)左右.可能会少,但不会太高.
不幸的是,在您的情况下,smartctl输出不提供信息.它纠正了错误,但这可能是正常的.长时间的测试似乎已经完成了,但这又是不确定的. ST3500620SS似乎是很好的旧服务器/ raid类型磁盘,它应该在读取错误时快速响应(与桌面/非raid磁盘不同),因此这可能是比坏扇区更复杂的硬件问题.尝试在RAID统计数据中找到不寻常的东西(如高错误率):http://hwraid.le-vert.net/wiki/LSIMegaRAIDSAS
我的建议是磁盘更换应该是下一步.
更新:
Svctm是更重要的因素,因为高利用率只是svctm异常高的结果.
桌面磁盘安装到Promise RAID时,我发现了类似的问题.桌面磁盘设计用于尝试通过许多长时间重试来修复读取错误,这会导致延迟(这些读取错误可能是由于某些其他因素,例如振动,这在服务器机房比在桌面上强得多).与此不同的是,设计用于RAID的磁盘只会快速向RAID控制器报告任何错误,RAID控制器可以使用RAID reduncancy来纠正它们.此外,服务器磁盘可以设计为更加机械地抵抗持续的强烈振动.有一个常见的神话,即服务器磁盘与桌面相同只是更昂贵,这是错误的,它们实际上是不同的.
问:啊,我想问一下:如果这是一个硬件问题,你不觉得问题应该持续可见而且不会消失一段时间吗?你碰巧对这种效果有任何解释吗?
A:
>问题可能总是存在,但只有在高负荷时才会变得明显.>在一天中的不同时间,振动水平可能会有所不同(例如,取决于附近的服务器所做的事情).如果您的问题是磁盘受到振动影响,它肯定会消失并重新出现.当我遇到“桌面磁盘”问题时,我看到了类似的行为. (当然,你的磁盘是服务器磁盘,推荐用于RAID,所以它不是完全相同的问题.但它可能类似.)