2009/02/25

나의 X2200에 무슨 일이?

최근 x2200 장비에 로그인을 했더니, 매우 느리다는 느낌이 왔습니다.
이유를 확인해야 한다고 생각했는데, 바빠서 잊고 있었습니다.

그래서 오늘 짬을 내서 보기로 했습니다.

우선은 ssh로 로그인하는 데 느려진 것 같아서
일단 /etc/nsswitch.conf와 /etc/resolv.conf를 확인해보았습니다.
별 문제가 없는 데 왜 느려졌지 하는 생각이 드는 군요.

일단 vmstat로 시작을 해보았습니다.

 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 --   in   sy   cs us sy id
 0 0 0 2287196 2068940 26 2 152 3  3  0  0 15 -1 -0  0 1245 1719  632  0 24 75
 1 0 0 3477208 2840460 1 46  0  0  0  0  0  0  0  0  0  421  458  278  0 25 75
 1 0 0 3477104 2840356 0  4  0  0  0  0  0  0  0  0  0  446  806  353  0 26 74
 1 0 0 3477104 2840356 0  0  0  0  0  0  0  0  0  0  0  406  435  254  0 25 75
 1 0 0 3477104 2840356 0  0  0  0  0  0  0  0  0  0  0  416  437  276  0 25 75
 1 0 0 3477104 2840356 0  0  0  0  0  0  0  0  0  0  0  408  477  260  0 25 75
 1 0 0 3477104 2840356 0  0  0  0  0  0  0  0  0  0  0  410  436  265  0 25 75
 1 0 0 3477104 2840356 0  0  0  0  0  0  0  0  0  0  0  443  747  340  0 26 74
 1 0 0 3477104 2840356 0  0  0  0  0  0  0  0  0  0  0  418  487  265  0 25 75
 1 0 0 3477104 2840356 0  0  0  0  0  0  0  0  0  0  0  409  439  265  0 25 75
 1 0 0 3477104 2840356 0  0  0  0  0  0  0  0  0  0  0  415  436  257  0 25 75


아무것도 안하고 있는 이 장비가 뭔가 열쒸미 하고 있네요.. 왜 평균 CPU가 75%일까 ?
sys%가 25%인 것으로 봐서, 4개의 코어이니, 한 코어가 system 관련해서 cpu를 다쓰나 ?
mpstat를 보기로 했습니다.
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    3   340  131  168    0    1    0    0    71    0   0   0 100
  1    2   0    0    49    3   80    0    1    1    0   327    1   0   0  99
  2    0   0    0    16    0   28    0    1    0    0    67    0   0   0 100
  3    0   0    0    15   13    0    0    0    1    0     0    0 100   0   0
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0   355  140  181    0    6    0    0    93    0   0   0 100
  1    0   0    0    51    2  124    1    4    2    0   611    0   5   0  95
  2    0   0    0    26    4   43    0    1    0    0    97    1   0   0  99
  3    0   0    0     9    8    0    0    0    1    0     0    0 100   0   0
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0   340  132  170    0    2    0    0    75    0   1   0  99
  1    0   0    0    47    3   77    0    2    2    0   367    0   0   0 100
  2    0   0    0    17    0   32    0    4    0    0    68    0   0   0 100
  3    0   0    0    11   10    0    0    0    2    0     0    0 100   0   0
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0   345  134  169    0    4    0    0    78    0   0   0 100
  1    0   0    0    47    3   78    0    3    1    0   313    0   0   0 100
  2    0   0    0    16    0   30    0    2    0    0    69    0   0   0 100
  3    0   0    0     9    8    0    0    0    2    0     0    0 100   0   0


음.. 예상대로 cpu (3번) 하나가 syscall 때문에 죽어나고 있네요.
뭐 때문에 저럴까요 ?

dtrace를 해보기로 했습니다.
일단, 도대체 커널의 어떤 부분이 cpu 3번을 다 쓰고 있을까 궁금합니다.

$pfexec dtrace -n 'fbt:::entry{@[probemod]=count();}'
dtrace: description 'fbt:::entry' matched 33143 probes
^C

  devfs                                                             1
  FX                                                                8
  cpu_ms.AuthenticAMD.15                                            8
  elfexec                                                           8
  lofs                                                             11
  mntfs                                                            16
  kcf                                                              20
  fifofs                                                           24
  pset                                                             24
  swrand                                                           28
  cpu.generic                                                      48
  ptm                                                             112
  pts                                                             141
  namefs                                                          147
  ttcompat                                                        236
  ptem                                                            276
  dld                                                             400
  mac_ether                                                       491
  arp                                                             494
  mac                                                             520
  scsi                                                            559
  ldterm                                                          755
  rootnex                                                         783
  ata                                                             997
  doorfs                                                         1024
  tmpfs                                                          1030
  sata                                                           1199
  sha1                                                           1284
  ohci                                                           1437
  nfs                                                            1508
  dls                                                            1544
  nv_sata                                                        2061
  sd                                                             2098
  TS                                                             2639
  nge                                                            2779
  procfs                                                         2893
  tl                                                             3066
  specfs                                                         6696
  ip                                                             7521
  ehci                                                           8709
  sockfs                                                         9242
  bge                                                            9686
  ufs                                                           38891
  pcplusmp                                                      45652
  genunix                                                     1267865
  unix                                                       28401472


sys는 전부 커널 영역이라 fbt Provider를 지켜봤는데, unix가 많이 사용되고 있군요.
어떤 unix의 어떤 모듈이 많이 사용되는 지 궁금해지는 군요.
 $pfexec dtrace -n 'fbt:unix::entry/cpu==3/{@[probefunc]=count();}'
dtrace: description 'fbt:unix::entry' matched 1538 probes
^C

  cmt_balance                                                       1
  cpu_choose                                                        1
  cpu_resched                                                       1
  cpu_wakeup                                                        1
  disp_lowpri_cpu                                                   1
  hati_sync_pte_to_page                                             1
  page_get_pagecnt                                                  1
  poke_cpu                                                          1
  setbackdq                                                         1
  tsc_gethrtimeunscaled                                             1
  tsc_gethrtimeunscaled_delta                                       1
  tsc_scalehrtime                                                   1
  x86pte_access_pagetable                                           1
  x86pte_get                                                        1
  x86pte_release_pagetable                                          1
  av_dispatch_softvect                                              2
  av_set_softint_pending                                            2
  cbe_low_level                                                     2
  cbe_softint                                                       2
  cms_hdl_getcms                                                    2
  cms_hdl_getcmsdata                                                2
  cms_poll_ownermask                                                2
  dosoftint_epilog                                                  2
  dosoftint_prolog                                                  2
  gethrestime_sec                                                   2
  pc_gethrestime                                                    2
  put                                                               2
  gethrestime_lasttick                                              3
  hr_clock_lock                                                     3
  hr_clock_unlock                                                   3
  x86pte_inval_func                                                 3
  ip_ocsum                                                          4
  lock_set_spl                                                      4
  av_check_softint_pending                                          6
  call_func_ntv                                                    12
  checked_rdmsr                                                    12
  cmi_hdl_rdmsr                                                    12
  msri_lookup                                                      12
  ntv_rdmsr                                                        12
  ntv_rdmsr_xc                                                     12
  mutex_vector_enter                                               15
  cbe_fire                                                         18
  cbe_reprogram                                                    18
  default_lock_delay                                               22
  tsc_gethrtime_delta                                              38
  putnext                                                          46
  xc_serv                                                          51
  hilevel_intr_prolog                                              69
  hilevel_intr_epilog                                              70
  page_release                                                    105
  intr_thread_epilog                                              148
  intr_thread_prolog                                              148
  bcmp                                                            168
  av_dispatch_autovect                                            217
  hat_page_getshare                                               374
  hment_mapcnt                                                    374
  hat_page_clrattr                                                704
  page_io_trylock                                                 704
  page_io_unlock                                                  704
  page_lookup_nowait                                              704
  hat_page_setattr                                                705
  hat_pagesync                                                    907
  page_next_scan_large                                            914
  x86_hm_enter                                                   1184
  x86_hm_exit                                                    1184
  page_add                                                    3056448
  page_sub                                                    3059851
  page_unlock                                                 3060007
  page_trylock                                                4852320


page_trylock ? 뭔가가 lock을 하려고 하는데 안되나 보군요.
page_trylock을 유도하는 process가 뭔지 궁금해졌습니다.
$pfexec dtrace -n 'fbt:unix:page_trylock:entry/cpu==3/{@p[execname]=count();}'
dtrace: description 'fbt:unix:page_trylock:entry' matched 1 probe
^C

  fsflush                                                     8032633


아..... 하.. 그렇군요. fsflush가 계속해서 page lock을 시도했군요.
그러고나니, 아주 오래전에 제가 화일 시스템 옵션을 변경했던 적이 생각이 납니다.
ufs의 옵션을 변경했었고 nfs로 외부 화일 시스템을 하나 마운트했던 게 생각이 나는 군요..
그게 이런 결과를 초래한 줄은 생각도 못했군요. ㅡㅡ;

어느 쪽의 문제인지 확인이 필요하겠네요.

/etc/vfstab을 확인해봤습니다.
/var/run on swap read/write/setuid/devices/xattr/dev=4800003 on 목  2월 12 14:41:03 2009
/export/home on /dev/dsk/c1t0d0s7 read/write/setuid/devices/intr/largefiles/xattr/noatime/onerror=panic/dev=780007 on 목  2월 12 18:32:57 2009
/net/129.158.2.181/export/web on 129.158.2.181:/export/web remote/read/write/nosetuid/nodevices/xattr/dev=4a80002 on 목  2월 12 17:07:37 2009


nfs를 우선적으로 umount 해보려고 했더니 안되는 군요. ㅡㅡ; 그제서야 생각났습니다.
nfs 대상 서버가 임시로 올렸다가 죽인 서버였다는 것을... 
그렇다고... 이렇게 cpu를 많이 사용하는게 이해는 안되는 군요..
버그가 아닌가 생각이 드는군요. 패치를 찾아봐야 겠습니다.


댓글 없음: