Find the answer to your Linux question:
Results 1 to 4 of 4
Hi, Situation: We have a number of hosts running RHEL 5.2 (x86_64) for our Oracle database estate. A typical deployment could comprise a DELL 1955 Blade with RAID 1 local ...
  1. #1
    Just Joined!
    Join Date
    Mar 2009
    Location
    Peterborough, UK
    Posts
    2

    RHEL 5.2 and 5.3 - iSCSI Errors impacting database performance?

    Hi,

    Situation:
    We have a number of hosts running RHEL 5.2 (x86_64) for our Oracle
    database estate. A typical deployment could comprise a DELL 1955
    Blade with RAID 1 local disks for O/S, swap and binaries, and iscsi
    attached SAN volumes for Oracle database files and disk backups. The
    Blade has four NIC's; two set-up for "Public" traffic (192.168.**.**)
    from our domain and the other two set-up on our SAN network
    (172.16.***.**). The four NICs are mated to the Blade Chassis that
    has teamed NIC's to our Domain switches and SAN switches. Our SAN is
    DataCore San Melody using two SM server nodes that manage two DELL MD
    1000 arrays (deployed as a mirror). Each MD 1000 contains 3 groups of
    (5 disks configured as RAID 5). These are presented as three separate
    storage groups.

    Issue:
    Since deploying this set-up last year, we repeately get errors within
    the host logs and the SM server nodes (see "Logs" later). I was
    hoping that the latest RHEL 5.3 Kernel improvements would address most
    of these errors. I have deployed RHEL 5.3 (x86) onto one of our TEST
    boxes, but continue to see errors.

    Impact:
    I suspect that the connection hang-ups/disk I/O re-trys are causing
    cumulative database waits on some of our busier databases resulting in
    degraded performance. I am concerned that this current situation will
    cause us further issues when we build our planned Oracle 11G RAC (5-
    node) system. Oracle RAC relies heavily on multi-plexed voting and
    registry disks (shared volumes) to maintain cohesion within the RAC
    cluster. Slow disk I/O / time-outs can cause one or more database
    nodes to go off-line (and thus force an auto-restart of the impacted
    host's Oracle services).

    LOGS:

    From RHEL 5.2 x86_64 Host;

    Kernel:
    Linux MYHOST52.MYDOMAIN.com 2.6.18-92.el5 #1 SMP Fri May 23 23:40:43
    EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

    fstab:
    /dev/VolGroup00/LogVol00 / ext3
    defaults 1 1
    LABEL=/boot /boot ext3
    defaults 1 2
    tmpfs /dev/shm tmpfs
    defaults 0 0
    devpts /dev/pts devpts
    gid=5,mode=620 0 0
    sysfs /sys sysfs
    defaults 0 0
    proc /proc proc
    defaults 0 0
    /dev/VolGroup00/LogVol01 swap swap
    defaults 0 0
    LABEL=data1 /U02 ext3 _netdev 0 0
    LABEL=data2 /U03 ext3 _netdev 0 0
    LABEL=data3 /U04 ext3 _netdev 0 0
    LABEL=data4 /U05 ext3 _netdev 0 0
    LABEL=data5 /U06 ext3 _netdev 0 0

    iscsiadm:
    iSCSI Transport Class version 2.0-724
    iscsiadm version 2.0-868
    Target: iqn.2000-08.com.datacore:sm2-3
    Current Portal: 172.16.200.9:3260,1
    Persistent Portal: 172.16.200.9:3260,1
    **********
    Interface:
    **********
    Iface Name: iface0
    Iface Transport: tcp
    Iface Initiatorname: iqn.1994-05.com.redhat:7fe2f44ea9de
    Iface IPaddress: 172.16.200.39
    Iface HWaddress: 00:14:22:0d:0a:fa
    Iface Netdev: default
    SID: 1
    iSCSI Connection State: LOGGED IN
    iSCSI Session State: Unknown
    Internal iscsid Session State: NO CHANGE
    ************************
    Negotiated iSCSI params:
    ************************
    HeaderDigest: None
    DataDigest: None
    MaxRecvDataSegmentLength: 131072
    MaxXmitDataSegmentLength: 262144
    FirstBurstLength: 0
    MaxBurstLength: 1048576
    ImmediateData: No
    InitialR2T: Yes
    MaxOutstandingR2T: 1
    ************************
    Attached SCSI devices:
    ************************
    Host Number: 1 State: running
    scsi1 Channel 00 Id 0 Lun: 0
    Attached scsi disk sdb State: running
    scsi1 Channel 00 Id 0 Lun: 1
    Attached scsi disk sde State: running
    scsi1 Channel 00 Id 0 Lun: 2
    Attached scsi disk sdf State: running
    Target: iqn.2000-08.com.datacore:sm2-4
    Current Portal: 172.16.200.10:3260,1
    Persistent Portal: 172.16.200.10:3260,1
    **********
    Interface:
    **********
    Iface Name: iface2
    Iface Transport: tcp
    Iface Initiatorname: iqn.1994-05.com.redhat:7fe2f44ea9de
    Iface IPaddress: 172.16.200.56
    Iface HWaddress: 00:14:22:b1:d6:a6
    Iface Netdev: default
    SID: 2
    iSCSI Connection State: LOGGED IN
    iSCSI Session State: Unknown
    Internal iscsid Session State: NO CHANGE
    ************************
    Negotiated iSCSI params:
    ************************
    HeaderDigest: None
    DataDigest: None
    MaxRecvDataSegmentLength: 131072
    MaxXmitDataSegmentLength: 262144
    FirstBurstLength: 0
    MaxBurstLength: 1048576
    ImmediateData: No
    InitialR2T: Yes
    MaxOutstandingR2T: 1
    ************************
    Attached SCSI devices:
    ************************
    Host Number: 2 State: running
    scsi2 Channel 00 Id 0 Lun: 0
    Attached scsi disk sdc State: running
    scsi2 Channel 00 Id 0 Lun: 1
    Attached scsi disk sdd State: running

    Log Errors;
    Mar 12 09:30:48 MYHOST52 last message repeated 2 times
    Mar 12 09:30:48 MYHOST52 iscsid: connection2:0 is operational after
    recovery (1 attempts)
    Mar 12 09:32:52 MYHOST52 kernel: ping timeout of 5 secs expired, last
    rx 19592296349, last ping 19592301349, now 19592306349
    Mar 12 09:32:52 MYHOST52 kernel: connection1:0: iscsi: detected conn
    error (1011)
    Mar 12 09:32:53 MYHOST52 iscsid: Kernel reported iSCSI connection 1:0
    error (1011) state (3)
    Mar 12 09:33:19 MYHOST52 iscsid: received iferror -38
    Mar 12 09:33:19 MYHOST52 last message repeated 2 times
    Mar 12 09:33:19 MYHOST52 iscsid: connection1:0 is operational after
    recovery (2 attempts)
    Mar 12 09:43:25 MYHOST52 kernel: ping timeout of 5 secs expired, last
    rx 19592929091, last ping 19592934091, now 19592939091
    Mar 12 09:43:25 MYHOST52 kernel: connection1:0: iscsi: detected conn
    error (1011)
    Mar 12 09:43:26 MYHOST52 iscsid: Kernel reported iSCSI connection 1:0
    error (1011) state (3)
    Mar 12 09:43:59 MYHOST52 iscsid: received iferror -38
    Mar 12 09:43:59 MYHOST52 last message repeated 2 times
    Mar 12 09:43:59 MYHOST52 iscsid: connection1:0 is operational after
    recovery (3 attempts)
    Mar 12 09:50:50 MYHOST52 kernel: connection2:0: iscsi: detected conn
    error (1011)
    Mar 12 09:50:50 MYHOST52 iscsid: Kernel reported iSCSI connection 2:0
    error (1011) state (3)
    Mar 12 09:50:53 MYHOST52 iscsid: received iferror -38
    Mar 12 09:50:53 MYHOST52 last message repeated 2 times
    Mar 12 09:50:53 MYHOST52 iscsid: connection2:0 is operational after
    recovery (1 attempts)
    Mar 12 09:54:06 MYHOST52 kernel: ping timeout of 5 secs expired, last
    rx 19593570520, last ping 19593575520, now 19593580520
    Mar 12 09:54:06 MYHOST52 kernel: connection1:0: iscsi: detected conn
    error (1011)
    Mar 12 09:54:07 MYHOST52 iscsid: Kernel reported iSCSI connection 1:0
    error (1011) state (3)
    Mar 12 09:54:34 MYHOST52 iscsid: received iferror -38
    Mar 12 09:54:34 MYHOST52 last message repeated 2 times
    Mar 12 09:54:34 MYHOST52 iscsid: connection1:0 is operational after
    recovery (2 attempts)
    Mar 12 10:00:54 MYHOST52 kernel: connection2:0: iscsi: detected conn
    error (1011)
    Mar 12 10:00:55 MYHOST52 iscsid: Kernel reported iSCSI connection 2:0
    error (1011) state (3)
    Mar 12 10:00:58 MYHOST52 iscsid: received iferror -38
    Mar 12 10:00:58 MYHOST52 last message repeated 2 times
    Mar 12 10:00:58 MYHOST52 iscsid: connection2:0 is operational after
    recovery (1 attempts)

    END

    From RHEL 5.3 x86 Host;

    Kernel:
    Linux MYHOST53.MYDOMAIN.com 2.6.18-128.el5 #1 SMP Wed Jan 21 07:58:05
    EST 2009 i686 i686 i386 GNU/Linux

    fstab;
    /dev/VolGroup00/LogVol00 / ext3
    defaults 1 1
    LABEL=/boot /boot ext3
    defaults 1 2
    tmpfs /dev/shm tmpfs
    defaults 0 0
    devpts /dev/pts devpts
    gid=5,mode=620 0 0
    sysfs /sys sysfs
    defaults 0 0
    proc /proc proc
    defaults 0 0
    /dev/VolGroup00/LogVol01 swap swap
    defaults 0 0
    /dev/sdc1 /sandisk1 ext3 _netdev 0 0

    iscsiadm;
    iSCSI Transport Class version 2.0-724
    iscsiadm version 2.0-868
    Target: iqn.2000-08.com.datacore:sm2-3
    Current Portal: 172.16.200.9:3260,1
    Persistent Portal: 172.16.200.9:3260,1
    **********
    Interface:
    **********
    Iface Name: default
    Iface Transport: tcp
    Iface Initiatorname: iqn.2005-03.com.redhat:01.406e5fd710e2
    Iface IPaddress: 172.16.200.69
    Iface HWaddress: default
    Iface Netdev: default
    SID: 1
    iSCSI Connection State: LOGGED IN
    iSCSI Session State: Unknown
    Internal iscsid Session State: NO CHANGE
    ************************
    Negotiated iSCSI params:
    ************************
    HeaderDigest: None
    DataDigest: None
    MaxRecvDataSegmentLength: 131072
    MaxXmitDataSegmentLength: 262144
    FirstBurstLength: 0
    MaxBurstLength: 1048576
    ImmediateData: No
    InitialR2T: Yes
    MaxOutstandingR2T: 1
    ************************
    Attached SCSI devices:
    ************************
    Host Number: 2 State: running
    scsi2 Channel 00 Id 0 Lun: 0
    Attached scsi disk sdc State: running

    Log Errors;
    Mar 11 18:12:03 MYHOST53 kernel: md: Autodetecting RAID arrays.
    Mar 11 18:12:03 MYHOST53 kernel: md: autorun ...
    Mar 11 18:12:03 MYHOST53 kernel: md: ... autorun DONE.
    Mar 11 18:12:03 MYHOST53 kernel: device-mapper: multipath: version
    1.0.5 loaded
    Mar 11 18:12:03 MYHOST53 kernel: EXT3 FS on dm-0, internal journal
    Mar 11 18:12:03 MYHOST53 kernel: kjournald starting. Commit interval
    5 seconds
    Mar 11 18:12:03 MYHOST53 kernel: EXT3 FS on sda1, internal journal
    Mar 11 18:12:03 MYHOST53 kernel: EXT3-fs: mounted filesystem with
    ordered data mode.
    Mar 11 18:12:03 MYHOST53 kernel: Adding 2031608k swap on /dev/
    VolGroup00/LogVol01. Priority:-1 extents:1 across:2031608k
    Mar 11 18:12:03 MYHOST53 kernel: IA-32 Microcode Update Driver: v1.14a
    <tig...@veritas.com>
    Mar 11 18:12:03 MYHOST53 kernel: microcode: CPU1 updated from revision
    0x7 to 0xc, date = 04212005
    Mar 11 18:12:03 MYHOST53 kernel: microcode: CPU0 updated from revision
    0x7 to 0xc, date = 04212005
    Mar 11 18:12:03 MYHOST53 kernel: Loading iSCSI transport class
    v2.0-724.
    Mar 11 18:12:03 MYHOST53 kernel: iscsi: registered transport (tcp)
    Mar 11 18:12:03 MYHOST53 kernel: iscsi: registered transport (iser)
    Mar 11 18:12:03 MYHOST53 kernel: ADDRCONF(NETDEV_UP): eth0: link is
    not ready
    Mar 11 18:12:03 MYHOST53 kernel: e1000: eth0: e1000_watchdog_task: NIC
    Link is Up 1000 Mbps Full Duplex, Flow Control: RX
    Mar 11 18:12:03 MYHOST53 kernel: ADDRCONF(NETDEV_CHANGE): eth0: link
    becomes ready
    Mar 11 18:12:03 MYHOST53 kernel: ADDRCONF(NETDEV_UP): eth1: link is
    not ready
    Mar 11 18:12:03 MYHOST53 kernel: e1000: eth1: e1000_watchdog_task: NIC
    Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    Mar 11 18:12:03 MYHOST53 kernel: ADDRCONF(NETDEV_CHANGE): eth1: link
    becomes ready
    Mar 11 18:12:03 MYHOST53 kernel: scsi2 : iSCSI Initiator over TCP/IP
    Mar 11 18:12:03 MYHOST53 kernel: Vendor: DataCore Model:
    SANmelody Rev: DCS
    Mar 11 18:12:03 MYHOST53 kernel: Type: Direct-
    Access ANSI SCSI revision: 04
    Mar 11 18:12:03 MYHOST53 kernel: SCSI device sdc: 41943040 512-byte
    hdwr sectors (21475 MB)
    Mar 11 18:12:03 MYHOST53 kernel: sdc: Write Protect is off
    Mar 11 18:12:03 MYHOST53 kernel: SCSI device sdc: drive cache: write
    back w/ FUA
    Mar 11 18:12:03 MYHOST53 kernel: SCSI device sdc: 41943040 512-byte
    hdwr sectors (21475 MB)
    Mar 11 18:12:03 MYHOST53 kernel: sdc: Write Protect is off
    Mar 11 18:12:03 MYHOST53 kernel: SCSI device sdc: drive cache: write
    back w/ FUA
    Mar 11 18:12:03 MYHOST53 kernel: sdc: sdc1
    Mar 11 18:12:03 MYHOST53 kernel: sd 2:0:0:0: Attached scsi disk sdc
    Mar 11 18:12:03 MYHOST53 kernel: sd 2:0:0:0: Attached scsi generic sg2
    type 0
    Mar 11 18:12:03 MYHOST53 rpc.statd[2160]: Version 1.0.9 Starting
    Mar 11 18:12:03 MYHOST53 iscsid: received iferror -38
    Mar 11 18:12:03 MYHOST53 last message repeated 2 times
    Mar 11 18:12:03 MYHOST53 iscsid: connection1:0 is operational now
    Mar 11 18:12:04 MYHOST53 kdump: kexec: loaded kdump kernel
    Mar 11 18:12:04 MYHOST53 kdump: started up
    Mar 11 18:12:04 MYHOST53 kernel: symev_rh_ES_5_2.6.18_53.el5_i686:
    module license 'Proprietary' taints kernel.
    Mar 11 18:12:04 MYHOST53 symev: loaded (symev-rh-ES-5-2.6.18-53.el5-
    i686.ko)
    Mar 11 18:12:04 MYHOST53 symap: loaded (symap-rh-ES-5-2.6.18-53.el5-
    i686.ko)

    END

    Any help / suggestions gratefully received. I can change the config
    of the RHEL 5.3 x86 host on demand, but not the RHEL 5.2 x86_64 host
    (prod box).

    Many thanks,

    Andy.

  2. #2
    Just Joined!
    Join Date
    Apr 2009
    Posts
    3
    Did you find a resolution to this? We are experiencing a very similar situaiton in a similiar setup.

    Thanks

  3. #3
    Just Joined!
    Join Date
    Aug 2009
    Posts
    1

    Same on Enterprise Linux 5.3 making RAC build impossible

    experiencing the same issue:
    Oracle enterprise linux - running same rehat 5 u3 kernel
    2.6.18-128.el5xen x86_64

    2 node RAC cluster running ocfs2
    thought it was the ocfs2 as the culprit but it is the iscsid and kernel error detection to the iscsi devices.

    seeing this in /var/log/messages

    iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
    iscsid: received iferror -38

    multiple entries on both nodes. i have to shutdown the second node otherwise it will reboot without warning and at times the primary node will also reboot.
    With just 1 single node running then there is no problem.

    the iscsi devices are mounted at boot and i can write to either node at any time. so looks like the iscsi server node may be experiencing some form of network contention which panics the cluster.

    using: Openfiler 2.3 as the iscsi host
    using: private ip address on gigabit ethernet
    can ping private between all nodes and to the iscsi server

    Any other troubleshooting tips on the iscsi / network side would be appreciated.

  4. #4
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    8,974
    Since Oracle's Linux distribution is based on the RHEL 5 code base (currently 5.3), one would think they have tested (and perhaps fixed) such kernel driver issues. So, have you tried installing Oracle's Linux 5.3 on your test rig to see if that also has this problem? If so, then perhaps Oracle can resolve it?
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...