Find the answer to your Linux question:
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 15
Hello everyone. I have currently a strange bug for which I do not succeed to find the cause. I am working on a Linux from scratch based distribution with kernel ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Apr 2009
    Location
    Paris suburb
    Posts
    8

    strange SCSI error while trying to mount a CDrom


    Hello everyone.
    I have currently a strange bug for which I do not succeed to find the cause.
    I am working on a Linux from scratch based distribution with kernel 2.6.21.1 and my company has recently acquired a new SCSI adapter, an Adaptec 29320ALP.
    With this adapter, when I boot on CDrom, trying to mount the CDrom filesystem, I get the following messages:

    sr 1:0:0:0: SCSI error: return code = 0x08000002
    sr 1:0:0:0: Sense Key : Medium Error [current]
    Info fld=0x10
    sr 1:0:0:0: Add. Sense: No seek complete
    end request: I/O error, dev sr0, sector 64
    isofs_fill_super: bread failed, dev=sr0, iso_blknum = 16, block = 16
    mount: Mounting /dev/sr0 on /mnt/cdrom1 failed: Invalid argument

    and the mount fails.
    I have to precise that my CDrom is an IDE one, the SCSI adapter is only used for the tape drive.
    that behabiour never occured before (with other SCSI adapters) and seems to happen only when the Adaptec card is plugged.
    What is strange is that, getting a shell from my install CD after the mount failure, I can mount the Cdrom with the command line. It seems that only the first attempt does fail.
    Another annoying thing is that this bug is not systematic, it can happen on 10 successive reboots and then not happen anymore, so I cannot know if the workaround I tried are working or not.
    It also happens sometimes when booting on disk, with the same kernel.

    Here are my configuration :
    Linux 2.6.21.1 compiled with libata
    module aic79xx loaded for the SCSI adapter
    driver aic7xxx compiled in the kernel (should not be used and doesn't seem to be)

    lsmod :

    Module Size Used by Not tainted
    isofs 33080 1 - Live 0xe00d7000
    sr_mod 16680 1 - Live 0xe0077000
    cdrom 36160 1 sr_mod, Live 0xe00cd000
    aic79xx 178776 0 - Live 0xe004a000
    r8169 27928 0 - Live 0xe0042000
    r8168 30760 0 - Live 0xe0039000
    i2c_i801 8336 0 - Live 0xe0024000
    i2c_core 20880 1 i2c_i801, Live 0xe0028000

    cat /proc/scsi/scsi :

    Attached devices:
    Host: scsi0 Channel: 00 Id: 00 Lun: 00
    Vendor: ATA Model: ST3500630AS Rev: 3.AA
    Type: Direct-Access ANSI SCSI revision: 05
    Host: scsi1 Channel: 00 Id: 00 Lun: 00
    Vendor: Optiarc Model: DVD RW AD-5170A Rev: 1.12
    Type: CD-ROM ANSI SCSI revision: 05
    Host: scsi2 Channel: 00 Id: 02 Lun: 00
    Vendor: TANDBERG Model: TS400 Rev: 0258
    Type: Sequential-Access ANSI SCSI revision: 03
    CPU0
    cat /proc/interrupts :

    0: 74304 XT-PIC-XT timer
    1: 533 XT-PIC-XT i8042
    2: 0 XT-PIC-XT cascade
    4: 1057 XT-PIC-XT serial
    8: 1 XT-PIC-XT rtc
    9: 0 XT-PIC-XT acpi
    10: 936 XT-PIC-XT eth0
    11: 165 XT-PIC-XT aic79xx
    12: 5 XT-PIC-XT i8042
    14: 3 XT-PIC-XT libata
    15: 67 XT-PIC-XT libata
    NMI: 0
    ERR: 0

    I am not an SCSI expert and would need someone to give me some light on this behaviour. What do mean these messages ?

    Thank you for help

    Pascal

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,594
    My guess is that the adaptec controller is masking the ide controller somehow. As for the SCSI errors, just remember that Linux presents all storage devices to the system as scsi devices, and deals with the sata, ide, scsi cruft under the covers. Anyway, see if there is a jumper of some other means to change the I/O (or DMA) address and/or interrupt used by the adapter, which could be interfering with the IDE controller.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  3. #3
    Just Joined!
    Join Date
    Apr 2009
    Location
    Paris suburb
    Posts
    8

    Talking suite

    Hi Rubberman and others,
    This seems to be even more complicated. My collegue working on the hardware is not convinced by the fact it is related to interrupts in any way.
    He would rather think that the aic79xx driver is taking the CDrom device as a removable device and willing to take control of it, so messing up libata stuff. The problem with this theory is the random factor that should not exist.
    Anyway, I have some new info, first, the simple fact to do an "eject -t /dev/sr0" before trying to mount is no sufficient, I supposed that operating on the device would "drain" the problem but no, at least, not with an "eject -t".
    Secondly, as strange as it can seem, the bug only happens with CD-R medias in the drive, never with RW, my tests are clear on this point, I have duplicated many CD-R medias presenting the bug on my CD-RW and never my RW did reproduce it.
    This can look strange, but it seems that the failure is reached somewhere on the allocation table of the CD, which is different between an R and an RW, which doesn't mean that the cause of the bug is related to the CD allocation table.
    Maybe these strange lights could give some ideas to you all.
    Thank you for previous, and maybe future answers.
    Pascal

  4. #4
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,594
    Well, to put it suscinctly, I am bumfuzzled! So, the problem seems to be related to the media present. Unfortunately, this is over my head. Time to contact Adaptec, I think. It could be that there is a bug in their Linux driver, or if they didn't provide one, then the kernel driver for the device.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  5. #5
    Just Joined!
    Join Date
    Apr 2009
    Location
    Paris suburb
    Posts
    8

    Post Ok, I've got new elements.

    I burnt a CD-R based on my install CD, but that does not launch the install, only starts a shell, so I can execute commands on a virgin basis, no modules loaded, no filesystems mounted, no devices created...
    I have stopped totally the machine and started it with the CD in.
    I loaded the few modules needed to mount the CD, created /dev/sr0 and mounted it : it failed. then remounting it immediately after it succeeded.
    I rebooted (no elctric stop this time) and redid the same things and the first mount succeeded.
    Restopping electrically and restarting reproduced the problem.
    So it seems the random factor is not so random when I do a minimal bunch of operations.
    Another thing, the driver aic79xx is not responsible of the problem since it wasn't loaded with this CD and the failure occured.

  6. #6
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,594
    Quote Originally Posted by pascaltaf View Post
    I burnt a CD-R based on my install CD, but that does not launch the install, only starts a shell, so I can execute commands on a virgin basis, no modules loaded, no filesystems mounted, no devices created...
    I have stopped totally the machine and started it with the CD in.
    I loaded the few modules needed to mount the CD, created /dev/sr0 and mounted it : it failed. then remounting it immediately after it succeeded.
    I rebooted (no elctric stop this time) and redid the same things and the first mount succeeded.
    Restopping electrically and restarting reproduced the problem.
    So it seems the random factor is not so random when I do a minimal bunch of operations.
    Another thing, the driver aic79xx is not responsible of the problem since it wasn't loaded with this CD and the failure occured.
    What I was trying to say before is that this seems to be a hardware, not software/driver issue. What if you remove the raid controller from the system altogether?
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  7. #7
    Just Joined!
    Join Date
    Apr 2009
    Location
    Paris suburb
    Posts
    8

    huh

    There is no RAID controller on this machine. The SCSI adapter is only for a tape drive.
    Or maybe you are talking about the RAID options in my kernel but I don't think so since you just said that you suggest it's a hardware, not software bug.
    I'm gonna try Adaptec support but I'm not sure of the result.
    I'm also trying to test with a 2.6.30-rc3 kernel as if it fails too, then I can post on the lkml, but my first attempt to compile it was unsuccessful.

  8. #8
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,594
    Quote Originally Posted by pascaltaf View Post
    There is no RAID controller on this machine. The SCSI adapter is only for a tape drive.
    Or maybe you are talking about the RAID options in my kernel but I don't think so since you just said that you suggest it's a hardware, not software bug.
    I'm gonna try Adaptec support but I'm not sure of the result.
    I'm also trying to test with a 2.6.30-rc3 kernel as if it fails too, then I can post on the lkml, but my first attempt to compile it was unsuccessful.
    Sorry, you are correct. I meant tape controller. Doh!
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  9. #9
    Just Joined!
    Join Date
    Apr 2009
    Location
    Paris suburb
    Posts
    8

    suite

    Hi. For your last suggestion, it is a bit hard to unplug the tape drive, and seems unlikely that it is bound to the problem, anyway, I'll try it if other things don't work.
    I mailed Adaptec support and go an answer, but that seems to not match exactly the problem.
    You might find the thread on ask.adaptec.com, the question reference is #090424-000059, the title is "pb managing devices", I have no exact link to provide yet, I should search for it.
    I have a new element: the problem is still here with linux kernel 2.6.30-rc3 which is the last one at now.
    So this is either an unknown software bug or a BIOS/hardware bug.
    My collegue is gonna try to see if the BIOs can be updated.

  10. #10
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,594
    Quote Originally Posted by pascaltaf View Post
    So this is either an unknown software bug or a BIOS/hardware bug.
    My collegue is gonna try to see if the BIOs can be updated.
    That's a definite possibility. Probably won't hurt to update the BIOS. I know that has fixed some problems on various systems of mine in the past.

    If it ain't broke, don't fix it. If it is broke, find a hammer and PARTY!
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •