Find the answer to your Linux question:
Results 1 to 10 of 10
I am getting the following error trying to read directories on SLES 11. It occurs on several machines and any directory, so it is not a corrupted directory problem. Exception ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    May 2014
    Posts
    9

    Getting error on JNI call to readdir in Java 7 and 8 on SLES 11


    I am getting the following error trying to read directories on SLES 11. It occurs on several machines and any directory, so it is not a corrupted directory problem.


    Exception in thread "main" java.nio.file.DirectoryIteratorException: java.nio.file.FileSystemException: /var/opt/util/java8/jdk/jdk/src/share/classes/sun/util/cldr/resources/21_0_1/common/main: Unknown error 9572
    at sun.nio.fs.UnixDirectoryStream$UnixDirectoryIterat or.readNextEntry(UnixDirectoryStream.java:172)
    at sun.nio.fs.UnixDirectoryStream$UnixDirectoryIterat or.hasNext(UnixDirectoryStream.java:201)
    at build.tools.cldrconverter.CLDRConverter.readBundle List(CLDRConverter.java:256)
    at build.tools.cldrconverter.CLDRConverter.main(CLDRC onverter.java:184)
    Caused by: java.nio.file.FileSystemException: /var/opt/util/java8/jdk/jdk/src/share/classes/sun/util/cldr/resources/21_0_1/common/main: Unknown error 9572
    at sun.nio.fs.UnixException.translateToIOException(Un ixException.java:91)
    at sun.nio.fs.UnixException.asIOException(UnixExcepti on.java:111)
    at sun.nio.fs.UnixDirectoryStream$UnixDirectoryIterat or.readNextEntry(UnixDirectoryStream.java:171)
    ... 3 more

    It appears as if the final iterative call to readdir after all the directory elements have been read is causing an unknown exception that is being propagated back from the JNI call.

    The following is a program I wrote to test this that will cause the error; I wanted to try to isolate the issue.


    import java.io.File;
    import java.nio.file.DirectoryStream;
    import java.nio.file.FileSystems;
    import java.nio.file.Files;
    import java.nio.file.Path;
    import java.util.*;

    public class tstconv {


    public static void main(String[] args) throws Exception {

    tstconv.testit();
    }

    public static void testit() throws Exception {

    Path path = FileSystems.getDefault().getPath("/home/gch");

    try (DirectoryStream<Path> dirStr = Files.newDirectoryStream(path)) {
    for (Path entry : dirStr) {
    String fileName = entry.getFileName().toString();
    System.out.println(fileName);

    }
    }
    return;
    }
    }

    The above will print out all the files in the directory before failing.

    Strangely, I am not getting the error on one box, and the only seemingly relevant difference in its setup is that it has glibc package glibc-2.11.1-0.46.1, while the others all have glibc-2.11.1-0.52.1.

    I have not been able to find anything helpful out on the web thus far.
    Last edited by herzjanny; 05-28-2014 at 01:11 AM. Reason: extra info

  2. #2
    Linux Engineer
    Join Date
    Dec 2013
    Posts
    1,084
    Is it only the one directory that fails? Could you have a filename that causes it trouble? Perhaps one with a control character? Have you checked output (maybe a count) against what the os reports?

  3. #3
    Just Joined!
    Join Date
    May 2014
    Posts
    9
    No, it is not directory related. I used it on different directories, including on different partitions. I got the same error on two other SLES 11 boxes as well. The one where it did work, as I said, was one with an older package for glibc, which seems an odd coincidence.

  4. #4
    Just Joined!
    Join Date
    May 2014
    Posts
    9
    It appears the problem is due to a change to readdir_r in glibc and the way errors are returned, as it was modified between the two versions I mention above.

    The code diffs in the glibc-2.11.1/sysdeps/unix/readdir_r.c file:

    @@ -42,6 +42,7 @@ __READDIR_R (DIR *dirp, DIRENT_TYPE *ent
    DIRENT_TYPE *dp;
    size_t reclen;
    const int saved_errno = errno;
    + int ret;

    __libc_lock_lock (dirp->lock);

    @@ -72,10 +73,10 @@ __READDIR_R (DIR *dirp, DIRENT_TYPE *ent
    bytes = 0;
    __set_errno (saved_errno);
    }
    + if (bytes < 0)
    + dirp->errcode = errno;

    dp = NULL;
    - /* Reclen != 0 signals that an error occurred. */
    - reclen = bytes != 0;
    break;
    }
    dirp->size = (size_t) bytes;
    @@ -108,18 +109,43 @@ __READDIR_R (DIR *dirp, DIRENT_TYPE *ent
    dirp->filepos += reclen;
    #endif

    - /* Skip deleted files. */
    +#ifdef NAME_MAX
    + if (reclen > offsetof (DIRENT_TYPE, d_name) + NAME_MAX + 1)
    + {
    + /* The record is very long. It could still fit into the
    + caller-supplied buffer if we can skip padding at the
    + end. */
    + size_t namelen = _D_EXACT_NAMLEN (dp);
    + if (namelen <= NAME_MAX)
    + reclen = offsetof (DIRENT_TYPE, d_name) + namelen + 1;
    + else
    + {
    + /* The name is too long. Ignore this file. */
    + dirp->errcode = ENAMETOOLONG;
    + dp->d_ino = 0;
    + continue;
    + }
    + }
    +#endif
    +
    + /* Skip deleted and ignored files. */
    }
    while (dp->d_ino == 0);

    if (dp != NULL)
    - *result = memcpy (entry, dp, reclen);
    + {
    + *result = memcpy (entry, dp, reclen);
    + ret = 0;
    + }
    else
    - *result = NULL;
    + {
    + *result = NULL;
    + ret = dirp->errcode;
    + }
    __libc_lock_unlock (dirp->lock);

    - return dp != NULL ? 0 : reclen ? errno : 0;
    + return ret;
    }

    It looks like it now returns a nonzero error code in instances where it used to return 0.

  5. #5
    Linux Engineer
    Join Date
    Dec 2013
    Posts
    1,084
    AFAIK it's always returned non-zero error codes. It does seem to have changed how that's done but that shouldn't cause a new error.

  6. #6
    Just Joined!
    Join Date
    May 2014
    Posts
    9
    Previously, if the return value from getdents was 0, presumably meaning nothing to read, this is what was returned. Now it is set to "dirp->errcode", which is never set in the function if getdents returned 0. One would hope this value was initialized to 0 at some point, but who knows where it may be being modified? The DIR structure is not passed to getdents, just a couple of its components. The only way it is modified within the function is if the return from getdents is < 0, but this should not occur just due to end of data condition.

    As I said, the system with the old glibc has no problem; the ones with the changes are not working.

  7. #7
    Linux Engineer
    Join Date
    Dec 2013
    Posts
    1,084
    You could compile and run this to check and see:
    Code:
    #include <stdio.h>
    #include <dirent.h>
    
    int main (int argc, char* argv[]) {
            struct dirent nxt;
            struct dirent* resp;
    
            DIR* dp = opendir (argv[1]);
    
            if (dp) {
                    while (1) {
                            int rv = readdir_r (dp, &nxt, &resp);
    
                            printf ("readdir_r return %d\n", rv);
                            if (resp == 0) {
                                    printf ("readdir_r response pointer NULL\n");
                                    break;
                            }
    
                            if (rv != 0) {
                                    break;
                            }
    
                            printf ("%s\n", nxt.d_name);
                    }
    
            } else {
                    printf ("Couldn't open %s\n", argv[1]);
            }
    
            printf ("Done\n");
    
            return 0;
    
    }
    I ran it on a system with glibc 2.19 and it worked as expected - it return 0 and set the response pointer to NULL at the end of the directory. If it isn't doing that it's a bug in glibc.

    * it's expecting a dirname at the command line when it's executed.

  8. #8
    Just Joined!
    Join Date
    May 2014
    Posts
    9
    Thanks for the code. I compiled this and it works with the glibc I have.

    Now I'm thinking there is something in the java native code that causes the problem, which wasn't manifested in the old readdir_r. The DIR object used in the JNI call is not retrieved with opendir but is passed in as a referenced long int and converted to a DIR pointer, since java code doesn't do pointers. What exactly is passed by the iterator when all the directory references have been read - that I am trying to figure out. If I modify the JNI (jdk/src/solaris/native/sun/nio/fs/UnixNativeDispatcher.c) to call readdir instead of readdir_r and set errno to 0 before the call, it seems to work fine.

  9. #9
    Linux Engineer
    Join Date
    Dec 2013
    Posts
    1,084
    I don't see why it would fail but in looking at the code for Java 8 they have extended struct dirent thusly:
    Code:
    struct {
            struct dirent64 buf;
            char name_extra[PATH_MAX + 1 - sizeof result->d_name];
    } entry;
    struct dirent64* ptr = &entry.buf;
    On the system I'm using at the moment d_name is hard wired to char d_name[256] - it doesn't rely on limits.h for PATH_MAX.

    In the patch you posted I see this comment:

    Code:
    + /* The record is very long. It could still fit into the
    + caller-supplied buffer if we can skip padding at the
    + end. */
    They then set reclen to the offset of the d_name and namelen + 1. I presume namelen is calculated in the same manner as string len so the nul byte must bSo I guess they are using possible extra bytes available because of alignment in the struct.

    I don't see why this would affect anything though.

  10. #10
    Just Joined!
    Join Date
    May 2014
    Posts
    9
    OK, I think I found the problem and it seems to be specific to the build I have. The errcode field was added to the DIR structure (__dirstream in this implementation) last year. In the __alloc_dir function in opendir.c, after the malloc is called for the new DIR the fields are initialized. In the build we have, the change to add the initialization of errcode was omitted. If there is garbage there, it will remain and then, when the final call to readdir_r is made and no results are returned, the return value of the function is set to whatever is in dirp->errcode.

    If the area malloc'd in __alloc_dir is zeroed out by happenstance on the malloc, this problem will not be noticed.

    Newer versions of glibc include the initialization of errcode to 0 and so have fixed the problem.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •