Find the answer to your Linux question:
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 12
Good afternoon, I develop an application and receive a following error message during its work: Code: malloc.c:3096: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Linux Newbie
    Join Date
    Apr 2010
    Location
    Novosibirsk, Russia
    Posts
    145

    Question sYSMALLOc failed with program abort


    Good afternoon,

    I develop an application and receive a following error message during its work:

    Code:
    malloc.c:3096: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.
    Aborted
    I suppose this is not kernel bug (I tested it on two or three different kernels), but it appears in system library function as I see, when I call malloc() during application works. I tried to rewrite and simplify the code that raises an error, but it seems that's not a problem. In my last code I call malloc() only once, but it fails again even in this single call.


    Can you help me to understand what really this "malloc.c" message means? And what cases can raise these fatal errors?

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    11,167
    This is indicating that something has stepped on memory that malloc() uses in the library code to track what has been allocated. This indicates that there is a serious bug in your program before you even get to the malloc() call. My guess is that you are writing to a bogus pointer somewhere. You could run your program in the debugger and do a stack trace when it asserts like this. Find out what address is triggering the failure, and then set a watch-point on that memory, rerun your program, and the debugger should stop it when something alters that memory. When it is your code that does it, you know what is munging it.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  3. #3
    Linux Newbie
    Join Date
    Apr 2010
    Location
    Novosibirsk, Russia
    Posts
    145
    It seems to me that's something like a race condition inside kernel function...

    Code:
    ccgid: malloc.c:3097: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.
    
    Program received signal SIGABRT, Aborted.
    [Switching to Thread 0xb5170b70 (LWP 25712)]
    0xffffe424 in __kernel_vsyscall ()
    (gdb) where
    #0  0xffffe424 in __kernel_vsyscall ()
    #1  0xb7e617ff in raise () from /lib/libc.so.6
    #2  0xb7e63140 in abort () from /lib/libc.so.6
    #3  0xb7ea4210 in __malloc_assert () from /lib/libc.so.6
    #4  0xb7ea625c in _int_malloc () from /lib/libc.so.6
    #5  0xb7ea862a in malloc () from /lib/libc.so.6
    #6  0xb7b7503c in ?? () from /lib/libcrypto.so.1.0.0
    #7  0xb7b756cc in CRYPTO_malloc () from /lib/libcrypto.so.1.0.0
    #8  0xb7be674e in lh_insert () from /lib/libcrypto.so.1.0.0
    #9  0xb7be8fc0 in ?? () from /lib/libcrypto.so.1.0.0
    #10 0xb7be8a9f in ?? () from /lib/libcrypto.so.1.0.0
    #11 0xb7bd40de in ERR_load_DSO_strings () from /lib/libcrypto.so.1.0.0
    #12 0xb7bea67a in ERR_load_crypto_strings () from /lib/libcrypto.so.1.0.0
    #13 0xb7d0d154 in SSL_load_error_strings () from /lib/libssl.so.1.0.0
    #14 0xb7e01709 in ?? () from /usr/lib/libcurl.so.4
    #15 0xb7e14c09 in ?? () from /usr/lib/libcurl.so.4
    #16 0xb7e09718 in curl_global_init () from /usr/lib/libcurl.so.4
    #17 0xb7e09885 in curl_easy_init () from /usr/lib/libcurl.so.4
    #18 0x0804deed in handle_post_request (
        http_request=0x806a8c8 "POST /add_db_entry.pl HTTP/1.1\r\nHost: localhost:10267\r\nUser-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.6) Gecko/20100626 SUSE/3.6.6-1.2 Firefox/3.6.6\r\nAccept: text/html,application/xhtml+"..., client_sock=18) at mod_nps.c:42
    #19 0x0804a155 in proceed_client_request (p_queued_client=0x806a600) at threads.c:65
    #20 0xb7fb0b25 in start_thread () from /lib/libpthread.so.0
    #21 0xb7f0946e in clone () from /lib/libc.so.6
    
    
    (gdb) info threads
    * 6 Thread 0xb57ffb70 (LWP 5367)  0xffffe424 in __kernel_vsyscall ()
      2 Thread 0xb79abb70 (LWP 4666)  0xffffe424 in __kernel_vsyscall ()
      1 Thread 0xb79aca20 (LWP 4663)  0xffffe424 in __kernel_vsyscall ()
    And threads are always in __kernel_vsyscall () at the same time when the program aborts. It's really looks like a race condition when threads are switching, am I correct?.. But I have never faced before with "memory allocation race conditions", what I could do with it, if it possible to go this way?...

  4. #4
    Linux Newbie
    Join Date
    Apr 2010
    Location
    Novosibirsk, Russia
    Posts
    145
    No, that's not a race condition...) Still searching...Seems to me that I know where the crash could raise...The 3 threads are alive in the moment of crash, but one of them is sleeeping awaiting network connections, and two sleeps in a second, waking up to check some conditions... Now I should go and check it...)

  5. #5
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    11,167
    Multiple threads are usually a good place to look for problems like this. The standard Linux allocator has to be one of the most reviewed pieces of code in the system, so I think that a race condition there is most unlikely. However, it is very common for multiple threads to try and access/modify a common pointer at the same time. So, one may be trying to read it while another is modifying/changing it. Oops... BOOM!
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  6. #6
    Linux Newbie
    Join Date
    Apr 2010
    Location
    Novosibirsk, Russia
    Posts
    145
    Yes, the debugging brought me to the same thought...) I supposed I should try to set watchpoints to sYSMALLOc()'s local variables in stack, am I right?...

  7. #7
    Linux Newbie
    Join Date
    Apr 2010
    Location
    Novosibirsk, Russia
    Posts
    145
    Oh, I just lose my head...) I see that's no much sense to watch the sYSMALLOc() inner variables after abort, because it seems to me that they are already corrupted/ Anyway, the assert() operation performs check for variables itself, it's not dereference any addresses in variables.

    From glibc package, file malloc.c:
    Code:
      /*
         If not the first time through, we require old_size to be
         at least MINSIZE and to have prev_inuse set.
      */
    
      assert((old_top == initial_top(av) && old_size == 0) ||
    	 ((unsigned long) (old_size) >= MINSIZE &&
    	  prev_inuse(old_top) &&
    	  ((unsigned long)old_end & pagemask) == 0));
    Then, I think I must set watchpoints on a thread stack, to catch if something changes its variables. But how I can be sure that in the next restart of my program the thread stack would occupy the same memory piece? And I still can't understand how to retrieve address of thread's stack...) So, if I call 'backtrace' in gdb, it shows to me something like

    Code:
    #0  0xffffe424 in __kernel_vsyscall ()
    #1  0xb7e627ff in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
    #2  0xb7e64140 in abort () at abort.c:92
    #3  0xb7ea5210 in __malloc_assert (assertion=<value optimized out>,
    But what are these addresses really mean?... Can I retrieve thread stack address from 'backtrace', or I can do it only from ESP registry?.. And how to be with these addreses in the next restart?.. I think thread stack could be allocated in other piece of memory...

  8. #8
    Linux Newbie
    Join Date
    Apr 2010
    Location
    Novosibirsk, Russia
    Posts
    145
    hmm...there is another thing I remembered...I use "threads" created by fork(), so it means that memory of these "threads" live according to "copy on write" rule, right?.. I paid much attention to my code, and there are no bogus pointers. Watchpoints that I tried to set in several ways are also do not trigger at all. I suppose that something corrupts the heap earlier in program, because "assert" participants are also come from outside structures. Still keep searching...

  9. #9
    Linux Newbie
    Join Date
    Apr 2010
    Location
    Novosibirsk, Russia
    Posts
    145

    Post

    I suppose I know what could be the reason of heap crash...) I think memory is not correctly freed after function returns - just if I allocate a buffer (say 4096 bytes starting at 0x4b276600) and save pointer in variable 'm_ptr'

    Code:
    void* m_ptr = malloc(4096);
    And inside the function this pointer is going to change its position, for example when I split data using strtok_r(). And if pointer remains at the middle of buffer when function returns - buffer could not be correctly freed, right?.. I think that's the root of my problem...) because strace shows to me that only single thread works at this time and no one breaks its memory.

  10. #10
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    11,167
    Quote Originally Posted by Schmidt View Post
    I suppose I know what could be the reason of heap crash...) I think memory is not correctly freed after function returns - just if I allocate a buffer (say 4096 bytes starting at 0x4b276600) and save pointer in variable 'm_ptr'

    Code:
    void* m_ptr = malloc(4096);
    And inside the function this pointer is going to change its position, for example when I split data using strtok_r(). And if pointer remains at the middle of buffer when function returns - buffer could not be correctly freed, right?.. I think that's the root of my problem...) because strace shows to me that only single thread works at this time and no one breaks its memory.
    Well, if you change m_ptr with something like strtok_r() and then try to free it, yes you will abort. Since you have linked with the debugable system libraries, that may be why it asserts on you instead of just generating a segfault. Try commenting out the free() of this pointer to see if the problem persists. If it doesn't, then you need to change your code to save the original pointer to an immutable variable which you will later use to free. That should minimize the number of code changes you make to fix this.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •