Find the answer to your Linux question:
Results 1 to 5 of 5
I have recently witten the code for a dynamic FIFO queue that has nodes that can have variable size. I have thoroughly tested the code and the init/add/remove operations work ...
  1. #1
    Just Joined!
    Join Date
    Oct 2009
    Posts
    7

    Memory Fragmentation with Dynamic FIFO Queue

    I have recently witten the code for a dynamic FIFO queue that has nodes that can have variable size. I have thoroughly tested the code and the init/add/remove operations work flawlessly. I have even run it agains valgrind to verify that there are no memory leaks when allocating and freeing nodes. I am currently running the program I have written on Debian 5.0 kernel 2.6.26 with the default OS settings. I have not performed any memory tweaks to tune for performance etc. All of the allocations for each node in the queue are handled using the default malloc() and free().

    The problem I have with my queue occurs when I add a large number of small items to the queue. I have been watching my memory usage via the System Monitor in Debian and after adding 4000 items of size 1518 each to the queue my memory usage jumps just as expected. However, when I move through my queue and remove each item the memory usage does not appear to go down. When I run this test again without restarting my program the system does not allocate any further memory than was allocated in the first run. It appears as though it reuses memory that was allocated previously. I ran a second test and instead of adding a large number of small nodes to the queue I just added 30 items containing 10MB each and then emptied my queue again. This time the memory went up as expected and came back down as expected. I have been researching this problem for a few days now and I believe I have discovered the problem but have been unable to decide on reasonable solution.

    Basically what I have determined is that when i call free() to free my nodes the memory is not being returned to the operating system. I found a good explaination but I can't paste the link because i dont have enough posts yet.

    It states:

    "The decision on whether to use brk() or mmap() requires one simple check. If the request is equal or larger than M_MMAP_THRESHOLD, the allocator uses mmap(). If it is smaller, the allocator calls brk(). By default, M_MMAP_THRESHOLD is 128KB, but you may freely change it by using mallopt().

    In the OOM context, how ptmalloc frees memory blocks is interesting. Blocks allocated via mmap() get freed via an unmap() call, and thus become completely released. Freeing blocks allocated via brk() means marking them as free, but they remain under the allocator's control."

    Because the small blocks I am allocating are smaller than the mmap threshold they are being allocated by brk() and brk does not release freed memory back to the OS right away. I am assuming it does this mostly for performance gains. Apparently it reuses the allocated memory for future requests.

    This is a big problem for me because in my software that uses my queue I also report on the amount of free memory the system currently has. Anomalies such as what I have just described above cause the memory usage updates sent from my program to give a completely inaccurate representaion of the amount of memory currently being used. In addition, what I have also noticed is that as I continue to add/remove more and more items I begin to notice a fairly large and rapid performance degredation with my software. I don't know if what I described above is actual "memory fragmentation". I have always thought that memory fragmentation occured as a result of blocks of allocated memory not being used at all by the software. I am using all of the blocks I allocate, they just dont appear to be freeing in the manner I would like them to. Furthermore, my queue is only a small portion of my software that will be performing these types of small memory allocations. I have other modules that will be performing very similar allocations and I expect to take even larger memory reporting inaccuracies and performance hits.

    I have thought of various solutions:

    1. Use a different memory allocator
    2. Allocate large chunks of memory ahead of time and manage the memory myself. (This is very difficult because my queue could become very large)
    3. Adjust the management parameters for malloc so that it does not use brk (have had no success with this)

    Any help or suggestions would be greatly appreciated.

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    8,974
    Actually, heap in Linux/Unix systems is not returned to the OS on free() as it is with Windoze systems. When you need more memory for malloc(), it will call sbrk() to allocate more heap from the OS. However, when you call free() it doesn't return it automatically since you are probably going to need it again. To do so, you need to use brk() as you found.

    Once of the reasons that memory isn't returned to the OS automatically is that usually there is fragmentation and returning memory still in use to the OS is a good way to dump core.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  3. #3
    Just Joined!
    Join Date
    Oct 2009
    Posts
    7
    I am confused. In the second paragraph it states that brk does not release the memory back to the OS and mmap completely releases it. So shouldn't I be using mmap() to solve my problem?

  4. #4
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    8,974
    Why is it a problem? You don't want to use mmap() without fully understanding what you are doing. Allocated, bug unused system memory (in page-size chunks - about 4k), are swapped out to disc as necessary for the use of other processes, so physical RAM is still available for running processes. In any case, brk() will return memory to the system, if the address specified point to allocated memory in the process. sbrk() calls brk() after computing what the new end_data_segment address would be after adding the specified increment to the current break position. If you are going to use brk() to reduce the amount of heap allocated to the process, then you have to determine what the end of the used memory for the process may be. IE, you might need to implement your own allocator/deallocator in order to track that definitively, bearing in mind that you also have to keep track of page boundaries. Normally, this is not done. Long running processes that know about what their maximum memory utilization will be often preallocate that amount so that the OS gives them that much heap once.

    As for determination of exactly how much "live" memory a process is using, that is difficult. Note that brk()/sbrk() operations are relative expensive. You don't want to be doing them frequently. The behavior of malloc has been tuned with that in mind. When malloc needs more heap, it calls sbrk(), and then allocates from the memory thus returned by sbrk(), which is some multiple of the system page size.

    Note that there are some documents on The Linux Documentation Project that will explain in better detail how Linux and Linux applications deal with memory.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  5. #5
    Just Joined!
    Join Date
    Oct 2009
    Posts
    7
    OK. I have run tests and what I have found out is that the 128KB threshold mentioned in the above quote is either inaccurate or outdated. The allocator only holds onto a chunk of allocated memory when it is has a total size that is less than or equal to 32672 bytes. This is essentially 4 pages worth of memory with a 96 byte gap which I am assuming is tied up in some sort of overhead with the allocator. I have tried this across two different platforms and versions of Linux and it seems to hold true. Based upon this I think I will just write a mask for my malloc() and free() methods that will keep track of the amount of memory my program has allocated. In these methods I can make a decision based upon the size of the chunk of data that is being allocated as to how it will be represented within the total memory used for my program ( i.e. whether or not the allocator will hold onto it or hand it directly back to the OS). I would rather not go through the trouble of managing my own allocator due to some current time constraints. Furthermore, it is not absolutely imperative that I know the exact memory usage but I would like to be within approximately 1%-5% accuracy and I think this method will achieve that goal. What are you thoughts on this approach?

    By the way, I ran into an interesting scenario when attempting to determine the 32672 threshold I mentioned above. My test involved storing 50,000 items of 32673 bytes each into a queue. Each item was allocated using malloc() of course. What I noticed is that this scenario brings one of my cpu cores to 100% usage and takes approximately 5-10 times longer to allocate all of the items than the same scenario with 50,000 items of 32672 bytes each. I am assuming this has to do with the fact that each item that is allocated is forced to use 2 pages rather than one in this scenario with only one byte of data being allocated in each of the second pages, but I don't know why that would slow it down. I also tried this scenario at 32674 and received the same results so I guess there is a tremendous amount of overhead when the allocator is forced to put a very few number of bytes in the last page required to fill the allocation. Any thoughts on this scenario?

    Thanks for all of your replies.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...