Find the answer to your Linux question:
Results 1 to 8 of 8
I assume that linux loads shared libraries into a reserved location in 64-bit address space and then all processes use that library from the same location. When new versions of ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jun 2008
    Location
    North East U.S.
    Posts
    30

    Shared library question


    I assume that linux loads shared libraries into a reserved location in 64-bit address space and then all processes use that library from the same location. When new versions of shared libraries are used, they get loaded and both the old and the new versions of the libraries are taking up space within the special area used for holding shared libraries.

    On AIX there is a command, slibclean, that can be used to clear out the shared library load area of memory. This was more important to do every so often when we ran on 32-bit systems, but it can still be used on 64-bit systems. Is there anything equivalent on linux to clear out shared libraries from system memory? We can always just reboot the system, but I would rather not have to do that.

    The reason I'm asking is that we have recently noticed some odd performance aspects of programs we develop and these programs are all dynamically loaded shared libraries. We noticed when comparing a new version against the prior version that the first time the new version ran, it was slower than the old version. The second time it got faster (not too unusual). The third and fourth times it continued to get faster (in CPU not just elapsed time) and after that it stayed at the CPU time of the fourth run and was now 33% faster in CPU time that the first time it was run. We put the dynamic modules in a different location and ran again: the first time was slow again and it sped up on each successive run through the fourth time. These are CPU intensive runs that take over an hour of CPU time and the I/O is less than a minute.

    I began to wonder if the CPU time improvements could be somehow due to the CPU learning to run the programs better over time - like what branch prediction tables are intended to do. Could it be that because the application is in the shared library and so is always found at the same virtual address in every process that uses that module, that branch tables in the processor can track and improve performance across processes?

    These runs were made on dual core Opterons, so we tried running on Intel processors and found that there was no such performance change between the first and subsequent runs of the applications. Also no such improvements when run on AIX or Solaris. This appears to be something unique to AMD processors, but it is clearly keyed off use of shared libraries. It would be nice if there was a way to clear out the shared libraries to test some of this out without having to reboot the systems.

    We are running Linux x86_64/System Red Hat Enterprise Linux WS release 4 (Nahant Update 6) or (Nahant Update 4)

    Brion

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,598
    I cannot explicitly explain this behavior, but my suspicion is that it is something related to cache size, cache coherency, and whether it is in L1 or L2 cache (on-chip vs. off-chip). The more often it loads with the Opteron, the CPU might recognize that, and tend to load it directly to cache more so than just the first time it is loaded. Again, this is just a SWAG and may be partially (or wholely) incorrect. However, I do know that Intel and AMD chips are very different in cache behavior.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  3. #3
    Just Joined!
    Join Date
    Jun 2008
    Location
    North East U.S.
    Posts
    30
    Rubberman,

    It is definitely odd. If there was a means to flush out all shared library modules from system memory as the slibclean command does on AIX, it would be easier to test out what causes this behavior. It is interesting that when first run on the Opteron the program takes longer than when run on an Intel processor, but after it "learns" the module, it runs faster than on the Intel machine. Given that these machine were run with no other user processes and machine load of 1 with this process running, it is hard to see how cache could have much to do with the 30% faster execution. This process is covering data that is multiple GB in size (a chip logic model), so the data cannot possibly ever all fit in cache. I'm not aware of cache fill learning going on. The only learning I know of is for branch prediction, but even that seems far fetched to be the reason for this given that the process runs for more than one hour. Why does it take 4 runs to get to the optimal performance point and stay there?

    In general, it is very difficult to get consistent CPU times when there are multiple processes contending for the CPU. We have seen 50% differences caused just by having additional processes running. All the cache contention and re-loading counts against CPU and not just elapsed time. Seeing this on an unloaded processor is even more confusing. I'm sure there is a reasonable explanation for it, but we likely will never know it.

  4. #4
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,598
    These chips are so complex that the fact that they work at all (let alone so well) is a downright wonder, not to mention FM (farking magic)!
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  5. #5
    Just Joined!
    Join Date
    Jun 2008
    Posts
    34
    Brion,
    Have you solved the mystry? If not yet, I would suggest using oprofile to profile the execution of your programs. But is it feasible for you to re-compile your programs with -g? If not, then forget about my suggestion.
    By comparing the profile output of the first 4 runs, we might be able to get some hints.
    -Steve

  6. #6
    Just Joined!
    Join Date
    Jun 2008
    Location
    North East U.S.
    Posts
    30
    Steve,

    We'll try -g and see what happens. It may be a couple days before we get a chance to try it. Since optimized code runs an hour+, profiling may take a while to run, but I like the idea.

    Brion

  7. #7
    Just Joined!
    Join Date
    Jun 2008
    Posts
    34
    Hi, Brion, any luck with the profiling results?
    -Steve

  8. #8
    Just Joined!
    Join Date
    Jun 2008
    Location
    North East U.S.
    Posts
    30
    Steve,

    Sorry, we've been on a forced vacation thanks to the economy and the responsible person can't get back to this right now. I will post something as soon as I have it. If that developer can't get to it soon, I'll try it myself.

    Brion

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •