Find the answer to your Linux question:
Results 1 to 9 of 9
Hi all, I'm running a quite complex model, and it's taking me around a week to complete the run, and one of my colleagues says that he can run it ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Mar 2011
    Posts
    19

    Allocation of memory usage


    Hi all,

    I'm running a quite complex model, and it's taking me around a week to complete the run, and one of my colleagues says that he can run it in one-two days. After looking to several forums to know how does it work, I've seen how the memory is used (using the command ps -A --sort -rss -o comm,pme), and the model run just takes 0,3% of the memory.

    I bought a 4 cores and 8gb of RAM laptop to be able to run the model fast, but I don't know how to allocate the use of the memory to concentrate it in developing that task. Is that possible?

    Thanks in advance and sorry for my english..

  2. #2
    Just Joined!
    Join Date
    Mar 2011
    Posts
    19
    I used the renice command to see if I could make the model run faster, but even prioritising it to -15 it's still taking a lot of time to run. Any ideas will be more than appreciated

    Thanks in advance

  3. #3
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,558
    If the model is single threaded, then more cores are not going to help, and laptop CPU's are quite a bit slower than desktop models. There are other system architectural issues as well, which depend upon the make/model of CPU that you have, and the memory bus / support chip set that is being used. FWIW, what is the software that you are using? What kind of modeling are you doing? Is this using Monte Carlo routines to model/simulate a physical system?
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  4. #4
    Just Joined!
    Join Date
    Mar 2011
    Posts
    19
    Thanks for answering, Rubbberman

    The model is not based in the Monte Carlo method. I takes climatic variables (over 10 different variables) and calculates the behaviour of carbon fluxes from different sources. It's a global model, so it has to repeat the calculations for more than 65000 grid cells. Also, it needs a spin up of 5000 years before the run itself, which is the process which takes most of the time. The model is written in two different languages, fortran and c++, which were compiled together by using a makefile. Now I'm making different runs using the executables that resulted from the compilation.

    I don't know if I answered your questions...Anyway, in case it helps, this is what I got when I execute the free command:

    [ascotilla@lb-ascotilla-lap ~]$ free -m
    total used free shared buffers cached
    Mem: 7859 7792 66 0 84 6484
    -/+ buffers/cache: 1223 6636
    Swap: 4095 0 4095

    Thank you

  5. #5
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,558
    The output of the free command, is that when you are running your modeling tool? It doesn't indicate that you are hitting the swapper, but if this is before you start the run, then it might then, which would degrade performance significantly. Also, how you build your application will impact performance. There are various optimization settings for the compilers that can speed things up significantly, such as by performing loop unrolling, CPU cache optimization, CPU-specific instructions, etc. I assume you are doing a lot of floating point math, so using hardware fp acceleration is critical.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  6. #6
    Just Joined!
    Join Date
    Mar 2011
    Posts
    19
    I was running the model when I executed the free command.

    For the compilation I just executed the makefile, so I'll need to have a look at those options (thanks for the hint!).

    The program has over 13000 lines in fortran and around 4000 in c++, so it does a lot of everything..;P

    What do you mean with fp acceleration?

    Thank you for your quick answers!

  7. #7
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,558
    You can see what optimizers are available for your g++ (c++) compiler with the --help=optimizers option. On my SL6 (RHEL6) system, running g++ 4.4.4, here is the list of optimization options and what they do - it's a long list!

    Code:
    g++ --help=optimizers
    The following options control optimizations:
      -O<number>                  Set optimization level to <number>
      -Os                         Optimize for space rather than speed
      -falign-functions           Align the start of functions
      -falign-jumps               Align labels which are only reached by jumping
      -falign-labels              Align all labels
      -falign-loops               Align the start of loops
      -fargument-alias            Specify that arguments may alias each other and
                                  globals
      -fargument-noalias          Assume arguments may alias globals but not each
                                  other
      -fargument-noalias-anything Assume arguments alias no other storage
      -fargument-noalias-global   Assume arguments alias neither each other nor
                                  globals
      -fasynchronous-unwind-tables Generate unwind tables that are exact at each
                                  instruction boundary
      -fbranch-count-reg          Replace add, compare, branch with branch on count
                                  register
      -fbranch-probabilities      Use profiling information for branch probabilities
      -fbranch-target-load-optimize Perform branch target load optimization before
                                  prologue / epilogue threading
      -fbranch-target-load-optimize2 Perform branch target load optimization after
                                  prologue / epilogue threading
      -fbtr-bb-exclusive          Restrict target load migration not to re-use
                                  registers in any basic block
      -fcaller-saves              Save registers around function calls
      -fcommon                    Do not put uninitialized globals in the common
                                  section
      -fconserve-stack            Do not perform optimizations increasing
                                  noticeably stack usage
      -fcprop-registers           Perform a register copy-propagation optimization
                                  pass
      -fcrossjumping              Perform cross-jumping optimization
      -fcse-follow-jumps          When running CSE, follow jumps to their targets
      -fcse-skip-blocks           When running CSE, follow conditional jumps
      -fcx-fortran-rules          Complex multiplication and division follow
                                  Fortran rules
      -fcx-limited-range          Omit range reduction step when performing complex
                                  division
      -fdata-sections             Place data items into their own section
      -fdce                       Use the RTL dead code elimination pass
      -fdefer-pop                 Defer popping functions args from stack until
                                  later
      -fdelayed-branch            Attempt to fill delay slots of branch instructions
      -fdelete-null-pointer-checks Delete useless null pointer checks
      -fdse                       Use the RTL dead store elimination pass
      -fearly-inlining            Perform early inlining
      -fexceptions                Enable exception handling
      -fexpensive-optimizations   Perform a number of minor, expensive optimizations
      -ffinite-math-only          Assume no NaNs or infinities are generated
      -ffloat-store               Don't allocate floats and doubles in extended-
                                  precision registers
      -fforward-propagate         Perform a forward propagation pass on RTL
      -fgcse                      Perform global common subexpression elimination
      -fgcse-after-reload         Perform global common subexpression elimination
                                  after register allocation has finished
      -fgcse-las                  Perform redundant load after store elimination in
                                  global common subexpression elimination
      -fgcse-lm                   Perform enhanced load motion during global common
                                  subexpression elimination
      -fgcse-sm                   Perform store motion after global common
                                  subexpression elimination
      -fgraphite-identity         Enable Graphite Identity transformation
      -fguess-branch-probability  Enable guessing of branch probabilities
      -fhandle-exceptions         
      -fif-conversion             Perform conversion of conditional jumps to
                                  branchless equivalents
      -fif-conversion2            Perform conversion of conditional jumps to
                                  conditional execution
      -finline-functions          Integrate simple functions into their callers
      -finline-functions-called-once Integrate functions called once into their
                                  callers
      -finline-small-functions    Integrate simple functions into their callers
                                  when code size is known to not growth
      -fipa-cp                    Perform Interprocedural constant propagation
      -fipa-cp-clone              Perform cloning to make Interprocedural constant
                                  propagation stronger
      -fipa-matrix-reorg          Perform matrix layout flattening and transposing
                                  based on profiling information.
      -fipa-pta                   Perform interprocedural points-to analysis
      -fipa-pure-const            Discover pure and const functions
      -fipa-reference             Discover readonly and non addressable static
                                  variables
      -fipa-type-escape           Type based escape and alias analysis
      -fivopts                    Optimize induction variables on trees
      -fjump-tables               Use jump tables for sufficiently large switch
                                  statements
      -floop-block                Enable Loop Blocking transformation
      -floop-interchange          Enable Loop Interchange transformation
      -floop-strip-mine           Enable Loop Strip Mining transformation
      -fmath-errno                Set errno after built-in math functions
      -fmerge-all-constants       Attempt to merge identical constants and constant
                                  variables
      -fmerge-constants           Attempt to merge identical constants across
                                  compilation units
      -fmodulo-sched              Perform SMS based modulo scheduling before the
                                  first scheduling pass
      -fmove-loop-invariants      Move loop invariant computations out of loops
      -fnon-call-exceptions       Support synchronous non-call exceptions
      -fomit-frame-pointer        When possible do not generate stack frames
      -foptimize-register-move    Do the full register move optimization pass
      -foptimize-sibling-calls    Optimize sibling and tail recursive calls
      -fpack-struct               Pack structure members together without holes
      -fpack-struct=<number>      Set initial maximum structure member alignment
      -fpeel-loops                Perform loop peeling
      -fpeephole                  Enable machine specific peephole optimizations
      -fpeephole2                 Enable an RTL peephole pass before sched2
      -fpredictive-commoning      Run predictive commoning optimization.
      -fprefetch-loop-arrays      Generate prefetch instructions, if available, for
                                  arrays in loops
      -freg-struct-return         Return small aggregates in registers
      -fregmove                   Enables a register move optimization
      -frename-registers          Perform a register renaming optimization pass
      -freorder-blocks            Reorder basic blocks to improve code placement
      -freorder-blocks-and-partition Reorder basic blocks and partition into hot
                                  and cold sections
      -freorder-functions         Reorder functions to improve code placement
      -frerun-cse-after-loop      Add a common subexpression elimination pass after
                                  loop optimizations
      -freschedule-modulo-scheduled-loops Enable/Disable the traditional scheduling
                                  in loops that already passed modulo scheduling
      -frounding-math             Disable optimizations that assume default FP
                                  rounding behavior
      -frtl-abstract-sequences    Perform sequence abstraction optimization on RTL
      -frtti                      Generate run time type descriptor information
      -fsched-interblock          Enable scheduling across basic blocks
      -fsched-spec                Allow speculative motion of non-loads
      -fsched-spec-load           Allow speculative motion of some loads
      -fsched-spec-load-dangerous Allow speculative motion of more loads
      -fsched-stalled-insns       Allow premature scheduling of queued insns
      -fsched-stalled-insns-dep   Set dependence distance checking in premature
                                  scheduling of queued insns
      -fsched2-use-superblocks    If scheduling post reload, do superblock
                                  scheduling
      -fsched2-use-traces         If scheduling post reload, do trace scheduling
      -fschedule-insns            Reschedule instructions before register allocation
      -fschedule-insns2           Reschedule instructions after register allocation
      -fsection-anchors           Access data in the same section from shared
                                  anchor points
      -fsel-sched-pipelining      Perform software pipelining of inner loops during
                                  selective scheduling
      -fsel-sched-pipelining-outer-loops Perform software pipelining of outer loops
                                  during selective scheduling
      -fsel-sched-reschedule-pipelined Reschedule pipelined regions without
                                  pipelining
      -fselective-scheduling      Schedule instructions using selective scheduling
                                  algorithm
      -fselective-scheduling2     Run selective scheduling after reload
      -fshort-double              Use the same size for double as for float
      -fshort-enums               Use the narrowest integer type possible for
                                  enumeration types
      -fshort-wchar               Force the underlying type for "wchar_t" to be
                                  "unsigned short"
      -fsignaling-nans            Disable optimizations observable by IEEE
                                  signaling NaNs
      -fsigned-zeros              Disable floating point optimizations that ignore
                                  the IEEE signedness of zero
      -fsingle-precision-constant Convert floating point constants to single
                                  precision constants
      -fsplit-ivs-in-unroller     Split lifetimes of induction variables when loops
                                  are unrolled
      -fsplit-wide-types          Split wide types into independent registers
      -fstrict-aliasing           Assume strict aliasing rules apply
      -fthread-jumps              Perform jump threading optimizations
      -fno-threadsafe-statics     Do not generate thread-safe code for initializing
                                  local statics
      -ftoplevel-reorder          Reorder top level functions, variables, and asms
      -ftrapping-math             Assume floating-point operations can trap
      -ftrapv                     Trap for signed overflow in addition, subtraction
                                  and multiplication
      -ftree-builtin-call-dce     Enable conditional dead code elimination for
                                  builtin calls
      -ftree-ccp                  Enable SSA-CCP optimization on trees
      -ftree-ch                   Enable loop header copying on trees
      -ftree-coalesce-inlined-vars Permit SSA coalescing of inlined variables only
      -ftree-coalesce-vars        Permit SSA coalescing of all variables
      -ftree-copy-prop            Enable copy propagation on trees
      -ftree-copyrename           Replace SSA temporaries with better names in
                                  copies
      -ftree-cselim               Transform condition stores into unconditional ones
      -ftree-dce                  Enable SSA dead code elimination optimization on
                                  trees
      -ftree-dominator-opts       Enable dominator optimizations
      -ftree-dse                  Enable dead store elimination
      -ftree-fre                  Enable Full Redundancy Elimination (FRE) on trees
      -ftree-loop-distribution    Enable loop distribution on trees
      -ftree-loop-im              Enable loop invariant motion on trees
      -ftree-loop-ivcanon         Create canonical induction variables in loops
      -ftree-loop-linear          Enable linear loop transforms on trees
      -ftree-loop-optimize        Enable loop optimizations on tree level
      -ftree-lrs                  Perform live range splitting during the SSA-
                                  >normal pass
      -ftree-pre                  Enable SSA-PRE optimization on trees
      -ftree-reassoc              Enable reassociation on tree level
      -ftree-scev-cprop           Enable copy propagation of scalar-evolution
                                  information.
      -ftree-sink                 Enable SSA code sinking on trees
      -ftree-sra                  Perform scalar replacement of aggregates
      -ftree-switch-conversion    Perform conversions of switch initializations.
      -ftree-ter                  Replace temporary expressions in the SSA->normal
                                  pass
      -ftree-vect-loop-version    Enable loop versioning when doing loop
                                  vectorization on trees
      -ftree-vectorize            Enable loop vectorization on trees
      -ftree-vrp                  Perform Value Range Propagation on trees
      -funit-at-a-time            Compile whole compilation unit at a time
      -funroll-all-loops          Perform loop unrolling for all loops
      -funroll-loops              Perform loop unrolling when iteration count is
                                  known
      -funsafe-loop-optimizations Allow loop optimizations to assume that the loops
                                  behave in normal way
      -funsafe-math-optimizations Allow math optimizations that may violate IEEE or
                                  ISO standards
      -funswitch-loops            Perform loop unswitching
      -funwind-tables             Just generate unwind tables for exception handling
      -fvar-tracking              Perform variable tracking
      -fvar-tracking-assignments  Perform variable tracking by annotating
                                  assignments
      -fvar-tracking-assignments-toggle Toggle -fvar-tracking-assignments
      -fvar-tracking-uninit       Perform variable tracking and also tag variables
                                  that are uninitialized
      -fvariable-expansion-in-unroller Apply variable expansion when loops are
                                  unrolled
      -fvect-cost-model           Enable use of cost model in vectorization
      -fvpt                       Use expression value profiles in optimizations
      -fweb                       Construct webs and split unrelated uses of single
                                  variable
      -fwhole-program             Perform whole program optimizations
      -fwrapv                     Assume signed arithmetic overflow wraps around
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  8. #8
    Just Joined!
    Join Date
    Mar 2011
    Posts
    19
    They're a lot indeed!
    I'm having some problems to run it, so I'll try tomorrow again (now I'm too tired...)

    Thank you!

  9. #9
    Just Joined!
    Join Date
    Mar 2011
    Posts
    19
    Hi again,

    It seems that optimizing options were considered in the makefile already. I'll probably apply the "semper gumbi" statement to myself and wait until the model finishes...Maybe it's just that the other computer where the model is being run is better than mine (it has 8 cores, so it can be a good explanation...

    Thanks a lot anyway, it has clarified a lot of thing for me, as a newbie

    Cheers

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •