Results 1 to 9 of 9
Hi all,
I'm running a quite complex model, and it's taking me around a week to complete the run, and one of my colleagues says that he can run it ...
- 05-04-2011 #1Just Joined!
- Join Date
- Mar 2011
- Posts
- 19
Allocation of memory usage
Hi all,
I'm running a quite complex model, and it's taking me around a week to complete the run, and one of my colleagues says that he can run it in one-two days. After looking to several forums to know how does it work, I've seen how the memory is used (using the command ps -A --sort -rss -o comm,pme), and the model run just takes 0,3% of the memory.
I bought a 4 cores and 8gb of RAM laptop to be able to run the model fast, but I don't know how to allocate the use of the memory to concentrate it in developing that task. Is that possible?
Thanks in advance and sorry for my english..
- 05-05-2011 #2Just Joined!
- Join Date
- Mar 2011
- Posts
- 19
I used the renice command to see if I could make the model run faster, but even prioritising it to -15 it's still taking a lot of time to run. Any ideas will be more than appreciated
Thanks in advance
- 05-05-2011 #3Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
If the model is single threaded, then more cores are not going to help, and laptop CPU's are quite a bit slower than desktop models. There are other system architectural issues as well, which depend upon the make/model of CPU that you have, and the memory bus / support chip set that is being used. FWIW, what is the software that you are using? What kind of modeling are you doing? Is this using Monte Carlo routines to model/simulate a physical system?
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 05-05-2011 #4Just Joined!
- Join Date
- Mar 2011
- Posts
- 19
Thanks for answering, Rubbberman
The model is not based in the Monte Carlo method. I takes climatic variables (over 10 different variables) and calculates the behaviour of carbon fluxes from different sources. It's a global model, so it has to repeat the calculations for more than 65000 grid cells. Also, it needs a spin up of 5000 years before the run itself, which is the process which takes most of the time. The model is written in two different languages, fortran and c++, which were compiled together by using a makefile. Now I'm making different runs using the executables that resulted from the compilation.
I don't know if I answered your questions...Anyway, in case it helps, this is what I got when I execute the free command:
[ascotilla@lb-ascotilla-lap ~]$ free -m
total used free shared buffers cached
Mem: 7859 7792 66 0 84 6484
-/+ buffers/cache: 1223 6636
Swap: 4095 0 4095
Thank you
- 05-05-2011 #5Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
The output of the free command, is that when you are running your modeling tool? It doesn't indicate that you are hitting the swapper, but if this is before you start the run, then it might then, which would degrade performance significantly. Also, how you build your application will impact performance. There are various optimization settings for the compilers that can speed things up significantly, such as by performing loop unrolling, CPU cache optimization, CPU-specific instructions, etc. I assume you are doing a lot of floating point math, so using hardware fp acceleration is critical.
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 05-05-2011 #6Just Joined!
- Join Date
- Mar 2011
- Posts
- 19
I was running the model when I executed the free command.
For the compilation I just executed the makefile, so I'll need to have a look at those options (thanks for the hint!).
The program has over 13000 lines in fortran and around 4000 in c++, so it does a lot of everything..;P
What do you mean with fp acceleration?
Thank you for your quick answers!
- 05-05-2011 #7Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
You can see what optimizers are available for your g++ (c++) compiler with the --help=optimizers option. On my SL6 (RHEL6) system, running g++ 4.4.4, here is the list of optimization options and what they do - it's a long list!

Code:g++ --help=optimizers The following options control optimizations: -O<number> Set optimization level to <number> -Os Optimize for space rather than speed -falign-functions Align the start of functions -falign-jumps Align labels which are only reached by jumping -falign-labels Align all labels -falign-loops Align the start of loops -fargument-alias Specify that arguments may alias each other and globals -fargument-noalias Assume arguments may alias globals but not each other -fargument-noalias-anything Assume arguments alias no other storage -fargument-noalias-global Assume arguments alias neither each other nor globals -fasynchronous-unwind-tables Generate unwind tables that are exact at each instruction boundary -fbranch-count-reg Replace add, compare, branch with branch on count register -fbranch-probabilities Use profiling information for branch probabilities -fbranch-target-load-optimize Perform branch target load optimization before prologue / epilogue threading -fbranch-target-load-optimize2 Perform branch target load optimization after prologue / epilogue threading -fbtr-bb-exclusive Restrict target load migration not to re-use registers in any basic block -fcaller-saves Save registers around function calls -fcommon Do not put uninitialized globals in the common section -fconserve-stack Do not perform optimizations increasing noticeably stack usage -fcprop-registers Perform a register copy-propagation optimization pass -fcrossjumping Perform cross-jumping optimization -fcse-follow-jumps When running CSE, follow jumps to their targets -fcse-skip-blocks When running CSE, follow conditional jumps -fcx-fortran-rules Complex multiplication and division follow Fortran rules -fcx-limited-range Omit range reduction step when performing complex division -fdata-sections Place data items into their own section -fdce Use the RTL dead code elimination pass -fdefer-pop Defer popping functions args from stack until later -fdelayed-branch Attempt to fill delay slots of branch instructions -fdelete-null-pointer-checks Delete useless null pointer checks -fdse Use the RTL dead store elimination pass -fearly-inlining Perform early inlining -fexceptions Enable exception handling -fexpensive-optimizations Perform a number of minor, expensive optimizations -ffinite-math-only Assume no NaNs or infinities are generated -ffloat-store Don't allocate floats and doubles in extended- precision registers -fforward-propagate Perform a forward propagation pass on RTL -fgcse Perform global common subexpression elimination -fgcse-after-reload Perform global common subexpression elimination after register allocation has finished -fgcse-las Perform redundant load after store elimination in global common subexpression elimination -fgcse-lm Perform enhanced load motion during global common subexpression elimination -fgcse-sm Perform store motion after global common subexpression elimination -fgraphite-identity Enable Graphite Identity transformation -fguess-branch-probability Enable guessing of branch probabilities -fhandle-exceptions -fif-conversion Perform conversion of conditional jumps to branchless equivalents -fif-conversion2 Perform conversion of conditional jumps to conditional execution -finline-functions Integrate simple functions into their callers -finline-functions-called-once Integrate functions called once into their callers -finline-small-functions Integrate simple functions into their callers when code size is known to not growth -fipa-cp Perform Interprocedural constant propagation -fipa-cp-clone Perform cloning to make Interprocedural constant propagation stronger -fipa-matrix-reorg Perform matrix layout flattening and transposing based on profiling information. -fipa-pta Perform interprocedural points-to analysis -fipa-pure-const Discover pure and const functions -fipa-reference Discover readonly and non addressable static variables -fipa-type-escape Type based escape and alias analysis -fivopts Optimize induction variables on trees -fjump-tables Use jump tables for sufficiently large switch statements -floop-block Enable Loop Blocking transformation -floop-interchange Enable Loop Interchange transformation -floop-strip-mine Enable Loop Strip Mining transformation -fmath-errno Set errno after built-in math functions -fmerge-all-constants Attempt to merge identical constants and constant variables -fmerge-constants Attempt to merge identical constants across compilation units -fmodulo-sched Perform SMS based modulo scheduling before the first scheduling pass -fmove-loop-invariants Move loop invariant computations out of loops -fnon-call-exceptions Support synchronous non-call exceptions -fomit-frame-pointer When possible do not generate stack frames -foptimize-register-move Do the full register move optimization pass -foptimize-sibling-calls Optimize sibling and tail recursive calls -fpack-struct Pack structure members together without holes -fpack-struct=<number> Set initial maximum structure member alignment -fpeel-loops Perform loop peeling -fpeephole Enable machine specific peephole optimizations -fpeephole2 Enable an RTL peephole pass before sched2 -fpredictive-commoning Run predictive commoning optimization. -fprefetch-loop-arrays Generate prefetch instructions, if available, for arrays in loops -freg-struct-return Return small aggregates in registers -fregmove Enables a register move optimization -frename-registers Perform a register renaming optimization pass -freorder-blocks Reorder basic blocks to improve code placement -freorder-blocks-and-partition Reorder basic blocks and partition into hot and cold sections -freorder-functions Reorder functions to improve code placement -frerun-cse-after-loop Add a common subexpression elimination pass after loop optimizations -freschedule-modulo-scheduled-loops Enable/Disable the traditional scheduling in loops that already passed modulo scheduling -frounding-math Disable optimizations that assume default FP rounding behavior -frtl-abstract-sequences Perform sequence abstraction optimization on RTL -frtti Generate run time type descriptor information -fsched-interblock Enable scheduling across basic blocks -fsched-spec Allow speculative motion of non-loads -fsched-spec-load Allow speculative motion of some loads -fsched-spec-load-dangerous Allow speculative motion of more loads -fsched-stalled-insns Allow premature scheduling of queued insns -fsched-stalled-insns-dep Set dependence distance checking in premature scheduling of queued insns -fsched2-use-superblocks If scheduling post reload, do superblock scheduling -fsched2-use-traces If scheduling post reload, do trace scheduling -fschedule-insns Reschedule instructions before register allocation -fschedule-insns2 Reschedule instructions after register allocation -fsection-anchors Access data in the same section from shared anchor points -fsel-sched-pipelining Perform software pipelining of inner loops during selective scheduling -fsel-sched-pipelining-outer-loops Perform software pipelining of outer loops during selective scheduling -fsel-sched-reschedule-pipelined Reschedule pipelined regions without pipelining -fselective-scheduling Schedule instructions using selective scheduling algorithm -fselective-scheduling2 Run selective scheduling after reload -fshort-double Use the same size for double as for float -fshort-enums Use the narrowest integer type possible for enumeration types -fshort-wchar Force the underlying type for "wchar_t" to be "unsigned short" -fsignaling-nans Disable optimizations observable by IEEE signaling NaNs -fsigned-zeros Disable floating point optimizations that ignore the IEEE signedness of zero -fsingle-precision-constant Convert floating point constants to single precision constants -fsplit-ivs-in-unroller Split lifetimes of induction variables when loops are unrolled -fsplit-wide-types Split wide types into independent registers -fstrict-aliasing Assume strict aliasing rules apply -fthread-jumps Perform jump threading optimizations -fno-threadsafe-statics Do not generate thread-safe code for initializing local statics -ftoplevel-reorder Reorder top level functions, variables, and asms -ftrapping-math Assume floating-point operations can trap -ftrapv Trap for signed overflow in addition, subtraction and multiplication -ftree-builtin-call-dce Enable conditional dead code elimination for builtin calls -ftree-ccp Enable SSA-CCP optimization on trees -ftree-ch Enable loop header copying on trees -ftree-coalesce-inlined-vars Permit SSA coalescing of inlined variables only -ftree-coalesce-vars Permit SSA coalescing of all variables -ftree-copy-prop Enable copy propagation on trees -ftree-copyrename Replace SSA temporaries with better names in copies -ftree-cselim Transform condition stores into unconditional ones -ftree-dce Enable SSA dead code elimination optimization on trees -ftree-dominator-opts Enable dominator optimizations -ftree-dse Enable dead store elimination -ftree-fre Enable Full Redundancy Elimination (FRE) on trees -ftree-loop-distribution Enable loop distribution on trees -ftree-loop-im Enable loop invariant motion on trees -ftree-loop-ivcanon Create canonical induction variables in loops -ftree-loop-linear Enable linear loop transforms on trees -ftree-loop-optimize Enable loop optimizations on tree level -ftree-lrs Perform live range splitting during the SSA- >normal pass -ftree-pre Enable SSA-PRE optimization on trees -ftree-reassoc Enable reassociation on tree level -ftree-scev-cprop Enable copy propagation of scalar-evolution information. -ftree-sink Enable SSA code sinking on trees -ftree-sra Perform scalar replacement of aggregates -ftree-switch-conversion Perform conversions of switch initializations. -ftree-ter Replace temporary expressions in the SSA->normal pass -ftree-vect-loop-version Enable loop versioning when doing loop vectorization on trees -ftree-vectorize Enable loop vectorization on trees -ftree-vrp Perform Value Range Propagation on trees -funit-at-a-time Compile whole compilation unit at a time -funroll-all-loops Perform loop unrolling for all loops -funroll-loops Perform loop unrolling when iteration count is known -funsafe-loop-optimizations Allow loop optimizations to assume that the loops behave in normal way -funsafe-math-optimizations Allow math optimizations that may violate IEEE or ISO standards -funswitch-loops Perform loop unswitching -funwind-tables Just generate unwind tables for exception handling -fvar-tracking Perform variable tracking -fvar-tracking-assignments Perform variable tracking by annotating assignments -fvar-tracking-assignments-toggle Toggle -fvar-tracking-assignments -fvar-tracking-uninit Perform variable tracking and also tag variables that are uninitialized -fvariable-expansion-in-unroller Apply variable expansion when loops are unrolled -fvect-cost-model Enable use of cost model in vectorization -fvpt Use expression value profiles in optimizations -fweb Construct webs and split unrelated uses of single variable -fwhole-program Perform whole program optimizations -fwrapv Assume signed arithmetic overflow wraps aroundSometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 05-05-2011 #8Just Joined!
- Join Date
- Mar 2011
- Posts
- 19
They're a lot indeed!
I'm having some problems to run it, so I'll try tomorrow again (now I'm too tired...)
Thank you!
- 05-06-2011 #9Just Joined!
- Join Date
- Mar 2011
- Posts
- 19
Hi again,
It seems that optimizing options were considered in the makefile already. I'll probably apply the "semper gumbi" statement to myself and wait until the model finishes...Maybe it's just that the other computer where the model is being run is better than mine (it has 8 cores, so it can be a good explanation...
Thanks a lot anyway, it has clarified a lot of thing for me, as a newbie
Cheers


Reply With Quote