Rebuild Perl open source to speed up performance?
I was asked by my friend (about 40% CPU bandwidth on perl??), who ran an small ERP system on Intel Xeon processor.
That caused my "enthusiasm" to research on how to reduce CPU cycles for "perl" process in the system.
I have downloaded Perl v.5.8.8 (stable.tar.gz) from http://www.perl.com/download.csp and extract it onto my Red Hat Enterprise Linux AS release 4 (Nahant Update 2)/Intel Xeon MP machine.
I used ./Configure to specify Perl's environment configuration and gcc compiler used at first time to generate perl program and libraries. I used "./perl test.pl" to run test (test.pl has been attached in this thread), also I can use Intel VTune(TM) Performance Analyzer for Linux (from http://www3.intel.com/cd/software/pr.../eng/index.htm) to generate profiling data by using "vtl activity perl_gcc -d 60 -c sampling -app "./perl, test.pl" run"
Now I can get profiling data file (.tb5) from /root/VTune/Projects/Sampling
and import this file into VTune Ananlzyer GUI. See "perl" process below -
Event "Clockticks" = 8,359,449,00
Event "Instruction Retire" = 5,340,216,000
Clockticks per Instructions Retired (CPI) = 1.565
Total duartion = 5.81s
I found that CPU consumed of most functions are flat, like as Perl_runops_standard(), Perl_pp_gvsv(), Perl_leave_scope(), Perl_sv_upgrade(), etc. That means if you optmized one of function - overall performance will speed up slowly.
I heard that Intel C++ compiler has strong capability on optimizing C++ code, so I reconfigured perl environment and select Intel icc as compiler. However if I used icc compiler options, like as "-fast", "-ipo", "-parallel", "axP" to rebuild perl - the results are not satisfied. I inspected on perl's source code, they have limited data processing, huge macro-defined functions, huge branch in deep loop. I realized that I can use "-O2 -g -prof-gen" to generate first perl, then run perl several times, all feedback (profiling) data will be written into .dyn for each perl's source file. Second time, I use icc with options "-O2 -g -prof-use" to generat new perl.
I ran perl on test.pl again and got performance as below for "perl" process -
Event "Clockticks" = 7,063,479,000 (vs. gcc's 8,359,449,00)
Event "Instruction Retire" = 5,139,120,000 (vs. gcc's 5,340,216,000)
Clockticks per Instructions Retired (CPI) = 1.374 (vs. gcc's 1.565)
Total duartion = 5.53s (vs gcc's 5.81s)
It seems that I saved 0.28s (or 5% of all time) to run perl with test.pl.
So if you want to generate optimized perl for your system, you may choose Intel C++ compiler and use "prof-gen" to generate application which can collect performance data, then use "prof-use" to feed data to compiler to generate your final application. Optimize your code on instruction cache line NOW!!!
Intel VTune Analyzer can help you to know the benifit gain in detail.