Results 1 to 10 of 10
Hey people,
I'm doing my own OS, and I'm trying to figure out what the quickest memcpy is.
I've written a small test project you can run to help me ...
- 11-27-2011 #1
Quik test
Hey people,
I'm doing my own OS, and I'm trying to figure out what the quickest memcpy is.
I've written a small test project you can run to help me finding out.
The instructions are quite simple.
1) get the test from here: https://github.com/bemk/memcpytest
2) run make and post a reply with the exact output
3) (if you are running 64-bits), run make CFLAGS=-m32 and post a reply with the exact output.
The output you should see is like the bit below. It's basically the time it takes to run the newly compiled binary, which itself is a test for one of the memcpy functions.
No. 5 is from the native library. Don't be surprised if that's the fastest one.
Code:time -p ./1 real 0.24 user 0.24 sys 0.00 time -p ./2 real 0.96 user 0.95 sys 0.00 time -p ./3 real 0.21 user 0.20 sys 0.00 time -p ./4 real 0.13 user 0.12 sys 0.00 time -p ./5 real 0.05 user 0.04 sys 0.00
- 11-28-2011 #2Linux Guru
- Join Date
- May 2011
- Posts
- 1,813
crappy pc #1:
slightly less crappy pc #2:Code:# # uname -r 2.6.30 # # cat /etc/redhat-release Red Hat Enterprise Linux WS release 4 (Nahant Update 3) # # cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 6 model name : AMD Athlon(tm) Proswssor stepping : 2 cpu MHz : 1462.866 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow bogomips : 2925.73 clflush size : 32 power management: ts # # gcc --version gcc (GCC) 4.0.2 20051130 (Red Hat 4.0.2-14.EL4) Copyright (C) 2005 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # # make gcc -o 1 memcpy1.c In file included from memcpy1.c:4: vars.h:2:22: warning: no newline at end of file memcpy1.c:41:2: warning: no newline at end of file time -p ./1 real 0.98 user 0.98 sys 0.00 gcc -o 2 memcpy2.c In file included from memcpy2.c:4: vars.h:2:22: warning: no newline at end of file memcpy2.c:29:2: warning: no newline at end of file time -p ./2 real 2.30 user 2.30 sys 0.00 gcc -o 3 memcpy3.c In file included from memcpy3.c:4: vars.h:2:22: warning: no newline at end of file memcpy3.c:27:2: warning: no newline at end of file time -p ./3 real 2.71 user 2.71 sys 0.00 gcc -o 4 memcpy4.c In file included from memcpy4.c:4: vars.h:2:22: warning: no newline at end of file memcpy4.c:38:2: warning: no newline at end of file time -p ./4 real 2.35 user 2.35 sys 0.00 gcc -o 5 memcpy5.c In file included from memcpy5.c:5: vars.h:2:22: warning: no newline at end of file memcpy5.c:19:2: warning: no newline at end of file time -p ./5 real 0.83 user 0.83 sys 0.00
Code:# # uname -r 2.6.38.6-26.rc1.fc15.i686.PAE # # cat /etc/redhat-release Fedora release 15 (Lovelock) # # cat /proc/cpuinfo processor : 0 (1 of 4) vendor_id : GenuineIntel cpu family : 6 model : 28 model name : Intel(R) Atom(TM) CPU D510 @ 1.66GHz stepping : 10 cpu MHz : 1662.555 cache size : 512 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dts bogomips : 3325.11 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: # # gcc --version gcc (GCC) 4.6.0 20110603 (Red Hat 4.6.0-10) Copyright (C) 2011 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # # make time -p ./1 real 0.56 user 0.56 sys 0.00 gcc -o 2 memcpy2.c time -p ./2 real 2.02 user 2.01 sys 0.00 gcc -o 3 memcpy3.c time -p ./3 real 1.80 user 1.79 sys 0.00 gcc -o 4 memcpy4.c time -p ./4 real 1.80 user 1.79 sys 0.00 gcc -o 5 memcpy5.c time -p ./5 real 0.09 user 0.08 sys 0.00
- 11-28-2011 #3
Thanks, this is actually a piece of useful result.
First of all, I noticed your compiler was complaining about newlines (fixed that in a new commit), and secondly, your results are quite different from mine. Already my reason for asking here seems to be valid.
As for the CPU specifications, I'm not really that interested. All I want to know is what your architecture is, 64-bits or 32-bits.
The rest can remain private.
- 11-28-2011 #4
32-bit Slackware
Code:gcc -o 1 memcpy1.c time -p ./1 real 0.40 user 0.40 sys 0.00 gcc -o 2 memcpy2.c time -p ./2 real 1.24 user 1.22 sys 0.00 gcc -o 3 memcpy3.c time -p ./3 real 0.27 user 0.26 sys 0.00 gcc -o 4 memcpy4.c time -p ./4 real 0.18 user 0.17 sys 0.00 gcc -o 5 memcpy5.c time -p ./5 real 0.11 user 0.10 sys 0.00
Jay
New users, read this first.
New Member FAQ
Registered Linux User #463940
I do not respond to Private Messages asking for Linux help. Please, keep it on the public boards.
- 11-28-2011 #5
Thx, but I was more talking about the CPU bus width rather than what the OS supports, since it's not so much the OS that defines the impact on relative times. The CPU bus is the bottle neck here. The cache could exert some influence as well.
- 11-28-2011 #6
In that case, I'll just give you the whole spread

Code:cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Pentium(R) Dual CPU T3400 @ 2.16GHz stepping : 13 cpu MHz : 1000.000 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts bogomips : 4323.13 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Pentium(R) Dual CPU T3400 @ 2.16GHz stepping : 13 cpu MHz : 2166.000 cache size : 1024 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts bogomips : 4322.28 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual
Jay
New users, read this first.
New Member FAQ
Registered Linux User #463940
I do not respond to Private Messages asking for Linux help. Please, keep it on the public boards.
- 11-28-2011 #7
Thx, this makes things a lot more logical
- 11-28-2011 #8Linux Guru
- Join Date
- May 2011
- Posts
- 1,813
That's why I showed /proc/cpuinfo, explained here.. Or do you mean kernel architecture? They're both 32-bit, though you could probably have inferred that.
- 11-28-2011 #9
I basically meant, that only bus size/register size is enough
- 11-28-2011 #10
CPU: 64 bit running 64bit Debian
ResultsCode:cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 37 model name : Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz stepping : 5 cpu MHz : 1200.000 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid bogomips : 6385.07 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 37 model name : Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz stepping : 5 cpu MHz : 1200.000 cache size : 4096 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 2 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid bogomips : 6384.25 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 37 model name : Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz stepping : 5 cpu MHz : 1200.000 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid bogomips : 6384.24 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 37 model name : Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz stepping : 5 cpu MHz : 1200.000 cache size : 4096 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 2 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid bogomips : 6384.23 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
Code:make CFLAGS=-m32 time -p ./1 real 0.16 user 0.17 sys 0.00 time -p ./2 real 0.83 user 0.83 sys 0.00 time -p ./3 real 0.23 user 0.23 sys 0.00 time -p ./4 real 0.10 user 0.10 sys 0.00 time -p ./5 real 0.03 user 0.03 sys 0.00
If we hit that bullseye, the rest of the dominoes will fall like a house of cards. Checkmate! (Zapp Brannigan)
My new blog. It's probably not as good as I think it is.


Reply With Quote

