Find the answer to your Linux question:
Results 1 to 10 of 10
Hey people, I'm doing my own OS, and I'm trying to figure out what the quickest memcpy is. I've written a small test project you can run to help me ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Linux Enthusiast Bemk's Avatar
    Join Date
    Sep 2008
    Location
    Oosterhout-NB, Netherlands
    Posts
    525

    Quik test


    Hey people,

    I'm doing my own OS, and I'm trying to figure out what the quickest memcpy is.

    I've written a small test project you can run to help me finding out.

    The instructions are quite simple.

    1) get the test from here: https://github.com/bemk/memcpytest
    2) run make and post a reply with the exact output
    3) (if you are running 64-bits), run make CFLAGS=-m32 and post a reply with the exact output.

    The output you should see is like the bit below. It's basically the time it takes to run the newly compiled binary, which itself is a test for one of the memcpy functions.

    No. 5 is from the native library. Don't be surprised if that's the fastest one.
    Code:
    time -p ./1
    real 0.24
    user 0.24
    sys 0.00
    time -p ./2
    real 0.96
    user 0.95
    sys 0.00
    time -p ./3
    real 0.21
    user 0.20
    sys 0.00
    time -p ./4
    real 0.13
    user 0.12
    sys 0.00
    time -p ./5
    real 0.05
    user 0.04
    sys 0.00
    Full time computer science student, spare time OS developer.
    @bemk92 on twitter.

  2. #2
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    crappy pc #1:
    Code:
    #
    # uname -r
    2.6.30
    #
    # cat /etc/redhat-release
    Red Hat Enterprise Linux WS release 4 (Nahant Update 3)
    #
    # cat /proc/cpuinfo 
    processor       : 0
    vendor_id       : AuthenticAMD
    cpu family      : 6
    model           : 6
    model name      : AMD Athlon(tm) Proswssor
    stepping        : 2
    cpu MHz         : 1462.866
    cache size      : 256 KB
    fdiv_bug        : no
    hlt_bug         : no
    f00f_bug        : no
    coma_bug        : no
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 1
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow
    bogomips        : 2925.73
    clflush size    : 32
    power management: ts
    #
    # gcc --version
    gcc (GCC) 4.0.2 20051130 (Red Hat 4.0.2-14.EL4)
    Copyright (C) 2005 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    #
    # make
    gcc  -o 1 memcpy1.c
    In file included from memcpy1.c:4:
    vars.h:2:22: warning: no newline at end of file
    memcpy1.c:41:2: warning: no newline at end of file
    time -p ./1
    real 0.98
    user 0.98
    sys 0.00
    gcc  -o 2 memcpy2.c
    In file included from memcpy2.c:4:
    vars.h:2:22: warning: no newline at end of file
    memcpy2.c:29:2: warning: no newline at end of file
    time -p ./2
    real 2.30
    user 2.30
    sys 0.00
    gcc  -o 3 memcpy3.c
    In file included from memcpy3.c:4:
    vars.h:2:22: warning: no newline at end of file
    memcpy3.c:27:2: warning: no newline at end of file
    time -p ./3
    real 2.71
    user 2.71
    sys 0.00
    gcc  -o 4 memcpy4.c
    In file included from memcpy4.c:4:
    vars.h:2:22: warning: no newline at end of file
    memcpy4.c:38:2: warning: no newline at end of file
    time -p ./4
    real 2.35
    user 2.35
    sys 0.00
    gcc  -o 5 memcpy5.c
    In file included from memcpy5.c:5:
    vars.h:2:22: warning: no newline at end of file
    memcpy5.c:19:2: warning: no newline at end of file
    time -p ./5
    real 0.83
    user 0.83
    sys 0.00
    slightly less crappy pc #2:
    Code:
    #
    # uname -r
    2.6.38.6-26.rc1.fc15.i686.PAE
    #
    # cat /etc/redhat-release
    Fedora release 15 (Lovelock)
    #
    # cat /proc/cpuinfo
    processor       : 0 (1 of 4)
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 28
    model name      : Intel(R) Atom(TM) CPU D510   @ 1.66GHz
    stepping        : 10
    cpu MHz         : 1662.555
    cache size      : 512 KB
    physical id     : 0
    siblings        : 4
    core id         : 0
    cpu cores       : 2
    apicid          : 0
    initial apicid  : 0
    fdiv_bug        : no
    hlt_bug         : no
    f00f_bug        : no
    coma_bug        : no
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 10
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dts
    bogomips        : 3325.11
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 36 bits physical, 48 bits virtual
    power management:
    #
    # gcc --version
    gcc (GCC) 4.6.0 20110603 (Red Hat 4.6.0-10)
    Copyright (C) 2011 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    #
    # make
    time -p ./1
    real 0.56
    user 0.56
    sys 0.00
    gcc  -o 2 memcpy2.c
    time -p ./2
    real 2.02
    user 2.01
    sys 0.00
    gcc  -o 3 memcpy3.c
    time -p ./3
    real 1.80
    user 1.79
    sys 0.00
    gcc  -o 4 memcpy4.c
    time -p ./4
    real 1.80
    user 1.79
    sys 0.00
    gcc  -o 5 memcpy5.c
    time -p ./5
    real 0.09
    user 0.08
    sys 0.00

  3. #3
    Linux Enthusiast Bemk's Avatar
    Join Date
    Sep 2008
    Location
    Oosterhout-NB, Netherlands
    Posts
    525
    Thanks, this is actually a piece of useful result.

    First of all, I noticed your compiler was complaining about newlines (fixed that in a new commit), and secondly, your results are quite different from mine. Already my reason for asking here seems to be valid.

    As for the CPU specifications, I'm not really that interested. All I want to know is what your architecture is, 64-bits or 32-bits.

    The rest can remain private.
    Full time computer science student, spare time OS developer.
    @bemk92 on twitter.

  4. #4
    Administrator jayd512's Avatar
    Join Date
    Feb 2008
    Location
    Kentucky
    Posts
    5,023
    32-bit Slackware
    Code:
    gcc  -o 1 memcpy1.c
    time -p ./1
    real 0.40
    user 0.40
    sys 0.00
    gcc  -o 2 memcpy2.c
    time -p ./2
    real 1.24
    user 1.22
    sys 0.00
    gcc  -o 3 memcpy3.c
    time -p ./3
    real 0.27
    user 0.26
    sys 0.00
    gcc  -o 4 memcpy4.c
    time -p ./4
    real 0.18
    user 0.17
    sys 0.00
    gcc  -o 5 memcpy5.c
    time -p ./5
    real 0.11
    user 0.10
    sys 0.00
    Jay

    New users, read this first.
    New Member FAQ
    Registered Linux User #463940
    I do not respond to private messages asking for Linux help. Please keep it on the public boards.

  5. #5
    Linux Enthusiast Bemk's Avatar
    Join Date
    Sep 2008
    Location
    Oosterhout-NB, Netherlands
    Posts
    525
    Thx, but I was more talking about the CPU bus width rather than what the OS supports, since it's not so much the OS that defines the impact on relative times. The CPU bus is the bottle neck here. The cache could exert some influence as well.
    Full time computer science student, spare time OS developer.
    @bemk92 on twitter.

  6. #6
    Administrator jayd512's Avatar
    Join Date
    Feb 2008
    Location
    Kentucky
    Posts
    5,023
    In that case, I'll just give you the whole spread
    Code:
    cat /proc/cpuinfo
    processor	: 0
    vendor_id	: GenuineIntel
    cpu family	: 6
    model		: 15
    model name	: Intel(R) Pentium(R) Dual  CPU  T3400  @ 2.16GHz
    stepping	: 13
    cpu MHz		: 1000.000
    cache size	: 1024 KB
    physical id	: 0
    siblings	: 2
    core id		: 0
    cpu cores	: 2
    apicid		: 0
    initial apicid	: 0
    fdiv_bug	: no
    hlt_bug		: no
    f00f_bug	: no
    coma_bug	: no
    fpu		: yes
    fpu_exception	: yes
    cpuid level	: 10
    wp		: yes
    flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts
    bogomips	: 4323.13
    clflush size	: 64
    cache_alignment	: 64
    address sizes	: 36 bits physical, 48 bits virtual
    
    
    processor	: 1
    vendor_id	: GenuineIntel
    cpu family	: 6
    model		: 15
    model name	: Intel(R) Pentium(R) Dual  CPU  T3400  @ 2.16GHz
    stepping	: 13
    cpu MHz		: 2166.000
    cache size	: 1024 KB
    physical id	: 0
    siblings	: 2
    core id		: 1
    cpu cores	: 2
    apicid		: 1
    initial apicid	: 1
    fdiv_bug	: no
    hlt_bug		: no
    f00f_bug	: no
    coma_bug	: no
    fpu		: yes
    fpu_exception	: yes
    cpuid level	: 10
    wp		: yes
    flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts
    bogomips	: 4322.28
    clflush size	: 64
    cache_alignment	: 64
    address sizes	: 36 bits physical, 48 bits virtual
    Jay

    New users, read this first.
    New Member FAQ
    Registered Linux User #463940
    I do not respond to private messages asking for Linux help. Please keep it on the public boards.

  7. #7
    Linux Enthusiast Bemk's Avatar
    Join Date
    Sep 2008
    Location
    Oosterhout-NB, Netherlands
    Posts
    525
    Thx, this makes things a lot more logical
    Full time computer science student, spare time OS developer.
    @bemk92 on twitter.

  8. #8
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    Quote Originally Posted by Bemk View Post
    As for the CPU specifications, I'm not really that interested. All I want to know is what your architecture is, 64-bits or 32-bits.
    That's why I showed /proc/cpuinfo, explained here.. Or do you mean kernel architecture? They're both 32-bit, though you could probably have inferred that.

  9. #9
    Linux Enthusiast Bemk's Avatar
    Join Date
    Sep 2008
    Location
    Oosterhout-NB, Netherlands
    Posts
    525
    I basically meant, that only bus size/register size is enough
    Full time computer science student, spare time OS developer.
    @bemk92 on twitter.

  10. #10
    Penguin of trust elija's Avatar
    Join Date
    Jul 2004
    Location
    Either at home or at work or down the pub
    Posts
    3,601
    CPU: 64 bit running 64bit Debian
    Code:
    cat /proc/cpuinfo
    processor	: 0
    vendor_id	: GenuineIntel
    cpu family	: 6
    model		: 37
    model name	: Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
    stepping	: 5
    cpu MHz		: 1200.000
    cache size	: 4096 KB
    physical id	: 0
    siblings	: 4
    core id		: 0
    cpu cores	: 2
    apicid		: 0
    initial apicid	: 0
    fpu		: yes
    fpu_exception	: yes
    cpuid level	: 11
    wp		: yes
    flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid
    bogomips	: 6385.07
    clflush size	: 64
    cache_alignment	: 64
    address sizes	: 36 bits physical, 48 bits virtual
    power management:
    
    processor	: 1
    vendor_id	: GenuineIntel
    cpu family	: 6
    model		: 37
    model name	: Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
    stepping	: 5
    cpu MHz		: 1200.000
    cache size	: 4096 KB
    physical id	: 0
    siblings	: 4
    core id		: 2
    cpu cores	: 2
    apicid		: 4
    initial apicid	: 4
    fpu		: yes
    fpu_exception	: yes
    cpuid level	: 11
    wp		: yes
    flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid
    bogomips	: 6384.25
    clflush size	: 64
    cache_alignment	: 64
    address sizes	: 36 bits physical, 48 bits virtual
    power management:
    
    processor	: 2
    vendor_id	: GenuineIntel
    cpu family	: 6
    model		: 37
    model name	: Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
    stepping	: 5
    cpu MHz		: 1200.000
    cache size	: 4096 KB
    physical id	: 0
    siblings	: 4
    core id		: 0
    cpu cores	: 2
    apicid		: 1
    initial apicid	: 1
    fpu		: yes
    fpu_exception	: yes
    cpuid level	: 11
    wp		: yes
    flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid
    bogomips	: 6384.24
    clflush size	: 64
    cache_alignment	: 64
    address sizes	: 36 bits physical, 48 bits virtual
    power management:
    
    processor	: 3
    vendor_id	: GenuineIntel
    cpu family	: 6
    model		: 37
    model name	: Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
    stepping	: 5
    cpu MHz		: 1200.000
    cache size	: 4096 KB
    physical id	: 0
    siblings	: 4
    core id		: 2
    cpu cores	: 2
    apicid		: 5
    initial apicid	: 5
    fpu		: yes
    fpu_exception	: yes
    cpuid level	: 11
    wp		: yes
    flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid
    bogomips	: 6384.23
    clflush size	: 64
    cache_alignment	: 64
    address sizes	: 36 bits physical, 48 bits virtual
    power management:
    Results
    Code:
    make CFLAGS=-m32
    time -p ./1
    real 0.16
    user 0.17
    sys 0.00
    time -p ./2
    real 0.83
    user 0.83
    sys 0.00
    time -p ./3
    real 0.23
    user 0.23
    sys 0.00
    time -p ./4
    real 0.10
    user 0.10
    sys 0.00
    time -p ./5
    real 0.03
    user 0.03
    sys 0.00
    "I used to be with it, then they changed what it was.
    Now what was it isn't it, and what is it is weird and scary to me.
    It'll happen to you too."

    Grandpa Simpson



    The Fifth Continent

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •