|
Re: RDTSC performance on different x86 archs
Yes, similar results. 6 ticks on athlon64, 11 on athlon, around 80 on
intel p4 and 30 ticks on intel p3. Looks ok. Since rdtsc cannot be used to
time very short instruction sequencies anyway, since it isn't serializing,
this does not matter much anyway.
Daniel
On Tue, 14 Aug 2007 04:36:10 +0200, <meta.x.gdb@gmail.com> wrote:
>
>
> not sure where to go with this question. I saw you listing in
> comp.sys.intel
>
> I'm having strange behavior with the RDTSC instruction
>
> On Jul 10 1994, 12:19 am, g...@ichips.intel.com (Andy Glew) wrote:
>>
>>
>> (3) RDMSR(MSR=10h) versus RDTSC: yes, indeed, MSR=10h is the TimeStamp
>> Counter (TSC). However, accessing this via RDMSR and WRMSR is *not*
>> portable.
>> RDTSC is the *portable*, architectural, way of accessing the
>> timestamp counter. It's faster, and it has certain other conveniences..
>> Please avoid using RDMSR(MSR=10).
>> There is no portable way of writing the TSC. WRMSR(MSR=10h) works
>> to a degree, but is non-portable. Moreover, arbitrary writeability is
>> *not* guaranteed - it may not be possible to write any arbitrary bit
>> pattern to the counter.
>>
>
>
> OK, I am having some unusual results from rdtsc
>
> I have a small C program with some inline assembly (in gnu style)
>
> #include <stdio.h>
>
> int main(void)
> {
> unsigned long long int t0, t1;
> int result;
> unsigned int ret0[2];
> unsigned int ret1[2];
> __asm__ __volatile__("rdtsc" : "=a"(ret0[0]), "=d"(ret0[1]));
> __asm__ ("xorl %ecx, %ecx \n\t"
> "L1: \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "addl $16, %ecx \n\t"
> "cmpl $8192, %ecx \n\t"
> "jne L1");
> __asm__ __volatile__("rdtsc" : "=a"(ret1[0]), "=d"(ret1[1]));
> t0 = *(unsigned long long int*)ret0;
> t1 = *(unsigned long long int*)ret1;
> result = (t1-t0)/8192;
> printf("ticks per rdtsc %d \n",result);
> return result;
> }
>
> This compiles and runs fine with both Intel and GNU compilers 3.3,
> 4.0, etc.
>
> when I compile this and execute under Cygwin (running on Windows XP)
> and an AMD 4200+ I get
> ./a.exe
> ticks per rdtsc 6
>
> which isn't 1 or 2, but I can live with 6 clock ticks to process a
> seldom called op.
>
> if I compile and run this under Mac OS X (new Apple MacBookPro) Intel
> Core 2 I get 65 ?!?!
>
> if I compile and run this on Suse Linux on a Xeon processor, I get
> 85 ?!?! (Intel or GNU compiler)
>
> I'm not even putting in serializing. does that look right to
> anyone ? Can anyone verify they get the same results on their x86
> machines ?
>
> Brian VS
>
|