View Single Post
  #1 (permalink)  
Old 08-14-2007, 03:43 AM
meta.x.gdb@gmail.com
 
Posts: n/a
RDTSC performance on different x86 archs



not sure where to go with this question. I saw you listing in
comp.sys.intel

I'm having strange behavior with the RDTSC instruction

On Jul 10 1994, 12:19 am, g...@ichips.intel.com (Andy Glew) wrote:
>
>
> (3) RDMSR(MSR=10h) versus RDTSC: yes, indeed, MSR=10h is the TimeStamp
> Counter (TSC). However, accessing this via RDMSR and WRMSR is *not*
> portable.
> RDTSC is the *portable*, architectural, way of accessing the
> timestamp counter. It's faster, and it has certain other conveniences.
> Please avoid using RDMSR(MSR=10).
> There is no portable way of writing the TSC. WRMSR(MSR=10h) works
> to a degree, but is non-portable. Moreover, arbitrary writeability is
> *not* guaranteed - it may not be possible to write any arbitrary bit
> pattern to the counter.
>



OK, I am having some unusual results from rdtsc

I have a small C program with some inline assembly (in gnu style)

#include <stdio.h>

int main(void)
{
unsigned long long int t0, t1;
int result;
unsigned int ret0[2];
unsigned int ret1[2];
__asm__ __volatile__("rdtsc" : "=a"(ret0[0]), "=d"(ret0[1]));
__asm__ ("xorl %ecx, %ecx \n\t"
"L1: \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"addl $16, %ecx \n\t"
"cmpl $8192, %ecx \n\t"
"jne L1");
__asm__ __volatile__("rdtsc" : "=a"(ret1[0]), "=d"(ret1[1]));
t0 = *(unsigned long long int*)ret0;
t1 = *(unsigned long long int*)ret1;
result = (t1-t0)/8192;
printf("ticks per rdtsc %d \n",result);
return result;
}

This compiles and runs fine with both Intel and GNU compilers 3.3,
4.0, etc.

when I compile this and execute under Cygwin (running on Windows XP)
and an AMD 4200+ I get
../a.exe
ticks per rdtsc 6

which isn't 1 or 2, but I can live with 6 clock ticks to process a
seldom called op.

if I compile and run this under Mac OS X (new Apple MacBookPro) Intel
Core 2 I get 65 ?!?!

if I compile and run this on Suse Linux on a Xeon processor, I get
85 ?!?! (Intel or GNU compiler)

I'm not even putting in serializing. does that look right to
anyone ? Can anyone verify they get the same results on their x86
machines ?

Brian VS

Reply With Quote