Web Hosting Forum - Hosting Reviews, Web Hosting Discussion ForumCalendarContact Us

Welcome to the Web Hosting Forum - Hosting Reviews, Web Hosting Discussion Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content etc.

By registering you have access to many other special features, Like personal blogs, your own personal forum, extended profiles, posting of your resume, free links, photo galleries, auctions etc. We are Web 2.0 Compliant .

We also reward our posters and referals with free hosting, domains, prizes etc. Even earn points for reading posts. We offer contests, and events that are sure to please anyone in the hosting industry, web developer or SEO at heart.

Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact us.

Go Back   Web Hosting Forum - Hosting Reviews, Web Hosting Discussion Forum > DataCenter Issues > Hardware > Processors & Motherboards > Intel Pentium and Above Chipsets

Intel Pentium and Above Chipsets The Intel Chipset (including xeon, and duocore)

Reply
 
LinkBack Thread Tools
  #1 (permalink)  
Old 08-14-2007, 02:43 AM
meta.x.gdb@gmail.com
 
Posts: n/a
RDTSC performance on different x86 archs



not sure where to go with this question. I saw you listing in
comp.sys.intel

I'm having strange behavior with the RDTSC instruction

On Jul 10 1994, 12:19 am, g...@ichips.intel.com (Andy Glew) wrote:
>
>
> (3) RDMSR(MSR=10h) versus RDTSC: yes, indeed, MSR=10h is the TimeStamp
> Counter (TSC). However, accessing this via RDMSR and WRMSR is *not*
> portable.
> RDTSC is the *portable*, architectural, way of accessing the
> timestamp counter. It's faster, and it has certain other conveniences.
> Please avoid using RDMSR(MSR=10).
> There is no portable way of writing the TSC. WRMSR(MSR=10h) works
> to a degree, but is non-portable. Moreover, arbitrary writeability is
> *not* guaranteed - it may not be possible to write any arbitrary bit
> pattern to the counter.
>



OK, I am having some unusual results from rdtsc

I have a small C program with some inline assembly (in gnu style)

#include <stdio.h>

int main(void)
{
unsigned long long int t0, t1;
int result;
unsigned int ret0[2];
unsigned int ret1[2];
__asm__ __volatile__("rdtsc" : "=a"(ret0[0]), "=d"(ret0[1]));
__asm__ ("xorl %ecx, %ecx \n\t"
"L1: \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"rdtsc \n\t"
"addl $16, %ecx \n\t"
"cmpl $8192, %ecx \n\t"
"jne L1");
__asm__ __volatile__("rdtsc" : "=a"(ret1[0]), "=d"(ret1[1]));
t0 = *(unsigned long long int*)ret0;
t1 = *(unsigned long long int*)ret1;
result = (t1-t0)/8192;
printf("ticks per rdtsc %d \n",result);
return result;
}

This compiles and runs fine with both Intel and GNU compilers 3.3,
4.0, etc.

when I compile this and execute under Cygwin (running on Windows XP)
and an AMD 4200+ I get
../a.exe
ticks per rdtsc 6

which isn't 1 or 2, but I can live with 6 clock ticks to process a
seldom called op.

if I compile and run this under Mac OS X (new Apple MacBookPro) Intel
Core 2 I get 65 ?!?!

if I compile and run this on Suse Linux on a Xeon processor, I get
85 ?!?! (Intel or GNU compiler)

I'm not even putting in serializing. does that look right to
anyone ? Can anyone verify they get the same results on their x86
machines ?

Brian VS

Reply With Quote
  #2 (permalink)  
Old 08-14-2007, 02:43 AM
Daniel Spångberg
 
Posts: n/a
Re: RDTSC performance on different x86 archs

Yes, similar results. 6 ticks on athlon64, 11 on athlon, around 80 on
intel p4 and 30 ticks on intel p3. Looks ok. Since rdtsc cannot be used to
time very short instruction sequencies anyway, since it isn't serializing,
this does not matter much anyway.
Daniel

On Tue, 14 Aug 2007 04:36:10 +0200, <meta.x.gdb@gmail.com> wrote:

>
>
> not sure where to go with this question. I saw you listing in
> comp.sys.intel
>
> I'm having strange behavior with the RDTSC instruction
>
> On Jul 10 1994, 12:19 am, g...@ichips.intel.com (Andy Glew) wrote:
>>
>>
>> (3) RDMSR(MSR=10h) versus RDTSC: yes, indeed, MSR=10h is the TimeStamp
>> Counter (TSC). However, accessing this via RDMSR and WRMSR is *not*
>> portable.
>> RDTSC is the *portable*, architectural, way of accessing the
>> timestamp counter. It's faster, and it has certain other conveniences..
>> Please avoid using RDMSR(MSR=10).
>> There is no portable way of writing the TSC. WRMSR(MSR=10h) works
>> to a degree, but is non-portable. Moreover, arbitrary writeability is
>> *not* guaranteed - it may not be possible to write any arbitrary bit
>> pattern to the counter.
>>

>
>
> OK, I am having some unusual results from rdtsc
>
> I have a small C program with some inline assembly (in gnu style)
>
> #include <stdio.h>
>
> int main(void)
> {
> unsigned long long int t0, t1;
> int result;
> unsigned int ret0[2];
> unsigned int ret1[2];
> __asm__ __volatile__("rdtsc" : "=a"(ret0[0]), "=d"(ret0[1]));
> __asm__ ("xorl %ecx, %ecx \n\t"
> "L1: \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "rdtsc \n\t"
> "addl $16, %ecx \n\t"
> "cmpl $8192, %ecx \n\t"
> "jne L1");
> __asm__ __volatile__("rdtsc" : "=a"(ret1[0]), "=d"(ret1[1]));
> t0 = *(unsigned long long int*)ret0;
> t1 = *(unsigned long long int*)ret1;
> result = (t1-t0)/8192;
> printf("ticks per rdtsc %d \n",result);
> return result;
> }
>
> This compiles and runs fine with both Intel and GNU compilers 3.3,
> 4.0, etc.
>
> when I compile this and execute under Cygwin (running on Windows XP)
> and an AMD 4200+ I get
> ./a.exe
> ticks per rdtsc 6
>
> which isn't 1 or 2, but I can live with 6 clock ticks to process a
> seldom called op.
>
> if I compile and run this under Mac OS X (new Apple MacBookPro) Intel
> Core 2 I get 65 ?!?!
>
> if I compile and run this on Suse Linux on a Xeon processor, I get
> 85 ?!?! (Intel or GNU compiler)
>
> I'm not even putting in serializing. does that look right to
> anyone ? Can anyone verify they get the same results on their x86
> machines ?
>
> Brian VS
>

Reply With Quote
  #3 (permalink)  
Old 08-15-2007, 02:40 AM
meta.x.gdb@gmail.com
 
Posts: n/a
Re: RDTSC performance on different x86 archs

On Aug 14, 12:13 am, Daniel Spångberg <dani...@mkem.uu.se> wrote:
> Yes, similar results. 6 ticks on athlon64, 11 on athlon, around 80 on
> intel p4 and 30 ticks on intel p3. Looks ok. Since rdtsc cannot be used to
> time very short instruction sequencies anyway, since it isn't serializing,
> this does not matter much anyway.
> Daniel
>


At issue for me is the overall cost to the code of having it
instrumented with profilers.
There are lots of examples on the web of people using rdtsc to time
routines that are on the order
of 200 cycles. In cases like this, an instrumented version of the
code will be significantly impacted by
the act of measuring.

It seems the P4 is a particularly pokey implementation of this
instruction.

It's not a show stopper for me. It would be nice if someone with
access to an IA64 chip could make a similar
test. the Intel IA64 compiler doesn't support inline assembly, but it
does have a built-in intrinsic __rdtsc() that
should do the same thing.

It would also be nice if future Intel designs were more aware of
measuring needs.

It is frustrating that I need to be running in privileged mode to
read the hardware counters. Folks in national labs don't get
patch their kernels or run in privilege = 0 that often.

the IBM Power chip manages a similar instruction in 2 cycles. I was
quite impressed.


Brian Van Straalen

Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Points Per Thread View:
Points Per Thread:
Points Per Reply:


All times are GMT -5. The time now is 08:55 PM.


International Visitors Translate Hostingforum.ca here

Dedicated Hosting


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO 3.1.0 Copyright 2008 Net Industries, LLC.