View Single Post
  #3 (permalink)  
Old 08-15-2007, 02:40 AM
meta.x.gdb@gmail.com
 
Posts: n/a
Re: RDTSC performance on different x86 archs

On Aug 14, 12:13 am, Daniel Spångberg <dani...@mkem.uu.se> wrote:
> Yes, similar results. 6 ticks on athlon64, 11 on athlon, around 80 on
> intel p4 and 30 ticks on intel p3. Looks ok. Since rdtsc cannot be used to
> time very short instruction sequencies anyway, since it isn't serializing,
> this does not matter much anyway.
> Daniel
>


At issue for me is the overall cost to the code of having it
instrumented with profilers.
There are lots of examples on the web of people using rdtsc to time
routines that are on the order
of 200 cycles. In cases like this, an instrumented version of the
code will be significantly impacted by
the act of measuring.

It seems the P4 is a particularly pokey implementation of this
instruction.

It's not a show stopper for me. It would be nice if someone with
access to an IA64 chip could make a similar
test. the Intel IA64 compiler doesn't support inline assembly, but it
does have a built-in intrinsic __rdtsc() that
should do the same thing.

It would also be nice if future Intel designs were more aware of
measuring needs.

It is frustrating that I need to be running in privileged mode to
read the hardware counters. Folks in national labs don't get
patch their kernels or run in privilege = 0 that often.

the IBM Power chip manages a similar instruction in 2 cycles. I was
quite impressed.


Brian Van Straalen

Reply With Quote