Access speeds in assembly

**falconindy** · October 4th, 2009

Short version: I'm curious as to the order in which these access methods rank in assembly: registers, stack, and memory (via direct addresses). I have a funny feeling that's the hierarchy right there, but I'm still fairly new to assembly...

Long version: I'm taking an x86 assembly class (much to my dismay using MASM), and our last assignment was to write the first 24 terms of the Fibonacci sequence. We were asked to write it with registers so I did, with gratuitous use of the xchg instruction.

Then, I had a thought. Why not write it using a pointer in memory (essentially an array)? I thought I was really clever when I was able to do it without using a 'mov' operation, and only arithmetic.

After that, I figured that there had to be a way to abuse the stack to store values, but ditched the idea with the suspicion that I would just be rewriting the xchg instruction (and likely less efficiently).

So I got to wondering which of these methods was the fastest, but found myself at a loss. M$ provides junk for gauging performance with MASM, and this is also a really small amount of code to be measuring execution time. Anyone have any experience with assembly to know which access methods are the quickest?

**froggyswamp** · October 4th, 2009

Your hierarchy is correct. I'd suggest getting a (thin) book on (x86) assembly if you're planning to do anything with it in the future.

**cszikszoy** · October 4th, 2009

The access speeds depend on both the operation and the operands (and the architecture). Typically, operations between 2 registers are very fast. The x86 architecture is an example of a pipelined architecture so most operations will take more than one clock cycle. Compare that to the MIPS architecture where operations typically perform in fewer clock cycles.

For reference, I found this website that lists the x86 ASM operations. If you click on them it will tell you the number of clock cycles the operation takes to complete depending on what the operands are.

**falconindy** · October 4th, 2009

Originally Posted by cszikszoy

For reference, I found this website that lists the x86 ASM operations. If you click on them it will tell you the number of clock cycles the operation takes to complete depending on what the operands are.

That's excellent! Thank you!

**wmcbrine** · October 4th, 2009

The stack is in main memory. The stack instructions I suppose are quicker to decode than pointer-style instructions, but any RAM access is slow. Registers are much faster.

cszikszoy, that's a nice reference, but it's very outdated. Most instructions on a modern x86 do not take more than one cycle; in fact, it's more like instructions per cycle than cycles per instruction now, what with simultaneous, partial, out-of-order execution.

Cache hits are the big thing now. The cache is much faster than RAM, so you try to keep your algorithms running within the cache.

**falconindy** · October 4th, 2009

Originally Posted by wmcbrine

The stack is in main memory. The stack instructions I suppose are quicker to decode than pointer-style instructions, but any RAM access is slow. Registers are much faster.

cszikszoy, that's a nice reference, but it's very outdated. Most instructions on a modern x86 do not take more than one cycle; in fact, it's more like instructions per cycle than cycles per instruction now, what with simultaneous, partial, out-of-order execution.

Cache hits are the big thing now. The cache is much faster than RAM, so you try to keep your algorithms running within the cache.

True, but the stack is treated separately and access to it generally seems to be faster (at least based on a 80486).

Correct me if I'm wrong, but the general trend in assembly seems to be: the more rules and/or the more limited your access, the faster it is. You only have a few registers, your stack access is limited in the way you can access it (FILO) but storage is broader, and then RAM access is sort of do-what-you-want and in almost unlimited quantities.

Is there a specific way to ensure instructions stay within the cache? Or, will instructions just default to the cache, and once they exceed the size, overflow to RAM?

**Can+~** · October 4th, 2009

Originally Posted by falconindy

True, but the stack is treated separately and access to it generally seems to be faster (at least based on a 80486).

It's still memory. It will probably take more than one atomic instruction to access it, like lw, decrease stack pointer (unless the cpu provides a single instruction).

Originally Posted by falconindy

Correct me if I'm wrong, but the general trend in assembly seems to be: the more rules and/or the more limited your access, the faster it is. You only have a few registers, your stack access is limited in the way you can access it (FILO) but storage is broader, and then RAM access is sort of do-what-you-want and in almost unlimited quantities.

Everything has a tradeoff. The ideal scenario would be to have all you need on registers and process it without even going to ram, or even issuing an interruption (no I/O, no storing anywhere).

The tradeoff of having everything in registers is that you're limited, and you'll probably have to use more instructions to swap around data to make space for new things.

IMO. This kind of fine-grained optimization is utterly useless. Why would you ponder so much on this, if eventually the scheduler will do a context switch and throw everything into memory? In other words, there are system-wide operations that have a major impact on how programs run, that you can achieve by coding. It's almost like trying to fine tune a sports car, putting in inside a truck and have it carried around to show off to friends. It will be fine tuned, but in the end it won't matter, the truck will be the one that decides your ultimate speed.

Have fun with assembly, but don't let it get over your head.

**falconindy** · October 5th, 2009

Originally Posted by Can+~

The tradeoff of having everything in registers is that you're limited, and you'll probably have to use more instructions to swap around data to make space for new things.

...and out of my own self interest, I'd like to understand the trade-offs.

Originally Posted by Can+~

IMO. This kind of fine-grained optimization is utterly useless. Why would you ponder so much on this, if eventually the scheduler will do a context switch and throw everything into memory? [etc etc etc]

And you're certainly entitled to your own opinion. With each new generation of processor, its effectively harder and harder to write something in assembly that will be a burden on the CPU (without intentionally doing something to chew up cycles). I think of assembly as being the ultimate programatical puzzle -- as many ways as there are to concoct a solution in a high level language, there's far more solutions in assembly because of the fine-grain control you're allowed. Again, out of self interest, I choose to spend time on this "useless" optimization. You obviously enjoy coding for reasons other than I do.

Originally Posted by Can+~

Have fun with assembly, but don't let it get over your head.

Too late. I had a dream the other night... cut to the chase, I woke up at 2am to write assembly.

**Can+~** · October 5th, 2009

Originally Posted by falconindy

I think of assembly as being the ultimate programatical puzzle

Really? It was the exact same opposite for me, I thought of it almost like a by-product of a higher level language (C for instance), something that is so trivial that the computer could figure it out for itself and wasn't worth of wasting time thinking on it that much.

Originally Posted by falconindy

as many ways as there are to concoct a solution in a high level language, there's far more solutions in assembly because of the fine-grain control you're allowed. Again, out of self interest, I choose to spend time on this "useless" optimization. You obviously enjoy coding for reasons other than I do.

Don't get me wrong, I also learnt Assembly for pretty much the same reasons, plus giving you a more deeper view on the Programming Abstractions as a system. The thing is that I never sought for ultimate speed, because I knew that this problem is already solved by a compiler+assembler, and it's underlying optimizations.

Originally Posted by falconindy

Too late. I had a dream the other night... cut to the chase, I woke up at 2am to write assembly.

I wonder what Freud would've said about that.

**cszikszoy** · October 5th, 2009

Originally Posted by wmcbrine

cszikszoy, that's a nice reference, but it's very outdated. Most instructions on a modern x86 do not take more than one cycle; in fact, it's more like instructions per cycle than cycles per instruction now, what with simultaneous, partial, out-of-order execution.

Sorry, but that's just wrong. The x86 architecture is a perfect example of a pipelined architecture. It's true that many operations will complete in one clock cycle, The commonly used operations are usually implemented with gates so that they do complete in one clock cycle.

Operations per cycle can't happen. With a pipelined architecture, it is true that several operations may complete on the same clock cycle, but it depends on the stream of instructions. But still, the first stage of the architecture is the fetch and decode. This happens in exactly one clock cycle, and it happens sequentially (meaning the CPU doesn't fetch the next 10 instructions at the same time, it fetches them one by one).

Furthermore, to the OP, you'll find a lot of people here that don't like or don't know how to really use ASM because python & the like are generally what's accepted and used in this forum. Speak to someone outside the Computer Science realm and you'll see that ASM is incredibly useful, and incredibly powerful. I work for a government contractor and use ASM extensively in systems where timing is absolutely critical. ASM is required in these situations because if you have the datasheet for the particular CPU you're working with, you know exactly how long a particular set of instructions will take. Higher level languages just won't work in these situations.

Thread: Access speeds in assembly

Thread Tools

Display

Access speeds in assembly

Re: Access speeds in assembly

Re: Access speeds in assembly

Re: Access speeds in assembly

Re: Access speeds in assembly

Re: Access speeds in assembly

Re: Access speeds in assembly

Re: Access speeds in assembly

Re: Access speeds in assembly

Re: Access speeds in assembly

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions