Page 1 of 3 123 LastLast
Results 1 to 10 of 21

Thread: Access speeds in assembly

  1. #1
    Join Date
    Jun 2009
    Location
    0000:0400
    Beans
    Hidden!

    Access speeds in assembly

    Short version: I'm curious as to the order in which these access methods rank in assembly: registers, stack, and memory (via direct addresses). I have a funny feeling that's the hierarchy right there, but I'm still fairly new to assembly...

    Long version: I'm taking an x86 assembly class (much to my dismay using MASM), and our last assignment was to write the first 24 terms of the Fibonacci sequence. We were asked to write it with registers so I did, with gratuitous use of the xchg instruction.

    Then, I had a thought. Why not write it using a pointer in memory (essentially an array)? I thought I was really clever when I was able to do it without using a 'mov' operation, and only arithmetic.

    After that, I figured that there had to be a way to abuse the stack to store values, but ditched the idea with the suspicion that I would just be rewriting the xchg instruction (and likely less efficiently).

    So I got to wondering which of these methods was the fastest, but found myself at a loss. M$ provides junk for gauging performance with MASM, and this is also a really small amount of code to be measuring execution time. Anyone have any experience with assembly to know which access methods are the quickest?

  2. #2
    Join Date
    May 2009
    Beans
    303

    Re: Access speeds in assembly

    Your hierarchy is correct. I'd suggest getting a (thin) book on (x86) assembly if you're planning to do anything with it in the future.

  3. #3
    Join Date
    Nov 2007
    Beans
    410
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: Access speeds in assembly

    The access speeds depend on both the operation and the operands (and the architecture). Typically, operations between 2 registers are very fast. The x86 architecture is an example of a pipelined architecture so most operations will take more than one clock cycle. Compare that to the MIPS architecture where operations typically perform in fewer clock cycles.

    For reference, I found this website that lists the x86 ASM operations. If you click on them it will tell you the number of clock cycles the operation takes to complete depending on what the operands are.

  4. #4
    Join Date
    Jun 2009
    Location
    0000:0400
    Beans
    Hidden!

    Re: Access speeds in assembly

    Quote Originally Posted by cszikszoy View Post
    For reference, I found this website that lists the x86 ASM operations. If you click on them it will tell you the number of clock cycles the operation takes to complete depending on what the operands are.
    That's excellent! Thank you!

  5. #5
    Join Date
    Mar 2005
    Beans
    947
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Access speeds in assembly

    The stack is in main memory. The stack instructions I suppose are quicker to decode than pointer-style instructions, but any RAM access is slow. Registers are much faster.

    cszikszoy, that's a nice reference, but it's very outdated. Most instructions on a modern x86 do not take more than one cycle; in fact, it's more like instructions per cycle than cycles per instruction now, what with simultaneous, partial, out-of-order execution.

    Cache hits are the big thing now. The cache is much faster than RAM, so you try to keep your algorithms running within the cache.

  6. #6
    Join Date
    Jun 2009
    Location
    0000:0400
    Beans
    Hidden!

    Re: Access speeds in assembly

    Quote Originally Posted by wmcbrine View Post
    The stack is in main memory. The stack instructions I suppose are quicker to decode than pointer-style instructions, but any RAM access is slow. Registers are much faster.

    cszikszoy, that's a nice reference, but it's very outdated. Most instructions on a modern x86 do not take more than one cycle; in fact, it's more like instructions per cycle than cycles per instruction now, what with simultaneous, partial, out-of-order execution.

    Cache hits are the big thing now. The cache is much faster than RAM, so you try to keep your algorithms running within the cache.
    True, but the stack is treated separately and access to it generally seems to be faster (at least based on a 80486).

    Correct me if I'm wrong, but the general trend in assembly seems to be: the more rules and/or the more limited your access, the faster it is. You only have a few registers, your stack access is limited in the way you can access it (FILO) but storage is broader, and then RAM access is sort of do-what-you-want and in almost unlimited quantities.

    Is there a specific way to ensure instructions stay within the cache? Or, will instructions just default to the cache, and once they exceed the size, overflow to RAM?

  7. #7
    Join Date
    Aug 2007
    Location
    127.0.0.1
    Beans
    1,800
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: Access speeds in assembly

    Quote Originally Posted by falconindy View Post
    True, but the stack is treated separately and access to it generally seems to be faster (at least based on a 80486).
    It's still memory. It will probably take more than one atomic instruction to access it, like lw, decrease stack pointer (unless the cpu provides a single instruction).

    Quote Originally Posted by falconindy View Post
    Correct me if I'm wrong, but the general trend in assembly seems to be: the more rules and/or the more limited your access, the faster it is. You only have a few registers, your stack access is limited in the way you can access it (FILO) but storage is broader, and then RAM access is sort of do-what-you-want and in almost unlimited quantities.
    Everything has a tradeoff. The ideal scenario would be to have all you need on registers and process it without even going to ram, or even issuing an interruption (no I/O, no storing anywhere).

    The tradeoff of having everything in registers is that you're limited, and you'll probably have to use more instructions to swap around data to make space for new things.

    IMO. This kind of fine-grained optimization is utterly useless. Why would you ponder so much on this, if eventually the scheduler will do a context switch and throw everything into memory? In other words, there are system-wide operations that have a major impact on how programs run, that you can achieve by coding. It's almost like trying to fine tune a sports car, putting in inside a truck and have it carried around to show off to friends. It will be fine tuned, but in the end it won't matter, the truck will be the one that decides your ultimate speed.

    Have fun with assembly, but don't let it get over your head.
    Last edited by Can+~; October 5th, 2009 at 12:02 AM.
    "Just in terms of allocation of time resources, religion is not very efficient. There's a lot more I could be doing on a Sunday morning."
    -Bill Gates

  8. #8
    Join Date
    Jun 2009
    Location
    0000:0400
    Beans
    Hidden!

    Re: Access speeds in assembly

    Quote Originally Posted by Can+~ View Post
    The tradeoff of having everything in registers is that you're limited, and you'll probably have to use more instructions to swap around data to make space for new things.
    ...and out of my own self interest, I'd like to understand the trade-offs.
    Quote Originally Posted by Can+~ View Post
    IMO. This kind of fine-grained optimization is utterly useless. Why would you ponder so much on this, if eventually the scheduler will do a context switch and throw everything into memory? [etc etc etc]
    And you're certainly entitled to your own opinion. With each new generation of processor, its effectively harder and harder to write something in assembly that will be a burden on the CPU (without intentionally doing something to chew up cycles). I think of assembly as being the ultimate programatical puzzle -- as many ways as there are to concoct a solution in a high level language, there's far more solutions in assembly because of the fine-grain control you're allowed. Again, out of self interest, I choose to spend time on this "useless" optimization. You obviously enjoy coding for reasons other than I do.

    Quote Originally Posted by Can+~ View Post
    Have fun with assembly, but don't let it get over your head.
    Too late. I had a dream the other night... cut to the chase, I woke up at 2am to write assembly.

  9. #9
    Join Date
    Aug 2007
    Location
    127.0.0.1
    Beans
    1,800
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: Access speeds in assembly

    Quote Originally Posted by falconindy View Post
    I think of assembly as being the ultimate programatical puzzle
    Really? It was the exact same opposite for me, I thought of it almost like a by-product of a higher level language (C for instance), something that is so trivial that the computer could figure it out for itself and wasn't worth of wasting time thinking on it that much.

    Quote Originally Posted by falconindy View Post
    as many ways as there are to concoct a solution in a high level language, there's far more solutions in assembly because of the fine-grain control you're allowed. Again, out of self interest, I choose to spend time on this "useless" optimization. You obviously enjoy coding for reasons other than I do.
    Don't get me wrong, I also learnt Assembly for pretty much the same reasons, plus giving you a more deeper view on the Programming Abstractions as a system. The thing is that I never sought for ultimate speed, because I knew that this problem is already solved by a compiler+assembler, and it's underlying optimizations.

    Quote Originally Posted by falconindy View Post
    Too late. I had a dream the other night... cut to the chase, I woke up at 2am to write assembly.
    I wonder what Freud would've said about that.
    "Just in terms of allocation of time resources, religion is not very efficient. There's a lot more I could be doing on a Sunday morning."
    -Bill Gates

  10. #10
    Join Date
    Nov 2007
    Beans
    410
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: Access speeds in assembly

    Quote Originally Posted by wmcbrine View Post
    cszikszoy, that's a nice reference, but it's very outdated. Most instructions on a modern x86 do not take more than one cycle; in fact, it's more like instructions per cycle than cycles per instruction now, what with simultaneous, partial, out-of-order execution.
    Sorry, but that's just wrong. The x86 architecture is a perfect example of a pipelined architecture. It's true that many operations will complete in one clock cycle, The commonly used operations are usually implemented with gates so that they do complete in one clock cycle.

    Operations per cycle can't happen. With a pipelined architecture, it is true that several operations may complete on the same clock cycle, but it depends on the stream of instructions. But still, the first stage of the architecture is the fetch and decode. This happens in exactly one clock cycle, and it happens sequentially (meaning the CPU doesn't fetch the next 10 instructions at the same time, it fetches them one by one).

    Furthermore, to the OP, you'll find a lot of people here that don't like or don't know how to really use ASM because python & the like are generally what's accepted and used in this forum. Speak to someone outside the Computer Science realm and you'll see that ASM is incredibly useful, and incredibly powerful. I work for a government contractor and use ASM extensively in systems where timing is absolutely critical. ASM is required in these situations because if you have the datasheet for the particular CPU you're working with, you know exactly how long a particular set of instructions will take. Higher level languages just won't work in these situations.

Page 1 of 3 123 LastLast

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •