be: Lower Perms using copy instead of swap by default.
For architectures without a swap instruction (all except general purpose register set on amd64 and ia32) this results in shorter code. In many cases (probably except swapping two registers) it is also better this way on amd64/ia32 due to fewer uops and modern processors eliminating mov during decoding.
Please register or sign in to comment