amd64 backend TODO: Correctness: - SSE Division is not commutative and we have no neg-add style workaround like for the Sub node. So maybe we need finally need a must_be_same constraint. - stdarg.h/varargs va_start - compound return calling convention - Implement more builtins (libgcc lacks several of them that gcc provides natively on amd64 so cparser/libfirm when linking to the compilerlib fallback) - Builtins not implemented: clz, ctz, ffs, parity, popcount - Bitcast not implemented - Thread local storage not implemented - support for setjmp - Support ASM node - fail on long double - (Support 80bit long double with x87 instructions) - Finish PIC code implementation. This is mostly done now, usual accesses to functions and variables including address mode matching looks fine now. Jumptables are not accessed correctly in PIC yet and Improve Quality: - Immediate32 matching could be better and match SymConst, Add(SymConst, Const) combinations where possible. - Support Destination Address Mode - Match Immediate + Address mode for Cmp - Support Read-Modify-Store operations (aka destination address mode) - Leave out labels that are not jumped at (improves assembly readability, see ia32 backend output) - Align certain labels if beneficial (see ia32 backend, compare with clang/gcc) - Implement CMov/Set and announce this in mux_allowed callback - We always Spill/Reload 64bit, we should improve the spiller to allow smaller spills where possible. - Perform some benchmark comparison with clang/gcc and distill more issues to put on this list. - Support folding reloads into nodes (amd64_irn_ops: possible_memory_operand() perform_memory_operand()) - Report instruction costs (amd64_irn_ops: get_op_estimated_cost()) - Transform IncSP+Store/Load to Push/Pop peephole pass - Use stack red zone where possible to avoid IncSP at begin/end of function - Compare node inputs can be swapped if we remember this in the compare node attributes, this allows us to think of them as associative operations and for example swap inputs to enable load folding, or immediates. - Lea needs to support all address modes (base, index +shifts, symconsts) - Match RCPxx SSE instruction