1. P6 and later Intel CPUs have had register renaming and more internal registers for a long time now. The big problem with this approach is that the compiler can't help you at all, so for performance you are completely dependent on the hardware's instruction reordering, branch prediction, and speculative execution. It would be difficult for me to agree that Intel was left behind in any large part because of its legacy appearance to the programmer - the P6 core was an excellent performer for a long time considering its roots.

2. CPU and PCI memory accesses are independent and mutually exclusive. Considering that the alternative to PCI DMA would be the CPU shuffling one word at a time to the board, I don't really see what the problem is here.