You're absolutely wrong. First of all, a pure RISC processor needs a much larger instruction cache to avoid misses, because the instructions are usually fixed width to make the decoder simple, compared to variable width instructions which make the most common instructions very short. You also need more memory bandwidth to fetch the larger instructions in the first place.

Furthermore, the Pentium 4 has what they refer to as a "trace cache" which caches the already-decoded instructions - only if this cache is missed does the instruction decoder have to become involved.

You'd be hard pressed to say that instruction decoding has any significant overhead anymore in Intel processors.