Why x86 uses xor to zero out a register

I was just reading my colleague Lu Pan’s blog post about C++ exceptions when I saw this quote:

The reason why it uses xor eax eax to set a register to 0 is for efficiency reason. It produces shorter opcode and enables the processor to perform register renaming.

Lu Pan
C++ exception (1) — zero-cost exception handling

I found this rather baffling for a few reasons. First of all, isn’t zeroing out a register a common enough operation that it should be optimized?

Further, why would using xor better for register renaming? Intuitively it seems like it might even be worse since the two operand registers and the result register are all the same.

Instruction size

The obvious way to zero a register is with mov. This instruction takes 7 bytes:

0:  48 c7 c0 00 00 00 00    mov    rax,0x0

It actually includes a full 4 byte immediate value. That wastes quite a bit of space in the instruction.

With xor, the same instruction takes only 3 bytes.

0:  48 31 c0                xor    rax,rax

It pretty much looks like the mov instruction but without the immediate (and a different opcode of course).

Register Renaming

It turns out that Intel has a few dependency breaking idioms in their optimization manual:

Assembly/Compiler Coding Rule 36. (M impact, ML generality) Use dependency-breaking-idiom instructions to set a register to 0, or to break a false dependence chain resulting from re-use of registers.

Section 3.5.1.8 Clearing Registers and Dependency Breaking Idioms
Intel® 64 and IA-32 Architectures Optimization Reference Manual

So actually while it seems strange that xor would be better for register renaming despite referencing the register twice, this particular instruction is explicitly handled by Intel. They themselves recommend it as the best way to zero a register. In fact, the CPU will explicitly skip as many steps as possible on this instruction in order to efficiently zero the register.