The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.
The x86 instruction set has been extended several times, introducing wider registers and datatypes as well as new functionality.[1]
Below is the full 8086/8088 instruction set of Intel (81 instructions total).[2] These instructions are also available in 32-bit mode, they operate instead on 32-bit registers (eax, ebx, etc.) and values instead of their 16-bit (ax, bx, etc.) counterparts. The updated instruction set is grouped according to architecture (i186, i286, i386, i486, i586/i686) and is referred to as (32-bit) x86 and (64-bit) x86-64 (also known as AMD64).
This is the original instruction set. In the 'Notes' column, r means register, m means memory address and imm means immediate (i.e. a value).
The new instructions added in 80286 add support for x86 protected mode. Some but not all of the instructions are available in real mode as well.
LGDT
, LIDT
, SGDT
and SIDT
instructions consist of a 2-part data structure. The first part is a 16-bit value, specifying table size in bytes minus 1. The second part is a 32-bit value (64-bit value in 64-bit mode), specifying the linear start address of the table.LGDT
and LIDT
with a 16-bit operand size, the address is ANDed with 00FFFFFFh.
On Intel (but not AMD) CPUs, the SGDT
and SIDT
instructions with a 16-bit operand size is – as of Intel SDM revision 079, March 2023 – documented to write a descriptor to memory with the last byte being set to 0. However, observed behavior is that bits 31:24 of the descriptor table address are written instead.[3]LGDT
, LIDT
, LLDT
, LMSW
and LTR
instructions are serializing on Pentium and later processors.LMSW
instruction can only modify the bottom 4 bits of this register and cannot clear bit 0. The inability to clear bit 0 means that LMSW
can be used to enter but not leave x86 Protected Mode.LMSW
nor with LOADALL
[4]) without a CPU reset – on 80386 and later, it is possible to leave Protected Mode, but this requires the use of the 80386-and-later MOV
to CR0
instruction.CR4.UMIP=1
is set, then the SGDT
, SIDT
, SLDT
, SMSW
and STR
instructions can only run in Ring 0.SMSW
, SLDT
and STR
instructions always use an operand size of 16 bits when used with a memory argument. With a register argument on 80386 or later processors, wider destination operand sizes are available and behave as follows:SMSW
: Stores full CR0 in x86-64 long mode, undefined otherwise.SLDT
: Zero-extends 16-bit argument on Pentium Pro and later processors, undefined on earlier processors.STR
: Zero-extends 16-bit argument.ARPL
instruction is not available – the 63 /r
opcode has been reassigned to the 64-bit-mode-only MOVSXD
instruction.ARPL
instruction causes #UD in Real mode and Virtual 8086 Mode – Windows 95 and OS/2 2.x are known to make extensive use of this #UD to use the 63
opcode as a one-byte breakpoint to transition from Virtual 8086 Mode to kernel mode.[8][9]0x00FFFF00
.VERW
instruction also flushes microarchitectural data buffers. This enables it to be used as part of workarounds for Microarchitectural Data Sampling security vulnerabilities.[11][12]LOADALL
with a different opcode and memory layout exists on 80386.)The 80386 added support for 32-bit operation to the x86 instruction set. This was done by widening the general-purpose registers to 32 bits and introducing the concepts of OperandSize and AddressSize – most instruction forms that would previously take 16-bit data arguments were given the ability to take 32-bit arguments by setting their OperandSize to 32 bits, and instructions that could take 16-bit address arguments were given the ability to take 32-bit address arguments by setting their AddressSize to 32 bits. (Instruction forms that work on 8-bit data continue to be 8-bit regardless of OperandSize. Using a data size of 16 bits will cause only the bottom 16 bits of the 32-bit general-purpose registers to be modified – the top 16 bits are left unchanged.)
The default OperandSize and AddressSize to use for each instruction is given by the D bit of the segment descriptor of the current code segment - D=0
makes both 16-bit, D=1
makes both 32-bit. Additionally, they can be overridden on a per-instruction basis with two new instruction prefixes that were introduced in the 80386:
66h
: OperandSize override. Will change OperandSize from 16-bit to 32-bit if CS.D=0
, or from 32-bit to 16-bit if CS.D=1
.67h
: AddressSize override. Will change AddressSize from 16-bit to 32-bit if CS.D=0
, or from 32-bit to 16-bit if CS.D=1
.The 80386 also introduced the two new segment registers FS
and GS
as well as the x86 control, debug and test registers.
The new instructions introduced in the 80386 can broadly be subdivided into two classes:
CWDE
, LODSD
)SHLD
, SETcc
)For instruction forms where the operand size can be inferred from the instruction's arguments (e.g. ADD EAX,EBX
can be inferred to have a 32-bit OperandSize due to its use of EAX as an argument), new instruction mnemonics are not needed and not provided.
EFLAGS.DF=1
and post-incremented by 4 otherwise.67
prefix.LODSD
, STOSD
, MOVSD
, INSD
and OUTSD
, the REP
prefix (F3
) will repeat the instruction the number of times specified in rCX (CX or ECX, decided by AddressSize), decrementing rCX for each iteration (with rCX=0 resulting in no-op and proceeding to the next instruction).CMPSD
and SCASD
, the REPE
(F3
) and REPNE
(F2
) prefixes are available, which will repeat the instruction but only as long as the flag condition (ZF=1 for REPE
, ZF=0 for REPNE
) holds true.INSB/W/D
instructions, the memory access rights for the ES:[rDI]
memory address might not be checked until after the port access has been performed – if this check fails (e.g. page fault or other memory exception), then the data item read from the port is lost. As such, it is not recommended to use this instruction to access an I/O port that performs any kind of side effect upon read.CWDE
instruction differs from the older CWD
instruction in that CWD
would sign-extend the 16-bit value in AX into a 32-bit value in the DX:AX register pair.E3
opcode (JCXZ
/JECXZ
), the choice of whether the instruction will use CX
or ECX
for its comparison (and consequently which mnemonic to use) is based on the AddressSize, not OperandSize. (OperandSize instead controls whether the jump destination should be truncated to 16 bits or not).LOOP
,LOOPE
,LOOPNE
(opcodes E0
,E1
,E2
), however, unlike JCXZ
/JECXZ
, these instructions have not been given new mnemonics for their ECX-using variants.PUSHA(D)
, the value of SP/ESP pushed onto the stack is the value it had just before the PUSHA(D)
instruction started executing.POPA
/POPAD
, the stack item corresponding to SP/ESP is popped off the stack (performing a memory read), but not placed into SP/ESP.PUSHFD
and POPFD
instructions will cause a #GP exception if executed in virtual 8086 mode if IOPL is not 3.PUSHF
, POPF
, IRET
and IRETD
instructions will cause a #GP exception if executed in Virtual-8086 mode if IOPL is not 3 and VME is not enabled.IRETD
is used to return from kernel mode to user mode (which will entail a CPL change) and the user-mode stack segment indicated by SS is a 16-bit segment, then the IRETD
instruction will only restore the low 16 bits of the stack pointer (ESP/RSP), with the remaining bits keeping whatever value they had in kernel code before the IRETD
. This has necessitated complex workarounds on both Linux ("ESPFIX")[15] and Windows.[16] This issue also affects the later 64-bit IRETQ
instruction.BT
, BTS
, BTR
and BTC
instructions:BTS
, BTC
and BTR
instructions accept the LOCK
(F0
) prefix when used with a memory argument – this results in the instruction executing atomically.F3
prefix is used with the 0F BC /r
opcode, then the instruction will execute as TZCNT
on systems that support the BMI1 extension. TZCNT
differs from BSF
in that TZCNT
but not BSR
is defined to return operand size if the source operand is zero – for other source operand values, they produce the same result.BSF
and BSR
set the EFLAGS.ZF flag to 1 if the source argument was all-0s and 0 otherwise.F3
prefix is used with the 0F BD /r
opcode, then the instruction will execute as LZCNT
on systems that support the ABM or LZCNT extensions. LZCNT
produces a different result from BSR
for most input values.SHLD
and SHRD
, the shift-amount is masked – the bottom 5 bits are used for 16/32-bit operand size and 6 bits for 64-bit operand size.SHLD
and SHRD
with 16-bit arguments and a shift-amount greater than 16 produce undefined results. (Actual results differ between different Intel CPUs, with at least three different behaviors known.[17])SETcc
and Jcc near
instructions (opcodes 0F 9x /0
and 0F 8x
respectively, with the x nibble specifying the condition) are:SETcc
, while the opcode is commonly specified as /0 – implying that bits 5:3 of the instruction's ModR/M byte should be 000 – modern x86 processors (Pentium and later) ignore bits 5:3 and will execute the instruction as SETcc
regardless of the contents of these bits.LFS
, LGS
and LSS
, the size of the offset part of the far pointer is given by operand size – the size of the segment part is always 16 bits. In 64-bit mode, using the REX.W
prefix with these instructions will cause them to load a far pointer with a 64-bit offset on Intel but not AMD processors.MOV
to/from the CRx
, DRx
and TRx
registers, the reg part of the ModR/M byte is used to indicate CRx/DRx/TRx
register and r/m part the general-register.
Uniquely for the MOV CRx/DRx/TRx
opcodes, the top two bits of the ModR/M byte is ignored – these opcodes are decoded and executed as if the top two bits of the ModR/M byte are 11b
.CRx
and DRx
registers, the operand size is always 64 bits in 64-bit mode and 32 bits otherwise.CR0
would not serialize the instruction stream – in part for this reason, it is usually required to perform a far jump immediately after a MOV
to CR0
if such a MOV
is used to enable/disable protected mode and/or memory paging.MOV
to CR2
is architecturally listed as serializing, but has been reported to be non-serializing on at least some Intel Core-i7 processors.[18]MOV
to CR8
(introduced with x86-64) is not serializing.MOV TRx
instructions were discontinued from Pentium onwards.INT1
/ICEBP
(F1
) instruction is present on all known Intel x86 processors from the 80386 onwards,[19] but only fully documented for Intel processors from the May 2018 release of the Intel SDM (rev 067) onwards.[20] Before this release, mention of the instruction in Intel material was sporadic, e.g. AP-526 rev 001.[21]F1
(ICEBP
) opcode differs from the operation of the regular software interrupt opcode CD 01
in several ways:CD 01
will check CPL against the interrupt descriptor's DPL field as an access-rights check, while F1
will not.CD 01
will also check CPL against IOPL as an access-rights check, while F1
will not.CD 01
but not F1
.XBTS
and IBTS
instructions were discontinued with the B1 stepping of 80386.XBTS
instruction as part of its CPU detection if CPUID
is not present, and will refuse to boot if XBTS
is found to be working.[24]XBTS
and IBTS
, the r/m argument represents the data to extract/insert a bitfield from/to, the reg argument the bitfield to be inserted/extracted, AX/EAX a bit-offset and CL a bitfield length.[25]BSWAP
with 16-bit registers is not disallowed per se (it will execute without producing an #UD or other exceptions) but is documented to produce undefined results – it is reported to produce various different results on 486,[27] 586, and Bochs/QEMU.[28]CMPXCHG
instruction uses a different encoding - 0F A6 /r
for 8-bit variant, 0F A7 /r
for 16/32-bit variant. The 0F B0/B1
encodings are used on 80486 stepping B and later.[30][31]CMPXCHG
instruction sets EFLAGS
in the same way as a CMP
instruction that uses the accumulator (AL/AX/EAX/RAX) as its first argument would do.INVLPG
executes as no-operation if the m8 argument is invalid (e.g. unmapped page or non-canonical address).INVLPG
can be used to invalidate TLB entries for individual global pages.INVD
and WBINVD
instructions will invalidate all cache lines in the CPU's L1 caches. It is implementation-defined whether they will invalidate L2/L3 caches as well.INVD
instruction is not permitted and will cause a #GP(0) exception. (The PRM is needed for Intel SGX.)[32]F3
prefix is used with the 0F 09
opcode, then the instruction will execute as WBNOINVD
on processors that support the WBNOINVD extension – this will not invalidate the cache.Integer/system instructions that were not present in the basic 80486 instruction set, but were added in various x86 processors prior to the introduction of SSE. (Discontinued instructions are not included.)
WRMSR
instruction is also used to update the CPU microcode. This is done by writing the virtual address of the new microcode to upload to MSR 79h
on Intel CPUs and MSR C001_0020h
[34] on AMD CPUs.RSM
instruction were made available on non-SL variants of the Intel 486 only after the initial release of the Intel Pentium in 1993.CPUID
with a leaf index (EAX) greater than 0 may leave EBX and ECX unmodified, keeping their old values. For this reason, it is recommended to zero out EBX and ECX before executing CPUID
.CPUID
will set the top 32 bits of RAX, RBX, RCX and RDX to zero.CPUID
to ring 0. Such MSRs are documented for at least Ivy Bridge[42] and Denverton.[43]CPUID
to ring 0 also exists on AMD processors supporting the "CpuidUserDis" feature (Zen 4 "Raphael" and later).[44]CPUID
is also available on some Intel and AMD 486 processor variants that were released after the initial release of the Intel Pentium.CPUID
is not enabled by default and must be enabled through a Cyrix configuration register.CPUID
is only supported with some system BIOSes. On some NexGen CPUs that do support CPUID
, EFLAGS.ID is not supported but EFLAGS.AC is, complicating CPU detection.[45]CMPXCHG
instruction, the CMPXCHG8B
instruction does not modify any EFLAGS bits other than ZF.LOCK CMPXCHG8B
with a register operand (which is an invalid encoding) can cause hangs on some Intel Pentium CPUs (Pentium F00F bug).CMPXCHG8B
instruction is always supported, however its CPUID bit may be missing. This is a workaround for a bug in Windows NT.[46]RDTSC
and RDPMC
instructions are not ordered with respect to other instructions, and may sample their respective counters before earlier instructions are executed or after later instructions have executed. Invocations of RDPMC
(but not RDTSC
) may be reordered relative to each other even for reads of the same counter.LFENCE
or serializing instructions (e.g. CPUID
) are needed.[47]8000_0007:EDX[8]
).RDTSC
can be run outside Ring 0 only if CR4.TSD=0
.RDTSC
cannot be run in Virtual-8086 mode.[51] Later processors removed this restriction.RDPMC
can be run outside Ring 0 only if CR4.PCE=1
.RDPMC
instruction is not present in VIA processors prior to the Nano.CMOVcc
instruction (opcode 0F 4x /r
, with the x nibble specifying the condition) are:CMOVcc
with a 32-bit operand size will clear the upper 32 bits of the destination register even if the condition is false.CMOVcc
with a memory source operand, the CPU will always read the operand from memory – potentially causing memory exceptions and cache line-fills – even if the condition for the move is not satisfied. (The Intel APX extension defines a set of new EVEX-encoded variants of CMOVcc
that will suppress memory exceptions if the condition is false.)reg,reg
but not reg,[mem]
forms of the CMOVcc
instructions have been reported to be present as undocumented instructions.[52]For cases where there is a need to use more than 9 bytes of NOP padding, it is recommended to use multiple NOPs.
0F 1F /0
as long-NOP was introduced in the Pentium Pro, but remained undocumented until 2006.[55]The whole 0F 18..1F
opcode range was NOP
in Pentium Pro. However, except for 0F 1F /0
, Intel does not guarantee that these opcodes will remain NOP
in future processors, and have indeed assigned some of these opcodes to other instructions in at least some processors.[56]0F 0B
opcode was officially reserved as an invalid opcode from Pentium onwards, it only got assigned the mnemonic UD2
from Pentium Pro onwards.[59]UD2A
and UD2B
mnemonics for the 0F 0B
and 0F B9
opcodes since version 2.7.[60]UD2A
nor UD2B
originally took any arguments - UD2B
was later modified to accept a ModR/M byte, in Binutils version 2.30.[61]UD2
(0F 0B
) instruction will additionally stop subsequent bytes from being decoded as instructions, even speculatively. For this reason, if an indirect branch instruction is followed by something that is not code, it is recommended to place an UD2
instruction after the indirect branch.[62]0F 0B
, 0F B9
and 0F FF
- will cause an #UD exception on all x86 processors from the 80186 onwards (except NEC V-series processors), but did not get explicitly reserved for this purpose until P5-class processors.0F B9
opcode was officially reserved as an invalid opcode from Pentium onwards, it only got assigned its mnemonic UD1
much later – AMD APM started listing UD1
in its opcode maps from rev 3.17 onwards,[64] while Intel SDM started listing it from rev 061 onwards.[65]0F B9
and 0F FF
opcodes, different x86 implementations are known to differ regarding whether the opcodes accept a ModR/M byte.[66][67]0F FF
opcode, the OIO
mnemonic was introduced by Cyrix,[68] while the UD0
menmonic (without arguments) was introduced by AMD and Intel at the same time as the UD1
mnemonic for 0F B9
.[64][65] Later Intel (but not AMD) documentation modified its description of UD0
to add a ModR/M byte and take two arguments.[69]SYSCALL
/SYSRET
instructions were available on Model 7 (250nm "Little Foot") and later, not on the earlier Model 6.[71]SYSCALL
and SYSRET
were made an integral part of x86-64 – as a result, the instructions are available in 64-bit mode on all x86-64 processors from AMD, Intel, VIA and Zhaoxin.SYSRET
differs slightly between AMD and Intel processors: non-canonical return addresses cause a #GP exception to be thrown in Ring 3 on AMD CPUs but Ring 0 on Intel CPUs. This has been known to cause security issues.[72]SYSRET
and SYSEXIT
instructions under x86-64, it is necessary to add the REX.W
prefix for variants that will return to 64-bit user-mode code.REX.W
prefix are used to return to 32-bit user-mode code. (Neither of these instructions can be used to return to 16-bit user-mode code.)SYSRET
, SYSENTER
and SYSEXIT
instructions are unavailable in Real mode. (SYSENTER
is, however, available in Virtual 8086 mode.)CPUID
flags that indicate support for SYSENTER
/SYSEXIT
are set on the Pentium Pro, even though the processor does not officially support these instructions.[73]SYSENTER
and SYSEXIT
instructions are not available in x86-64 long mode (#UD).SYSENTER
and SYSEXIT
instructions are only available with version 4.2 or higher of the Transmeta Code Morphing software.[76]SYSENTER
and SYSEXIT
are available only on stepping 8 and later.[77]These instructions can only be encoded in 64 bit mode. They fall in four groups:
MOVSXD
replacing ARPL
)SWAPGS
)JRCXZ
)Most instructions with a 64 bit operand size encode this using a REX.W
prefix; in the absence of the REX.W
prefix,
the corresponding instruction with 32 bit operand size is encoded. This mechanism also applies to most other instructions with 32 bit operand
size. These are not listed here as they do not gain a new mnemonic in Intel syntax when used with a 64 bit operand size.
CMPXCHG16B
must be 16-byte aligned.CMPXCHG16B
instruction was absent from a few of the earliest Intel/AMD x86-64 processors. On Intel processors, the instruction was missing from Xeon "Nocona" stepping D,[78] but added in stepping E.[79] On AMD K8 family processors, it was added in stepping F, at the same time as DDR2 support was introduced.[80]CMPXCHG16B
has its own CPUID flag, separate from the rest of x86-64.MOVSXD
without REX.W prefix are permitted but discouraged[81] – such encodings behave identically to 16/32-bit MOV
(8B /r
).Bit manipulation instructions. For all of the VEX-encoded instructions defined by BMI1 and BMI2, the operand size may be 32 or 64 bits, controlled by the VEX.W bit – none of these instructions are available in 16-bit variants.
POPCNT
and LZCNT
. On Intel CPUs, however, the CPUID bit for "ABM" is only documented to indicate the presence of the LZCNT
instruction and is listed as "LZCNT", while POPCNT
has its own separate CPUID feature bit.POPCNT
and set the CPUID feature bit for POPCNT, so the distinction is theoretical only.POPCNT
but not ABM, such as Intel Nehalem and VIA Nano 3000.)LZCNT
instruction will execute as BSR
on systems that do not support the LZCNT or ABM extensions. BSR
computes the index of the highest set bit in the source operand, producing a different result from LZCNT
for most input values.TZCNT
instruction will execute as BSF
on systems that do not support the BMI1 extension. BSF
produces the same result as TZCNT
for all input operand values except zero – for which TZCNT
returns input operand size, but BSF
produces undefined behavior (leaves destination unmodified on most modern CPUs).BEXTR
, the start position and length are not masked and can take values from 0 to 255. If the selected bits extend beyond the end of the r/m
argument (which has the usual 32/64-bit operand size), then the excess bits are read out as 0.PEXT
and PDEP
instructions are quite slow[82] and exhibit data-dependent timing due to the use of a microcoded implementation (about 18 to 300 cycles, depending on the number of bits set in the mask argument). As a result, it is often faster to use other instruction sequences on these processors.[83][84]Intel CET (Control-Flow Enforcement Technology) adds two distinct features to help protect against security exploits such as return-oriented programming: a shadow stack (CET_SS), and indirect branch tracking (CET_IBT).
RDSSPD
and RDSSPQ
instructions act as NOPs on processors where shadow stacks are disabled or CET is not supported.ENDBR32
and ENDBR64
act as NOPs on processors that don't support CET_IBT or where IBT is disabled.The XSAVE instruction set extensions are designed to save/restore CPU extended state (typically for the purpose of context switching) in a manner that can be extended to cover new instruction set extensions without the OS context-switching code needing to understand the specifics of the new extensions. This is done by defining a series of state-components, each with a size and offset within a given save area, and each corresponding to a subset of the state needed for one CPU extension or another. The EAX=0Dh
CPUID leaf is used to provide information about which state-components the CPU supports and what their sizes/offsets are, so that the OS can reserve the proper amount of space and set the associated enable-bits.
XSAVE*
and XRSTOR*
instructions cannot be encoded with the REX2 prefix.XGETBV
with ECX=1 is permitted – this will not return XCR1
(no such register exists) but instead return XCR0
bitwise-ANDed with the current value of the "XINUSE" state-component bitmap (a bitmap of XSAVE state-components that are not known to be in their initial state).XGETBV
is indicated by CPUID.(EAX=0Dh,ECX=1):EAX[bit 2].PREFETCH*
instructions are hint instructions with effects only on performance, not program semantics. Providing an invalid address (e.g. address of an unmapped page or a non-canonical address) will cause the instruction to act as a NOP without any exceptions generated.SFENCE
, LFENCE
and MFENCE
instructions, the bottom 3 bits of the ModR/M byte are ignored, and any value of x in the range 0..7 will result in a valid instruction.SFENCE
instruction ensures that all memory stores after the SFENCE
instruction are made globally observable after all memory stores before the SFENCE
. This imposes ordering on stores that can otherwise be reordered, such as non-temporal stores and stores to WC (Write-Combining) memory regions.[89]SFENCE
also acts as a reordering barrier on cache flushes/writebacks performed with the CLFLUSH
, CLFLUSHOPT
and CLWB
instructions. (Older AMD CPUs require MFENCE
to order CLFLUSH
.)SFENCE
is not ordered with respect to LFENCE
, and an SFENCE+LFENCE
sequence is not sufficient to prevent a load from being reordered past a previous store.[90] To prevent such reordering, it is necessary to execute an MFENCE
, LOCK
or a serializing instruction.LFENCE
instruction ensures that all memory loads after the LFENCE
instruction are made globally observable after all memory loads before the LFENCE
.LFENCE
instruction provides a stronger ordering guarantee:[91] it is dispatch-serializing, meaning that instructions after the LFENCE
instruction are allowed to start executing only after all instructions before it have retired (which will ensure that all preceding loads but not necessarily stores have completed). The effect of dispatch-serialization is that LFENCE
also acts as a speculation barrier and a reordering barrier for accesses to non-memory resources such as performance counters (accessed through e.g. RDTSC
or RDPMC
) and x2apic MSRs.LFENCE
is not necessarily dispatch-serializing by default – however, on all AMD CPUs that support any form of non-dispatch-serializing LFENCE
, it can be made dispatch-serializing by setting bit 1 of MSR C001_1029
.[92]MFENCE
instruction ensures that all memory loads, stores and cacheline-flushes after the MFENCE
instruction are made globally observable after all memory loads, stores and cacheline-flushes before the MFENCE
.MFENCE
is not dispatch-serializing, and therefore cannot be used to enforce ordering on accesses to non-memory resources such as performance counters and x2apic MSRs. MFENCE
is still ordered with respect to LFENCE
, so if a memory barrier with dispatch serialization is needed, then it can be obtained by issuing an MFENCE
followed by an LFENCE
.[47]MFENCE
is serializing.PAUSE
instruction in 64-bit mode is, unlike NOP
, unaffected by the presence of the REX.R
prefix. Neither NOP
nor PAUSE
are affected by the other bits of the REX
prefix. A few examples of opcode 90
with various prefixes in 64-bit mode are:90
is NOP
41 90
is XCHG R8D,EAX
4E 90
is NOP
49 90
is XCHG R8,RAX
F3 90
is PAUSE
F3 41 90
is PAUSE
F3 4F 90
is PAUSE
PAUSE
instruction is implementation-dependent.PAUSE
will execute as NOP.PAUSE
many times in a short time interval may cause a #VMEXIT. The number of PAUSE
executions and interval length that can trigger #VMEXIT are platform-specific.CLFLUSH
instruction was introduced together with SSE2, it has its own CPUID flag and may be present on processors not otherwise implementing SSE2 and/or absent from processors that otherwise implement SSE2. (E.g. AMD Geode LX supports CLFLUSH
but not SSE2.)MONITOR
and MWAIT
instructions were introduced at the same time as SSE3, they have their own CPUID flag that needs to be checked separately from the SSE3 CPUID flag (e.g. Athlon 64 X2 and VIA C7 supported SSE3 but not MONITOR.)MONITOR
and MWAIT
instructions, older Intel documentation[93] lists instruction mnemonics with explicit operands (MONITOR EAX,ECX,EDX
and MWAIT EAX,ECX
), while newer documentation omits these operands. Assemblers/disassemblers may support one or both of these variants.[94] MONITOR
, the DS: segment can be overridden with a segment prefix.MONITOR
instruction. As such, the instruction requires ECX=0 and ignores EDX.MONITOR
and MWAIT
to run in Ring 3.MWAIT
may be ended by system events other than a memory write (e.g. cacheline evictions, interrupts) – the exact set of events that can cause the wait to end is implementation-specific.MONITOR
before using MWAIT
to wait for memory writes again.MWAIT
in the ECX register are:MWAIT
in the EAX register are:The C-states are processor-specific power states, which do not necessarily correspond 1:1 to ACPI C-states.
GETSEC
instruction, the REX.W
prefix enables 64-bit addresses for the EXITAC leaf function only - REX prefixes are otherwise permitted but ignored for the instruction.GETSEC
(selected by EAX) are:Any unsupported value in EAX causes an #UD exception.
GETSEC
, most leaf functions are restricted to Ring 0, but the CAPABILITIES (EAX=0) and PARAMETERS (EAX=6) leaf functions are available in Ring 3.RDTSCP
and RDPID
is actually the TSC_AUX
MSR (MSR C000_0103h
). Whether this value actually corresponds to a processor ID is a matter of operating system convention.RDTSC
instruction, RDTSCP
will delay the TSC read until all previous instructions have retired, guaranteeing ordering with respect to preceding memory loads (but not stores). RDTSCP
is not ordered with respect to subsequent instructions, though.RDTSCP
can be run outside Ring 0 only if CR4.TSD=0
.RDTSCP
was added in stepping F of the AMD K8, and is not available on earlier steppings.POPCNT
instruction was introduced at the same time as SSE4.2, it is not considered to be a part of SSE4.2, but instead a separate extension with its own CPUID flag.INVPCID
(selected by register argument) are:Any unsupported value in the register argument causes a #GP exception.
INVLPG
instruction, INVPCID
will cause a #GP exception if the provided memory address is non-canonical. This discrepancy has been known to cause security issues.[100]PREFETCH
and PREFETCHW
instructions are mandatory parts of the 3DNow! instruction set extension, but are also available as a standalone extension on systems that do not support 3DNow!PREFETCH
and PREFETCHW
(0F 0D /r
) execute as NOPs on Intel CPUs from Cedar Mill (65nm Pentium 4) onwards, with PREFETCHW
gaining prefetch functionality from Broadwell onwards.PREFETCH
(0F 0D /0
) instruction is a 3DNow! instruction, present on all processors with 3DNow! but not necessarily on processors with the PREFETCHW extension.0F 0D /0
as well as opcodes 0F 0D /2../7
are all documented to be performing prefetch.0F 0D /2
being PREFETCHWT1 m8
on Xeon Phi only) – third party testing[102] indicates that some or all of these opcodes may be performing prefetch on at least some Intel Core CPUs.RDTSCP
instruction which can also be used to read the processor ID, user-mode RDPID
is not disabled by CR4.TSD=1
.MOVDIR64
, the destination address given by ES:reg must be 64-byte aligned.67h
prefix.WBNOINVD
instruction will execute as WBINVD
if run on a system that doesn't support the WBNOINVD extension.WBINVD
differs from WBNOINVD
in that WBINVD
will invalidate all cache lines after writeback.HWNT
and HST
are listed in early Willamette documentation only[104] - later Intel documentation lists the branch hint prefixes without assigning them a mnemonic.[105]2E
and 3E
prefixes are interpreted as branch hints only when used with the Jcc
conditional branch instructions (opcodes 70..7F
and 0F 80..8F
) - when used with other opcodes, they may take other meanings (e.g. for instructions with memory operands outside 64-bit mode, they will work as segment-override prefixes CS:
and DS:
, respectively). On processors that don't support branch hints, these prefixes are accepted but ignored when used with Jcc
.ENCLS
(selected by EAX) are:Any unsupported value in EAX causes a #GP exception.
ENCLU
(selected by EAX) are:Any unsupported value in EAX causes a #GP exception.
The EENTER and ERESUME functions cannot be executed inside an SGX enclave – the other functions can only be executed inside an enclave.
ENCLU
can only be executed in ring 3, not rings 0/1/2.ENCLV
(selected by EAX) are:Any unsupported value in EAX causes a #GP exception.
The ENCLV
instruction is only present on systems that support the EPC Oversubscription Extensions to SGX ("OVERSUB").
ENCLV
is only available if Intel VMX operation is enabled with VMXON
, and will produce #UD otherwise.PTWRITE
, the write to the Processor Trace Packet will only happen if a set of enable-bits (the "TriggerEn", "ContextEn", "FilterEn" bits of the RTIT_STATUS
MSR and the "PTWEn" bit of the RTIT_CTL
MSR) are all set to 1.PTWRITE
instruction is indicated in the SDM to cause an #UD exception if the 66h instruction prefix is used, regardless of other prefixes.PCONFIG
(selected by EAX) are:Any unsupported value in EAX causes a #GP(0) exception.
CLDEMOTE
, the cache level that it will demote a cache line to is implementation-dependent.UMONITOR
, the operand size of the address argument is given by the address size, which may be overridden by the 67h
prefix. The default segment used is DS:, which can be overridden with a segment prefix.UMWAIT
and TPAUSE
instructions, the operating system can use the IA32_UMWAIT_CONTROL
MSR to limit the maximum amount of time that a single UMWAIT
/TPAUSE
invocation is permitted to wait. The UMWAIT
and TPAUSE
instructions will set RFLAGS.CF
to 1 if they reached the IA32_UMWAIT_CONTROL
-defined time limit and 0 otherwise.TPAUSE
and UMWAIT
can be run outside Ring 0 only if CR4.TSD=0
.UMWAIT
and TPAUSE
instructions, the following flag bits are supported:CPUID
and IRET
, these instructions perform additional functions, causing side-effects and reduced performance when stand-alone instruction serialization is needed. (CPUID
additionally has the issue that it causes a mandatory #VMEXIT when executed under virtualization, which causes a very large overhead.) The SERIALIZE
instruction performs serialization only, avoiding these added costs.HRESET
is provided by CPUID.(EAX=20h,ECX=0):EBX.SENDUIPI
is an index to pick an entry from the UITT (User-Interrupt Target Table, a table specified by the new UINTR_TT
and UINT_MISC
MSRs.)UIRET
instruction always sets UIF (User Interrupt Flag) to 1. On Sierra Forest and later processors, UIRET
will set UIF to the value of bit 1 of the value popped off the stack for RFLAGS - this functionality is indicated by CPUID.(EAX=7,ECX=1):EDX[17]
.CMPccXADD
instructions (opcode VEX.128.66.0F38 Ex /r
with the x nibble specifying the condition) are:CMPccXADD
instructions perform a locked memory operation, they do not require or accept the LOCK
(F0h
) prefix - attempting to use this prefix results in #UD.REX.R
prefix, e.g. 44 0F 20 07
(MOV RDI,CR8
). However, the REX.R
prefix is only available in 64-bit mode.F0
(LOCK
) prefix instead of REX.R
– this provides access to CR8 outside 64-bit mode.11b
.LOCK
prefix with the REX.R
prefix is not permitted and will cause an #UD exception.CLZERO
, the address size and 67h prefix control whether to use AX, EAX or RAX as address. The default segment DS: can be overridden by a segment-override prefix. The provided address does not need to be aligned – hardware will align it as necessary.CLZERO
instruction is intended for recovery from otherwise-fatal Machine Check errors. It is non-cacheable, cannot be used to allocate a cache line without a memory access, and should not be used for fast memory clears.[114]RDPRU
does not necessarily match that of RDMSR
/WRMSR
.RDPRU
as of December 2022 are:Unsupported values in ECX return 0.
CR4.TSD=1
, then the RDPRU
instruction can only run in ring 0.The x87 coprocessor, if present, provides support for floating-point arithmetic. The coprocessor provides eight data registers, each holding one 80-bit floating-point value (1 sign bit, 15 exponent bits, 64 mantissa bits) – these registers are organized as a stack, with the top-of-stack register referred to as "st" or "st(0)", and the other registers referred to as st(1),st(2),...st(7). It additionally provides a number of control and status registers, including "PC" (precision control, to control whether floating-point operations should be rounded to 24, 53 or 64 mantissa bits) and "RC" (rounding control, to pick rounding-mode: round-to-zero, round-to-positive-infinity, round-to-negative-infinity, round-to-nearest-even) and a 4-bit condition code register "CC", whose four bits are individually referred to as C0,C1,C2 and C3). Not all of the arithmetic instructions provided by x87 obey PC and RC.
WAIT
instruction is executed.FN
, there exists a pseudo-instruction that has the same mnemonic except without the N. These pseudo-instructions consist of a WAIT
instruction (opcode 9B
) followed by the corresponding non-waiting x87 instruction. For example:FNCLEX
is an instruction with the opcode DB E2
. The corresponding pseudo-instruction FCLEX
is then encoded as 9B DB E2
.FNSAVE ES:[BX+6]
is an instruction with the opcode 26 DD 77 06
. The corresponding pseudo-instruction FSAVE ES:[BX+6]
is then encoded as 9B 26 DD 77 06
FLDENV
, F(N)STENV
, FRSTOR
and F(N)SAVE
exist in 16-bit and 32-bit variants. The 16-bit variants will load/store a 14-byte floating-point environment data structure to/from memory – the 32-bit variants will load/store a 28-byte data structure instead. (F(N)SAVE
/FRSTOR
will additionally load/store an additional 80 bytes of FPU data register content after the FPU environment, for a total of 94 or 108 bytes). The choice between the 16-bit and 32-bit variants is based on the CS.D
bit and the presence of the 66h
instruction prefix. On 8087 and 80287, only the 16-bit variants are available.REX.W
under x86-64 will cause the 32-bit variants to be used. Since these can only load/store the bottom 32 bits of FIP and FDP, it is recommended to use FXSAVE64
/FXRSTOR64
instead if 64-bit operation is desired.F(N)DISI
and F(N)ENI
instructions to set/clear the Interrupt Mask bit (bit 7) of the x87 Control Word,[117] to control the interrupt.F(N)ENI
and F(N)DISI
instructions were kept for backwards compatibility, executing as NOPs that do not modify any x87 state.FST
/FSTP
with an 80-bit destination (m80 or st(i)) and an sNaN source value will produce exceptions on AMD but not Intel FPUs.FSTP ST(0)
is a commonly used idiom for popping a single register off the x87 register stack.FBSTP
and the load-constant instructions always use the round-to-nearest rounding mode. On the 80387 and later x87 FPUs, these instructions will use the rounding mode specified in the x87 RC register.FADDP
, FSUBP
, FSUBRP
, FMULP
, FDIVP
, FDIVRP
, FCOM
, FCOMP
and FXCH
instructions, x86 assemblers/disassemblers may recognize variants of the instructions with no arguments. Such variants are equivalent to variants using st(1) as their first argument.FXCH
is implemented as a register renaming rather than a true data move. This has no semantic effect, but enables zero-cycle-latency operation. It also allows the instruction to break data dependencies for the x87 top-of-stack value, improving attainable performance for code optimized for these processors.FBLD
instruction on non-BCD data is undefined.FPREM
and FPATAN
.[124]FXAM
instruction will set C0, C2 and C3 based on value type in st(0) as follows:C1 is set to the sign-bit of st(0), regardless of whether st(0) is Empty or not.
FXTRACT
, if st(0) is zero or ±∞, then M is set equal to st(0). If st(0) is zero, E is set to 0 on 8087/80287 but -∞ on 80387 and later. If st(0) is ±∞, then E is set to +∞.FPREM
, if the quotient Q is larger than , then the remainder calculation may have been done only partially – in this case, the FPREM
instruction will need to be run again in order to complete the remainder calculation. This is indicated by the instruction setting C2
to 1.C2
to 0 and set the three bits {C0,C3,C1}
to the bottom three bits of the quotient Q.FPREM
instruction is always exact with no roundoff errors.FSCALE
instruction on 8087 and 80287, st(1) is required to be in the range . Also, its absolute value must be either 0 or at least 1. If these requirements are not satisfied, the result is undefined.FSCALE
, rounding is only applied in the case of overflow, underflow or subnormal result.FYL2X
and FYL2XP1
instructions, the maximum error bound of ±1 ulp only holds for st(1)=1.0 – for other values of st(1), the error bound is increased to ±1.35 ulps.FPATAN
, the following adjustments are done as compared to just computing a one-argument arctangent of the ratio :FNOP
is a no-op in the sense that will leave the x87 FPU register stack unmodified, it may still modify FIP and CC, and it may fault if a pending x87 FPU exception is present.F(N)SAVE
, FRSTOR
, FLDENV
and F(N)STENV
instructions has different formats in Real Mode and Protected Mode. On 80287, the F(N)SETPM
instruction is required to communicate the real-to-protected mode transition to the FPU. On 80387 and later x87 FPUs, real↔protected mode transitions are communicated automatically to the FPU without the need for any dedicated instructions – therefore, on these FPUs, FNSETPM
executes as a NOP that does not modify any FPU state.FUCOM
and FUCOMP
instructions, x86 assemblers/disassemblers may recognize variants of the instructions with no arguments. Such variants are equivalent to variants using st(1) as their first argument.FPREM1
instruction differs from the older FPREM
(D9 F8
) instruction in that the quotient Q is rounded to integer with round-to-nearest-even rounding rather than the round-to-zero rounding used by FPREM
. Like FPREM
, FPREM1
always computes an exact result with no roundoff errors. Like FPREM
, it may also perform a partial computation if the quotient is too large, in which case it must be run again.FSIN
, FCOS
and FSINCOS
is not precisely 1.0, but instead given by[125][126]This argument reduction inaccuracy also affects the FPTAN
instruction.FCOMI
, FCOMIP
, FUCOMI
and FUCOMIP
instructions write their results to the ZF
, CF
and PF
bits of the EFLAGS
register. On Intel but not AMD processors, the SF
, AF
and OF
bits of EFLAGS
are also zeroed out by these instructions.FXSAVE
and FXRSTOR
instructions were added in the "Deschutes" revision of Pentium II, and are not present in earlier "Klamath" revision.FXSAVE
and FXRSTOR
instructions will save/restore SSE state only on processors that support SSE. Otherwise, they will only save/restore x87 and MMX state.FXSAVE
/FXRSTOR
has a completely different layout than the data structure of the older F(N)SAVE
/FRSTOR
instructions, enabling faster save/restore by avoiding misaligned loads and stores.CR0.EM=1
, FXSAVE(64)
and FXRSTOR(64)
are considered to be x87 instructions and will accordingly produce an #NM (device-not-available) exception. Other than WAIT
, these are the only opcodes outside the D8..DF
ESC opcode space that exhibit this behavior. (All opcodes in D8..DF
will produce #NM if CR0.EM=1
, even for undefined opcodes that would produce #UD otherwise.)F(N)SAVE
instruction, FXSAVE
will not initialize the FPU after saving its state to memory, but instead leave the x87 coprocessor state unmodified.x86 also includes discontinued instruction sets which are no longer supported by Intel and AMD, and undocumented instructions which execute but are not officially documented.
INT1
on page 86.The instruction brings down the upper word of the doubleword register without affecting its upper 16 bits.
internal (zero-)extending the value of a smaller (16-bit) register … applying the bswap to a 32-bit value "00 00 AH AL", … truncated to lower 16-bits, which are "00 00". … Bochs … bswap reg16 acts just like the bswap reg32 … QEMU … ignores the 66h prefix
CMPXCHG
with 0F A6/A7
encodings.CMPXCHG
with 0F B0/B1
encodings.CPUID
instruction)RDTSC
instruction on p.1739 describes the instruction sequences required to order the RDTSC
instruction with respect to earlier and later instructions.UD2A
and UD2B
instruction mnemomics to GNU Binutils. Archived on 25 Jul 2023.UD1
/UD2B
and added UD0
. Archived on 25 Jul 2023.0F 0B
and 0F B9
.UD0
and page 415 and 419 for UD1
.UD1
(with ModR/M byte) and UD0
(without ModR/M byte) on page 4-687.OIO
("Official invalid opcode") for the 0F FF
opcode.UD0
(with ModR/M byte) on page 4-683.0F FF
opcode without assigning it a mnemonic.MONITOR
and MWAIT
with explicit operands. Archived on 9 May 2005.MONITOR
/MWAIT
mnemonics. Archived on 6 Nov 2022.HWNT
/HST
mnemonics for the branch hint prefixes. Archived from the original on 5 Feb 2005.FSIN
, FCOS
, FSINCOS
and FPTAN
in volume 1, section 8.3.8