Floating Point

Floating Point Representation

Normalized Values

Numerical Form:

(1)sM2E(-1)^s*M*2^E

  • S is a sign bit, it determines whether number is negative or positive.
  • Significand M normally a fractional value in range [1.0,2.0).
  • E weigths value by power of 2.

image-20240413161231483

1
2
3
4
Single precision: 32 bits
s:1 exp:8-bits frac:23-bits
Double precision: 64 bits
s:1 exp:11-bits frac:52-bits

E = Exp - Bias

  • Exp: unsigned value of exp field.
  • Bias = 2k112^{k-1}-1, k is number of exp’s bits
    • Single precision:271=1272^7-1=127
    • Double precision:21012^{10}-1=1023

M=1.xxxxxxx21.xxxxxxx_2,there first 1 is not stored.

Example

982fb48a9427e224a783fb7b9500e6e3

image-20240413163753467

Denormalized Values

Condition:exp=000…0

E=1-Bias(instead of E = 0-Bias)

Significand coded with implied leading 0: M = 0.xxxxxx20.xxxxxx_2

it’s usually represents closest to 0.0

exp = 0, frac = 0, Represents zero value.

Special Values

Condition:exp=111…1

Case:exp=111…1, frac = 0, infinity,E.g.1.0/0.0=-1.0/-0.0=positive infinity,1.0/-0.0=negative infinity

Case:exp=111…1,frac!=0, NaN

Round-To-Even

When exactly halfway between two possible values,Round so that least significant digit is even

image-20240413190420585 image-20240413190549173

Multiplication

(1)s1M12E1(1)s2M22E2(-1)^{s_1}*M_1*2^{E_1}*(-1)^{s_2}*M_2*2^{E_2}

Exact Result:(1)sM2E(-1)^s*M*2^E

  • Sign s : s1s_1^s2s_2
  • Significand M : M1M2M_1*M_2
  • Exponent E : E1+E2E_1+E_2

if M>=2,shift M right, increment E

if E out of range, oveflow

Round M to fit frac precision

Addition

(1)s1M12E1+(1)s2M22E2,E1>E2(-1)^{s_1}*M_1*2^{E_1}+(-1)^{s_2}*M_2*2^{E_2},E_1>E_2

Exact Result:(1)sM2E(-1)^sM2^{E}

  • E : E1E_1
  • M : line up the binary points and add M1,M2M_1,M_2

float point addition is not commutative compliance law

1
(3.14+1e10)-1e10=0,	3.14+(1e10-1e10)=3.14

(update date:2024/4/13)

Machine-Level Programming I: Basics

Assembly/Machine Code View

image-20240414153424136

PC:Program counter

  • Address of next instruction
  • Called “RIP”(x86-64)

Register file

  • Heavily used program data

Condition codes

  • Store status information about most recent arithmetic or logical operation
  • used for conditional branching

Memory

  • Byte addressable array
  • Code and user data
  • Stack to support procedures

Assembly Characteristics

DataType

integer data of 1, 2, 4, 8bytes

  • Data values
  • Addresses(untyped pointers)

Floating point data of 4, 8, or 10 bytes

Code:Byte sequences encoding series of instructions

Arrays or structures: just contiguously allocated bytes in memory

Operations

image-20240414155550057

Example

C : store value t where designated by dest

1
*dest = t;

Assembly : Move 8 byte to memory.

1
2
3
4
moveq %rax, (%rbx)
t:Register %rax
dest:Register %rbx
*dest:Memory M[%rbx]

Object Code: 3-byte instruction stored at address 0x40059e

1
0x40059e: 48 89 03

Disassembly

We can use gdb to get assembly.

1
gdb ./bomb

image-20240414162556158

also, We can get same bytes on a address.like this, I can get string by address.(/s is ask to gdb use which way to get bytes)

image-20240414162745383

X86-64 Interger Registers

image-20240414162928535

Can reference low-order 4 bytes (also low-order 1 & 2 bytes)

Moving Data

movq Source, Dest;

image-20240414163846893

Cannot do memory-memory transfer with a single instruction.

example

image-20240414164248626

Most General Form

D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+D]

D: Constant “displacement” 1, 2 or 4 bytes

Rb: Base register: Any of 16 integer registers

Ri: index register: Any, except for %rsp

S: Scale: 1,2,4, or 8

Address Computation Instruction

leaq Src, Dst

  • Src is address mode expression
  • Set Dst to address denoted by expression
1
2
3
4
5
long m12(long x) {
return x * 12;
}
leaq (%rdi, %rdi, 2), %rax // rax = rdi + 2*rdi = 3*rdi
salq $2, %rax // rax = rax<<2 = rax*4 = 12*rdi

Some Arithmetic Operations

image-20240414170745693

(update date: 2024/4/14)