EnHacklopedia » Basics » Number and Data Encoding Formats

Number and Data Encoding Formats

Creative Commons License

All files (HTML, CSS, images) included in EnHacklopedia are licensed under the Creative Commons Attribution-ShareAlike 3.0 License. All authors contributing to EnHacklopedia should be made aware of the license before contributing. If the author does not agree to the licensing, his or her contributions will not be accepted into the project.

History

  • 08-26-2007
    • Added abridged ASCII table. (dlong)
  • 04-10-2007
    • Added information on modulus
    • Fixed footer. (Viper187)
  • 04-09-2007
    • Spellchecked a bit. (Viper187)
    • Made endian sections subsections of Byte Order. (Viper187)
    • Updated the footer and cleaned up source. (Viper187)
  • 04-06-2007
    • Added a "TO DO" for ASCII table. (Parasyte)
    • Changed <abbr> to <acronym> where appropriate. (Parasyte)
    • Made section IDs lower case. (Parasyte)
    • Fixed header. (Parasyte)
  • 04-05-2007
    • Initial Release. (dlong)

Modulus

The modulus instruction (represented by % in C-style languages and MOD in other places) is a fundamental operation that is not used very often outside of computer science and advance math. All it does is returned the remainder of a division. For example, 3 % 2 = 1, while 3 % 3 = 0. Modulus is often used to determine divisibility, as if one number modulus a second is zero, then the first number is evenly divisible by the second. The modulus is also used in hashing algorithms.

Decimal

The decimal method of representing numbers is the most common method used in mathematics. 10 digits (0 - 9) exist each of which occupies one decimal place. Decimal is also known as base-10, due to the fact that is both consists of 10 different digits and because each decimal place has its value based on a power of ten. In a multiple digit number, each successive digit to the left has an increased value by a factor of ten, while each successive digit to the right has a decreased value by a factor of ten. For instance, the number 873 is equivalent to 8 * 102 + 7 * 101 + 3 * 100. This is an important concept as all other base-n representations of numbers work in a similar fashion. Decimal numbers sometimes have a trailing "d" on them to make the base being used explicit. In this document, if there are no other base markers (trailing "b" or preceding "0x"), then the number is in base-10.

Binary

At the lowest level, a computer is just a series of switches with two settings - high voltage and low voltage. By convention, the number 1 represents the high (or on) setting and 0 represents the low (or off) setting. Just as decimal places are based on powers of ten, binary (or base-2) places are based on powers of two. For instance, the number 110 is equal to 1 * 102 + 1 * 101 + 0 * 100 in decimal and 1 * 22 + 1 * 21 + 0 * 20 in binary. While the binary representation does allow one to easily perform bitwise operations, it has a drawback. The length of the representation becomes massive quite quickly. A three digit decimal number requires a minimum of seven binary digits and a four digit decimal number requires a minimum of ten binary digits. By the number one million, twenty binary digits are required. Binary numbers are represented by a trailing b: 110 is a decimal number; 110b is binary. 1 binary digit is referred to as a "bit".

Hexadecimal

Hexadecimal (also called base-16 or hex) is used as a substitute for binary. Hexadecimal has many of the advantages that binary has, but removes binary's largest disadvantage: the massive size. Each hexadecimal digit takes the value of one of 16 different values, 0-15. Values 10 and up are represented as the letters A through F. Because each hexadecimal place is based on a power of 16 and because 16 is equal to 24, hexadecimal numbers are four times shorter than their binary equivalents. Hexadecimal numbers are represented by a preceding "0x" or "$" or a succeeding "h"; For instance, 10d is also 0xA, $A, or Ah. In this document, all hexadecimal numbers are represented with the preceding "0x". 1 hexadecimal digit is known as a nybble or half-byte, while 2 digits is called a byte (8 bits). Larger units are left somewhat ambiguous. A "word" is used to referred to the fundamental data type of a microprocessor. However, this can be 16, 32, or 64 bits, as well as some rather odd sizes on archaic processors, such as 39 bits. For the purposes of this document, a word is 32 bits with a halfword (or short) being 16 and a doubleword (or dword) being 64. A quadword (or qword) will be 128 bits.

Conversions

Converting between bases is not that difficult a task, but can require a bit of practice. The easiest conversions to do are being base-16 and base-2. For a base-16 number, all you have to do is convert each digit into its corresponding binary form and keep the order intact. The representations are as follows:
0x0 - 0000b
0x1 - 0001b
0x2 - 0010b
0x3 - 0011b
0x4 - 0100b
0x5 - 0101b
0x6 - 0110b
0x7 - 0111b
0x8 - 1000b
0x9 - 1001b
0xA - 1010b
0xB - 1011b
0xC - 1100b
0xD - 1101b
0xE - 1110b
0xF - 1111b

For instance, to convert 0xDEA2 to binary, you would expand "D" to "1101", "E" to "1110", "A" to "1010", and "2" to "0010", giving you 1101111010100010b.

Converting binary to hexadecimal is similar. Each four bit number is converted into its hexadecimal version. While one can start working from the right when converting hexadecimal to binary, it's important to start on the left when doing the opposite, as starting from the right will give a wrong result unless the binary number has a number of bits that is a multiple of four. For this example, the number 1111010010b will be used. The lowest four bits are 0010b which corresponds to 0x2. Then, 1101b which is 0xD. Finally 0011b, which is 0x3. The base-16 representation is 0x3D2.

Converting binary and hexadecimal into decimal is easy as well, but a much different task. It is good practice to know the first few powers of 2 (1, 2, 4, 8, 16) and 16 (1, 16, 256, 4096, 65536). The so-called one's place in the number has a value of 1 (20 or 160) and as one moves to the left, the value of the place increases by a factor of 2 or 16 respectively. The convert these values to decimal, multiply the value of the digit in a specific binary or hexadecimal place by that place's value. Do the same for each digit, then add all the products up.

Sample binary to decimal conversion: 110101011101b = 1 * 20 + 0 * 21 + 1 * 22 + 1 * 23 + 1 * 24 + 0 * 25 + 1 * 26 + 0 * 27 + 1 * 28 + 0 * 29 + 1 * 211 + 1 * 212 = 1 + 4 + 8 + 16 + 64 + 256 + 1024 + 2048 = 3,421.

Sample hexadecimal to decimal conversion: 0xBEEF = 0xF * 160 + 0xE * 161 + 0xE * 162 + 0xB * 163 = 15 * 1 + 14 * 16 + 14 * 256 + 11 * 4,096 = 15 + 224 + 3,584 + 45,056 = 48,879.

Converting numbers from decimal to binary or hexadecimal is slightly more problematic. The best way to do this is to find the largest power of 2 (or 16) that is not greater than the decimal number. Divide the number by that power and the result is the highest digit in the converted number. Divide the power by 2 (or 16) and repeat the process with that power and the remainder of the division.

Sample decimal to binary conversion: 132. 128 is 27. 132 / 128 gives 1 with a remainder of 4. Dividing 128 by 2 gives 64. 4 / 64 is 0 with a remainder of 4. Dividing 64 by 2 gives 32. 4 / 32 is 0 with a remainder of 4. Dividing 32 by 2 gives 16. 4 / 16 is 0 with a remainder of 4. Dividing 16 by 2 gives 8. 4 / 8 is 0 with a remainder of 4. Dividing 8 by 2 gives 4. 4 / 4 is 1 with a remainder of 0. At this point we can stop the math. However, we still need to 0-fill the remaining bits. The 2's-place and 1's place still haven't been taken care of, which means two more zeros need to be appended. Our final result is 10000100b.

Sample decimal to hexadecimal conversion: 327. 256 is 162 327 / 256 is 1 with a remainder of 71. 256 / 16 is 16. 71 / 16 is 4 with a remainder of 7. 16 / 16 is 1. 7 / 1 is 7. The converted number is 0x147.

AND

One of the fundamental logical and bitwise operations is known as the AND. AND takes in two inputs and produces one output. At its most fundamental level true is returned if both inputs are true and false otherwise. Bitwise AND (represented by &) ANDs each individual bit of a number together and the corresponding bit of the output is the result of ANDing the two input bits. For instance, take 11011111b & 10101101b. Bit n of the output is equal to Bit n of input 1 & Bit n of input 2. Bit 7 of the output will be 1 &s 1, or 1. Bit 6 will be 1 & 0 or 0. Bit 5 will be 0 & 1 or 0. Bit 4 will be 1 & 0 or 0. Bit 3 will be 1 & 1 or 1. Bit 2 will be 1 & 1 or 1. Bit 1 will be 1 & 0 or 0. And bit 0 will be 1 & 1 or 1. The final result is 10001101.

OR

OR (represented by |) is another fundamental bitwise operation. On the one-bit level, OR returns 1 if either input is 1 and 0 if both inputs are 0. For multiple bit numbers, the same bit-by-bit process used to AND numbers and be used to OR them. Example: 10011001b | 11001010b = 11011011b.

XOR

XOR (represented by ^) is a slightly more advanced bitwise operation. It is similar to OR and takes two inputs. However on the one-bit level, the output is 1 when either input is 1 but not both. If both inputs are 0 or 1, the output is 0. Example: 10011001b ^ 11001010b = 01010011b.

NOT

NOT (represented by ~) is a fundamental bitwise operation. However, unlike AND, OR, and XOR, it only takes in one input. NOT inverts a bit. That is, if the input is 0, the output is 1, and if the input is 1, the output is 0. When performing a NOT, it is important to keep in mind implicit zeros. ~1101b will not be 10b. It will be 11110010b, 1111111111110010b, or 11111111111111111111111111110010b depending on whether the data size is 8-, 16-, or 32-bits.

Shifts

A shift does what it implies: it shifts data to the left or to the right; that is, each bit is moved one place over per shift. There are left shifts and right shifts and two varieties of each.
The logical shift (represented by << for left and >> for right) shifts every bit to the left or the right a specified amount. Vacant bit positions are filled with zeros while the high bits (for left shift; low for right) are discarded. For example: 01001101b << 2 = 00110100b.

The arithmetic shift is similar to the logical shift. In fact, the left arithmetic shift is identical to the left logical shift. The difference is what occurs in the right shift. An arithmetic right shift performs the corresponding right logical shift first. Then, the MSB is replaced with what it originally was prior to the shift. Because there is no difference between left logical and arithmetic shifts, << is used to represent both. The right arithmetic shift is represented by >>>. Example: 10111000 >>> 3 = 10010111.

Rotate

A rotate is identical to a shift except that instead of the edge bit being thrown away, it is moved to the other end of the number. On a left rotate, the MSB becomes the LSB and on a right rotate the LSB becomes the MSB.

Two's Complement

A major problem in the early years of computer science was how to represent negative numbers. After a lot of other formats were tried, the method eventually settled upon was Two's Complement. To turn a negative number into its two's complement representation, you invert the binary representation of that number's positive representation and then add 1. For instance, (assuming an 8-bit data size), -12 would be calculated as follows:
12 = 00001100
~12 = 11110011
-12 = ~12 + 1 = 11110100
Determining what a particular binary representation means is not difficult. If the highest bit is 0, the number is positive and can be calculated as normal. If the highest is 1, then the number is negative. To determine the negative number's magnitude, subtract one and then invert the number. Then convert to decimal like normal. The only exception is when the sign bit is set and every other bit is not. To determine the value of this representation, set the LSB and determine what this number is. Then subtract one.
Example: 11111110b
The sign bit is set, so the number is negative. Subtracting gives 11111101b. Inverting gives 10b, which is 2. The number is -2.
Example: 00000110b
The sign bit is not set, so the number is positive. Conversion can performed normal. The number is 6.
Example: 10000000b
The sign bit is set and all other bits are not. Adding one gives 10000001b. This number represents -127. Subtracting one from that gives -128.

Zero- and Sign-Extensions

Zero- and Sign-Extensions are the two different methods used to extend a value to a larger size, for example an 8-bit value to 16-bits. Zero-extension simply involves adding the appropriate number of zeros to the beginning of the number. For instance, zero-extending 10011111b to 16-bits would result in 0000000010011111b. Sign-extension is slightly different. If the MSB is zero, then the conversion is the same as in zero-extension. However, if the MSB is one, then the appropriate number of ones are added to the beginning of the number instead of zeros. Using the same example, sign-extending 10011111b to 16-bits would result in 1111111110011111b. Sign-extension is used because it allows two's complement numbers to keep their value when extended, whereas zero-extending the number would not.

IEEE-754

IEEE-754 is the nearly-universal standard used for representing floating point values. Nearly any computer that contains an FPU will be using this format, and this includes consoles. The small floating point size is 32 bits; floating point values of this size are called single-precision floating point numbers (keyword float in C-like languages). Of these 32 bits, 1 is set aside for the sign, 8 bits are for the exponent, and 23 bits are for the mantissa. When the sign bit is 1, the number is negative; when it is 0, it is positive. The exponent bits actually store the exponent with 127 added to it, so when the exponent bits are 00000000b, the exponent is -127. The mantissa stores the normalized version of the number. This means that the decimal value is stored without the decimal point; the exponent keeps track of where the decimal point belongs. Normalized numbers have a MSB of one in the mantissa; there are also denormalized numbers, which have a MSB of zero in the mantissa. When using denormalized numbers, the exponent is stored with 126 added to it instead of 127.

Example: -118.625
The number is negative so the sign bit is set.
118.625 in binary is 1110110.101b.
Normalized, the number is 1.110110101 * 26.
The exponent is 6. Adding 127, the exponent is 133 or 10000101b.
The floating point value is 1100001011110110101000000000000b.

Double-precision floating point values also exist (keyword double in C-like languages). The size is doubled to 64-bits. The exponent is expanded to 11 bits with a bias of 1,023, and the mantissa size is more than double to 52 bits. Finally, the quadruple-precision floating point values (keyword long double in C-like languages) expands the size to 128 bits, with a 112-bit mantissa and a 15-bit exponent (with a 16,383 bias).

IEEE-754 also allows for signed infinities and NaNs. An infinity is represented by the maximum possible exponent with a mantissa of zero, while an NaN is represented by the maximum possible exponent with a non-zero mantissa.

Float Convert is a tool that allows one to convert between decimal floating point numbers and IEEE-754 formatted floating point numbers. However, it only works with single-precision values.

Some basic C code can be used to convert between decimal and IEEE-754 double-precision values. This code is portable and is guaranteed to work on any platform. To convert a double to hex:
    #include <stdio.h>

    int main()
    {
        double val = 1.0; //Substitute your value here.
        printf("%016I64X\n", *(long long *)&val);
        return 0;
    }

Converting from hex to double is easy as well:
    #include <stdio.h>

    int main()
    {
        long long val = 0x3FF0000000000000LL; //Substitute your value here.
        printf("%G", *(double *)&val);
        return 0;
    }

Binary Coded Decimal (BCD)

BCD is an archaic method of representing decimal values. It was used on system without an FPU, such as the SNES. Each nybble in a BCD represents a specific decimal place; because of this, only values 0-9 are legal. The size of a BCD is arbitrary, as is the location where the decimal point exists.

Example: 0x9837. Assuming that the decimal point exists in-between the bytes, this halfword represents the BCD value of 98.37.

Byte Order

Big Endian

Big endian is a method of storing data. The "big end" of a word is stored first. For instance, the word 0x0A0B0C0D stored at 0x08000000 would have 0x0A stored at 0x08000000, 0x0B at 0x08000001, 0x0C at 0x08000002, and 0x0D at 0x08000003.

Little Endian

Little endian is the most common method of storing data. The "little end" of a word is stored first. For instance, the word 0x0A0B0C0D stored at 0x08000000 would have 0x0D stored at 0x08000000, 0x0C stored at 0x08000001, 0x0B stored at 0x08000002, and 0x0A stored at 0x08000003. Many systems, including Nintendo's handhelds, store data this way.

ASCII

ASCII is the most commonly used format for encoding characters. Originally a 7-bit encoding, it later expanded to 8-bits to double the number of characters. In ASCII, the letters 'A' through 'Z' are represented by 0x41 through 0x5A and 'a' through 'z' are 0x61 through 0x7A. '0' through '9' are 0x30 through 0x39, and a space is 0x20. While many games encode their text into ASCII, a lot of games use a modified ASCII, where the letters, while still together, start at a different point. For instance, the capital letters may begin a 0x00 instead of their standard ASCII location. A relative search can help search for text in games that use this. A relative searcher searches based on differences between bytes instead of looking for a specific series of bytes. For instance, if one were to search for "ABD", the relative searcher would search for three consecutive bytes where the difference between the first and second was one and the difference between the second and third was two. SearchR3 is a program that automates this process. Below is an abridged ASCII table, which contains some of the more common characters used. Full ASCII tables are available on other websites, such as asciitable.com.

Character Hex Dec
NULL 0 0x00
LF (Newline) 10 0x0A
Space 32 0x20
! 33 0x21
" 34 0x22
# 35 0x23
$ 36 0x24
'(' - ')' 40 - 41 0x28 - 0x29
. 46 0x2B
'0' - '9' 48 - 57 0x30 - 0x39
'A' - 'Z' 65 - 90 0x41 - 0x5A
'a' - 'z' 97 - 122 0x61 - 0x7A