Data – Chapter 3 | Painless Programming

All data inside a computer is ultimately stored and processed as binary (base 2) — sequences of 0s and 1s. Students must understand how numbers, characters, images and sound are represented in binary, how file sizes are calculated, and how data can be protected through encryption.

3.1

Binary

Binary Representation of All Data (3.1.1)

Every piece of data stored and processed by a computer is represented in binary (base 2) using only the digits 0 and 1, called bits. This includes numbers, text characters, images, sound, and even program instructions themselves. The reason computers use binary is that electronic circuits can reliably represent two states: on (1) and off (0).

An 8-bit binary number (one byte) — the position values are powers of 2:

1128

064

132

116

The value above is 128+32+16+4+1 = 181 in denary.

Representing Integers (3.1.2)

Unsigned integers — represent positive numbers only. An 8-bit unsigned integer can store values 0 to 255 (2⁸ − 1).
Sign and magnitude — leftmost bit = sign (0=positive,1=negative). Two representations of zero. Range for 8-bit: −127 to +127.
Two’s complement — MSB has negative weight. To negate: invert bits then add 1. Single zero. Range for 8-bit: −128 to +127.

Binary ↔ Denary Conversion (3.1.3)

Must convert between binary and denary for values 0–255 (8 bits).

Binary to denary: add place values of positions with 1.

E.g. 1010 0110 = 128+32+4+2 = 166

Denary to binary: repeatedly divide by 2, record remainders bottom-up.

E.g. 200 = 128+64+8 = 1100 1000

Binary Arithmetic (3.1.4)

Binary addition: 0+0=0, 0+1=1, 1+0=1, 1+1=10 (0 carry 1), 1+1+1=11.

Logical shift left: moves bits left, fills with 0s → multiplies by 2ⁿ.

Logical shift right: moves bits right, fills with 0s → divides by 2ⁿ.

Overflow: result too large for available bits — carry bit lost.

Hexadecimal (3.1.5)

Hexadecimal (base 16) uses digits 0–9 and letters A–F (A=10, B=11, C=12, D=13, E=14, F=15). Each hex digit = 4 binary bits (a nibble). Hex is compact and human-readable.

Binary to hex: split into groups of 4 from right, convert each group. E.g. 1010 1100 → A C = AC₁₆
Hex to binary: replace each hex digit with its 4-bit binary equivalent. E.g. 3F → 0011 1111
Uses: colour codes (#FF5733), memory addresses, MAC addresses, IPv6.

3.2

Data Representation

Character Encoding — ASCII and Unicode (3.2.1)

ASCII — 7-bit, 128 characters (English letters, digits, punctuation). Extended ASCII uses 8 bits for 256 characters.
Unicode — represents characters from all world writing systems. UTF-8 uses 1–4 bytes per character. Backwards compatible with ASCII.
Why Unicode? ASCII cannot represent Chinese, Arabic, Hindi, emoji, or thousands of other characters.
File size: Unicode files are larger than ASCII files.

Bitmap Images (3.2.2)

Pixel — smallest element. Each pixel stores a colour as binary.
Resolution — width × height in pixels. Higher resolution = larger file.
Colour depth — bits per pixel. 1-bit = 2 colours, 8-bit = 256 colours, 24-bit = 16.7M colours (true colour).
File size (bits) = width × height × colour depth
E.g. 100×100 image, 24-bit depth = 100×100×24 = 240,000 bits = 30,000 bytes

Sound Representation (3.2.3)

Sound is analogue — must be sampled for digital storage. Measurements of amplitude are taken at regular intervals and stored as binary values.

Sampling frequency (Hz) — samples per second. CD quality = 44,100 Hz.
Bit depth — bits per sample. CD quality = 16 bits.
File size (bits) = frequency × bit depth × duration × channels

Limitations of Binary Representation (3.2.4)

Low sample rate → missing high frequencies, aliasing.
Low bit depth → quantisation noise, distortion.
Low colour depth → banding, loss of gradient detail.
Trade-off: higher quality = larger file size.

3.3

Data Storage and Compression

Data Units (3.3.1)

Unit	Binary (IEC)	Exact value
Bit	–	0 or 1
Nibble	–	4 bits
Byte	–	8 bits
Kibibyte (KiB)	2¹⁰	1,024 bytes
Mebibyte (MiB)	2²⁰	1,048,576 bytes
Gibibyte (GiB)	2³⁰	~1.07 billion bytes

Unit	Decimal (SI)	Exact value
Kilobyte (kB)	10³	1,000 bytes
Megabyte (MB)	10⁶	1,000,000 bytes
Gigabyte (GB)	10⁹	1,000,000,000 bytes
Terabyte (TB)	10¹²	1,000,000,000,000 bytes

Compression (3.3.2)

Lossless — original perfectly reconstructed. Used for text, code, ZIP. Algorithms: RLE, Huffman.
Lossy — removes imperceptible data. Original cannot be exactly restored. Used for images (JPEG), audio (MP3), video.

Run-Length Encoding (RLE) (3.3.3)

RLE is a lossless algorithm. Replaces consecutive runs of the same value with (count, value).

Original: A A A A A B B C C C C = 11 chars
RLE: 5A 2B 4C = 6 chars (45% smaller)

Most effective when data has long repeated runs (simple graphics, fax). Ineffective for random data.

3.3.4

Calculating file sizes — Image size (bits) = width × height × colour depth. Sound size (bits) = sample rate × bit depth × duration × channels. Divide by 8 for bytes, then by 1024 for KiB.

3.4

Encryption

Why Encrypt Data? (3.4.1)

Encryption converts plaintext to unreadable ciphertext using an algorithm and a key. Only those with the correct key can decrypt.

Protects data in transit (HTTPS, online banking, email)
Protects data at rest (encrypted drives, password databases)
Encryption does NOT prevent theft — it makes stolen data useless

3.4.2 — Encryption Algorithms

Caesar Cipher

Each letter shifted by fixed number (key). Shift 3: A→D, B→E. Easy to break by trying all 25 shifts (brute force).

Vigenère Cipher

Polyalphabetic — uses keyword repeated to match message length. Each letter shifted by corresponding keyword letter. Much harder to crack than Caesar.

Pigpen Cipher

Substitution cipher replacing letters with symbols from 2×2 grids and X-shapes. Visual — no letter shifting.

Rail Fence Cipher

Transposition cipher — rearranges letters without changing them. Message written in zigzag across “rails” then read row by row. Key = number of rails.

Substitution vs Transposition Ciphers

Substitution ciphers — replace characters with different characters. Letters stay in same positions but are replaced. Examples: Caesar, Vigenère, Pigpen.

Transposition ciphers — rearrange characters without changing them. Same letters, different order. Example: Rail Fence.

Modern encryption (AES, RSA) combines multiple rounds of both substitution and transposition.