Text files

Characters

A character is a single letter, or number, or space, or symbol.

A text file contains nothing but a series of numbers, where each number represents a character. The number is the character code, which is a mapping (for example, 65 for A, 66 for B).

To decode the numbers in a text file, you must know the mappings between those numbers and the characters they represent. Today there is effectively one standard code for this, called Unicode. The first 126 characters in that code are very common and, historically, were defined first as the ASCII code.

Some characters have special “non-printable” meanings and represent something other than a shape you can see: these are called control characters.

Accessing character codes programmatically

Most general programming languages let you access a character’s underlying character code.

In Python

In Python, use the ord() function:

letter = "A"
ord(letter)   # => 65

In Java

In Java, the underlying value of the primitive type char is its character code. So casting it explicitly as an integer will give that code. For example:

char letter  = 'A';
int char_code = letter;   // => 65

In JavaScript

In JavaScript, use string’s charCodeAt() with the index of the character within the string (but for full Unicode support, use codePointAt() instead because charCodeAt() is only good for codes up to 65536):

let letters = "ABCZ";
letters.charCodeAt(0);   // A => 65
letters.charCodeAt(3);   // Z => 90