Text files

Newlines

One of the most significant control characters is the newline. It’s used to indicate the line ending in the file. On Unix systems, this is character 10 (“line feed”, or LF).

A text file is really simply long sequence of characters. It might be a very long sequence, but nonetheless it’s just a sequence of one character followed by another.

For example, a text file containing different fruit on each line might look like this. Notice how there is a ↵ character (that you don’t normally see) at the end of every line:

a97
p112
p112
l108
e101
s115
b98
a97
n110
a97
n110
a97
a97
 32
c99
h104
e101
r114
r113
y121
d100
u117
r114
i105
a97
n110
m109
a97
n110
g103
o111
 

Click or hover on the characters to see their underlying codes.

Only when you display it does the computer start a new line after each newline character. So really the content of the file is more like this:

a97
p112
p112
l108
e101
s115
b98
a97
n110
a97
n110
a97
a97
 32
c99
h104
e101
r114
r113
y121
d100
u117
r114
i105
a97
n110
m109
a97
n110
g103
o111
 

This shows how the line ending is effectively a separator. It serves to separate items in the file. If the file consists of data records, this means the line ending is operating as a record separator.

This is why the newline at the very end of the file is not always necessary — unlike the others, it’s not separating anything.

A blank line is what you get when two line endings are adjacent to each other.

Different line endings

Different operating systems use different characters to mark the end of a line. Two characters are especially relevant. They are sometimes abbreviated as LF and CR:

ASCII code Abbreviation Meaning Operating system
10 LF Line feed Unix
13 & 10 CR LF Carriage return & line feed Windows

Early Apple computers just used CR but you’re unlikely to see that today (MacOS uses Unix line endings now).

Here’s the same fruit file, but with Windows line endings:

a97
p112
p112
l108
e101
s115
b98
a97
n110
a97
n110
a97
a97
 32
c99
h104
e101
r114
r113
y121
d100
u117
r114
i105
a97
n110
m109
a97
n110
g103
o111
 

You don’t see the codes

When you look at the contents of the text file, if there are newlines in it, you’ll simply see the text starting on a new line.

However, this can catch you out if you have problems with newlines. Depending on where the file came from (for example, was it made on Windows or Unix?) — this display might go wrong if the file is using the “other” system’s line endings. On Unix systems, the extra character from Windows will sometimes appear as ^M. On Windows systems, the file might display as one long line.

It’s important to know this because sometimes it will explain why you’re having problems with a file or program that otherwise is working fine.

Changing line endings

If you need to change the line endings (that is, switch between LF and CR+LF) you can usually find an option (maybe when you choose “Save as…”) to force the line endings to be the ones you want. By default, your editor will be using the correct line endings for your operating system.

Some systems — such as git (version control) — can be configured to change line endings automatically to stop them being mixed in the same project.

Having both types of line endings in the same file is always wrong. This can happen if different programmers are editing the same file on different systems.

In programs

This can be a problem when you write programs: you probably want your Python program to work with both kinds of file (maybe when you’re reading or writing a data file). In general, you can use \n to indicate “newline” and the program will do the right thing depending on what operating system it’s running on. That is, \n can be a little “magic” and represent both kinds of line ending.

A little history

On paper printers (“line printers”) line feed advances the paper up one line. Carriage return returns the carriage the print head is riding on to the beginning of the line.

So the combined action of line feed and carriage return describe the control characters for starting a newline.

In fact the situation is a little more complicated because historically there are other characters which can indicate line (or record) endings too. Unicode does support these even though they are uncommon. They include vertical tab (VT), line feed (LF), and form feed (FF). Today you’re unlikely to encounter them unless you’re working with old files.