Using Vim to change character hexadecimal values beyond standard ASCII values
Last updated: Jan 23, 2023
I needed to insert the character with hexadecimal value 0xFF into a text file. Problem was, every time I did this using vim and xxd, it didn’t ‘stick’. Turned out I needed to open the file for editing using the ‘-b’ flag:
vim -b
Longer story and example.
vim test.txt
Text in the file:
123
Press
:%!xxd
This is what you will see, the line number on the left, the hexadecimal values of each character (two numbers per character) and on the right the text. The final ‘.’ indicates an end of line and is represented by the final 0x0a value.
00000000: 3132 330a 123.
Fair enough. ASCII ox31 is indeed 1, 0x32 is 2, 0x33 is 3. Now I change the ‘1’ to be character 0xFF using the text editor. I now have:
00000000: FF32 330a 123.
Convert back to text using:
:%!xxd -r
I see:
ÿ23
Success! Tea and cakes. The funky ÿ symbol surely means character 0xFF? But, but, but, when I reverted to hexadecimal using :%!xxd , the ‘FF’ had not stuck:
:%!xxd
00000000: c3bf 3233 0a ..23.
What is this crazy 0xc3bf now in my file instead of a lovely 0xff? Turns out that this will happen when I try to insert any hexadecimal value over the magic decimal 127 = 0x7F, which is the last value of the standard ASCII table.
The solution is, as already mentioned, to open the file in binary mode using:
vim -b test.txt
Editing the hexadecimal values again using
:%!xxd
Now I adjust the text to be:
00000000: ff32 330a .23.
Back to text view using
:%!xxd -r
This time I see:
23
The
Save the file and display it using ‘more test.txt’ or ’less test.txt’ or ‘cat test.txt’. We don’t judge which is your preferred method of displaying files. Now I get:
�23
The funky � symbol seems to mean ‘This is not a standard ASCII character’. I get the same symbol for different hexadecimal values over 0x7f.
Why am I doing this? I needed to test that a C program was correctly reading the end of file (EOF) character. For whatever reason, a common EOF character is -1 which in two’s complement wraps around to be 0xFF. I was directed to check that the C-code would not confuse character 0xFF with the EOF character.