Hidden Linux : File mysteries
Here's a little mystery for
you. Imagine you have three files on a Windows machine calledHTML_File.htm
PDF_File.pdf
Text_File.txt
Windows will have no problem opening the appropriate program when you double-click them because of those three-letter extensions. If however you drop the extensions or mix them up, you'll have problems.
Now copy the three files to a Linux machine. You can see how the operating system perceives them by typing the file command, so file * will list them all ...
HTML_File.htm:
HTML document text
PDF_File.pdf: PDF document, version 1.3
Text_File.txt: ASCII text
PDF_File.pdf: PDF document, version 1.3
Text_File.txt: ASCII text
Okay, lets try mixing up the extensions ...
HTML_File.txt:
HTML document text
PDF_File.htm: PDF document, version 1.3
Text_File.pdf: ASCII text
PDF_File.htm: PDF document, version 1.3
Text_File.pdf: ASCII text
How about dropping them altogether?
HTML_File:
HTML document text
PDF_File: PDF document, version 1.3
Text_File: ASCII text
PDF_File: PDF document, version 1.3
Text_File: ASCII text
Still no difference. So how does the system know what's what?
Linux uses file to determine a file's type by the use of 'magic numbers' -- specific bytes stored in particular locations, typically near the beginning of the file.
Actually, file performs three checks. First it looks to see if the file is empty or is some sort of special file like a directory or a link. Then it checks for known magic numbers. If that fails it checks if the file is plain text, and if so what type -- ASCII, for example, or ISO-8859-x, non-ISO 8-bit extended-ASCII, or UTF-8-encoded Unicode, etc. If all those checks fail, the file is reported as being 'data'.
A simple file call can tell you quite a lot about a file's contents, such as in the following examples:
Backup.zip:
Zip archive data, at least
v2.0 to extract
Bike Ride.mpg: MPEG sequence, v2, program multiplex
Help.rtf: Rich Text Format data, version 1, ANSI
myzip.tar.gz: gzip compressed data, from Unix, last modified: Thu Jun 11 02:30:36 2009
Notes: ASCII text
Pictures: symbolic link to `/home/geoff/Pictures'
print.gif: GIF image data, version 89a, 560 x 174
Shorts.avi: RIFF (little-endian) data, AVI, 320 x 240, ~30 fps, video: XviD, audio: MPEG-1 Layer 3 (stereo, 22050 Hz)
Ski Jump.mov: ISO Media, Apple QuickTime movie
Turino.wmv: Microsoft ASF
Video.mov: ISO Media, Apple QuickTime movie
Web Notes UTF-8 Unicode English text, with very long lines
yuk!.exe: MS-DOS executable PE for MS Windows (GUI) Intel 80386 32-bit
Bike Ride.mpg: MPEG sequence, v2, program multiplex
Help.rtf: Rich Text Format data, version 1, ANSI
myzip.tar.gz: gzip compressed data, from Unix, last modified: Thu Jun 11 02:30:36 2009
Notes: ASCII text
Pictures: symbolic link to `/home/geoff/Pictures'
print.gif: GIF image data, version 89a, 560 x 174
Shorts.avi: RIFF (little-endian) data, AVI, 320 x 240, ~30 fps, video: XviD, audio: MPEG-1 Layer 3 (stereo, 22050 Hz)
Ski Jump.mov: ISO Media, Apple QuickTime movie
Turino.wmv: Microsoft ASF
Video.mov: ISO Media, Apple QuickTime movie
Web Notes UTF-8 Unicode English text, with very long lines
yuk!.exe: MS-DOS executable PE for MS Windows (GUI) Intel 80386 32-bit
Add the -s parameter and you can look at special files, such hard disk formats! (Note, you need admin priviliges for this, hence the 'sudo'.)
sudo file -s /dev/sda
/dev/sda:
x86 boot
sector; partition 2: ID=0x83, starthead 254, startsector 954646560,
21880530 sectors; partition 3: ID=0x83, starthead 254, startsector
566419770, 388226790 sectors; partition 4: ID=0x5, starthead 1,
startsector 63, 566419707 sectors, code offset 0x4, Bytes/sector 1766,
sectors/cluster 87, reserved sectors 36434, FATs 192, root entries
64763, sectors 191 (volumes <=32 MB) , Media descriptor 0x6,
sectors/FAT 185, heads 165, hidden sectors 1568, sectors 3141645394
(volumes > 32 MB) , physical drive 0xaa, physical drive 0x2a,
reserved 0x55, dos < 4.0 BootSector (0x31)
You can also look at individual partitions ...
sudo file -s /dev/sda{1,2,3,4,5}
/dev/sda1: Linux rev
1.0 ext3 filesystem data
/dev/sda2: x86 boot sector; partition 2: ID=0x5, starthead 254, startsector 29302560, 204941205 sectors, extended partition table
/dev/sda3: Linux rev 1.0 ext3 filesystem data (needs journal recovery) (large files)
/dev/sda4: ERROR: cannot open `/dev/sdb4' (No such file or directory)
/dev/sda5: Linux/i386 swap file (new style) 1 (4K pages) size 487973 pages
/dev/sda2: x86 boot sector; partition 2: ID=0x5, starthead 254, startsector 29302560, 204941205 sectors, extended partition table
/dev/sda3: Linux rev 1.0 ext3 filesystem data (needs journal recovery) (large files)
/dev/sda4: ERROR: cannot open `/dev/sdb4' (No such file or directory)
/dev/sda5: Linux/i386 swap file (new style) 1 (4K pages) size 487973 pages
You'll find a list magic numbers in /usr/share/file/magic. You can add your own file types in /etc/magic (to make them system-wide) or $HOME/.magic locally. The format is described -- with no offence to feminists intended -- in man magic.
<--Previous Hidden Linux Next Hidden Linux -->

PC World is New Zealand’s top selling computing and technology magazine.
Comments
Just one correction:
"Linux uses file to determine a file's type..."
It doesn't use "file", file managers use "libmagic", same as "file" command. Otherwise, far too much child processes would be spawned.
Posted by: Guest | June 29, 2009 7:27 AM