« May 2009 | Main | July 2009 »

June 28, 2009

Hidden Linux : File mysteries

Here's a little mystery for you. Imagine you have three files on a Windows machine called

HTML_File.htm
PDF_File.pdf
Text_File.txt


Windows will have no problem opening the appropriate program when you double-click them because of those three-letter extensions. If however you drop the extensions or mix them up, you'll have problems.

Now copy the three files to a Linux machine. You can see how the operating system perceives them by typing the file command, so file * will list them all ...

HTML_File.htm:   HTML document text
PDF_File.pdf:    PDF document, version 1.3
Text_File.txt:   ASCII text

Okay, lets try mixing up the extensions ...

HTML_File.txt:   HTML document text
PDF_File.htm:    PDF document, version 1.3
Text_File.pdf:   ASCII text

How about dropping them altogether?

HTML_File:       HTML document text
PDF_File:        PDF document, version 1.3
Text_File:       ASCII text

Still no difference. So how does the system know what's what?

Linux uses file to determine a file's type by the use of 'magic numbers' -- specific bytes stored in particular locations, typically near the beginning of the file.

Actually, file performs three checks. First it looks to see if the file is empty or is some sort of special file like a directory or a link. Then it checks for known magic numbers. If that fails it checks if the file is plain text, and if so what type -- ASCII, for example, or ISO-8859-x, non-ISO 8-bit extended-ASCII, or UTF-8-encoded Unicode, etc. If all those checks fail, the file is reported as being 'data'.

A simple file call can tell you quite a lot about a file's contents, such as in the following examples:

Backup.zip:         Zip archive data, at least v2.0 to extract
Bike Ride.mpg:      MPEG sequence, v2, program multiplex
Help.rtf:           Rich Text Format data, version 1, ANSI
myzip.tar.gz:       gzip compressed data, from Unix, last modified: Thu Jun 11 02:30:36 2009
Notes:              ASCII text
Pictures:           symbolic link to `/home/geoff/Pictures'
print.gif:          GIF image data, version 89a, 560 x 174
Shorts.avi:         RIFF (little-endian) data, AVI, 320 x 240, ~30 fps, video: XviD, audio: MPEG-1 Layer 3 (stereo, 22050 Hz)
Ski Jump.mov:       ISO Media, Apple QuickTime movie
Turino.wmv:         Microsoft ASF
Video.mov:          ISO Media, Apple QuickTime movie
Web Notes           UTF-8 Unicode English text, with very long lines
yuk!.exe:           MS-DOS executable PE  for MS Windows (GUI) Intel 80386 32-bit

Add the -s parameter and you can look at special files, such hard disk formats! (Note, you need admin priviliges for this, hence the 'sudo'.)

sudo file -s /dev/sda

/dev/sda: x86 boot sector; partition 2: ID=0x83, starthead 254, startsector 954646560, 21880530 sectors; partition 3: ID=0x83, starthead 254, startsector 566419770, 388226790 sectors; partition 4: ID=0x5, starthead 1, startsector 63, 566419707 sectors, code offset 0x4, Bytes/sector 1766, sectors/cluster 87, reserved sectors 36434, FATs 192, root entries 64763, sectors 191 (volumes <=32 MB) , Media descriptor 0x6, sectors/FAT 185, heads 165, hidden sectors 1568, sectors 3141645394 (volumes > 32 MB) , physical drive 0xaa, physical drive 0x2a, reserved 0x55, dos < 4.0 BootSector (0x31)


You can also look at individual partitions ...

sudo file -s /dev/sda{1,2,3,4,5}

/dev/sda1: Linux rev 1.0 ext3 filesystem data
/dev/sda2: x86 boot sector; partition 2: ID=0x5, starthead 254, startsector 29302560, 204941205 sectors, extended partition table
/dev/sda3: Linux rev 1.0 ext3 filesystem data (needs journal recovery) (large files)
/dev/sda4: ERROR: cannot open `/dev/sdb4' (No such file or directory)
/dev/sda5: Linux/i386 swap file (new style) 1 (4K pages) size 487973 pages

You'll find a list magic numbers in /usr/share/file/magic. You can add your own file types in /etc/magic (to make them system-wide) or $HOME/.magic locally. The format is described -- with no offence to feminists intended -- in man magic.



<--Previous Hidden Linux      Next Hidden Linux -->

June 22, 2009

The Open Data Catalogue

Whether you're looking for maritime charts, national broadband maps or details of market rentals in your area, the place to go is the Open Data Catalogue. The website -- "an attempt to collate the many different datasets available through the New Zealand Government Departments and Local Bodies" -- has only been up a month but already has a number of useful and interesting links to information that I didn't realise was available online.

The aims of the site are to:
  • List all of the datasets available to members of the public.
  • Provide a place for people to comment on the datasets (what’s good about them, bad, uses they have.)
  • Make it easy for people to find the information they are after and who they need to contact.
  • Provide a voice for the data using community, both professional and casual.
Each dataset carries a brief description of its content, availability, license and price (if any), the formats it can be downloaded in, and of course a link to the relevant site. You can see what data's available by government department and even suggest other datasets to add.


June 16, 2009

Hidden Linux : The perfect backup


rdiff-backup is a great little utility. At it's simplest it backs up one directory to another. The command

rdiff-backup dir1 dir2

will mirror the first directory onto the second, preserving everything -- subdirectories, hard links, permissions, ownership info, modification times and all extended attributes. Install it on both client and server and the command

rdiff-backup dir1 user@system::/dir2

will do so over a network.

Because backups are mirrors of the source, you can use regular tools like find and locate without having to wade through zipped archives. Accidentally deleted a source file? Simply drag and drop it from backup. What's more, if the backups are on a different drive (which they should be!) and your source drive crashes, you can simply mount the backup disk in its place and carry on!

But wait, there's more!

Once it's done the initial backup, future backups are simply done on diffs -- which is to say file differences. Not only does that make backups blindingly fast (unless you regularly change lots of files!), but it also records incremental changes.

To appreciate what that means, imagine you have cron job set up to perform an rdiff-backup every hour. At 3.00 in the afternoon you realise you've been on the wrong track for the last couple of hours and want to go back to the version of the document you were working after lunch. The command

rdiff-backup -r 2h /dir1/file /dir2/file

will do just that -- picking up the version it saved two (h)ours ago. (Other useful interval characters are s, m, h, D, W, M, or Y indicating seconds, minutes, hours, days, weeks, months, or years respectively.)

What it also means is that unless you use the --remove-older-than command at some point, you can effectively restore anything since you first started doing backups. Even files you deleted months or even years ago.

There's a whole lot more to rdiff-backup than that, including the ability to include and exclude files and file types, but you probably just want to get started and have a play. Because of its flexiblity and features, its man page looks a little daunting so try the rdiff-backup examples page instead.


<--Previous Hidden Linux      Next Hidden Linux -->




June 10, 2009

Corrupted files. Get 'em while they're hot!


Whether it's Word, Excel or Powerpoint, Corrupted-Files.com have the file for you! For just US$3.95 (reduced from US$5.95 till the end of June), you can buy a selection of corrupted files that includes 2, 5, 10, 20, 30 and 40 page documents.

Hang on. Why would anyone pay for a file containing scrambled and unrecoverable data? To buy time, of course!

How it Works!

Step 1: After purchasing a file, rename the file e.g. Mike_Final-Paper.

Step 2: Email the file to your professor along with your "here's my assignment" email.

Step 3: It will take your professor several hours if not days to notice your file is "unfortunately" corrupted. Use the time this website just bought you wisely and finish that paper!!!

Note: The only difference between each Word file is its file size, because it will look a bit odd if your 10 page term paper is only 1k in size! Yes, we thought of everything! We guarantee and stand by our product!

Don't you just love the internet?


June 7, 2009

Adventures in spam land


I won a quarter of a million pounds the other day. At least that's what the message on my mobile phone said:

From: 62455556
priz@sapo.pt

Congrats! Your Tel.No. won 250,000 GB pounds in SAMSUNG JUMBO.
To claim email: prize@sapo.pt call +447024062979

Several things struck me as odd though -- not least the fact that I'd won a contest without actually even entering.

First off the message has two slightly different email addresses, priz@... and prize@... That's confusing.

Second, they ask me to either email a Portugese address (the .pt top-level domain name) or phone a UK number (the 44 prefix). Odd.

Third, a visit to www.sapo.pt reveals the site belongs to Serviço de Apontadores Portugueses, a search engine subsidiary of the Portugal Telecom Group. Odder still.

So I sent them an email saying I'd received their message -- after a quick visit to mail.com where I set up an email account just for this purpose. A day later I received this reply:


Good to hear from you regarding the SAMSUNG MOBILE JUMBO.
In commemoration of our anniversary we rolled out over £3,000.000.00 (Three Million Great Britain Pounds) for our Anniversary Draws . All participants were selected through a computer ballot system drawn from mobile numbers from the 45 mobile networks from Australia, New Zealand, North America, South America, Europe, Asia and Africa as part of our International Promotions Program.

This promotion is a bid to increase our market share in the fast growing mobile telecommunication industry. Samsung has a teeming population of customers spread all over the world, and the promo is targeted at promoting our brand, and creating awareness amongst Samsung users and none users.

This promo is approved by the British Gaming Board and also licensed by the International Association of Gaming Regulators (IAGR). This promo is the first of its kind and we intend to sensitize the public. To begin the process of claiming your prize, Fill and send back the claims form below. The address is where your international Bank Draft will be delivered.

Your private data is secure and will not be accessed by any third party.

Section A

**Personal

Prefix (Mr., Mrs., Ms., Dr., etc.):

First name:

Middle name:

Last name:

Date of birth (yyyy-mm-dd):

Gender:

Male Female

Occupation

Address:


City /State/ Province :

Country:

Section B

**Winning Information

Winning mobile number

Network Name

Amount won (in figure)

£

Amount won (in words)


Section C

**Internet

Contact email address(s)

1.
2.

Internet connection

Home  Work  Public

 

The reply came from <info@samsungjumbo.co.uk> but samsungjumbo.co.uk don't have a proper website yet ...




You'd think they'd at least sort that out before giving away "over £3,000.000.00 (Three Million Great Britain Pounds)".

You'd also think they'd get their links right. Although the message carries a Samsung logo, if you click the "SAMSUNG MOBILE JUMBO" link you'll actually get sent to Nokia UK. Whoops!

Since this promotion has been "approved by the British Gaming Board and also licensed by the International Association of Gaming Regulators (IAGR)" I thought I'd check with them. It turns out the former the is frequently referenced "in unsolicited emails that appear to be fraudulent or bogus" and the latter warn, "We receive many inquiries referencing email notifications about prizes and lotteries which claim to be licensed by IAGR. IAGR is not affiliated with any company awarding any type of prize or money."

Nevertheless, I've replied to the email -- using my specially created fictious email account and ... ahem ... some slightly fictitious personal details. I'll let you know what happens ...

June 2, 2009

Open season for open source?


Late last week months of talks between Microsoft and our very own State Services Commision collapsed in a steaming heap. The talks, aimed at extending the cosy three-year contract whereby the government licensed Microsoft software, ended with the SSC saying, "We didn't feel we got the appropriate levels of benefit from the negotiations."

Sam Varghesse, writing for ITWire, put it bluntly in a piece entitled "Kiwis give Microsoft the finger":

One of the money milking machines in Microsoft's stable has just gone dry. It's a little teat but significant in that it shows the way for other, bigger teats to be pulled out of the Redmond suction pump.

New Zealand's government-wide deal to purchase Microsoft products has fallen apart. The fact that the country has won awards for its work towards adopting open standards and open source means that it has the resources to look at other options for its software.

These deals have always been "commercial-in-confidence" arrangements. Given the absence of actual competitors, I've always assumed they had to be kept secret lest other Microsoft consumers find out how much they were really being gouged. (Schools apparently get MS Office for nix. Corporate largesse? Or so they can churn out loads of MS-demanding clones who know nothing of the glories of free software?)

Some in the open source world are lauding the announcement. Personally, I'm sceptical. I tend to side with Rob O'Neill, writing in ComputerWorld:

It's hard to escape the conclusion that Microsoft got just what it wanted out of the government's failed three-year software licence negotiations, G2009 ...

Microsoft will now negotiate individually with each agency. It will get to talk a lot more with government CIOs and rebuild the direct relationships that were so strong in the 1990s, but that in the era of all-of-government buying were threatened.

Divide and conqueror.

Expect a lot more Microsoft schmoozing and CIO wining and dining. The "free lunch" is about the only freedom Microsoft readily embraces.

Subscribe
Newsletter & SubscriptionsPC World is New Zealand’s top selling computing and technology magazine.

It provides up-to-the-minute editorial, insight and buying advice for personal computing, cell phones, game consoles, digital entertainment and broadband.
SIGN UP
PCWorldUpdate
PC World's weekly round-up of tech news, gear and game reviews, software selections, and handy How Tos.