Results 1 to 7 of 7
I know there is a file command but having read the man page I don't think there is a simple way for the computer to determine what file type it ...
- 01-02-2012 #1Just Joined!
- Join Date
- Dec 2010
- Posts
- 2
Checking file types, not extensions
I know there is a file command but having read the man page I don't think there is a simple way for the computer to determine what file type it is since it outputs the result in a sentence meant for a human to read.
The reason I'm not looking to check for file extensions is because they can be changed. Perhaps I'm being too paranoid?
EDIT: Forgot to say it's in bash.
- 01-02-2012 #2
It depends on what you are trying to do. If you are trying to tell exactly what programming language something is in, this is obviously fairly complicated, and it can't be guaranteed that file would ever know. However, if you're looking to tell if it's a certain type of binary file, there are options.
In the general case, "file" does have some advice:
If you know that the file is binary, and you want to know exactly what type of file it is, you can look at the magic number. This is usually the first few bytes of a binary file. For example, PDF files always start with the ASCII bytes for "%PDF".Code:The type printed will usually contain one of the words text (the file contains only print‐ ing characters and a few common control characters and is probably safe to read on an ASCII terminal), executable (the file contains the result of compiling a program in a form understandable to some UNIX kernel or another), or data meaning anything else (data is usually “binary” or non-printable).
If you only care about specific file types, you could keep track of the magic numbers that are important to you. If you are writing a general file type detector, well, that's really what "file" is for, right?DISTRO=Arch
Registered Linux User #388732
- 01-03-2012 #3Just Joined!
- Join Date
- Dec 2010
- Posts
- 2
- 01-04-2012 #4
So there's when you get into some interestingness.
tar files do not have a magic number. As per tar(5) ("man 5 tar"), the tar file format is simply a series of headers (the first 100 bytes of which are the file name) followed by the file contents.
For bzip2 and gzip, there are magic numbers, but you cannot tell if they are .tar.bz2 files specifically, or just any bzip2 file. For bzip2, the magic numbers are that the first three bytes will be "BZh" (as per Wikipedia), and for gzip, the magic numbers are that the first two bytes are 0x1f8b (as per RFC 1952).
I don't know much about zip, but I suspect it has a magic number as well.DISTRO=Arch
Registered Linux User #388732
- 01-04-2012 #5
Sometime back,while dealing with file magic signature,I found this link useful File Signatures
- Lakshmipathi.G
-------------------
FOSS India Award winning ext3fs Undelete tool and tutorials www.giis.co.in
First they criticize you,Then they laugh at you,Then they fight with you,Then you win. - M.K.Gandhi
-------------------
- 01-04-2012 #6Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,117
Hi.
Using the link that Lakshmipathi provided, there is a signature in a tar file for some versions of tar:
producing:Code:#!/usr/bin/env bash # @(#) s1 Demonstrate ustar, tar, POSIX, GNU signature. # Utility functions: print-as-echo, print-line-with-visual-space, debug. # export PATH="/usr/local/bin:/usr/bin:/bin" pe() { for _i;do printf "%s" "$_i";done; printf "\n"; } pl() { pe;pe "-----" ;pe "$*"; } db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; } db() { : ; } C=$HOME/bin/context && [ -f $C ] && $C tar FILE=${1-data1} # Create tar file. pl " Characteristics of test files:" pe x > f1 tar cf f2 f1 ls -lgG f1 f2 pl " Identification by command \"file\":" file f1 f2 pl " Results:" od -cx f2 | sed -n '/^0000400/,$p' | head -n 2 | cut -c1-36 exit 0
It's not at the beginning of the file, but rather embedded [257 byte offset].Code:% ./s1 Environment: LC_ALL = C, LANG = C (Versions displayed with local utility "version") OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64 Distribution : Debian GNU/Linux 5.0.8 (lenny) GNU bash 3.2.39 tar (GNU tar) 1.20 ----- Characteristics of test files: -rw-r----- 1 2 Jan 4 05:54 f1 -rw-r----- 1 10240 Jan 4 05:54 f2 ----- Identification by command "file": f1: ASCII text f2: POSIX tar archive (GNU) ----- Results: 0000400 \0 u s t a r 7500 7473 7261 2020 6400 6e6
Best wishes ... cheers, drlWelcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )
- 01-04-2012 #7
I don't know much about the magic.mgc file that the file command uses, but you could potentially parse that to get the information directly. I wouldn't be surprised if there were a library out there to make this easier.
DISTRO=Arch
Registered Linux User #388732


Reply With Quote
