On Mar 5, 2009, at 4:57 AM, Balazs Horvath wrote:
http://gaarai.com/2009/02/14/generating-mime-type-in-php-is-not-magic/
My summary :
" None of the hosting providers I use has the right fileinfo
software, nor can I install it. "
(Hey man, how do you know that? I couldn't find that info in any documentation!)
The man page for the " file " command has that documentation, I guess
I am quite familiar with that command.
I'd used a PHP exec() to the file command for file type detection
since PHP 3 ( which kinda dates me ).
I simply matched up the file command flags to the predefined
constants for the fileinfo options
http://www.php.net/manual/en/fileinfo.constants.php
A few sample files and some sample PHP code produced the information
I posted earlier.
I just tried .docx, .odt, and .ods -
Using options "1046" : docx : application/xml compressed-encoding=application/zip ods : text/plain charset=us-ascii compressed-encoding=application/ octet-stream odt : text/plain charset=us-ascii compressed-encoding=application/ vnd.oasis.opendocument.text
Using options "38" : docx : XML document text ( Zip archive data, at least v2.0 to extract) ods : ASCII text, with no line terminators (OpenDocument Spreadsheet) odt : ASCII text, with no line terminators (OpenDocument Text)
Those return strings seem to identify the file types fairly
conclusively.
If you find the file type is a zip file using "normal" 1040 options,
poke at it again with different options.
I find that opening the magic file with no options allows you to
probe the file multiple times using different options, but you have
to remember to specify the options at probe time instead of assuming
the options you want have been globally specified.
Maybe because I got burned on file type issues in the past I am
sensitive to it ( and was forced to learn about it in detail ).
Looking at the upstream fileinfo mail list, newer versions might be
able to better determine Office 2007 file types.
http://mx.gw.com/pipermail/file/2009/000311.html
However my test of a Fedora 11 rpm rebuilt on F10 didn't show any
improvement.
Another interesting thread -
http://mx.gw.com/pipermail/file/2008/000283.html
BTW :
OpenOffice.org uses a standard file format :
http://www.idpf.org/ocf/ocf1.0/download/ocf10.htm
This is the same format used by Adobe for Mars files.
If you explode the zip, there is a "mimetype" file at the root level
with the mime-type inside.
The fileinfo library can see that in a odt but for some reason not a
ods ( or a mars ).
Not sure if code to peek inside the zip for that mimetype file is
worthwhile.
MS uses something similar to but not the same, <snarky>typical of MS</ snarky>
http://en.wikipedia.org/wiki/Open_Packaging_Convention
If the user sends something bogus by playing with the extension,
who cares?
I think passing the security buck to some other part of the system
isn't good practice. If you look at the OWASP site at all, the
preferred way is to validate and test all input _and_ output.