[RCD] mime problem - excel file as text/plain
chasd at silveroaks.com
Thu Mar 5 22:27:21 CET 2009
On Mar 5, 2009, at 4:57 AM, Balazs Horvath wrote:
My summary :
" None of the hosting providers I use has the right fileinfo
software, nor can I install it. "
> (Hey man, how do
> you know that? I couldn't find that info in any documentation!)
The man page for the " file " command has that documentation, I guess
I am quite familiar with that command.
I'd used a PHP exec() to the file command for file type detection
since PHP 3 ( which kinda dates me ).
I simply matched up the file command flags to the predefined
constants for the fileinfo options
A few sample files and some sample PHP code produced the information
I posted earlier.
I just tried .docx, .odt, and .ods -
Using options "1046" :
docx : application/xml compressed-encoding=application/zip
ods : text/plain charset=us-ascii compressed-encoding=application/
odt : text/plain charset=us-ascii compressed-encoding=application/
Using options "38" :
docx : XML document text ( Zip archive data, at least v2.0 to extract)
ods : ASCII text, with no line terminators (OpenDocument Spreadsheet)
odt : ASCII text, with no line terminators (OpenDocument Text)
Those return strings seem to identify the file types fairly
If you find the file type is a zip file using "normal" 1040 options,
poke at it again with different options.
I find that opening the magic file with no options allows you to
probe the file multiple times using different options, but you have
to remember to specify the options at probe time instead of assuming
the options you want have been globally specified.
Maybe because I got burned on file type issues in the past I am
sensitive to it ( and was forced to learn about it in detail ).
Looking at the upstream fileinfo mail list, newer versions might be
able to better determine Office 2007 file types.
However my test of a Fedora 11 rpm rebuilt on F10 didn't show any
Another interesting thread -
OpenOffice.org uses a standard file format :
This is the same format used by Adobe for Mars files.
If you explode the zip, there is a "mimetype" file at the root level
with the mime-type inside.
The fileinfo library can see that in a odt but for some reason not a
ods ( or a mars ).
Not sure if code to peek inside the zip for that mimetype file is
MS uses something similar to but not the same, <snarky>typical of MS</
> If the user sends something bogus by playing with the extension,
> who cares?
I think passing the security buck to some other part of the system
isn't good practice. If you look at the OWASP site at all, the
preferred way is to validate and test all input _and_ output.
System Admin - Silver Oaks Communications
824 17th Street, Moline IL 61265
List info: http://lists.roundcube.net/dev/
More information about the Dev