[RCD] mime problem - excel file as text/plain

chasd chasd at silveroaks.com
Thu Mar 5 22:27:21 CET 2009


On Mar 5, 2009, at 4:57 AM, Balazs Horvath wrote:

> http://gaarai.com/2009/02/14/generating-mime-type-in-php-is-not-magic/

My summary :
" None of the hosting providers I use has the right fileinfo  
software, nor can I install it. "

> (Hey man, how do
> you know that? I couldn't find that info in any documentation!)

The man page for the " file " command has that documentation, I guess  
I am quite familiar with that command.
I'd used a PHP exec() to the file command for file type detection  
since PHP 3 ( which kinda dates me ).

I simply matched up the file command flags to the predefined  
constants for the fileinfo options
<http://www.php.net/manual/en/fileinfo.constants.php>

A few sample files and some sample PHP code produced the information  
I posted earlier.

I just tried .docx, .odt, and .ods -

Using options "1046" :
docx : application/xml compressed-encoding=application/zip
ods : text/plain charset=us-ascii compressed-encoding=application/ 
octet-stream
odt : text/plain charset=us-ascii compressed-encoding=application/ 
vnd.oasis.opendocument.text

Using options "38" :
docx : XML document text ( Zip archive data, at least v2.0 to extract)
ods : ASCII text, with no line terminators (OpenDocument Spreadsheet)
odt : ASCII text, with no line terminators (OpenDocument Text)

Those return strings seem to identify the file types fairly  
conclusively.
If you find the file type is a zip file using "normal" 1040 options,  
poke at it again with different options.

I find that opening the magic file with no options allows you to  
probe the file multiple times using different options, but you have  
to remember to specify the options at probe time instead of assuming  
the options you want have been globally specified.

Maybe because I got burned on file type issues in the past I am  
sensitive to it ( and was forced to learn about it in detail ).
Looking at the upstream fileinfo mail list, newer versions might be  
able to better determine Office 2007 file types.
<http://mx.gw.com/pipermail/file/2009/000311.html>
However my test of a Fedora 11 rpm rebuilt on F10 didn't show any  
improvement.
Another interesting thread -
<http://mx.gw.com/pipermail/file/2008/000283.html>

BTW :

OpenOffice.org uses a standard file format :

<http://www.idpf.org/ocf/ocf1.0/download/ocf10.htm>

This is the same format used by Adobe for Mars files.
If you explode the zip, there is a "mimetype" file at the root level  
with the mime-type inside.
The fileinfo library can see that in a odt but for some reason not a  
ods ( or a mars ).
Not sure if code to peek inside the zip for that mimetype file is  
worthwhile.

MS uses something similar to but not the same, <snarky>typical of MS</ 
snarky>

<http://en.wikipedia.org/wiki/Open_Packaging_Convention>

> If the user sends something bogus by playing with the extension,  
> who cares?

I think passing the security buck to some other part of the system  
isn't good practice. If you look at the OWASP site at all, the  
preferred way is to validate and test all input _and_ output.


-- 
Charles Dostale
System Admin - Silver Oaks Communications
http://www.silveroaks.com/
824 17th Street, Moline  IL  61265

_______________________________________________
List info: http://lists.roundcube.net/dev/



More information about the Dev mailing list