File Format Identification, Characterization, Validation and Transformation
- AbiWord - free, open source word processing program, with considerable support for reading and converting between various office file formats
http://www.abisource.com/
http://www.abisource.com/download/plugins.phtml
- activePDF- several products to deal with PDF creation and conversion, including PrimoPDF, Server, DotConverter, WebGrabber, Printer and Portofolio
http://www.activepdf.com/
- Antiword – “converts the binary files from Word 2, 6, 7, 97, 2000, 2002 and 2003 to plain text and to PostScript” or PDF
[-s switch allows for extraction of hidden data (comments)]
http://www.winfield.demon.nl/
- AONS (Automated Obsolescence Notification System), Australian National University and National Library of Austrlia.
http://www.apsr.edu.au/publications/aons_report.pdf (see also AONS II - http://pilot.apsr.edu.au/wiki/index.php/AONS_II)
- Apache POI - Java API To Access Microsoft Format Files
http://poi.apache.org/
- Audio/Video to WAV Converter
http://www.archive.org/details/tucows_369301_Audio_Video_To_Wav_Converter
- catdoc – “reads one or more Microsoft word files and outputs text” (comes along with xls2csv to export Excel spreadsheets into comma-separated values, and catppt to extract text from Powerpoint files)
http://vitus.wagner.pp.ru/software/catdoc/
- CDS Convert (CERN Document Server Software Consortium) - suite of tools that allow conversion of documents, presentations and images between different software formats
http://cdsware.cern.ch/convert/index.html
- Chilkat - various file parsing and file transformation tools, with a strong emphasis on email
http://www.chilkatsoft.com/
- compare (ImageMagick command) - "mathematically and visually annotate the difference between an image and its reconstruction"
http://www.imagemagick.org/script/compare.php
- Conversion and Recommendation of Digital Object Formats (CRiB) (University of Minho, Department of Information Systems) - "Service Oriented Architecture (SOA) designed to assist cultural heritage institutions in the implementation of migration-based preservation interventions"
http://crib.dsi.uminho.pt/ (see also the demonstration version, the Migration Workbench - http://digitarq.di.uminho.pt/MigrationWorkbench/)
- convert (ImageMagick command) - converts between image formats
http://www.imagemagick.org/script/convert.php
- DANS (Data Archiving and Networked Services) DBF - "Java library for reading and writing...dBase and its dialects"
http://dans-dbf-lib.sourceforge.net/
- dBpowerAMP Music Converter (Illustrate)
http://www.dbpoweramp.com/dmc.htm
- Digital Scholar's Workbench - web application that converts suitably structured word processing documents into DocBook XML and then into XHTML for onscreen viewing and into PDF for printing
http://www.apsr.edu.au/Open_Repositories_2006/barnes_yeadon.ppt
- DROID (Digital Record Object Identification), National Archives (UK) - automated batch identification of file formats
http://www.nationalarchives.gov.uk/aboutapps/pronom/droid.htm
- Easy CD-DA Extractor (Poikosoft) – conversion between audio file formats
http://www.poikosoft.com/
- Electronic Document Conversion - National Library of Medicine
http://docmorph.nlm.nih.gov/docmorph/ - includes DocMorph, http://docmorph.nlm.nih.gov/docmorph/docmorph.htm and MyMorph, http://docmorph.nlm.nih.gov/docmorph/mymorph.htm
- Emailchemy (Weird Kid Software) – “email conversion, email migration and management of email archives”
http://www.weirdkid.com/products/emailchemy/
- FFmpeg. http://ffmpeg.mplayerhq.hu/download.html - converts flash videos to MPEG
- FilAlyzer – “allows a basic analysis of files (showing file properties and file contents in hex dump form) and is able to interpret common file contents like resources structures (like text, graphics, HTML, media and PE)”
http://www.safer-networking.org/en/filealyzer/
- File (Unix command)
- Fileformat.info - numerous online tools and links to tools
http://www.fileformat.info/
- Fine Free file command
http://www.darwinsys.com/file/
- Gnumeric ssconvert
- conversion between spreadsheet formats
http://linuxcommand.org/man_pages/ssconvert1.html
- Hashkeeper (National Drug Intelligence Center, U.S. Department of Justice) - "quickly eliminates known operating system files and focuses on electronic files created by the user/subject of the investigation" [normally available only to law enforcement, military and other government agencies, but others can reportedly obtain a copy by filing Freedom of Information Act (FOIA) requests]
http://www.usdoj.gov/ndic/domex/hashkeeper.htm
- identify (ImageMagick command)
- identifies formats of and characterizes image files
http://www.imagemagick.org/script/convert.php
- Jakarta POI - Java API for accessing and manipulating Microsoft format files
http://jakarta.apache.org/poi/
- JHOVE - JSTOR/Harvard Object Validation Environment - "format-specific identification, validation, and characterization of digital objects"
http://hul.harvard.edu/jhove/index.html
- KOffice - free, open source office suite, which provides filters for converting between various formats
http://www.koffice.org/filters/
- LibMagic – "library for the file utility that can classify files according to magic number tests"
http://packages.debian.org/unstable/libdevel/libmagic-dev
ftp://ftp.astron.com/pub/file/
- libsharedmime - "reads the Shared Mime Info database and returns you the MIME-TYPE of a file"
http://www.memecode.com/libsharedmime.php
- LuraDocument PDF Compressor Desktop - claims to convert TIFF, JPEG, BMP or PNM to PDF/A http://www.luratech.com/products/luradocument/pdf/compressor/index.jsp?OnlineShopId=26F2242CD8E5AB172C2B99F08E32B5C8
- Microsoft Office Compatibility Pack for Word, Excel, and PowerPoint 2007 File Formats http://www.microsoft.com/downloads/details.aspx?familyid=941B3470-3AE9-4AEE-8F43-C6BB74CD1466
- OpenDocument Fellowship - Conversion Software
http://opendocumentfellowship.org/applications#convert
- Open Office - supports batch conversions
http://www.openoffice.org/
- Open Video Converter
- "video conversion, splitting and editing"
http://www.archive.org/details/tucows_371200_Open_Video_Converter
- OpenXML Translator (ODF Add-in for Word)
http://sourceforge.net/projects/odf-converter/
- Outside In (Oracle) – “suite of software development kits (SDKs) that provides developers with a comprehensive solution to access, transform and control the contents of nearly 500 unstructured file formats”
http://www.oracle.com/technology/products/content-management/oit/oit_all.html
- Reference Data Set, National Software Reference Library (NIST) – collection of millions of hash values for “known, traceable software applications”
http://www.nsrl.nist.gov/
- TrID File Identifier - utility designed to identify file types from their binary signatures
http://mark0.net/soft-trid-e.html
- TubeSock (Stinkbot) - for downloading and converting videos from Flash (e.g. from YouTube)
http://www.stinkbot.com/Tubesock/index.html
- Typed Object Model (TOM)
http://tom.library.upenn.edu/
- W3C Markup Validation Service - World Wide Web Consortium
http://validator.w3.org/
- WMDecode (Biblet Computer Services) - "for extracting files from winmail.dat mail messages (files named winmail.dat or ATT00001.dat)," which are associated with email stored in Microsoft Outlook
http://www.biblet.freeserve.co.uk/
- wvWare – various tools for extracting content from Microsoft Word files, most features now incorporated into AbiWord
http://www.abisource.com/projects/
- xlrd - Python library for extracting information from Microsoft Excel spreadsheets
http://www.lexicon.net/sjmachin/xlrd.htm