Recently I had to write some code as a sample to be able to display Word and Acrobat files on a Web page in a 2-page view. We couldn’t simply use a plugin or application on the user’s machine to do so. The solution was to export each page out to an image and then display the pages in any way we needed. We therefore needed to do some document export code from our application.
Word to Images
Converting Word files (.DOC & .DOCX) to images was fairly simple, although I do think I probably took a longer approach. The problem is that the older DOC format and the newer DOCX format have different APIs to work with. So instead of doing this, I simply exported them both to XPS and then used the XPS API to retrieve images for each page. This last part was not obvious to me till I found the one line of code required to do this on a forum – I apologize for not linking to this as I can’t find that link anymore. The credit for that part is wholly the original author’s.
PDF to Images
Converting PDF to images was a major issue. There is no direct way of doing this. There are many 3rd party components that are available to do this, but most of them cost a bomb. Some free ones like PDFSharp are able to iterate pages but there is no way to export a complete page to an image without walking through the entire structure of the page and redrawing everything.
This is where I found the GFLAX library. This requires GhostScript for Windows to be installed on the machine as well. You can register the DLL and then reference it in your .NET code.
I’m attaching the entire code sample to this post as a download. The code is released with the open source BSD license. All external components (Word and Office interop assemblies, GFL, etc.) have the copyright of their owners and must be adhered to.
Once you download the attachment and extract it, open it in Visual Studio 2008. Make sure you’ve installed GhostScript from the link above and run a “regsvr32 GFLAX.dll” for the GFLAX Library. Add reference to Microsoft.Office.Interop.Word on your machine (and remove the marked lines from Web.config) from the .NET tab and to GFLAX from the COM tab.
Run the application and upload .doc, .docx and .pdf files and you can then view them in the browser directly.
Two page display of an uploaded Word file. Works with PDF files too.