Convert a PDF document to a DOCX document in C#

Blog category: PDFOffice.NET

April 12, 2024

PDF document is a document that includes a complete description of fixed layout of document elements on a plane, including text, fonts, graphics and other information necessary to display the document. The advantage of PDF document is the fact that it can always look the same, regardless of the device. Another advantage of PDF document is the fact that the content of each page is stored separately and, for example, you can render and view the last page of 1000-page PDF document without the need to render all other pages of this document. The disadvantage of PDF document is the difficulty of editing its content.

DOCX document is a Microsoft Word Open XML format document that contains text, images, graphics and more. The advantage of DOCX document is simple and intuitive editing of its content. The disadvantage of a DOCX document is that it requires to layout the document content to break it into pages. In other words, for a 1000-page DOCX document, you require to render all pages of document, even if you only need to look at the last page.

Based on the advantages and disadvantages described above, it turns out that a PDF file is convenient for viewing and storing a document, and a DOCX file is convenient for creating and editing a document.

VintaSoft Imaging .NET SDK allows to edit the content of PDF document and you can read about that here.

Also VintaSoft Imaging .NET SDK allows to convert a PDF document to a DOCX document for further editing of DOCX document in an appropriate text editor program, i.e. MicrosoftOffice Word or OpenOffice Writer.

VintaSoft Imaging .NET SDK allows as well to convert a DOCX document back to a PDF document.

Here is C# code, which allows to convert a PDF document to a DOCX document:
/// <summary>
/// Converts PDF document to a DOCX document.
/// </summary>
public static void ConvertPdfToDocx(string pdfFileName, string docxFileName)
{
    // create an image collection
    using (Vintasoft.Imaging.ImageCollection imageCollection = new Vintasoft.Imaging.ImageCollection())
    {
        // add PDF document to the image collection
        imageCollection.Add(pdfFileName);

        // save images of image collection (PDF pages) to a DOCX file
        imageCollection.SaveSync(docxFileName);

        // dispose images
        imageCollection.ClearAndDisposeItems();
    }
}

Here is C# code, which allows to convert a DOCX document to a PDF document:
/// <summary>
/// Converts DOCX document to a PDF document.
/// </summary>
public static void ConvertDocxToPdf(string docxFileName, string pdfFileName)
{
    // create an image collection
    using (Vintasoft.Imaging.ImageCollection imageCollection = new Vintasoft.Imaging.ImageCollection())
    {
        // add DOCX document to the image collection
        imageCollection.Add(docxFileName);

        // create PdfEncoder
        using (Vintasoft.Imaging.Codecs.Encoders.PdfEncoder pdfEncoder = 
            new Vintasoft.Imaging.Codecs.Encoders.PdfEncoder(true))
        {
            // set compression for image resources in PDF document
            pdfEncoder.Settings.Compression = Vintasoft.Imaging.Codecs.Encoders.PdfImageCompression.Jpeg;

            // save images of image collection (DOCX pages) to a PDF document
            imageCollection.SaveSync(pdfFileName, pdfEncoder);
        }

        // dispose images
        imageCollection.ClearAndDisposeItems();
    }
}