Work with large PDF documents in .NET

Blog category: PDF.NET

February 5, 2021

PDF specification defines the size of PDF document version 1.0-1.4 not larger than 9,3GB (9.999.999.999 bytes).
PDF document of version 1.5 and higher is not limited in file size when uses Compressed Cross-Reference Table.

VintaSoft Imaging .NET SDK allows to create, open, change and save PDF document of version 1.0-1.4 when the file size does not exceed 9,3GB (9.999.999.999 bytes).
The SDK allows to create, open, change and save PDF document of version 1.5 and higher, which uses Compressed Cross-Reference Table, when the file size does not exceed 256TB (281.474.976.710.655 bytes).

PDF specification does not limit the size of file attachment stored in PDF document.
VintaSoft Imaging .NET SDK allows to add a file attachment into PDF document. Generally the SDK can add the file attachment into PDF document when it size does not exceed 2GB (2.147.483.647 bytes). Also the SDK allows to add a PDF file attachment, which is larger than 2GB, but does not exceed 931GB (999.999.999.999 bytes), when the PDF document is not encrypted and the resource is added with None or ZIP compression.
For resource retrieval from PDF document apply the same size restrictions as for adding.


Here is C# code, which demonstrates how to add and retrieve an attachment from/to PDF document when the attachment size exceeds 2GB:
/// <summary>
/// Adds the large file attachment to PDF document.
/// </summary>
/// <param name="pdfFilename">The PDF filename.</param>
/// <param name="attachmentFilename">The attachment filename.</param>
public static void AddLargeAttachment(string pdfFilename, string attachmentFilename)
{
    // open PDF document
    using (Vintasoft.Imaging.Pdf.PdfDocument document = new Vintasoft.Imaging.Pdf.PdfDocument(pdfFilename))
    {
        if (document.EmbeddedFiles == null)
            document.EmbeddedFiles = new Vintasoft.Imaging.Pdf.Tree.PdfEmbeddedFileSpecificationDictionary(document);

        // open attachment file
        using (System.IO.Stream attachmentStream = System.IO.File.OpenRead(attachmentFilename))
        {
            // set ZIP compression level to 2 (fast)
            Vintasoft.Imaging.Pdf.PdfCompressionSettings compressionSettings = new Vintasoft.Imaging.Pdf.PdfCompressionSettings();
            compressionSettings.ZipCompressionLevel = 2;

            // create PDF embedded file
            Vintasoft.Imaging.Pdf.Tree.PdfEmbeddedFile embeddedFile = new Vintasoft.Imaging.Pdf.Tree.PdfEmbeddedFile(
                document, attachmentStream, false, Vintasoft.Imaging.Pdf.PdfCompression.Zip, compressionSettings);
            
            // create PDF embedded file specification
            Vintasoft.Imaging.Pdf.Tree.PdfEmbeddedFileSpecification fileSpecification =
                 new Vintasoft.Imaging.Pdf.Tree.PdfEmbeddedFileSpecification(System.IO.Path.GetFileName(attachmentFilename), embeddedFile);

            // add PDF embedded file specification to PDF document
            document.EmbeddedFiles.Add(fileSpecification);

            // save changes in PDF document (file attachment will be encoded during saving of PDF document)
            document.SaveChanges();
        }
    }
}

/// <summary>
/// Extracts the file attachments of PDF document in specified folder.
/// </summary>
/// <param name="pdfFilename">The PDF filename.</param>
/// <param name="attachmentOutputDir">The attachment output directory.</param>
public static void ExtractFileAttachments(string pdfFilename, string attachmentOutputDir)
{
    // open PDF document
    using (Vintasoft.Imaging.Pdf.PdfDocument document = new Vintasoft.Imaging.Pdf.PdfDocument(pdfFilename))
    {
        // if PDF document has embedded files
        if (document.EmbeddedFiles != null)
        {
            // for each file embedded in PDF document
            foreach (Vintasoft.Imaging.Pdf.Tree.PdfEmbeddedFileSpecification fileSpecification in document.EmbeddedFiles.Values)
            {
                if (fileSpecification.EmbeddedFile != null)
                {
                    // save embedded file resource to a file in output directory
                    string filename = System.IO.Path.GetFileName(fileSpecification.Filename);
                    fileSpecification.EmbeddedFile.Save(System.IO.Path.Combine(attachmentOutputDir, filename));
                }
            }
        }
    }
}


Here is C# code, which demonstrates how to add and retrieve an attachment from/to PDF document when the attachment size does not exceed 2GB:
/// <summary>
/// Adds the file attachment to PDF document.
/// </summary>
/// <param name="pdfFilename">The PDF filename.</param>
/// <param name="attachmentFilename">The attachment filename.</param>
public static void AddAttachment(string pdfFilename, string attachmentFilename)
{
    // open PDF document
    using (Vintasoft.Imaging.Pdf.PdfDocument document = new Vintasoft.Imaging.Pdf.PdfDocument(pdfFilename))
    {
        if (document.EmbeddedFiles == null)
            document.EmbeddedFiles = new Vintasoft.Imaging.Pdf.Tree.PdfEmbeddedFileSpecificationDictionary(document);

        // create PDF embedded file (file attachment will be encoded in constructor of PdfEmbeddedFile class)
        Vintasoft.Imaging.Pdf.Tree.PdfEmbeddedFile embeddedFile = new Vintasoft.Imaging.Pdf.Tree.PdfEmbeddedFile(
            document, attachmentFilename, Vintasoft.Imaging.Pdf.PdfCompression.Zip);

        // create PDF embedded file specification
        Vintasoft.Imaging.Pdf.Tree.PdfEmbeddedFileSpecification fileSpecification =
             new Vintasoft.Imaging.Pdf.Tree.PdfEmbeddedFileSpecification(System.IO.Path.GetFileName(attachmentFilename), embeddedFile);

        // add PDF embedded file specification to PDF document
        document.EmbeddedFiles.Add(fileSpecification);

        // save PDF document
        document.SaveChanges();
    }
}

/// <summary>
/// Extracts the file attachments of PDF document in specified folder.
/// </summary>
/// <param name="pdfFilename">The PDF filename.</param>
/// <param name="attachmentOutputDir">The attachment output directory.</param>
public static void ExtractFileAttachments(string pdfFilename, string attachmentOutputDir)
{
    // open PDF document
    using (Vintasoft.Imaging.Pdf.PdfDocument document = new Vintasoft.Imaging.Pdf.PdfDocument(pdfFilename))
    {
        // if PDF document has embedded files
        if (document.EmbeddedFiles != null)
        {
            // for each file embedded in PDF document
            foreach (Vintasoft.Imaging.Pdf.Tree.PdfEmbeddedFileSpecification fileSpecification in document.EmbeddedFiles.Values)
            {
                if (fileSpecification.EmbeddedFile != null)
                {
                    // save embedded file resource to a file in output directory
                    string filename = System.IO.Path.GetFileName(fileSpecification.Filename);
                    fileSpecification.EmbeddedFile.Save(System.IO.Path.Combine(attachmentOutputDir, filename));
                }
            }
        }
    }
}