使用 C# 和 VB.NET 压缩 PDF 文档

博客分类:PDF.NET

2024/07/12

VintaSoft PDF .NET Plug-in 可用于压缩和优化 PDF 文档。此外,VintaSoft PDF .NET Plug-in 可用于移除 PDF 文档中的元数据。减小 PDF 文件的大小有助于减少传输 PDF 文件时的网络流量,并减少 PDF 文件占用的存储空间。这在归档、电子邮件发送以及在 Web 应用程序中使用 PDF 文档等领域尤为有用。

为了优化 PDF 文档,VintaSoft PDF .NET Plug-in 可以执行以下操作:

打包 PDF 文档

PDF 文档可能包含未使用的资源。VintaSoft PDF .NET Plug-in 插件可以识别并删除 PDF 文档中未使用的资源。
此外,PDF 文档可能包含其修订历史记录。VintaSoft PDF .NET Plug-in 插件允许从 PDF 文档中删除修订历史记录。
PDF 文档的资源可能使用非最优压缩算法进行压缩。VintaSoft PDF .NET 插件可以使用更优的压缩算法来压缩资源。
如果 PDF 文件使用 PDF 1.4 或更早版本,则 PDF 文档中包含未压缩的交叉引用表。VintaSoft PDF .NET 插件可以将 PDF 文档保存为 PDF 1.5 或更高版本,并使用压缩后的交叉引用表。

以下 C# 代码演示了如何加载现有 PDF 文档、移除 PDF 文档中未使用的资源、使用最佳压缩算法压缩已使用的 PDF 资源以及以最佳 PDF 格式保存 PDF 文档:
/// <summary>
/// Packs the PDF document.
/// </summary>
/// <param name="inPdfFilename">The input PDF filename.</param>
/// <param name="outPdfFilename">The output PDF filename.</param>
public static void PackDocument(string inPdfFilename, string outPdfFilename)
{
    // create compressor with empty compression settings
    Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand compressor =
        Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand.CreateEmptyCompressor();

    // specify that compressor must use maximum Flate compression level (best compression)
    compressor.FlateCompressionLevel = 9;
    // specify that compressor must recompress all resource that uses None, LZW, Flate compression using Flate compression
    compressor.RecompressFlateCompression = true;
    compressor.UseFlateInsteadLzwCompression = true;
    compressor.UseFlateInsteadNoneCompression = true;

    // specify that compressor must remove incremental update info and unused objects
    compressor.PackDocument = true;

    // if version of PDF document is lower than 1.7
    if (GetPdfDocumentVersion(inPdfFilename) < 17)
    {
        // set output format to PDF 1.7
        compressor.DocumentPackFormat = Vintasoft.Imaging.Pdf.PdfFormat.Pdf_17;
    }

    // compress PDF document
    compressor.Compress(inPdfFilename, outPdfFilename);
}

/// <summary>
/// Returns the PDF document version.
/// </summary>
/// <param name="pdfFilename">The PDF filename.</param>
/// <returns>The version number in dual-digit format (10,11,12,13,14,15,16,17,20,...).</returns>
private static int GetPdfDocumentVersion(string pdfFilename)
{
    using (Vintasoft.Imaging.Pdf.PdfDocument document = new Vintasoft.Imaging.Pdf.PdfDocument(pdfFilename))
        return document.Format.VersionNumber;
}


优化 PDF 文档中的字体

某些字体字形在 PDF 文档中不用于文本渲染。VintaSoft PDF .NET 插件允许从 PDF 文档的字体中移除未使用的字形。

以下 C# 代码演示了如何使用 Vintasoft.Imaging.Pdf.Processing 优化 PDF 文档中的字体。PdfDocumentCompressorCommand 类:
/// <summary>
/// Subsets fonts in PDF document.
/// </summary>
/// <param name="inPdfFilename">The input PDF filename.</param>
/// <param name="outPdfFilename">The output PDF filename.</param>
public static void SubsetFonts(string inPdfFilename, string outPdfFilename)
{
    // create compressor with empty compression settings
    Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand compressor =
       Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand.CreateEmptyCompressor();

    // specify that compressor must subset fonts in PDF document
    compressor.SubsetFonts = true;

    // compress PDF document
    compressor.Compress(inPdfFilename, outPdfFilename);
}


压缩 PDF 文档中的图像

许多 PDF 文档都包含图像。VintaSoft PDF .NET 插件允许降低 PDF 文档中图像的分辨率和颜色位深度,从而减小 PDF 文件的大小。

以下 C# 代码演示了如何使用 Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand 类降低 PDF 文档中资源的分辨率和颜色位深度:
/// <summary>
/// Detects "read color depth" of PDF image resources and compress PDF document with intent to view in 150DPI.
/// </summary>
/// <param name="inPdfFilename">The input PDF filename.</param>
/// <param name="outPdfFilename">The output PDF filename.</param>
public static void CompressToViewIn150DPI(string inPdfFilename, string outPdfFilename)
{
    // create compressor that will compress PDF document using lossy compression algorithms
    Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand compressor =
       Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand.CreateLossyCompressor(150, false, false, false);

    // specify that compressor must use JPEG compression for color images
    compressor.ColorImagesCompression = Vintasoft.Imaging.Pdf.PdfCompression.Jpeg;
    // specify that compressor must set JPEG quality to 70
    compressor.ColorImagesCompressionSettings.JpegQuality = 70;

    // specify that compressor must detect if image is bitonal image and use optimal compression for bitonal image
    compressor.DetectBitonalImageResources = true;
    // specify that compressor must detect if image is black-white image and use optimal compression for black-white image
    compressor.DetectBlackWhiteImageResources = true;
    // specify that compressor must detect if image is grayscale image and use optimal compression for grayscale image
    compressor.DetectGrayscaleImageResources = true;

    // compress PDF document
    compressor.Compress(inPdfFilename, outPdfFilename);
}


清除 PDF 文档中的内容

PDF 文档可能包含未使用的对象,例如:资源、页面、字体、图像、名称、内容运算​​符。此外,PDF 文档可能包含重复的资源,例如图像副本或字体副本。VintaSoft PDF .NET 插件可以识别并删除 PDF 文档中未使用的对象和重复的资源。

以下 C# 代码演示了如何使用 Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand 类从 PDF 文档中删除未使用的对象和重复资源:
/// <summary>
/// Removes unused and duplicated resources in the PDF document.
/// </summary>
/// <param name="inPdfFilename">The input PDF filename.</param>
/// <param name="outPdfFilename">The output PDF filename.</param>
public static void RemoveUnsusedResources(string inPdfFilename, string outPdfFilename)
{
    // create compressor with empty compression settings
    Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand compressor =
       Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand.CreateEmptyCompressor();

    // specify that compressor must remove duplicate resources from PDF document
    compressor.RemoveDuplicateResources = true;
    // specify that compressor must remove unused names resources from PDF document
    compressor.RemoveUnusedNamedResources = true;
    // specify that compressor must remove unused names from PDF document
    compressor.RemoveUnusedNames = true;
    // specify that compressor must remove unused pages from PDF document
    compressor.RemoveUnusedPages = true;
    // specify that compressor must remove invalid bookmarks from PDF document
    compressor.RemoveInvalidBookmarks = true;
    // specify that compressor must remove invalid links from PDF document
    compressor.RemoveInvalidLinks = true;

    // compress PDF document
    compressor.Compress(inPdfFilename, outPdfFilename);
}


从 PDF 文档中删除元数据和其他元素

PDF 文档可能包含一些不影响 PDF 页面显示的对象,例如:元数据、书签、嵌入文件、交互式表单、页面缩略图、结构树、文档信息。VintaSoft PDF .NET 插件允许从 PDF 文档中删除不需要的对象。

以下C#代码演示了如何使用Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand类从PDF文档中移除对象:
/// <summary>
/// Removes metadata, bookmarks, document information, embedded files, embedded thumbnails, interactive form and structure tree of the PDF document.
/// </summary>
/// <param name="inPdfFilename">The input PDF filename.</param>
/// <param name="outPdfFilename">The output PDF filename.</param>
public static void RemoveObjects(string inPdfFilename, string outPdfFilename)
{
    // create compressor with empty compression settings
    Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand compressor =
       Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand.CreateEmptyCompressor();

    // specify that compressor must remove metadata from PDF document
    compressor.RemoveMetadata = true;
    // specify that compressor must remove bookmarks from PDF document
    compressor.RemoveBookmarks = true;
    // specify that compressor must remove document information from PDF document
    compressor.RemoveDocumentInformation = true;
    // specify that compressor must remove embedded files from PDF document
    compressor.RemoveEmbeddedFiles = true;
    // specify that compressor must remove embedded thumbnails from PDF document
    compressor.RemoveEmbeddedThumbnails = true;
    // specify that compressor must remove interactive form from PDF document
    compressor.RemoveInteractiveForm = true;
    // specify that compressor must remove structure tree from PDF document
    compressor.RemoveStructureTree = true;

    // compress PDF document
    compressor.Compress(inPdfFilename, outPdfFilename);
}


从 PDF 文档中删除注释

如果PDF页面上不需要注释,VintaSoft PDF .NET插件可以移除这些注释。此外,如果必须在PDF页面上显示注释,但用户不应能够与注释进行交互,VintaSoft PDF .NET插件还允许将注释转换为图形(展平注释)。

以下C#代码演示了如何使用Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand类将注释转换为图形(展平注释)并从PDF文档中移除交互式表单:
/// <summary>
/// Flatten an annotations and remove intractive form of the PDF document.
/// </summary>
/// <param name="inPdfFilename">The input PDF filename.</param>
/// <param name="outPdfFilename">The output PDF filename.</param>
public static void FlattenAnnotations(string inPdfFilename, string outPdfFilename)
{
    // create compressor with empty compression settings
    Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand compressor =
       Vintasoft.Imaging.Pdf.Processing.PdfDocumentCompressorCommand.CreateEmptyCompressor();

    // specify that compressor must remove interactive form from PDF document
    compressor.RemoveInteractiveForm = true;
    // specify that compressor must flatten annotations in PDF document
    compressor.FlattenAnnotations = true;

    // compress PDF document
    compressor.Compress(inPdfFilename, outPdfFilename);
}