VintaSoft Imaging .NET SDK v8.6
In This Topic
    PDF: Remove content from PDF page. Redaction marks.
    In This Topic

    The process of content removal from PDF page should be used for permanent removal of some sensitive information from the document page, for example, for removal some private data before sharing the document with third parties. The functionality can be used also for removal of some selected text.

    Important! After content removal it is necessary to perform packing of PDF document (PdfDocument.Pack), otherwise the removed data could be restored and retrieved!



    Remove content from PDF page programmatically.

    VintaSoft PDF .NET Plug-in allows to:
    1. Blackout image or image regions on PDF page:
      • Blackout a rectangle on the image-resource or embedded image (PdfImageResource.ClearRect).
      • Blackout a graphical path on the image-resource or embedded image (PdfImageResource.ClearPath).
      • Blackout image regions on PDF page (PdfPage.ClearImages).
      • Blackout an image without changing parameters of image compression.
      • Blackout image-resources and embedded images.

    2. Remove text from PDF page:
      • Remove the specified text from PDF page (PdfPage.RemoveText).
      • Remove text from the specified regions of PDF page (PdfPage.RemoveText).
      • Text removal algorithm saves formatting of the surrounding text.

    3. Remove vector graphics from PDF page:

    4. Remove annotations from PDF page:

    5. Remove any content (text, images, vector graphics, annotations) from PDF page (PdfPage.RemoveContentAndBlackOutResources):
      • Remove PDF page content.
      • Blackout all image-resources used on PDF page.
      • Remove content and blackout images in all forms used on PDF page.
      • Remove all annotations and annotation appearances from PDF page.

    Here is an example that demonstrates how to find text on PDF page and remove text from page:
    ' The project, which uses this code, must have references to the following assemblies:
    ' - Vintasoft.Imaging.Pdf
    
    ''' <summary>
    ''' Searches and removes specified text on all pages of PDF document.
    ''' </summary>
    ''' <param name="inputPdfFilename">The name of input PDF file.</param>
    ''' <param name="outputPdfFilename">The name of output PDF file.</param>
    ''' <param name="textToRemove">The text to remove.</param>
    Public Shared Sub TestFindAndRemoveTextOnAllPages(inputPdfFilename As String, outputPdfFilename As String, ParamArray textToRemove As String())
        ' open document
        Using document As New Vintasoft.Imaging.Pdf.PdfDocument(inputPdfFilename)
            ' if there is a text to remove
            If textToRemove.Length > 0 Then
                ' create a list that contains text regions to remove
                Dim textRegions As New System.Collections.Generic.List(Of Vintasoft.Imaging.Pdf.Content.TextExtraction.PdfTextRegion)()
    
                ' for each page
                For Each page As Vintasoft.Imaging.Pdf.Tree.PdfPage In document.Pages
                    ' clear a list of text regions to remove
                    textRegions.Clear()
    
                    ' for all text strings that must be remove
                    For i As Integer = 0 To textToRemove.Length - 1
                        ' search text string on PDF page
                        Dim searchedText As Vintasoft.Imaging.Pdf.Content.TextExtraction.PdfTextRegion() = SimpleTextSearchOnPdfPage(page, textToRemove(i))
                        ' if text is found
                        If searchedText IsNot Nothing AndAlso searchedText.Length > 0 Then
                            ' add searched text to a list of text for removing
                            textRegions.AddRange(searchedText)
                        End If
                    Next
    
                    ' if PDF page contains text regions with text to remove
                    If textRegions.Count > 0 Then
                        ' remove text regions from PDF page
                        page.RemoveText(textRegions.ToArray())
                    End If
                Next
            End If
    
            ' if names of source and destination files are the same
            If inputPdfFilename = outputPdfFilename Then
                ' pack PDF document
                document.Pack()
            Else
                ' if names of source and destination files are different
                ' pack source PDF document to specified file
                document.Pack(outputPdfFilename)
            End If
        End Using
    End Sub
    
    ''' <summary>
    ''' Searches a text string on PDF page.
    ''' </summary>
    ''' <param name="page">PDF page where text should be searched.</param>
    ''' <param name="text">Text to search.</param>
    ''' <returns>An array of text regions on PDF page where text was found.</returns>
    Public Shared Function SimpleTextSearchOnPdfPage(page As Vintasoft.Imaging.Pdf.Tree.PdfPage, text As String) As Vintasoft.Imaging.Pdf.Content.TextExtraction.PdfTextRegion()
        Dim textRegions As New System.Collections.Generic.List(Of Vintasoft.Imaging.Pdf.Content.TextExtraction.PdfTextRegion)()
    
        Dim textRegion As Vintasoft.Imaging.Pdf.Content.TextExtraction.PdfTextRegion = Nothing
        Dim startIndex As Integer = 0
        Do
            ' search text
            textRegion = page.TextRegion.FindText(text, startIndex, False)
            ' if text is found
            If textRegion IsNot Nothing Then
                ' add searched text to a result
                textRegions.Add(textRegion)
                ' shift start index
                startIndex += textRegion.TextContent.Length
            End If
        Loop While textRegion IsNot Nothing
    
        Return textRegions.ToArray()
    End Function
                  
    
    // The project, which uses this code, must have references to the following assemblies:
    // - Vintasoft.Imaging.Pdf
    
    /// <summary>
    /// Searches and removes specified text on all pages of PDF document.
    /// </summary>
    /// <param name="inputPdfFilename">The name of input PDF file.</param>
    /// <param name="outputPdfFilename">The name of output PDF file.</param>
    /// <param name="textToRemove">The text to remove.</param>
    public static void TestFindAndRemoveTextOnAllPages(
        string inputPdfFilename,
        string outputPdfFilename,
        params string[] textToRemove)
    {
        // open document
        using (Vintasoft.Imaging.Pdf.PdfDocument document = new Vintasoft.Imaging.Pdf.PdfDocument(inputPdfFilename))
        {
            // if there is a text to remove
            if (textToRemove.Length > 0)
            {
                // create a list that contains text regions to remove
                System.Collections.Generic.ListPdfTextRegion> textRegions = 
                    new System.Collections.Generic.ListPdfTextRegion>();
    
                // for each page
                foreach (Vintasoft.Imaging.Pdf.Tree.PdfPage page in document.Pages)
                {
                    // clear a list of text regions to remove
                    textRegions.Clear();
    
                    // for all text strings that must be remove
                    for (int i = 0; i < textToRemove.Length; i++)
                    {
                        // search text string on PDF page
                        Vintasoft.Imaging.Pdf.Content.TextExtraction.PdfTextRegion[] searchedText = SimpleTextSearchOnPdfPage(page, textToRemove[i]);
                        // if text is found
                        if (searchedText != null && searchedText.Length > 0)
                            // add searched text to a list of text for removing
                            textRegions.AddRange(searchedText);
                    }
    
                    // if PDF page contains text regions with text to remove
                    if (textRegions.Count > 0)
                        // remove text regions from PDF page
                        page.RemoveText(textRegions.ToArray());
                }
            }
    
            // if names of source and destination files are the same
            if (inputPdfFilename == outputPdfFilename)
                // pack PDF document
                document.Pack();
            // if names of source and destination files are different
            else
                // pack source PDF document to specified file
                document.Pack(outputPdfFilename);
        }
    }
    
    /// <summary>
    /// Searches a text string on PDF page.
    /// </summary>
    /// <param name="page">PDF page where text should be searched.</param>
    /// <param name="text">Text to search.</param>
    /// <returns>An array of text regions on PDF page where text was found.</returns>
    public static Vintasoft.Imaging.Pdf.Content.TextExtraction.PdfTextRegion[] SimpleTextSearchOnPdfPage(
        Vintasoft.Imaging.Pdf.Tree.PdfPage page, string text)
    {
        System.Collections.Generic.ListPdfTextRegion> textRegions = 
            new System.Collections.Generic.ListPdfTextRegion>();
    
        Vintasoft.Imaging.Pdf.Content.TextExtraction.PdfTextRegion textRegion = null;
        int startIndex = 0;
        do
        {
            // search text
            textRegion = page.TextRegion.FindText(text, ref startIndex, false);
            // if text is found
            if (textRegion != null)
            {
                // add searched text to a result
                textRegions.Add(textRegion);
                // shift start index
                startIndex += textRegion.TextContent.Length;
            }
        } while (textRegion != null);
    
        return textRegions.ToArray();
    }
                    
    


    Remove content from PDF page visually.

    VintaSoft PDF .NET Plug-in allows to remove content visually using the redaction marks. Redaction mark is a visual object that defines an area on PDF page, from where the content should be removed.

    Here is a list of supported types of redaction marks:

    Visual tool PdfRemoveContentTool/WpfPdfRemoveContentTool allows to:

    Here is an example that shows how to define redaction mark appearance and apply redaction marks programmatically:
    ' The project, which uses this code, must have references to the following assemblies:
    ' - Vintasoft.Imaging
    ' - Vintasoft.Imaging.Pdf
    ' - Vintasoft.Imaging.Pdf.UI
    
    ''' <summary>
    ''' Creates the redaction mark with custom appearance and applies the redaction mark
    ''' to PDF page.
    ''' </summary>
    Public Shared Sub TestRedactionMarkAppearance(viewer As Vintasoft.Imaging.UI.ImageViewer)
        ' if image viewer does not have image
        If viewer.Image Is Nothing Then
            Throw New System.InvalidOperationException()
        End If
    
        ' if image viewer contains not PDF page
        Dim page As Vintasoft.Imaging.Pdf.Tree.PdfPage = Vintasoft.Imaging.Pdf.PdfDocumentController.GetPageAssociatedWithImage(viewer.Image)
        If page Is Nothing Then
            Throw New System.InvalidOperationException()
        End If
    
        ' create and set PdfRemoveContentTool as current tool of image viewer
        Dim removeContentTool As New Vintasoft.Imaging.Pdf.UI.PdfRemoveContentTool()
        viewer.VisualTool = removeContentTool
    
        ' create the redaction mark
        Dim mark As New Vintasoft.Imaging.Pdf.UI.RedactionMark(viewer.Image)
        ' specify that redaction mark must remove all PDF content
        mark.MarkType = Vintasoft.Imaging.Pdf.UI.RedactionMarkType.RemoveAll
        ' calculate and specify the redaction mark rectangle
        Dim rect As System.Drawing.RectangleF = page.MediaBox
        rect.Inflate(-rect.Width / 4, -rect.Height / 4)
        mark.SelectedRect = rect
    
        ' add the redaction mark to a list of redaction marks of visual tool
        removeContentTool.Add(mark)
    
        ' create redaction mark appearance
        Dim textBox As New Vintasoft.Imaging.Pdf.Drawing.GraphicsFigures.TextBoxFigure(New Vintasoft.Imaging.Pdf.Drawing.PdfBrush(System.Drawing.Color.Red), "TOP SECRET", page.Document.FontManager.GetStandardFont(Vintasoft.Imaging.Pdf.Tree.Fonts.PdfStandardFontType.Helvetica), 0)
        textBox.TextAlignment = Vintasoft.Imaging.Pdf.Drawing.PdfContentAlignment.Center
        textBox.Brush = New Vintasoft.Imaging.Pdf.Drawing.PdfBrush(System.Drawing.Color.Black)
        textBox.AutoFontSize = True
        removeContentTool.RedactionMarkAppearance = textBox
    
        ' apply redaction marks
        removeContentTool.ApplyRedactionMarks()
    End Sub
                  
    
    // The project, which uses this code, must have references to the following assemblies:
    // - Vintasoft.Imaging
    // - Vintasoft.Imaging.Pdf
    // - Vintasoft.Imaging.Pdf.UI
    
    /// <summary>
    /// Creates the redaction mark with custom appearance and applies the redaction mark
    /// to PDF page.
    /// </summary>
    public static void TestRedactionMarkAppearance(Vintasoft.Imaging.UI.ImageViewer viewer)
    {
        // if image viewer does not have image
        if (viewer.Image == null)
            throw new System.InvalidOperationException();
    
        // if image viewer contains not PDF page
        Vintasoft.Imaging.Pdf.Tree.PdfPage page = 
            Vintasoft.Imaging.Pdf.PdfDocumentController.GetPageAssociatedWithImage(viewer.Image);
        if (page == null)
            throw new System.InvalidOperationException();
    
        // create and set PdfRemoveContentTool as current tool of image viewer
        Vintasoft.Imaging.Pdf.UI.PdfRemoveContentTool removeContentTool = 
            new Vintasoft.Imaging.Pdf.UI.PdfRemoveContentTool();
        viewer.VisualTool = removeContentTool;
    
        // create the redaction mark
        Vintasoft.Imaging.Pdf.UI.RedactionMark mark = 
            new Vintasoft.Imaging.Pdf.UI.RedactionMark(viewer.Image);
        // specify that redaction mark must remove all PDF content
        mark.MarkType = Vintasoft.Imaging.Pdf.UI.RedactionMarkType.RemoveAll;
        // calculate and specify the redaction mark rectangle
        System.Drawing.RectangleF rect = page.MediaBox;
        rect.Inflate(-rect.Width / 4, -rect.Height / 4);
        mark.SelectedRect = rect;
    
        // add the redaction mark to a list of redaction marks of visual tool
        removeContentTool.Add(mark);
    
        // create redaction mark appearance
        Vintasoft.Imaging.Pdf.Drawing.GraphicsFigures.TextBoxFigure textBox = 
            new Vintasoft.Imaging.Pdf.Drawing.GraphicsFigures.TextBoxFigure(
                new Vintasoft.Imaging.Pdf.Drawing.PdfBrush(System.Drawing.Color.Red),
                "TOP SECRET",
                page.Document.FontManager.GetStandardFont(Vintasoft.Imaging.Pdf.Tree.Fonts.PdfStandardFontType.Helvetica),
                0);
        textBox.TextAlignment = Vintasoft.Imaging.Pdf.Drawing.PdfContentAlignment.Center;
        textBox.Brush = new Vintasoft.Imaging.Pdf.Drawing.PdfBrush(System.Drawing.Color.Black);
        textBox.AutoFontSize = true;
        removeContentTool.RedactionMarkAppearance = textBox;
    
        // apply redaction marks
        removeContentTool.ApplyRedactionMarks();
    }
                    
    



    Pdf Editor Demo application.

    Pdf Editor Demo and WPF Pdf Editor Demo application includes an example of use of PdfRemoveContentTool visual tool, which allows to:

    An example of content removal using PdfRemoveContentTool visual tool. Nothing will appear on place of removed content:







    An example of content removal using PdfRemoveContentTool visual tool. A black rectangle with red text will appear on place of removed content: