Using Rectangle Annotation To Read text from PDF

Questions, comments and suggestions concerning VintaSoft Annotation .NET Plug-in.

Moderator: Alex

Post Reply
Milos1980
Posts: 3
Joined: Fri Feb 28, 2020 3:11 am

Using Rectangle Annotation To Read text from PDF

Post by Milos1980 »

Hi,

I am in the process of evaluation of VintaSoft libraries (at the moment I am using annotations for pdf). Our goal is to use rectangle annotations so our users can mark text in the pdf and by using their rectangle annotations we want to read marked text. This is the method that I have found in the documentation pages

Code: Select all

public string GetRegionTextPage(
    Vintasoft.Imaging.Pdf.Tree.PdfPage page,
    Vintasoft.Imaging.UI.ImageViewer imageViewer,
    System.Drawing.Rectangle selectedRegion)
{
    // convert the rectangle from the control coordinates to the image coordinates
    System.Drawing.RectangleF imageCoordinateSystemRectangle =
        imageViewer.RectangleToImage(selectedRegion);

    // get left-top point of the rectangle
    System.Drawing.PointF pdfPageCoordinateSystemPoint1 = imageCoordinateSystemRectangle.Location;
    // get rigth-bottom point of the rectangle
    System.Drawing.PointF pdfPageCoordinateSystemPoint2 =
        new System.Drawing.PointF(imageCoordinateSystemRectangle.Right, imageCoordinateSystemRectangle.Bottom);
    // get resolution of the image
    Vintasoft.Imaging.Resolution resolution = imageViewer.Image.Resolution;
    // convert points from the image coordinate space to the page coordinate space
    page.PointToUnit(ref pdfPageCoordinateSystemPoint1, resolution);
    page.PointToUnit(ref pdfPageCoordinateSystemPoint2, resolution);

    // create rectangle in the page's coordinate space
    System.Drawing.RectangleF rectangle = new System.Drawing.RectangleF(new PointF(pdfPageCoordinateSystemPoint1.X, pdfPageCoordinateSystemPoint1.Y),
        new System.Drawing.SizeF(
            pdfPageCoordinateSystemPoint2.X - pdfPageCoordinateSystemPoint1.X,
            pdfPageCoordinateSystemPoint2.Y - pdfPageCoordinateSystemPoint1.Y));
    // get text region of the page           
    Vintasoft.Imaging.Text.TextRegion textRegion = page.TextRegion.GetSubregion(
        rectangle,
        Vintasoft.Imaging.Text.TextSelectionMode.Rectangle);

    string textContent = string.Empty;
    // if text region is found
    if (textRegion != null)
        textContent = textRegion.TextContent;

    return textContent;
}
and I am calling this method

Code: Select all

GetRegionTextPage(page, _annotationViewer, new Rectangle(new Point((int)_annotationViewer.AnnotationDataCollection[0].Location.X, (int)_annotationViewer.AnnotationDataCollection[0].Location.Y), new Size((int)_annotationViewer.AnnotationDataCollection[0].Size.Width, (int)_annotationViewer.AnnotationDataCollection[0].Size.Height)));
where _annotationViewer.AnnotationDataCollection[0] is the rectangle annotation that I am using to mark text in pdf. All the time as result of this method I am getting text that is lower to the left from the text I want to read. Does anyone know what I am missing here? Thanks.
Alex
Site Admin
Posts: 2305
Joined: Thu Jul 10, 2008 2:21 pm

Re: Using Rectangle Annotation To Read text from PDF

Post by Alex »

Hi,

The "selectedRegion" parameter in GetRegionTextPage method must be specified in image viewer coordinate space.

In your code:

Code: Select all

GetRegionTextPage(page, _annotationViewer, new Rectangle(new Point((int)_annotationViewer.AnnotationDataCollection[0].Location.X, (int)_annotationViewer.AnnotationDataCollection[0].Location.Y), ...
you are specifing the "selectedRegion" parameter in annotation coordinate space.

Please read about annotation coordinate space here: https://www.vintasoft.com/docs/vsimagin ... _Data.html,
convert the "selectedRegion" parameter from annotation coordinate space to the image viewer coordinate space and your code will work correctly.

Best regards, Alexander
Milos1980
Posts: 3
Joined: Fri Feb 28, 2020 3:11 am

Re: Using Rectangle Annotation To Read text from PDF

Post by Milos1980 »

Hi Alex,

Thanks for the reply. Can you please explain to me how to convert from annotation coordinate space to the image viewer coordinate space? I thought that first line of the GetRegionTextPage method is doing this by calling imageViewer.RectangleToImage(selectedRegion). Maybe I am wrong but if this is not that can you please tell me how to convert from annotation coordinate space to the image viewer coordinate space when I am passing selectedRegion to the method? Thanks once again for reply.
Alex
Site Admin
Posts: 2305
Joined: Thu Jul 10, 2008 2:21 pm

Re: Using Rectangle Annotation To Read text from PDF

Post by Alex »

Here is an example that shows how to get PDF text under focused annotation in annotation viewer:

Code: Select all

...
MessageBox.Show(GetTextByAnnotation(annotationViewer1.Image, (RectangleAnnotationData)annotationViewer1.FocusedAnnotationData));
...

/// <summary>
/// Extracts the text that is located in specified rectangle.
/// </summary>
/// <param name="image">The image.</param>
/// <param name="rect">The rectange, in DIP (Device Indepened Pixels) space.</param>
public static string ExtractText(
    Vintasoft.Imaging.VintasoftImage image,
    RectangleF rect)
{
    // get text region of image
    Vintasoft.Imaging.Text.TextRegion textRegion = image.Metadata.TextRegion;
    if (textRegion == null)
        return "";

    // transform rect to TextRegion space
    rect = Vintasoft.Imaging.Utils.GraphicsUtils.TransformRect(rect, textRegion.TrasformFromDipSpace);

    // get text sub region for specified rect
    textRegion = textRegion.GetSubregion(rect, Vintasoft.Imaging.Text.TextSelectionMode.Rectangle);

    // return text
    return textRegion.TextContent;
}

public static string GetTextByAnnotation(
    Vintasoft.Imaging.VintasoftImage image,
    Vintasoft.Imaging.Annotation.RectangleAnnotationData annotation)
{
    // rect in DIP space
    System.Drawing.RectangleF rect = new System.Drawing.RectangleF(
        annotation.Location.X - annotation.Size.Width / 2,
        annotation.Location.Y - annotation.Size.Height / 2,
        annotation.Size.Width,
        annotation.Size.Height);
    // extract text
    return ExtractText(image, rect);
}
Best regards, Alexander
Milos1980
Posts: 3
Joined: Fri Feb 28, 2020 3:11 am

Re: Using Rectangle Annotation To Read text from PDF

Post by Milos1980 »

Hi Alex,

It is working. Thank you.
Post Reply