Page 1 of 1

Using Rectangle Annotation To Read text from PDF

Posted: Fri Feb 28, 2020 3:18 am
by Milos1980
Hi,

I am in the process of evaluation of VintaSoft libraries (at the moment I am using annotations for pdf). Our goal is to use rectangle annotations so our users can mark text in the pdf and by using their rectangle annotations we want to read marked text. This is the method that I have found in the documentation pages

Code: Select all

public string GetRegionTextPage(
    Vintasoft.Imaging.Pdf.Tree.PdfPage page,
    Vintasoft.Imaging.UI.ImageViewer imageViewer,
    System.Drawing.Rectangle selectedRegion)
{
    // convert the rectangle from the control coordinates to the image coordinates
    System.Drawing.RectangleF imageCoordinateSystemRectangle =
        imageViewer.RectangleToImage(selectedRegion);

    // get left-top point of the rectangle
    System.Drawing.PointF pdfPageCoordinateSystemPoint1 = imageCoordinateSystemRectangle.Location;
    // get rigth-bottom point of the rectangle
    System.Drawing.PointF pdfPageCoordinateSystemPoint2 =
        new System.Drawing.PointF(imageCoordinateSystemRectangle.Right, imageCoordinateSystemRectangle.Bottom);
    // get resolution of the image
    Vintasoft.Imaging.Resolution resolution = imageViewer.Image.Resolution;
    // convert points from the image coordinate space to the page coordinate space
    page.PointToUnit(ref pdfPageCoordinateSystemPoint1, resolution);
    page.PointToUnit(ref pdfPageCoordinateSystemPoint2, resolution);

    // create rectangle in the page's coordinate space
    System.Drawing.RectangleF rectangle = new System.Drawing.RectangleF(new PointF(pdfPageCoordinateSystemPoint1.X, pdfPageCoordinateSystemPoint1.Y),
        new System.Drawing.SizeF(
            pdfPageCoordinateSystemPoint2.X - pdfPageCoordinateSystemPoint1.X,
            pdfPageCoordinateSystemPoint2.Y - pdfPageCoordinateSystemPoint1.Y));
    // get text region of the page           
    Vintasoft.Imaging.Text.TextRegion textRegion = page.TextRegion.GetSubregion(
        rectangle,
        Vintasoft.Imaging.Text.TextSelectionMode.Rectangle);

    string textContent = string.Empty;
    // if text region is found
    if (textRegion != null)
        textContent = textRegion.TextContent;

    return textContent;
}
and I am calling this method

Code: Select all

GetRegionTextPage(page, _annotationViewer, new Rectangle(new Point((int)_annotationViewer.AnnotationDataCollection[0].Location.X, (int)_annotationViewer.AnnotationDataCollection[0].Location.Y), new Size((int)_annotationViewer.AnnotationDataCollection[0].Size.Width, (int)_annotationViewer.AnnotationDataCollection[0].Size.Height)));
where _annotationViewer.AnnotationDataCollection[0] is the rectangle annotation that I am using to mark text in pdf. All the time as result of this method I am getting text that is lower to the left from the text I want to read. Does anyone know what I am missing here? Thanks.

Re: Using Rectangle Annotation To Read text from PDF

Posted: Fri Feb 28, 2020 9:02 am
by Alex
Hi,

The "selectedRegion" parameter in GetRegionTextPage method must be specified in image viewer coordinate space.

In your code:

Code: Select all

GetRegionTextPage(page, _annotationViewer, new Rectangle(new Point((int)_annotationViewer.AnnotationDataCollection[0].Location.X, (int)_annotationViewer.AnnotationDataCollection[0].Location.Y), ...
you are specifing the "selectedRegion" parameter in annotation coordinate space.

Please read about annotation coordinate space here: https://www.vintasoft.com/docs/vsimagin ... _Data.html,
convert the "selectedRegion" parameter from annotation coordinate space to the image viewer coordinate space and your code will work correctly.

Best regards, Alexander

Re: Using Rectangle Annotation To Read text from PDF

Posted: Fri Feb 28, 2020 4:17 pm
by Milos1980
Hi Alex,

Thanks for the reply. Can you please explain to me how to convert from annotation coordinate space to the image viewer coordinate space? I thought that first line of the GetRegionTextPage method is doing this by calling imageViewer.RectangleToImage(selectedRegion). Maybe I am wrong but if this is not that can you please tell me how to convert from annotation coordinate space to the image viewer coordinate space when I am passing selectedRegion to the method? Thanks once again for reply.

Re: Using Rectangle Annotation To Read text from PDF

Posted: Fri Feb 28, 2020 5:28 pm
by Alex
Here is an example that shows how to get PDF text under focused annotation in annotation viewer:

Code: Select all

...
MessageBox.Show(GetTextByAnnotation(annotationViewer1.Image, (RectangleAnnotationData)annotationViewer1.FocusedAnnotationData));
...

/// <summary>
/// Extracts the text that is located in specified rectangle.
/// </summary>
/// <param name="image">The image.</param>
/// <param name="rect">The rectange, in DIP (Device Indepened Pixels) space.</param>
public static string ExtractText(
    Vintasoft.Imaging.VintasoftImage image,
    RectangleF rect)
{
    // get text region of image
    Vintasoft.Imaging.Text.TextRegion textRegion = image.Metadata.TextRegion;
    if (textRegion == null)
        return "";

    // transform rect to TextRegion space
    rect = Vintasoft.Imaging.Utils.GraphicsUtils.TransformRect(rect, textRegion.TrasformFromDipSpace);

    // get text sub region for specified rect
    textRegion = textRegion.GetSubregion(rect, Vintasoft.Imaging.Text.TextSelectionMode.Rectangle);

    // return text
    return textRegion.TextContent;
}

public static string GetTextByAnnotation(
    Vintasoft.Imaging.VintasoftImage image,
    Vintasoft.Imaging.Annotation.RectangleAnnotationData annotation)
{
    // rect in DIP space
    System.Drawing.RectangleF rect = new System.Drawing.RectangleF(
        annotation.Location.X - annotation.Size.Width / 2,
        annotation.Location.Y - annotation.Size.Height / 2,
        annotation.Size.Width,
        annotation.Size.Height);
    // extract text
    return ExtractText(image, rect);
}
Best regards, Alexander

Re: Using Rectangle Annotation To Read text from PDF

Posted: Fri Feb 28, 2020 8:19 pm
by Milos1980
Hi Alex,

It is working. Thank you.