Page 1 of 1

sample to read text content in pdf

Posted: Wed Feb 27, 2013 5:35 pm
by marco.malizia
Hi

I need to read (in VB.NET application) TEXT CONTENT from a particular page and AREA from pdf files.
I try with this sample (below) but don't function (don't function well).
I have problem to know exact coordinates, i try with sample PdfReaderDemo, but the coordinates and resolutions do not corresponding well.
Any suggestions? Any way to suggest for reading text from a particular AREA form pdf?


Thanks.

Code: Select all

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

    Dim x1 As Int16, x2 As Int16, y1 As Int16, y2 As Int16
    Dim vTesto As String = ""
    Dim vArea As SizeF
    Dim vRect As RectangleF

    Try

        _fileStream = New FileStream(edPdf1.Text, FileMode.Open, FileAccess.Read)
        _document = PdfDocumentController.OpenDocument(_fileStream)

        vTesto = _document.Pages(0).TextRegion.TextContent
        edTxt1.Text = vTesto

        vArea = _document.Pages(0).GetPageSizeInPixels(_document.Pages(0).DefaultResolution)

        x1 = Convert.ToInt16(edX1.Text)
        x2 = Convert.ToInt16(edX2.Text)
        y1 = Convert.ToInt16(edY1.Text)
        y2 = Convert.ToInt16(edY2.Text)

        If x1 <> 0 Or x2 <> 0 Or y1 <> 0 Or y2 <> 0 Then
            vRect = New RectangleF(x1, y1, x2, y2)

             vTesto = _document.Pages(0).TextRegion.GetSubregion(vRect).TextContent

        End If


    Catch ex As Exception

    End Try

Re: sample to read text content in pdf

Posted: Thu Feb 28, 2013 11:57 am
by Alex
Hello,

You have 2 logical mistakes in your code.

First, you need convert coordinates from the image space to the page space before getting of text content.

Next, you need specify width and height of rectangle as third and fourth parameter in the RectangleF constructor.

Here is correct code:

Code: Select all

...
x1 = Convert.ToInt16(edX1.Text)
x2 = Convert.ToInt16(edX2.Text)
y1 = Convert.ToInt16(edY1.Text)
y2 = Convert.ToInt16(edY2.Text)

Dim points As Single() = {x1, x2, y1, y2}
_document.Pages(0).PointsToUnits(points, _document.Pages(0).DefaultResolution)

x1 = points(0)
x2 = points(1)
y1 = points(2)
y2 = points(3)

If x1 <> 0 Or x2 <> 0 Or y1 <> 0 Or y2 <> 0 Then
    vRect = New RectangleF(x1, y1, x2 - x1, y2 - y1)

    vTesto = _document.Pages(0).TextRegion.GetSubregion(vRect).TextContent
End If
...
Best regards, Alexander