sample to read text content in pdf

Questions, comments and suggestions concerning VintaSoft PDF .NET Plug-in.

Moderator: Alex

Post Reply
marco.malizia
Posts: 1
Joined: Wed Feb 27, 2013 5:13 pm

sample to read text content in pdf

Post by marco.malizia » Wed Feb 27, 2013 5:35 pm

Hi

I need to read (in VB.NET application) TEXT CONTENT from a particular page and AREA from pdf files.
I try with this sample (below) but don't function (don't function well).
I have problem to know exact coordinates, i try with sample PdfReaderDemo, but the coordinates and resolutions do not corresponding well.
Any suggestions? Any way to suggest for reading text from a particular AREA form pdf?


Thanks.

Code: Select all

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

    Dim x1 As Int16, x2 As Int16, y1 As Int16, y2 As Int16
    Dim vTesto As String = ""
    Dim vArea As SizeF
    Dim vRect As RectangleF

    Try

        _fileStream = New FileStream(edPdf1.Text, FileMode.Open, FileAccess.Read)
        _document = PdfDocumentController.OpenDocument(_fileStream)

        vTesto = _document.Pages(0).TextRegion.TextContent
        edTxt1.Text = vTesto

        vArea = _document.Pages(0).GetPageSizeInPixels(_document.Pages(0).DefaultResolution)

        x1 = Convert.ToInt16(edX1.Text)
        x2 = Convert.ToInt16(edX2.Text)
        y1 = Convert.ToInt16(edY1.Text)
        y2 = Convert.ToInt16(edY2.Text)

        If x1 <> 0 Or x2 <> 0 Or y1 <> 0 Or y2 <> 0 Then
            vRect = New RectangleF(x1, y1, x2, y2)

             vTesto = _document.Pages(0).TextRegion.GetSubregion(vRect).TextContent

        End If


    Catch ex As Exception

    End Try

Alex
Site Admin
Posts: 1445
Joined: Thu Jul 10, 2008 2:21 pm

Re: sample to read text content in pdf

Post by Alex » Thu Feb 28, 2013 11:57 am

Hello,

You have 2 logical mistakes in your code.

First, you need convert coordinates from the image space to the page space before getting of text content.

Next, you need specify width and height of rectangle as third and fourth parameter in the RectangleF constructor.

Here is correct code:

Code: Select all

...
x1 = Convert.ToInt16(edX1.Text)
x2 = Convert.ToInt16(edX2.Text)
y1 = Convert.ToInt16(edY1.Text)
y2 = Convert.ToInt16(edY2.Text)

Dim points As Single() = {x1, x2, y1, y2}
_document.Pages(0).PointsToUnits(points, _document.Pages(0).DefaultResolution)

x1 = points(0)
x2 = points(1)
y1 = points(2)
y2 = points(3)

If x1 <> 0 Or x2 <> 0 Or y1 <> 0 Or y2 <> 0 Then
    vRect = New RectangleF(x1, y1, x2 - x1, y2 - y1)

    vTesto = _document.Pages(0).TextRegion.GetSubregion(vRect).TextContent
End If
...
Best regards, Alexander

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest