XpdfText is a very affordable programmer's toolkit that makes it easy to extract plain text from PDF files. The PDF file can be on disk or in memory, and likewise, the text can be extracted to memory or directly to disk.
XpdfText can be used in different ways:
Convert entire PDF files or individual pages to plain text
Extract text from a specified rectangle on a page
Convert pages into word lists – for each word, you can retrieve:
The extracted text can be converted to a wide choice of standard encodings, including UTF-8 Unicode, ISO-8859-1 (Latin-1), 7-bit ASCII, and various other language-specific encodings.
The XpdfText toolkit also includes all the functionality of the XpdfInfo toolkit.
If you need to convert to XML instead of plain text, consider the PDFdeconstruct product.