Java 的 PDF 处理类,Apache PDFBox 1.8.11 发布

jopen 8年前

Apache PDFBox 1.8.11 发布,此版本是个增量 bug 修复版本,包括大量 bug 修复和改进。

现已提供下载:

http://pdfbox.apache.org/download.cgi

主要改进内容:

Bug 修复    [PDFBOX-962] - All sort of Problems when importing Xfdf files into PDFs ->   damaged pdfs and NPEs  [PDFBOX-2508] - Text extraction getting zero font height, bad widths, and ? for   text in this PDF with Type 3 Fonts  [PDFBOX-2693] - OutOfMemoryError at   org.apache.fontbox.cff.IndexData.initData(IndexData.java:95)  [PDFBOX-2816] - PDFBox makes disallowed changes when signing a signed document  [PDFBOX-2845] - Error parsing PDF  [PDFBOX-2901] - High CPU load and OutOfMemoryError when rendering shading  [PDFBOX-2903] - ClassCastException at PDFParser:667  [PDFBOX-2909] - NullPointerException when rendering shading with no function  [PDFBOX-2911] - Merge does not close input streams  [PDFBOX-2914] - java.lang.NegativeArraySizeException in   org.apache.pdfbox.pdmodel.graphics.color.PDDeviceGray.createColorModel  [PDFBOX-2916] - ArrayIndexOutOfBoundsException in CmapSubtable.processSubtype6  [PDFBOX-2923] - CFFParser parser treats CIDFont's charset data as SID  [PDFBOX-2924] - ClassCastException when doing PDFSplit  [PDFBOX-2925] - EmptyStackException in PDFStreamEngine.getColorSpaces  [PDFBOX-2935] - Problem while extracting font from PDFontSetting (used in   PDExtendedGraphicsState)  [PDFBOX-2940] - ClassCastException in FDF export  [PDFBOX-2958] - TIFF-Predictor with 1 bit per component not supported  [PDFBOX-2964] - Checkbox getOnValue() throws NPE  [PDFBOX-2965] - NPE in PDAcroForm.getField() if the /Fields entry is missing  [PDFBOX-2976] - java.util.zip.DataFormatException: incorrect data check  [PDFBOX-2982] - fix ClassCastExceptions in operator methods  [PDFBOX-2985] - Potential NPE in PDMarkedContent#getMCID()  [PDFBOX-2986] - Potential resource leak in TTFParser's use of RAFDataStream  [PDFBOX-2987] - NPE in PDTrueTypeFont.extractCMaps  [PDFBOX-2988] - Infinite recursion in ExtractImages 1.8.11-SNAPSHOT  [PDFBOX-2989] - LZW decode filter shouldn't throw IndexOutOfBoundsException  [PDFBOX-2990] - PDDocument.load fails to load a PDF document.  [PDFBOX-2996] - StackOverflow in Quicksort  [PDFBOX-3002] - PDF files not closed after load fails  [PDFBOX-3022] - Maven repos should be https  [PDFBOX-3034] - Newly created XRef stream has direct root objects  [PDFBOX-3035] - Files with missing xref table must fail  [PDFBOX-3041] - Wrong default type in Xref stream W0 element  [PDFBOX-3087] - Metadata stream should not be compressed  [PDFBOX-3097] - ClassCastException in Axial / Radial shading when object   reference in extends  [PDFBOX-3110] - Extract by beads doesn't work  [PDFBOX-3114] - Visible signatures in different pages changes previous revision  [PDFBOX-3153] - Direct JPEG extraction results in invalid images in 2.0.0 releases.  [PDFBOX-3155] - org.apache.pdfbox.util.PDFTextStripper class initialization   throws NumberFormatException with recent Verona-enabled Java 9 JVMs  [PDFBOX-3157] - PDOutputIntent has N=3 (RGB) hardcoded  [PDFBOX-3173] - Signature dictionary is not decrypted in encrypted files  [PDFBOX-3190] - Links don't work in firefox  [PDFBOX-3193] - New NPE in PDFBox 1.8.11-rc1 in Acroform PDCheckbox's isChecked()    改进    [PDFBOX-1621] - Add setModifiedDate(Calendar c) to PDAnnotation  [PDFBOX-2891] - Use animal sniffer maven plugin to detect non java 5 api usage   within the 1.8 branch  [PDFBOX-2952] - Log statement on level 'severe' while nothing else indicates error  [PDFBOX-2962] - Handle TIFF predictor for bpc 2 and 4 / optimize existing   predictor code  [PDFBOX-3007] - Preflight cookbook example is inefficient  [PDFBOX-3176] - Add a removeRegion method in PDFTextSTripperByArea class

PDFBox是Java实现的PDF文档协作类库,提供PDF文档的创建、处理以及文档内容提取功能,也包含了一些命令行实用工具。

主要特性包括:

  • 从PDF提取文本
  • 合并PDF文档
  • PDF 文档加密与解密
  • 与Lucene搜索引擎的集成
  • 填充PDF/XFDF表单数据
  • 从文本文件创建PDF文档
  • 从PDF页面创 建图片
  • 打印PDF文档

来自: http://www.oschina.net//news/69983/pdfbox-1-8-11