从文档(office,pdf,hwp)抽取文本的Java类库：JSearch

jopen 9年前

从文档(office,pdf,hwp)抽取文本的Java类库：JSearch。

Download & Installation

JSearch.jar
Just import JSearch.jar to your project

Requirement

It should work with various types of document. ex) hwp, pdf, office
It should support extract string and rapidly find keyword from doucments.
It will be jar library.
All functions are synchronous.
a result of extraction contains full string.
a result of finding contains word count.

Class

public class JSearch

JSearch supports various types of documents with open source engines.
And this library contains 3 types of functions. extract...() and isContainsKeyword...() and getFileList...()

HWP, DOC, PPT, EXCEL, TEXT, PDF and UNKNOWN are supported.

Modifier and Type	Method and Description
static java.lang.String	extractContentsFromFile(java.io.File target) extract string
static java.lang.String	extractContentsFromFile(java.lang.String filePath) extract string
static java.util.List	getFileListContainsKeywordFromDirectory(java.lang.String dirPath, java.lang.String keyword) get a list of files which are containing keyword.
static java.util.List	getFileListContainsKeywordFromDirectory(java.lang.String dirPath, java.lang.String keyword, boolean recursive) get a list of files which are containing keyword.
static boolean	isContainsKeywordFromFile(java.io.File file, java.lang.String keyword) get true or false about containing keyword.
static boolean	isContainsKeywordFromFile(java.lang.String filePath, java.lang.String keyword) get true or false about containing keyword.

项目主页：http://www.open-open.com/lib/view/home/1439124196411

从文档(office,pdf,hwp)抽取文本的Java类库：JSearch

Download & Installation

Requirement

Class

相关经验

目录