DocFetcher 1.1.1 发布,Linux 桌面全文搜索

openkk 12年前
   <p><a href="/misc/goto?guid=4958199064721712114" target="_blank">DocFetcher</a>是一个Linux下的桌面全文搜索工具,它可以快速的在指定的文件夹搜索特定关键字。功能:</p>    <ul>     <li><strong>A portable version</strong>: There is a portable version of DocFetcher that runs on Windows, Linux <em>and</em> Mac OS X. How this is useful is described in more detail further down this page.</li>     <li><strong>64-bit support</strong>: Both 32-bit and 64-bit operating systems are supported.</li>     <li><strong>Unicode support</strong>: DocFetcher comes with rock-solid Unicode support for all major formats, including Microsoft Office, OpenOffice.org, PDF, HTML, RTF and plain text files. The only exception is CHM, for which we don't have Unicode support yet.</li>     <li><strong>Archive support</strong>: DocFetcher supports the following archive formats: zip, 7z, rar, and the whole tar.* family. The file extensions for zip archives can be customized, allowing you to add more zip-based archive formats as needed. Also, DocFetcher can handle an unlimited nesting of archives (e.g. a zip archive containing a 7z archive containing a rar archive... and so on).</li>     <li><strong>Search in source code files</strong>: The file extensions by which DocFetcher recognizes plain text files can be customized, so you can use DocFetcher for searching in any kind of source code and other text-based file formats. (This works quite well in combination with the customizable zip extensions, e.g. for searching in Java source code inside Jar files.)</li>     <li><strong>Outlook PST files</strong>: DocFetcher allows searching for Outlook emails, which Microsoft Outlook typically stores in PST files.</li>     <li><strong>Detection of HTML pairs</strong>: By default, DocFetcher detects pairs of HTML files (e.g. a file named "foo.html" and a folder named "foo_files"), and treats the pair as a single document. This feature may seem rather useless at first, but it turned out that this dramatically increases the quality of the search results when you're dealing with HTML files, since all the "clutter" inside the HTML folders disappears from the results.</li>     <li><strong>Regex-based exclusion of files from indexing</strong>: You can use regular expressions to exclude certain files from indexing. For example, to exclude Microsoft Excel files, you can use a regular expression like this: <code>.*\.xls</code></li>     <li><strong>Mime-type detection</strong>: You can use regular expressions to turn on "mime-type detection" for certain files, meaning that DocFetcher will try to detect their actual file types not just by looking at the filename, but also by peeking into the file contents. This comes in handy for files that have the wrong file extension.</li>     <li><strong>Powerful query syntax</strong>: In addition to basic constructs like <code>OR</code>, <code>AND</code> and <code>NOT</code> DocFetcher also supports, among other things: Wildcards, phrase search, fuzzy search ("find words that are similar to..."), proximity search ("these two words should be at most 10 words away from each other"), boosting ("increase the score of documents containing...")</li>    </ul>    <p>支持的文档格式包括:</p>    <ul>     <li>Microsoft Office (doc, xls, ppt)</li>     <li>Microsoft Office 2007 and newer (docx, xlsx, pptx, docm, xlsm, pptm)</li>     <li>Microsoft Outlook (pst)</li>     <li>OpenOffice.org (odt, ods, odg, odp, ott, ots, otg, otp)</li>     <li>Portable Document Format (pdf)</li>     <li>HTML (html, xhtml, ...)</li>     <li>Plain text (customizable)</li>     <li>Rich Text Format (rtf)</li>     <li>AbiWord (abw, abw.gz, zabw)</li>     <li>Microsoft Compiled HTML Help (chm)</li>     <li>Microsoft Visio (vsd)</li>     <li>Scalable Vector Graphics (svg)</li>    </ul>    <p>DocFetcher 1.1.1 发布,该版本修复了读取 PDF 和 OpenOffice 文件时崩溃的问题。</p>    <div style="text-align:center;" id="img">    <a href="https://simg.open-open.com/show/bda3e55b11f0b2b39405ead35f365851.png"><img style="width:500px;height:375px;" alt="DocFetcher 1.1.1 发布,Linux 桌面搜索" src="https://simg.open-open.com/show/318167a6ffe4c343a2f90f8b640cbd44.png" width="820" height="615" /></a>    </div>