You are on page 1of 7

dtSearch Product Line Features - dtSearch Forensics

Page 1 of 7

Overview Whats New Features Map Animated Demo Who Uses dtSearch Reviews Web Demo Search Site Products Desktop with Spider Network with Spider Publish (portable media) Web with Spider Engine for Linux Engine for Win & .NET Case Studies Powered by dtSearch 3rd-Party Developers & Hosting Solutions by Industry Information Management Technical and Medical Documentation Forensics, Intelligence and Security Legal Recruiting and Staffing Financial, Trade and News Non-Profit and Education International and Other Government International Language Developer Tools General Developer Tools and Resources Orders Buy Now Buy Now Upgrades Developer Pricing Catalogs Outside US Downloads Developer Evaluations Desktop Evaluations Upgrades Other Files In Beta Download PDFs Support FAQ Search Upgrade Notice Service

mhtml:file://D:\_To_USB_HDD\_ SW\DtSearch Engine v7.74.8154\dtSearch Product ... 21-Oct-13

dtSearch Product Line Features - dtSearch Forensics

Page 2 of 7

Other Users Groups Technical Release Notes Technical Beta Notes Contact Support And More dtSearch Corp. Press Releases User Registration Privacy Terms of Use Developer Links dtSearch Logos Contact dtSearch

Features Map

Indexing and Searching Features of Special Interest to Forensics Users Products Optimizing Indexing of Large Collections of Data Distributed/Federated Searching Spider-Assisted Searching Adobe Reader X and XI Users Automatic Recognition of Dates, Email Addresses, and Credit Card Numbers Forensics Filtering Features Email Support Fuzzy Searching

Desktop

Network

Spider

International Language Su Chinese, Japanese and Ko Text With No Word Brea Language Group Identific Hidden Content Search for List of Words Concepts View Log of Encrypted F Index Encrypted PDFs Copying Retrieved Files Making Available Retriev Files on CD/DVD or Othe Portable Media

Web

Optimizing Indexing of Large Collections of Data

Publish

This article acts as a forensics supplement to the article on tips for indexing of large collections of data. Topics in that article include: docum storage and the NTFS file system, general indexing strategy, index and document location, indexing resources and efficient text processing.

Engine

dtSearch can index over a terabyte of text in a single index, with search ti typically less than a second. There are no limits on the number of indexes dtSearch can build and simultaneously search. Please see optimizing inde large collections of data for additional information on using the terabyte indexer.

Features

mhtml:file://D:\_To_USB_HDD\_ SW\DtSearch Engine v7.74.8154\dtSearch Product ... 21-Oct-13

dtSearch Product Line Features - dtSearch Forensics

Page 3 of 7

dtSearch does not alter the original files, including Hash values, in indexi searching and display of documents. Document Filters and Supported Data Distributed/Federated Searching

Search Types

A single terabyte-data index can span multiple local and remote locations example, a single index can include data from hard drives, local area netw Exchange servers (see Outlook/Exchange topic below), Intranet servers a public Web sites (see Spider topic below). (For indexing SQL-type datab please see the Databases and Field Searching topic on the developer Selected Articles by Subject page.)

How dtSearch Works

Relevancy-Ranking

dtSearch can rank federated or distributed indexed search results collectiv relevance, displaying all local and remote files with highlighted hits. A sc "word wheel" display in dtSearch Desktop includes all words in an index covering local and remote locations. dtSearch can also output all indexed to a file. dtSearch Desktop: Click Index > List Index ContentsdtSearch Developer API: Use ListIndexJob (.NET) or DListIndexJob (C++) Spider-Assisted Searching

Databases and Fielded Data

International Languages

The dtSearch Spider supports searching of static browser-ready content ( PDF, XML/XSL); dynamic browser-ready content (MS CMS, SharePoin ASP.NET, etc,); as well as browser-incompatible content (MS Office file OpenOffice files, etc.) The Spider can even index and search web-access data in platforms that dtSearch does not directly support like MAC and U

Working with OCR

FindPlus Distributed Searching

The Spider supports public sites as well as password accessible, forms-ba authentication, and other secure content access. Indexing with the Spider involves simply selecting a URL or URLs and indicating how many verti horizontal links to follow. The Spider automatically figures out the forma data, so there is no need to tell the Spider whether a retrieved web page contains, for example, an MS Office document or a PDF file.

Forensics

The dtSearch Spider displays static and dynamic browser-ready content WYSIWYG, including display of images, formatting and links, with the addition of highlighted hits. The Spider converts browser-incompatible c (such as MS Office or OpenOffice) "on the fly" to HTML for browser dis with highlighted hits. More information (basic article); more information (advanced article)

Developer Evaluations

Desktop Evaluations

For convenient offline access, the dtSearch Spider also includes a caching option, to store the full spidered content along with the index. (Without c the Spider has to return to the relevant URL to display the full content wi highlighted hits.)

Download PDFs

dtSearch Desktop: To enable caching, using the Create Index (Advanc dialog box. dtSearch developer API: To enable caching, set the caching flags in IndexJob.IndexingFlags. Adobe Reader X and XI Users

mhtml:file://D:\_To_USB_HDD\_ SW\DtSearch Engine v7.74.8154\dtSearch Product ... 21-Oct-13

dtSearch Product Line Features - dtSearch Forensics

Page 4 of 7

Animated Demo

Adobe Reader X and XI require a plug-in to support highlighting of hits a search. More information

Automatic Recognition of Dates, Email Addresses, and Credit Card Num Contact dtSearch

copyright notice and terms of use

dtSearch can automatically recognize dates, email addresses, and credit c numbers, and search for these items by type. Through this feature, dtSear for example, search for a credit card number regardless of how it may be formatted, or search for a range of dates even if the dates are expressed in different text formats (January 15, 2005, through 2/19/07). dtSearch can a extract all dates, emails and credit card numbers from a collection of doc More information Forensics Filtering Features

dtSearch offers a Unicode filtering feature for automatic recovery of text corrupt forensically-recovered documents and large data blocks, such as recovered through an "undelete" process, from unallocated computer spa from partially recovered file fragments. The filtering algorithm can scan recovered data blocks using multiple Unicode and other text encoding de methods. More information

dtSearch Desktop: Click Options > Preferences > Filtering Options, a check the "Filter text" option under "Binary files" to enable filtering o binary files. dtSearch developer API: Set Options.BinaryFiles = dtsoFilterBinaryU Email Support

dtSearch includes multiple ways to index Outlook or Exchange messages contacts, tasks, and notes. All methods include indexing and searching of underlying messages, including all meta data, as well as the full text of al attachments. dtSearch will highlight hits in both messages and attachmen including ZIP and other nested attachments.

(i) Starting with Version 7.67, dtSearch supports native PST files, bypass need to go through MAPI or pre-convert the messages to .msg, as describ below.

(ii) In the second approach, dtSearch indexes "live" content in an Outlook profile. In addition to display of search results in dtSearch with highlight dtSearch supports launching a message, contact, task, or note in the nativ application. For example, you can search for a message in dtSearch, laun message in Outlook, and then reply to the message using Outlook.

(iii) For Exchange data, as well as for certain archiving and forensic applications, dtSearch supports extracting Outlook and Exchange data to files. The .msg conversion approach in dtSearch works through a comma tool to extract Outlook items in bulk from larger volumes of PST or Exch data. The converted .msg files will include all properties of the original O item, including any attachments. Following conversion, dtSearch can ind resulting .msg files, including highlighting hits in messages and attachme More information

mhtml:file://D:\_To_USB_HDD\_ SW\DtSearch Engine v7.74.8154\dtSearch Product ... 21-Oct-13

dtSearch Product Line Features - dtSearch Forensics

Page 5 of 7

Normally, dtSearch indexes each .eml file and each .msg file as a single document. Attachments are recursively unpacked and appended to the m body, so no matter how many attachments there are, a single document is indexed for each message. Using the File Types table, you can set up rule require each message to be treated as a container, with the message body attachments each indexed as a separate document in the container. information

The above discussion applies to Outlook and Exchange data. dtSearch ca Outlook Express .dbx files just like any other supported file type.

dtSearch also supports Thunderbird (MBOX/EML), including nested em attachments. Fuzzy Searching

Fuzzy searching uses a proprietary algorithm to find search terms even if are misspelled. dtSearch recommends fuzzy searching for searching emai OCRed text, or any other text that may contain misspellings.

Search fuzziness adjusts from 0 to 10 so you can fine-tune fuzziness to th of OCR or typographical errors in your files. A search for alphabet fuzziness of 1 would find alphaqet; with a fuzziness of 3, it would find b alphaqet and alpkaqet. Fuzziness is not built into the index, so you can v fuzziness at the time of each search. More information on fuzzy and othe search options International Language Support

dtSearch includes Unicode-compatible file parsing, to convert input data Unicode. dtSearch automatically recognizes all Unicode-supported encod representing hundreds of international languages.

The following dtSearch search options work automatically on text in any international language: phrase; Boolean; proximity and directed proximit wildcard; macro; numeric range; fielded data / metadata search options; f searching (adjustable from 0 to 10 to account for typographical or OCR e and relevancy-ranked searching (including natural language vector-space ranking, positional scoring options, general variable term weighting, vari term weighting in fields, and other API-based document classification an sorting options). More information Chinese, Japanese and Korean Text With No Word Breaks

Some Chinese, Japanese, and Korean text does not include word breaks. the text appears as lines of characters with no spaces between the words. Because there are no spaces separating the words on each line, dtSearch s each line of text as a single long word. To make this type of text searchab enable automatic insertion of word breaks around Chinese, Japanese, and Korean characters, so each character will be treated as single word.

dtSearch Desktop: In Options > Preferences > Letters and Words, che box to Insert word breaks between Chinese, Japanese, and Korean characters in text.

mhtml:file://D:\_To_USB_HDD\_ SW\DtSearch Engine v7.74.8154\dtSearch Product ... 21-Oct-13

dtSearch Product Line Features - dtSearch Forensics

Page 6 of 7

dtSearch Developer API: set dtsoTfAutoBreakCJK in Options.TextFla

Note: this setting will only affect text identified as Unicode Chinese, Japa or Korean text; it will not affect text identified as other Unicode characte Language Group Identification

For documents in certain formats that do not include encoding informatio as single-byte text files, dtSearch provides a proprietary language recogn algorithm for detecting text in a large variety of languages (Western Euro other European, Middle-Eastern, etc.). This algorithm is enabled by defau Hidden Content

A search in dtSearch will always include white-on-white text and similar "invisible" text in files. dtSearch also includes options for searching emb objects in Microsoft Office documents, and normally hidden content in H

While HTML comments, scripts, links, and styles are not by default inclu indexing, dtSearch has an option to include these.

dtSearch Desktop: Click Options > Preferences > Indexing Options, a check the box to "Index HTML scripts, styles, links and comments." dtSearch developer API: Set Options.FieldFlags = to a combination o flags: dtsoFfHtmlShowLinks, dtsoFfHtmlShowImgSrc, dtsoFfHtmlShowComments, dtsoFfHtmlShowScripts, dtsoFfHtmlShowStylesheets, and dtsoFfHtmlShowMetatags.

A similar option searches hidden content (such as Macros or other embed objects) in Microsoft Office files. dtSearch Desktop: Click Options > Preferences > Indexing Options, and the box to "Index Hidden content in Office documents." dtSearch developer API: This option is set by default. To disable it, set dtsoFfOfficeSkipHiddenContent in Options.FieldFlags. Search for List of Words or Concepts

dtSearch provides an option to search for a list of words. Under this optio special dialog box provides a way to search for a long list of words, and c list of matching files, in a single step. This option can work with the full of dtSearch search features (Boolean, fuzzy, natural language, etc.). information

For expanding a search for a specific set of word or words to a user-defin of concepts or synonyms, dtSearch also offers a user-defined thesaurus ad to the comprehensive English-language thesaurus included with dtSearch

dtSearch Desktop: Click Options > Preferences > Search Options > U Thesaurus to add a list of synonym rings to a specific terms. View Log of Encrypted Files; Index Encrypted PDFs

After an index update completes, click "View Log" to see a report that w include information on any encrypted or unreadable files that the indexer

mhtml:file://D:\_To_USB_HDD\_ SW\DtSearch Engine v7.74.8154\dtSearch Product ... 21-Oct-13

dtSearch Product Line Features - dtSearch Forensics

Page 7 of 7

not process. This report can be accessed at any time in the index folder in file Index_LastUpdateErrors.html. The report indicates which files were encrypted, (b) corrupt, (c) partially encrypted, and (d) partially corrupt. P encrypted or corrupt files are files that could be indexed in part but that in some encrypted or corrupt data (for example, an email with an encrypted attachment).

To index encrypted PDFs, make a temporary, decrypted copy of the encr files, index the decrypted copy, and then replace the temporary decrypted with the encrypted versions. This one-time unencryption is sufficient for dtSearch operation. dtSearch does not need to unencrypt the PDF files to and display them with highlighted hits once the original index is complet Copying Retrieved Files

dtSearch's Edit Copy file function lets you copy all or selected documen retrieved from a search to a folder. You can optionally preserve the full p filename in the copy, and you can preserve creation and last access times as the last modified date. More information. Making Available Retrieved Files on CD/DVD or Other Portable Media

The dtSearch Publish product can quickly publish forensically retrieved ( e-discovery retrieved) documents to CD, DVD or other portable media. T resulting product provides instant search and display access to the docum set. The CD, DVD or other portable media can run with zero footprint, requiring no installation on the end-user's computer. Please see Mirroring Searchable Web Content on Portable Media overview of how dtSearch Publish works.

Link to dtSearch Search Features page

Back To Top

mhtml:file://D:\_To_USB_HDD\_ SW\DtSearch Engine v7.74.8154\dtSearch Product ... 21-Oct-13

You might also like