Table of Contents

Name

nlpc-vrfy - verifies nlpc database entry authenticity

Synopsis

nlpc-vrfy [-dhv] [-f path] [-p path]

Description

The nlpc-vrfy verifies fetched nlpc database entries for authenticity. Specifically, it makes sure that the language and encoding are valid. nlpc-vrfy is part of the nlpcrawl(1) series of tools. The arguments are as follow:

-d
Enable debugging. Use multiple times for more verbosity.

-f path
The cache file path. See the nlpcrawl(1) FILES section.

-h
Print a help message and exit.

-p path
The database environment path. See nlpcrawl(1) FILES section.

-v
Print version information and exit.

In addition, the following long arguments may be used:

--filter-charset string
For relevant protocols, if a charset is provided by the server, match it now against string. If the charset is not matched, discontinue fetching the page. Multiple charsets may be space-separated. Example: --filter-charset ‘utf-8 utf-16’.

--filter-dict file
Word database for dictionary matching. If specified, file is read into a hashtable against which blocks of text are matched. If a certain percent of matched elements are reached, the page is considered acceptable. Entries in the file must be one per line (with Unix end-of-line markers), lowercase, and in UTF-8 encoding. Empty lines will be ignored.

--filter-dict-pct num
Tune the percentage of matched --filter-dict words required for success. Defaults to 10 (percent).

---use-interval num
Define the wait period between read scans. Defaults to 120 (seconds). Must be greater than 1.

---use-charset string
Define the default charset if none could be deduced. This defaults to ISO-8859-1 as recommended by the w3c.

Media Types
At this time, the nlpc-vrfy utility is capable of parsing HTML (text/html) and XML (text/xml, application/xml, application/xml+html) pages. Plain text transitions to being directly processed without scanning.

See Also

nlpcrawl(1)

Caveats

Incorrect encoding conventions (for example, the bogus nesting of   as ISO-8859-1 instead of UTF-8) leads to words containing blank spaces (which obviously don’t validate against the hashtable). Also, a large dictionary file takes considerable time to load.

The nlpc-vrfy utility expects the en_GB.UTF-8 locale to be installed and functional. If it mayn’t be installed with setlocale(3) , the system will not start.


Table of Contents