Table of Contents
nlpc-vrfy - verifies nlpc database entry authenticity
nlpc-vrfy [-dhv] [-f path] [-p path]
The nlpc-vrfy verifies fetched nlpc database entries for authenticity.
Specifically, it makes sure that the language and encoding are valid.
nlpc-vrfy is part of the nlpcrawl(1)
series of tools. The arguments are
- Enable debugging. Use multiple times for more verbosity.
- -f path
- The cache file path. See the nlpcrawl(1)
- Print a help message and exit.
- -p path
- The database environment path. See nlpcrawl(1)
- Print version information and exit.
In addition, the following long arguments may be used:
- --filter-charset string
For relevant protocols, if a charset is provided by the server,
match it now against string. If the charset is not matched, discontinue
fetching the page. Multiple charsets may be space-separated.
--filter-charset ‘utf-8 utf-16’.
- --filter-dict file
Word database for dictionary matching. If specified, file is
read into a hashtable against which blocks of text are matched.
If a certain percent of matched elements are reached, the page is
considered acceptable. Entries in the file must be one per line
(with Unix end-of-line markers), lowercase, and in UTF-8 encoding.
Empty lines will be ignored.
- --filter-dict-pct num
Tune the percentage of matched --filter-dict words required for
success. Defaults to 10 (percent).
- ---use-interval num
Define the wait period between read scans. Defaults to 120 (seconds).
Must be greater than 1.
- ---use-charset string
Define the default charset if none could be deduced. This
defaults to ISO-8859-1 as recommended by the w3c.
At this time, the nlpc-vrfy utility is capable of parsing HTML
(text/html) and XML (text/xml, application/xml, application/xml+html)
pages. Plain text transitions to being directly processed without scanning.
Incorrect encoding conventions (for example, the bogus nesting of
as ISO-8859-1 instead of UTF-8) leads to words containing blank spaces
(which obviously don’t validate against the hashtable). Also, a large
dictionary file takes considerable time to load.
The nlpc-vrfy utility expects the en_GB.UTF-8 locale to be installed and
functional. If it mayn’t be installed with setlocale(3)
, the system will
Table of Contents