LinkChecker 5.2 commandline options
USAGE linkchecker [options] [file-or-url]…
Options:
-h, –help show this help message and exit
General options:
-f FILENAME, –config=FILENAME
Use FILENAME as configuration file. Per default
LinkChecker first searches
/etc/linkchecker/linkcheckerrc and then
~/.linkchecker/linkcheckerrc (under Windows <path-to-
program>\linkcheckerrc).
-I, –interactive Ask for URL if none are given on the commandline.
-t NUMBER, –threads=NUMBER
Generate no more than the given number of threads.
Default number of threads is 10. To disable threading
specify a non-positive number.
–priority Run with normal thread scheduling priority. Per
default LinkChecker runs with low thread priority to
be suitable as a background job.
-V, –version Print version and exit.
–allow-root Do not drop privileges when running as root user on
Unix systems.
–stdin Read list of white-space separated URLs to check from
stdin.
Output options:
-v, –verbose Log all URLs. Default is to log only errors and
warnings.
–complete Log all URLs, including duplicates. Default is to log
duplicate URLs only once.
–no-warnings Don’t log warnings. Default is to log warnings.
-W REGEX, –warning-regex=REGEX
Define a regular expression which prints a warning if
it matches any content of the checked link. This
applies only to valid pages, so we can get their
content.
Use this to check for pages that contain some form of
error message, for example ‘This page has moved’ or
‘Oracle Application Server error’.
–warning-size-bytes=NUMBER
Print a warning if content size info is available and
exceeds the given number of bytes.
–check-html Check syntax of HTML URLs with local library (HTML
tidy).
–check-html-w3 Check syntax of HTML URLs with W3C online validator.
–check-css Check syntax of CSS URLs with local library
(cssutils).
–check-css-w3 Check syntax of CSS URLs with W3C online validator.
–scan-virus Scan content of URLs with ClamAV virus scanner.
-q, –quiet Quiet operation, an alias for ‘-o none’. This is only
useful with -F.
-o TYPE[/ENCODING], –output=TYPE[/ENCODING]
Specify output as ‘xml’, ‘none’, ‘gml’, ‘text’,
‘blacklist’, ‘html’, ‘gxml’, ‘sql’, ‘csv’, ‘dot’.
Default output type is text. The ENCODING specifies
the output encoding, the default is that of your
locale. Valid encodings are listed at
http://docs.python.org/lib/standard-encodings.html.
-F TYPE[/ENCODING][/FILENAME], –file-output=TYPE[/ENCODING][/FILENAME]
Output to a file linkchecker-out.TYPE,
$HOME/.linkchecker/blacklist for ‘blacklist’ output,
or FILENAME if specified. The ENCODING specifies the
output encoding, the default is that of your locale.
Valid encodings are listed at
http://docs.python.org/lib/standard-encodings.html.
The FILENAME and ENCODING parts of the ‘none’ output
type will be ignored, else if the file already exists,
it will be overwritten. You can specify this option
more than once. Valid file output types are ‘xml’,
‘none’, ‘gml’, ‘text’, ‘blacklist’, ‘html’, ‘gxml’,
‘sql’, ‘csv’, ‘dot’. You can specify this option
multiple times to output to more than one file.
Default is no file output. Note that you can suppress
all console output with the option ‘-o none’.
–no-status Do not print check status messages.
-D STRING, –debug=STRING
Print debugging output for the given logger. Available
loggers are ‘all’, ‘thread’, ‘checking’, ‘gui’,
‘cache’, ‘cmdline’, ‘dns’. Specifying ‘all’ is an
alias for specifying all available loggers. The option
can be given multiple times to debug with more than
one logger.
For accurate results, threading will be disabled
during debug runs.
–trace Print tracing information.
–profile Write profiling data into a file named
linkchecker.prof in the current working directory. See
also –viewprof.
–viewprof Print out previously generated profiling data. See
also –profile.
Checking options:
-r NUMBER, –recursion-level=NUMBER
Check recursively all links up to given depth. A
negative depth will enable infinite recursion. Default
depth is infinite.
–no-follow-url=REGEX
Check but do not recurse into URLs matching the given
regular expression. This option can be given multiple
times.
–ignore-url=REGEX Only check syntax of URLs matching the given regular
expression. This option can be given multiple times.
-C, –cookies Accept and send HTTP cookies according to RFC 2109.
Only cookies which are sent back to the originating
server are accepted. Sent and accepted cookies are
provided as additional logging information.
–cookiefile=FILENAME
Read a file with initial cookie data. The cookie data
format is explained below.
-a, –anchors Check HTTP anchor references. Default is not to check
anchors. This option enables logging of the warning
‘url-anchor-not-found’.
–no-anchor-caching
This option is deprecated and does nothing. It will be
removed in a future release.
-u STRING, –user=STRING
Try the given username for HTTP and FTP authorization.
For FTP the default username is ‘anonymous’. For HTTP
there is no default username. See also -p.
-p STRING, –password=STRING
Try the given password for HTTP and FTP authorization.
For FTP the default password is ‘anonymous@’. For HTTP
there is no default password. See also -u.
–timeout=NUMBER Set the timeout for connection attempts in seconds.
The default timeout is 60 seconds.
-P NUMBER, –pause=NUMBER
Pause the given number of seconds between two
subsequent connection requests to the same host.
Default is no pause between requests.
-N STRING, –nntp-server=STRING
Specify an NNTP server for ‘news:…’ links. Default
is the environment variable NNTP_SERVER. If no host is
given, only the syntax of the link is checked.
–no-proxy-for=REGEX
This option is deprecated and does nothing. It will be
removed in a future release.
EXAMPLES
The most common use checks the given domain recursively, plus any
single URL pointing outside of the domain:
linkchecker http://www.example.org/
Beware that this checks the whole site which can have several hundred
thousands URLs. Use the -r option to restrict the recursion depth.
Don’t connect to mailto: hosts, only check their URL syntax. All other
links are checked as usual:
linkchecker –ignore-url=^mailto: http://www.example.org
Checking local HTML files on Unix:
linkchecker ../bla.html subdir/blubber.html
Checking a local HTML file on Windows:
linkchecker c:\temp\test.html
You can skip the "http://" url part if the domain starts with "www.":
linkchecker http://www.example.de
You can skip the "ftp://" url part if the domain starts with "ftp.":
linkchecker -r0 ftp.example.org
OUTPUT TYPES
Note that by default only errors and warnings are logged.
You should use the –verbose option to see valid URLs,
and –complete when outputting a sitemap graph format.
text Standard text output, logging URLs in keyword: argument fashion.
html Log URLs in keyword: argument fashion, formatted as HTML.
Additionally has links to the referenced pages. Invalid URLs have
HTML and CSS syntax check links appended.
csv Log check result in CSV format with one URL per line.
gml Log parent-child relations between linked URLs as a GML sitemap
graph.
dot Log parent-child relations between linked URLs as a DOT sitemap
graph.
gxml Log check result as a GraphXML sitemap graph.
xml Log check result as machine-readable XML.
sql Log check result as SQL script with INSERT commands. An example
script to create the initial SQL table is included as create.sql.
blacklist
Suitable for cron jobs. Logs the check result into a file
~/.linkchecker/blacklist which only contains entries with invalid
URLs and the number of times they have failed.
none Logs nothing. Suitable for debugging or checking the exit code.
REGULAR EXPRESSIONS
Only Python regular expressions are accepted by LinkChecker.
See http://www.amk.ca/python/howto/regex/ for an introduction in
regular expressions.
The only addition is that a leading exclamation mark negates
the regular expression.
COOKIE FILES
A cookie file contains standard RFC 805 header data with the following
possible names:
Scheme (optional)
Sets the scheme the cookies are valid for; default scheme is ‘http’.
Host (required)
Sets the domain the cookies are valid for.
Path (optional)
Gives the path the cookies are value for; default path is ‘/’.
Set-cookie (optional)
Set cookie name/value. Can be given more than once.
Multiple entries are separated by a blank line.
The example below will send two cookies to all URLs starting with
‘http://example.org/hello/‘ and one to all URLs starting
with ‘https://example.com/‘:
Host: example.org
Path: /hello
Set-cookie: ID="smee"
Set-cookie: spam="egg"
Scheme: https
Host: example.com
Set-cookie: baggage="elitist"; comment="hologram"