aboutsummaryrefslogtreecommitdiff
path: root/doc/pdftohtml.cat
diff options
context:
space:
mode:
authorCalvin Morrison <calvin@pobox.com>2023-04-05 14:13:39 -0400
committerCalvin Morrison <calvin@pobox.com>2023-04-05 14:13:39 -0400
commit835e373b3eeaabcd0621ed6798ab500f37982fae (patch)
treedfa16b0e2e1b4956b38f693220eac4e607802133 /doc/pdftohtml.cat
xpdf-no-select-disableHEADmaster
Diffstat (limited to 'doc/pdftohtml.cat')
-rw-r--r--doc/pdftohtml.cat144
1 files changed, 144 insertions, 0 deletions
diff --git a/doc/pdftohtml.cat b/doc/pdftohtml.cat
new file mode 100644
index 0000000..5ddbfa0
--- /dev/null
+++ b/doc/pdftohtml.cat
@@ -0,0 +1,144 @@
+pdftohtml(1) General Commands Manual pdftohtml(1)
+
+
+
+NAME
+ pdftohtml - Portable Document Format (PDF) to HTML converter (version
+ 4.04)
+
+SYNOPSIS
+ pdftohtml [options] PDF-file HTML-dir
+
+DESCRIPTION
+ Pdftohtml converts Portable Document Format (PDF) files to HTML.
+
+ Pdftohtml reads the PDF file, PDF-file, and places an HTML file for
+ each page, along with auxiliary images in the directory, HTML-dir. The
+ HTML directory will be created; if it already exists, pdftohtml will
+ report an error.
+
+CONFIGURATION FILE
+ Pdftohtml reads a configuration file at startup. It first tries to
+ find the user's private config file, ~/.xpdfrc. If that doesn't exist,
+ it looks for a system-wide config file, typically /etc/xpdfrc (but this
+ location can be changed when pdftohtml is built). See the xpdfrc(5)
+ man page for details.
+
+OPTIONS
+ Many of the following options can be set with configuration file com-
+ mands. These are listed in square brackets with the description of the
+ corresponding command line option.
+
+ -f number
+ Specifies the first page to convert.
+
+ -l number
+ Specifies the last page to convert.
+
+ -z number
+ Specifies the initial zoom level. The default is 1.0, which
+ means 72dpi, i.e., 1 point in the PDF file will be 1 pixel in
+ the HTML. Using '-z 1.5', for example, will make the initial
+ view 50% larger.
+
+ -r number
+ Specifies the resolution, in DPI, for background images. This
+ controls the pixel size of the background image files. The ini-
+ tial zoom level is controlled by the '-z' option. Specifying a
+ larger '-r' value will allow the viewer to zoom in farther with-
+ out upscaling artifacts in the background.
+
+ -vstretch number
+ Specifies a vertical stretch factor. Setting this to a value
+ greater than 1.0 will stretch each page vertically, spreading
+ out the lines. This also stretches the background image to
+ match.
+
+ -embedbackground
+ Embeds the background image as base64-encoded data directly in
+ the HTML file, rather than storing it as a separate file.
+
+ -nofonts
+ Disable extraction of embedded fonts. By default, pdftohtml
+ extracts TrueType and OpenType fonts. Disabling extraction can
+ work around problems with buggy fonts.
+
+ -embedfonts
+ Embeds any extracted fonts as base64-encoded data directly in
+ the HTML file, rather than storing them as separate files.
+
+ -skipinvisible
+ Don't draw invisible text. By default, invisible text (commonly
+ used in OCR'ed PDF files) is drawn as transparent (alpha=0) HTML
+ text. This option tells pdftohtml to discard invisible text
+ entirely.
+
+ -allinvisible
+ Treat all text as invisible. By default, regular (non-invisi-
+ ble) text is not drawn in the background image, and is instead
+ drawn with HTML on top of the image. This option tells pdfto-
+ html to include the regular text in the background image, and
+ then draw it as transparent (alpha=0) HTML text.
+
+ -formfields
+ Convert AcroForm text and checkbox fields to HTML input ele-
+ ments. This also removes text (e.g., underscore characters) and
+ erases background image content (e.g., lines or boxes) in the
+ field areas.
+
+ -table Use table mode when performing the underlying text extraction.
+ This will generally produce better output when the PDF content
+ is a full-page table. NB: This does not generate HTML tables;
+ it just changes the way text is split up.
+
+ -opw password
+ Specify the owner password for the PDF file. Providing this
+ will bypass all security restrictions.
+
+ -upw password
+ Specify the user password for the PDF file.
+
+ -verbose
+ Print a status message (to stdout) before processing each page.
+ [config file: printStatusInfo]
+
+ -q Don't print any messages or errors. [config file: errQuiet]
+
+ -cfg config-file
+ Read config-file in place of ~/.xpdfrc or the system-wide config
+ file.
+
+ -v Print copyright and version information.
+
+ -h Print usage information. (-help and --help are equivalent.)
+
+BUGS
+ Some PDF files contain fonts whose encodings have been mangled beyond
+ recognition. There is no way (short of OCR) to extract text from these
+ files.
+
+EXIT CODES
+ The Xpdf tools use the following exit codes:
+
+ 0 No error.
+
+ 1 Error opening a PDF file.
+
+ 2 Error opening an output file.
+
+ 3 Error related to PDF permissions.
+
+ 99 Other error.
+
+AUTHOR
+ The pdftohtml software and documentation are copyright 1996-2022 Glyph
+ & Cog, LLC.
+
+SEE ALSO
+ xpdf(1), pdftops(1), pdftotext(1), pdfinfo(1), pdffonts(1), pdfde-
+ tach(1), pdftoppm(1), pdftopng(1), pdfimages(1), xpdfrc(5)
+ http://www.xpdfreader.com/
+
+
+
+ 18 Apr 2022 pdftohtml(1)