From 835e373b3eeaabcd0621ed6798ab500f37982fae Mon Sep 17 00:00:00 2001 From: Calvin Morrison Date: Wed, 5 Apr 2023 14:13:39 -0400 Subject: xpdf-no-select-disable --- doc/pdftohtml.cat | 144 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 144 insertions(+) create mode 100644 doc/pdftohtml.cat (limited to 'doc/pdftohtml.cat') diff --git a/doc/pdftohtml.cat b/doc/pdftohtml.cat new file mode 100644 index 0000000..5ddbfa0 --- /dev/null +++ b/doc/pdftohtml.cat @@ -0,0 +1,144 @@ +pdftohtml(1) General Commands Manual pdftohtml(1) + + + +NAME + pdftohtml - Portable Document Format (PDF) to HTML converter (version + 4.04) + +SYNOPSIS + pdftohtml [options] PDF-file HTML-dir + +DESCRIPTION + Pdftohtml converts Portable Document Format (PDF) files to HTML. + + Pdftohtml reads the PDF file, PDF-file, and places an HTML file for + each page, along with auxiliary images in the directory, HTML-dir. The + HTML directory will be created; if it already exists, pdftohtml will + report an error. + +CONFIGURATION FILE + Pdftohtml reads a configuration file at startup. It first tries to + find the user's private config file, ~/.xpdfrc. If that doesn't exist, + it looks for a system-wide config file, typically /etc/xpdfrc (but this + location can be changed when pdftohtml is built). See the xpdfrc(5) + man page for details. + +OPTIONS + Many of the following options can be set with configuration file com- + mands. These are listed in square brackets with the description of the + corresponding command line option. + + -f number + Specifies the first page to convert. + + -l number + Specifies the last page to convert. + + -z number + Specifies the initial zoom level. The default is 1.0, which + means 72dpi, i.e., 1 point in the PDF file will be 1 pixel in + the HTML. Using '-z 1.5', for example, will make the initial + view 50% larger. + + -r number + Specifies the resolution, in DPI, for background images. This + controls the pixel size of the background image files. The ini- + tial zoom level is controlled by the '-z' option. Specifying a + larger '-r' value will allow the viewer to zoom in farther with- + out upscaling artifacts in the background. + + -vstretch number + Specifies a vertical stretch factor. Setting this to a value + greater than 1.0 will stretch each page vertically, spreading + out the lines. This also stretches the background image to + match. + + -embedbackground + Embeds the background image as base64-encoded data directly in + the HTML file, rather than storing it as a separate file. + + -nofonts + Disable extraction of embedded fonts. By default, pdftohtml + extracts TrueType and OpenType fonts. Disabling extraction can + work around problems with buggy fonts. + + -embedfonts + Embeds any extracted fonts as base64-encoded data directly in + the HTML file, rather than storing them as separate files. + + -skipinvisible + Don't draw invisible text. By default, invisible text (commonly + used in OCR'ed PDF files) is drawn as transparent (alpha=0) HTML + text. This option tells pdftohtml to discard invisible text + entirely. + + -allinvisible + Treat all text as invisible. By default, regular (non-invisi- + ble) text is not drawn in the background image, and is instead + drawn with HTML on top of the image. This option tells pdfto- + html to include the regular text in the background image, and + then draw it as transparent (alpha=0) HTML text. + + -formfields + Convert AcroForm text and checkbox fields to HTML input ele- + ments. This also removes text (e.g., underscore characters) and + erases background image content (e.g., lines or boxes) in the + field areas. + + -table Use table mode when performing the underlying text extraction. + This will generally produce better output when the PDF content + is a full-page table. NB: This does not generate HTML tables; + it just changes the way text is split up. + + -opw password + Specify the owner password for the PDF file. Providing this + will bypass all security restrictions. + + -upw password + Specify the user password for the PDF file. + + -verbose + Print a status message (to stdout) before processing each page. + [config file: printStatusInfo] + + -q Don't print any messages or errors. [config file: errQuiet] + + -cfg config-file + Read config-file in place of ~/.xpdfrc or the system-wide config + file. + + -v Print copyright and version information. + + -h Print usage information. (-help and --help are equivalent.) + +BUGS + Some PDF files contain fonts whose encodings have been mangled beyond + recognition. There is no way (short of OCR) to extract text from these + files. + +EXIT CODES + The Xpdf tools use the following exit codes: + + 0 No error. + + 1 Error opening a PDF file. + + 2 Error opening an output file. + + 3 Error related to PDF permissions. + + 99 Other error. + +AUTHOR + The pdftohtml software and documentation are copyright 1996-2022 Glyph + & Cog, LLC. + +SEE ALSO + xpdf(1), pdftops(1), pdftotext(1), pdfinfo(1), pdffonts(1), pdfde- + tach(1), pdftoppm(1), pdftopng(1), pdfimages(1), xpdfrc(5) + http://www.xpdfreader.com/ + + + + 18 Apr 2022 pdftohtml(1) -- cgit v1.2.3