Results 1 to 1 of 1
Hi, I wanted to convert a few .pdf files to .epub. I use ebook-convert (from calibre), which converts it to html first, with pdftohtml. However, pdftohtml doesn't exactly work as ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
- 03-20-2010 #1Just Joined!
- Join Date
- Oct 2009
- Posts
- 11
pdftohtml - local characters
Hi, I wanted to convert a few .pdf files to .epub. I use ebook-convert (from calibre), which converts it to html first, with pdftohtml. However, pdftohtml doesn't exactly work as expected - some of local characters (language-specific) are there, but some of them are completely gone. There is just blank space instead.I googled, and found i should do it manually, with parameter "-enc Latin" . Then pdftohtml says:
I googled more, but found nothing useful. Just that I should install package {They do not allow me to post links, so google for "xpdf latin2" and the first link} - i did so, but it keeps throwing the same error. Some relevant files follow:"Code:Error: Couldn't find unicodeMap file for the 'Latin2' encoding
/etc/xpdf/xpdfrc-latin2:
/etc/xpdf/xpdfrcCode:#----- begin Latin2 support package (2002-oct-22) unicodeMap Latin2 /usr/share/xpdf/latin2/Latin2.unicodeMap #----- end Latin2 support package
/etc/xpdf/includesCode:#======================================================================== # # System-wide xpdfrc file # # The Xpdf tools look for a config file in two places: # 1. ~/.xpdfrc # 2. /etc/xpdf/xpdfrc # # Note that if ~/.xpdfrc exists, Xpdf will NOT read the system # configuration file /etc/xpdf/xpdfrc. You may wish to include it # from your ~/.xpdfrc using: # include /etc/xpdf/xpdfrc # and then add additional settings. # # For complete details on config file syntax and available options, # please see the xpdfrc(5) man page. # # # #======================================================================== #----- display fonts # These map the Base-14 fonts to the Type 1 fonts that ship with # ghostscript (gsfonts package). displayFontT1 Times-Roman /usr/share/fonts/type1/gsfonts/n021003l.pfb displayFontT1 Times-Italic /usr/share/fonts/type1/gsfonts/n021023l.pfb displayFontT1 Times-Bold /usr/share/fonts/type1/gsfonts/n021004l.pfb displayFontT1 Times-BoldItalic /usr/share/fonts/type1/gsfonts/n021024l.pfb displayFontT1 Helvetica /usr/share/fonts/type1/gsfonts/n019003l.pfb displayFontT1 Helvetica-Oblique /usr/share/fonts/type1/gsfonts/n019023l.pfb displayFontT1 Helvetica-Bold /usr/share/fonts/type1/gsfonts/n019004l.pfb displayFontT1 Helvetica-BoldOblique /usr/share/fonts/type1/gsfonts/n019024l.pfb displayFontT1 Courier /usr/share/fonts/type1/gsfonts/n022003l.pfb displayFontT1 Courier-Oblique /usr/share/fonts/type1/gsfonts/n022023l.pfb displayFontT1 Courier-Bold /usr/share/fonts/type1/gsfonts/n022004l.pfb displayFontT1 Courier-BoldOblique /usr/share/fonts/type1/gsfonts/n022024l.pfb displayFontT1 Symbol /usr/share/fonts/type1/gsfonts/s050000l.pfb displayFontT1 ZapfDingbats /usr/share/fonts/type1/gsfonts/d050000l.pfb # If you need to display PDF files that refer to non-embedded fonts, # you should add one or more fontDir options to point to the # directories containing the font files. Xpdf will only look at .pfa, # .pfb, and .ttf files in those directories (other files will simply # be ignored). #fontDir /usr/local/fonts/bakoma #----- PostScript output control # Set the default PostScript file or command. psFile "|lpr" # Set the default PostScript paper size -- this can be letter, legal, # A4, or A3. You can also specify a paper size as width and height # (in points). Xpdf uses the paper size in /etc/papersize by default. #psPaperSize letter #----- text output control # Choose a text encoding for copy-and-paste and for pdftotext output. # The Latin1, ASCII7, and UTF-8 encodings are built into Xpdf. Other # encodings are available in the language support packages. #textEncoding UTF-8 # Choose the end-of-line convention for multi-line copy-and-past and # for pdftotext output. The available options are unix, mac, and dos. #textEOL unix #----- misc settings # Enable Type 1 font rasterizing with t1lib. Default "yes". #enableT1lib no # Enable TrueType and Type 1 font rasterizing with FreeType. Default "yes". #enableFreeType no # Enable anti-aliasing of fonts. Default "yes". #antialias no # Set the command used to run a web browser when a URL hyperlink is # clicked. urlCommand "sensible-browser '%s'" # Include the language configuration file list generated by update-xpdfrc include /etc/xpdf/includes
Code:# This file was automatically generated by /usr/sbin/update-xpdfrc. # Instead, add or remove files in /etc/xpdf/ then run # /usr/sbin/update-xpdfrc to regenerate this file. include /etc/xpdf/xpdfrc-arabic include /etc/xpdf/xpdfrc-turkish include /etc/xpdf/xpdfrc-cyrillic include /etc/xpdf/xpdfrc-greek include /etc/xpdf/xpdfrc-hebrew include /etc/xpdf/xpdfrc-latin2 include /etc/xpdf/xpdfrc-thai
edit:The README said that the "add-to-xpdfrc" should be added to the system-wide configuration file, which is /etc/xpdf/xpdfrc, (instead of what was said in README), but even after adding it there, it does not works, still the same error.Code:0ebb395634bd1e23113dcf69f19930f7 /usr/share/xpdf/latin2/Latin2.unicodeMap


Reply With Quote
