Text on some PDFs is not highlightable
Printed From: www.exp-systems.com
Category: PDF reDirect
Forum Name: Using PDF reDirect
Forum Discription: Questions and Comments on using PDF reDirect Freeware and Pro
URL: http://www.exp-systems.com/Forum_exp/forum_posts.asp?TID=1177
Printed Date: 21 Nov 24 at 7:51PM
Topic: Text on some PDFs is not highlightable
Posted By: sludge7051-x
Subject: Text on some PDFs is not highlightable
Date Posted: 17 Mar 14 at 1:30PM
I'm on Win 8, using the latest version of Firefox 27.0.1
I have been using PDF reDirect v2.5.2, to print web pages to PDF.
I later like to use Natural Reader to read the PDF. This is a text-to-speech program . . . http://www.naturalreaders.com/download.php
The problem I have is that the text is not usable all the time, when I print through Firefox (it is through IE 10 though).
When not usable, it's one of two things: 1.) dragging with the mouse, you can see the selection rectangle trying to lasso the text, but it fails, and highlights nothing . . . 2.) you can highlight the text, but there is weird spacing in between the letters and words, and when Natural Reader tries to read, it is garbled.
I get a good PDF with Firefox on some pages, like this page, the Lincoln Trillion bill . . . http://store.livingwaters.com/index.php?page=shop.product_details&flypage=flypage.tpl&product_id=477&category_id=8&option=com_virtuemart&Itemid=199&lang=en
But not on this page, How to Create a Macro in Excel 2010 . . . http://www.ehow.com/how_8037720_create-macro-excel-2010.html
I tried this Add-on: Print pages to Pdf 0.1.9.3 . . . and it was not able to make a good PDF of the Create a Macro page, either.
I went into Firefox's about:config . . . and disabled javascript.enabled . . . but that didn't do anything. This is what to do if you go to a page like Snopes, and Natural Reader cannot read text on the web page. It's protected from scraping by javascript.
I think there's something up with the web page coding, but I don't know what, or why Firefox can't deal with it.
If I print the Create a Macro page through IE 10, though, it will make a good pdf. IE always makes a good PDF. I was previously always using IE to make PDFs because . . .
I tried using Firefox to make PDFs a year or so ago, with the above results (but never got a good pdf). I thought something might have changed by now, so I tried it. It looked like it was fixed, but then I got a bad pdf of the Create a Macro page. Any idea what's going on? TY
|
Replies:
Posted By: yorkshire_lad
Date Posted: 18 Mar 14 at 4:45AM
I'm sure Michael will be along shortly to comment. However (as an ordinary user), I have experienced similar behaviour with different browsers. IE usually produces pdfs that contain text (that can be manipulated as text) whereas Firefox sometimes seems to print as a graphic. So it's down to the way the browser prints, not PDF reDirect. I've never found a solution, and if I really want to create a pdf as text, I'll use IE (I mostly use FF).
|
Posted By: sludge7051-x
Date Posted: 18 Mar 14 at 7:48AM
Hello,
Yes, Firefox seems to print some as a graphic . . . I wonder what it's doing when it prints with weird spacing? E-Bay statement come out like that. In that case, I notice they use a tiny font, seems like it has to do with if it's using proportional spacing or not.
It seems like EXP must use something related to printing, that is within each browser, and that's why results vary. I wonder what it is.
I have this post in to Firefox also. They're thinking to just use some of the current print-to-PDF Add-ons. I tried a couple, with no change:
Print to PDF - Text in the PDF is not always good for Text-to-Speech | Firefox Support Forum . . . https://support.mozilla.org/en-US/questions/989439
|
Posted By: sludge7051-x
Date Posted: 18 Mar 14 at 10:23AM
CleanPrint makes a good PDF of the "Create a Macro" page:
http://www.formatdynamics.com/bookmarklets/ - http://www.formatdynamics.com/bookmarklets/
I think it goes to a server to do it, though, so it's not going through Firefox?
|
Posted By: Michel_K17
Date Posted: 18 Mar 14 at 9:53PM
Hi there,
Unfortunately, it is not PDF redirect. All that PDF redirect does is create a PDF file of the printout as specified by the browser.
If browser X says to print it one way or another, then PDF redirect will follow the instructions.
Cheers!
Michel
------------- Michel Korwin-Szymanowski
EXP Systems LLC
|
Posted By: sludge7051-x
Date Posted: 19 Mar 14 at 7:48AM
I think I figured out what's going on with EXP, and other print-to-PDF programs, where the text is weird . . . it's javascript
* * * * * * * * * * * * * * * * * * * * * * * *
Using Firefox . . . NoScript is installed / Allow Scripts Globally (dangerous)
* * * * * * * * * * * * * * * * * * * * * * * *
Go to: http://nypost.com/2012/08/18/feds-move-to-strike-lewd-details-from-homeland-security-sexual-discrimination-lawsuit/
The text on this page is highlightable, and text-to-speech can read it. If you make a PDF of it with EXP, though, you can't highlight any of the text. It's like it's an image.
* * * * * * * * * * * * * * * * * * * * * * * *
If I then do this: NoScript installed / Forbid Scripts Globally . . . now you get a good PDF
* * * * * * * * * * * * * * * * * * * * * * * *
View Page Info / there are many occurrences of the word "javascript"
. . .
NoScript / Allow Scripts Globally (dangerous) . . . and now you're back to getting a PDF where you can't even lasso the the text
* * * * * * * * * * * * * * * * * * * * * * * *
about:config / javascript.enabled . . . double-click to make it false go back to that page, and refresh . . . and now you can get a good PDF
* * * * * * * * * * * * * * * * * * * * * * * *
It looks like this is due to javascript protecting the text.
Is there a way for EXP to automatically disable javascript before it prints a PDF? IE must be doing this itself.
* * * * * * * * * * * * * * * * * * * * * * * *
Another example, but a little different . . .
Make sure this is back to enabled . . . about:config / javascript.enabled . . . double-click to make it "true"
On this page: http://www.enviroreporter.com/2014/03/china-syndrome-town/
NoScript installed / Allow Scripts Globally (dangerous) . . . and you can't even highlight any of the text . . . but the PDF comes out fine
NoScript installed / Forbid Scripts Globally . . . now the text on the web page is highlightable, and can be read by text-to-speech
|
Posted By: yorkshire_lad
Date Posted: 19 Mar 14 at 11:31AM
I like that tip: I may try it when I next need to avoid a graphic print. TYVM for posting the feedback.
|
Posted By: sludge7051-x
Date Posted: 19 Mar 14 at 2:44PM
|