News:

SMF - Just Installed!

Main Menu

Messing about with PDFs

Started by Cez, Jul 05, 2023, 09:17 AM

Previous topic - Next topic

Cez

Need help from the group.

I have PDFs. Lets say they are about 16-20 mb. I need to crop each page. So extract them as jpg using PDF24. I then do the cropping using Fotosizer. When I put the images back together using Acrobat the file size PDF is much bigger say 180 mb. I know there is is compressiony type stuff to be doing but do you have any advice as to how I can get it back down to 20 mb without losing too much quality. Hope you can help.

Peter

The only PDF's I have seen size are OCR versions. That or they are crap quality.

If they are OCR versions then when you export the pages as JPG's the software has to convert the text information into the image format which always means a larger file.

The only way to recompile it into a similar size file is to use a program to create, you guessed it, an OCR version. That or apply a hideous amount of compression for a substandard file.

kitsunebi

PDFs have their place, particularly when your goal is to display text - they're called "true PDFs" and they AREN'T made up of full-page images.  Each page is made up of multiple (often dozens) of highly-compressed puzzle-piece-like fragmentations of whatever images are on the page, and the text is pure true-type data, all of which gets compiled by the PDF reader to be seen as a single image.  This allows the text to be razor sharp at any viewing size, since it can just scale the font size, and the filesize will be very small since the non-text images are low-res and compressed.  This is the format of choice for pretty much all magazine publishers who digitally release their products.

Comics fans and digital mag collectors are a bit more demanding and want the images to be high res as well, so PDFs are not the format of choice for those markets.  They require high res jpgs of every page, a format that suits CBRs better. 

What's probably happening to you (as others have mentioned) is that you're dealing with true-PDFs and are not actually extracting the files within the PDF - if you did (and you can, with other programs), you'd end up with an unreadable mess of fragmented images.  What you're actually doing is exporting the images as pages, which is creating jpgs that don't exist in the PDF at all.  You're essentially exporting an image of what the compiled images look like in your PDF reader.  These created jpgs are typically going to be a larger size than the original contents of the PDF.

Since you want to crop the images, I'd say you're screwed regarding filesize bloat.  There's no way to crop the images without turning them into a jpg image of the entire page, at which point, there's no way for you to turn them back into "true PDF" data, at least, none that I'm aware of.

slider1983

Quote from: kitsunebi on Jul 06, 2023, 09:04 PMPDFs have their place, particularly when your goal is to display text - they're called "true PDFs" and they AREN'T made up of full-page images.  Each page is made up of multiple (often dozens) of highly-compressed puzzle-piece-like fragmentations of whatever images are on the page, and the text is pure true-type data, all of which gets compiled by the PDF reader to be seen as a single image.  This allows the text to be razor sharp at any viewing size, since it can just scale the font size, and the filesize will be very small since the non-text images are low-res and compressed.  This is the format of choice for pretty much all magazine publishers who digitally release their products.
I've had a similar issue with New Age Gamer downloads from their website. I wanted to use Scan to PDF to join the supplements back into the magazine but when I import any of the downloaded PDFs it comes up as you say as fragmented unreadable images. Anything I can do to get them read properly?