All images are included in split up documents #314
-
|
Hi, we face a problem when splitting up a PDF into several "smaller" PDF files. var filenameBase = Path.GetFileNameWithoutExtension(inputFile);
var pageRanges = new List<PageRange>() {
new PageRange() { LowerBound = 1, UpperBound = 2 },
new PageRange() { LowerBound = 3, UpperBound = 4 },
...
};
using var inputDocument = PdfReader.Open(inputFile, PdfDocumentOpenMode.Import);
foreach(var range in pageRanges.Select((pageRange, index) => new { Pages = pageRange, Index = index })) {
var outputDocument = new PdfDocument();
outputDocument.Options.UseFlateDecoderForJpegImages = PdfUseFlateDecoderForJpegImages.Automatic;
outputDocument.Options.FlateEncodeMode = PdfFlateEncodeMode.BestCompression;
outputDocument.Options.EnableCcittCompressionForBilevelImages = true;
outputDocument.Options.CompressContentStreams = true;
outputDocument.Options.NoCompression = false;
outputDocument.Version = inputDocument.Version;
var pageInfo = $"Pages {range.Pages.LowerBound}-{range.Pages.UpperBound}";
outputDocument.Info.Title = $"{pageInfo} of {inputDocument.Info.Title}";
outputDocument.Info.Creator = inputDocument.Info.Creator;
for (var pageNumber = range.Pages.LowerBound; pageNumber <= range.Pages.UpperBound; pageNumber++)
{
outputDocument.AddPage(inputDocument.Pages[pageNumber - 1]);
}
var outputPath = Path.Combine(targetDirectory, $"{filenameBase}_part_{range.Index + 1}.pdf");
outputDocument.Save(outputPath);
}This creates several documents only containing the pages of the page ranges. However some of the documents are basically as large as the original document. Is this intentional behavior? Thanks you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The creators of that PDF document took a simple approach: There is only a single resource catalog that lists all images - and that catalog is used for all pages. PDFsharp does not analyze the contents of the page, PDFsharp includes all images listed in the required resources of the page. Normally, PDF pages should only list the resources they actually use. In that case, PDFsharp will only include the images needed by the imported pages. |
Beta Was this translation helpful? Give feedback.
The creators of that PDF document took a simple approach: There is only a single resource catalog that lists all images - and that catalog is used for all pages.
PDFsharp does not analyze the contents of the page, PDFsharp includes all images listed in the required resources of the page.
Normally, PDF pages should only list the resources they actually use. In that case, PDFsharp will only include the images needed by the imported pages.
Other tools may be smarter and analyze the page contents. That is not yet done by PDFsharp and not planned for the near future.