Reconstructing PDF -> HTML -> PDF

anubha16 · February 12, 2021, 11:09pm

Hi,

I have a PDF file that I need to convert to HTML to translate the content. The extraction renders HTML and a number of jpeg files. I modify the HTML and replace the old HTML and try to reconstruct. However, the new PDF document renders with the html content and the jpegs on separate pages.

atir.tahir · February 13, 2021, 2:06pm

@anubha16

Are you using this app for PDF to HTML conversion or our stand-alone API? Could you please share more details on this scenario along-with the problematic files.

anubha16 · February 17, 2021, 5:45pm

I am using the Java APIs. Attached are the pdf, docx (SDAsposePDFWord.zip) and html (SDAsposeHTML.zip) files.

SDAsposeHTML.zip (90.6 KB)
SDAsposePDFWord.zip (298.6 KB)

First I convert PDF to Word
Then convert Word to HTML

public static void convertPDFToWord() {
try {
// Load source PDF file
com.aspose.pdf.Document doc = new com.aspose.pdf.Document(“SD_Aspose.pdf”);
doc.save(“SD_Aspose.docx”, SaveFormat.DocX);
} catch (Exception ex) {
System.out.println(ex);
}
}

public static void convertWordHTML() {
try {
Document doc = new Document(“SD_Aspose.docx”);
String dataDir = “SDAspose/”;
String outHtmlFile = “SD_Aspose.html”;
// Save the output file
doc.save(dataDir + outHtmlFile, com.aspose.words.SaveFormat.HTML);
} catch (Exception ex) {
System.out.println(ex);
}
}

anubha16 · February 17, 2021, 6:13pm

I have to convert to docx first because I need to translate the text and I am converting to html because I need to display it in a browser.

atir.tahir · February 17, 2021, 6:34pm

This topic has been moved to the related forum: Reconstructing PDF -> HTML -> PDF - Free Support Forum - aspose.com

atir.tahir · February 17, 2021, 6:34pm