Free Support Forum - aspose.app

Wrong comparison of html files

Hi,
I want to use Apose.Word compare feature for comparing html documents in my product but from the online demo tool, I am getting some wrong/extra comparison that I dont want. Aspose is also showing font/casing differences. For example it show difference of “FORM” vs “Form”. I want to ignore such differences. Can you please let me know if I can set any flag to achieve this?

I am attaching my input documents + their comparision + snapshot of the issue

The tool should not compare “FORM” with “Form” as both are same words.

Wrong Comparision.zip (11.2 KB)

@Muzna_Tariq

We are investigating this scenario in the online application. You investigation ticket ID is WORDSAPP-93. We’ll notify you in case of any update.

Hi, any update on this?

@Muzna_Tariq

This issue is still under investigation.

@Muzna_Tariq

Please have a look at this screenshot.png (211.7 KB). We have added a new option “Ignore case sensitive” in Comparison app. Let us know if issue persists.

Thanks a lot. This worked !!!

However I’m facing another issue where there is no difference in text but Aspose is showing difference.
Documents and snapshot of issue is attached.No Difference.zip (592.1 KB)

@Muzna_Tariq

We’re further investigating this scenario. You’ll be notified in case of any update.

Any update on this?

1 Like

@Muzna_Tariq

We are still working on it.

@Muzna_Tariq

In document 10-K.html, the text “For the transition from to” is inside the table cell. And in document 10-K (FY 2019).html, this text is contained in the paragraph between the tables. Thus, if the texts are contained in different parts, then these texts will never be compared as identical, even if the texts are identical. So, this is the expected behavior.
These html files have different structure: one file has text in a table and other file has text in the paragraph outside the table. We compare not only the text, but the structure too.

But for a user reading the document, this is not a change. He would never know why we have highlighted or marked this as a difference. Is there any flag to ignore html structure differences?

@Muzna_Tariq

We’ll further investigate it.

Any update on this?

@Muzna_Tariq

This scenario is still under investigation. You’ll be notified in case of any progress update.

We tried to find a possible workaround. Unfortunately, there is no solution without changing the html structure. Even Microsoft Word works in the same way.

Can’t html structure be ignored and only text inside is compared?

@Muzna_Tariq

I’m afraid but this is not possible.

We are unable to use your product due to this reason. Let us know if this is something you can plan for future

@Muzna_Tariq

We’ll look into the possibility and let you know. However, we cannot promise anything at the moment.