2021年3月6日星期六

VBA: Copy all text from Word to Single Cell in Excel preserving formatting

Goal

Using VBA copy all text from MS Word file to a single cell in MS Excel, while preserving the text formatting.

Caveats:

  • Text may contain return carriage / linefeed characters (see Section below on Return Carriage /Linefeed)
  • Character formatting must be preserved as derived from the MS Word Compare function including: red strikethrough for deleted text, green underline for added text.
  • Documents do not contain: Section breaks, headers/footers, tables (inline & floating), footnotes, endnotes, indents, tabs, etc.
  • Page layouts and paragraph formatting does not need to be preserved.
  • I know there are other solutions outside of VBA for text comparison, but the end users are desperate to have this in Excel. Save those for another question. Answers outside of VBA will be down voted.

Carriage Return / Linefeed: MS Products don't play nice

In short, Word uses CHAR(13), or ^p if searching via Find within Word. Whereas Excel uses CHAR(10) for CrLf within a string. If text is copied from Word and pasted to Excel, and Excel sees CHAR(13) in the pasted text it is interpreted as "at this point move to the next cell and then continue pasting text." When run in a loop this starts to cause issues.

First Iteration of the Loop

The end result is that only the first paragraph, i.e. anything before the first CHAR(13), of the first text comments ends up in the intended cell. Each subsequent paragraph ends up in the same column but down one row.

rec_num Comments
1 R1.Para1
R1.Para2
...
R1.ParaN

After all iterations of the Loop

On the next iteration of the loop, the first paragraph is written in the intended cell (same column but one row down from cell for the previous iteration) but overwrites the second paragraph from the previous iteration of the loop. This goes on for all comment comparisons in the loop. Hence, only the first paragraph is ever captured to the intended cell.

rec_num Comments
1 Rec1.Para1
2 Rec2.Para1
... ...
N-1 RecN-1.Para1
N RecN.Para1
RecN-1.Para3
...
RecN-1.ParaK

Half Fix

You can use the Replace() function in VBA, or = SUBSTITUTE() in an excel formula to resolve this, but then text formatting is lost (see below for VBA Code examples)

rec_num Comments
1 Rec1.Para1
Rec1.Para2
...
Rec1.ParaK
... ...
N RecN.Para1
RecN.Para2
...
RecN.ParaK

For further details see Preserve line breaks when inserting/pasting into a single Excel cell

Attempted Solutions

The following attempted solution preserves text formatting, but generates the CrLf issue:

' Lots of previous code. most important part is below    ' Run the Compare function in MS Word between doc1 and doc2 to get marked-up text in Word  objDoc.Activate  objDoc.Compare  Name:=objDoc2.Name, CompareTarget:=wdCompareTargetCurrent  objDoc.Save    objDoc2.Close    ' Copy compare text from doc1  objWord.Selection.WholeStory  objWord.Selection.Copy  ' Pate formatted text from doc1 to a cell in Excel  ActiveSheet.Paste Destination:=ws.Range("F" & row_num)    objDoc.Close        

This attempted solution solves the CrLf issue, but does not preserve text formatting:

' Lots of previous code. most important part is below    ' Run the Compare function in MS Word between doc1 and doc2 to get marked-up text in Word  objDoc.Activate  objDoc.Compare  Name:=objDoc2.Name, CompareTarget:=wdCompareTargetCurrent  objDoc.Save    objDoc2.Close    ' Copy compare text from doc1 and paste the text into a singe cell in Excel  If objWord.ActiveDocument.Paragraphs.Count >= 1 Then      objWord.Selection.WholeStory      ' Yes I did try objWord.Selection.FormattedText without success :(      str1 = Replace(objWord.Selection.Text, Chr(13), Chr(10))      ws.Range(col & row_num).Value = str1  End If  

Partial Answers from StackOverflow

https://stackoverflow.com/questions/66498369/vba-copy-all-text-from-word-to-single-cell-in-excel-preserving-formatting March 06, 2021 at 03:16AM

没有评论:

发表评论