2021年3月7日星期日

I would like to replace all strings between > and <, that is, for example, replace center (from excerpt:> is the sun the center of the universe?:<) by foo, but do not replace center (from excerpt: <...center;">).

I am using the following command:

perl -pi -w -e 's/center/foo/g;' file.html

So I tried to use replace all "foo" between two HTML tags using REGEX (PHP code), getting like this:

perl -pi -w -e 's/(?<![\w$<])\$\(center\)(?![\w$>])/foo/g;' file.html

but it doesn't work properly for what I want. I searched the web and what comes closest to what I need is Perl string replace: Match, but not replace, part of regex, Perl Regex - Search and replace between tags only if string is in-between and Replace text in string with exceptions. But I can't quite solve the need to just replace strings that are not <center> specifically.

fragment_html_code:

</td></tr><tr><th colspan="2" class="" style="text-align:center;">is the sun the center of the universe?:</th></tr><tr class=""><td colspan="2" class="" style="text-align:center;">  center </td></tr>  

EDIT UPDATE:

About Lordadmira Solution:

The code fails every time there is a line jump between <> and </>. For example failed when the word to be replaced is like (here there is a line break) center </>. What can it be happening? See below for an example of context:

</td></tr><tr><th colspan="2" class="" style="text-align:center;">     (Here there is a line jump and then the solution of Lordadmira fails and does not occur) ----> is the sun the center of the universe?:      </th></tr><tr class=""><td colspan="2" class="" style="text-align:center;">          center </td></tr>

EDIT UPDATE 01:

I modified the initial solution of Lordadmira to perl -0777 -pi -w -e 's{>\K[^<]*?\K.foo[^<]*(?=<).}{ bar }g;' file.html or perl -0777 -pi -w -e 's{>\K[^<]*?\K.foo.[^<]*(?=<).}{ bar }g;' file.html and this has worked with line break but it erases everything that comes after foo. I tried several methods to avoid the text after the foo was erased but I have not been able to get a solution. If in case I managed to resolve this then the question would be fully answered.

EDIT UPDATE 02:

I have now changed my modification from Lordadmira in EDIT UPDATE 01 to perl -0777 -pi -w -e 's{>\K[^<]*?\K.foo.[^<](?!=<)}{ bar }g;' in order to correct the fact that the text after foo was previously being deleted. But this is erasing the first character of the string after foo, I need to resolve this. I would like to say that for example in

> "lorem    foo ipsum "<   

when foo is replaced the result is not as expected because I get >" lorem bar psum "< , that is, the ipsum "i" is deleted.


The solution below has solved the issue of having a character in the string after foo is being deleted with each replacement. For the time being under a broad context this has been the most functional adaptation of Lordadmira's initial solution.

perl -0777 -pi -w -e 's{>\K[^<]*?\K.foo[^<](?!=<)}{ bar }g;'

https://stackoverflow.com/questions/66469833/perl-regex-replace-string-only-if-it-is-not-between-and March 04, 2021 at 02:39PM

没有评论:

发表评论