„multiline pattern matching“ mit sed

Ich liebe sed. Und manchmal auch awk. Beide haben aber das Problem, dass man nicht einfach(!) mehrzeilige Pattern suchen und ersetzen kann. Dachte ich zumindest bis gerade eben. Aber man lernt ja nie aus – „hold space“ ist das Zauberwort…

But we can optimize this. Besides pattern space, sed also provides the so called hold space. This is just a temporary buffer on which no operations are performed. We can use this to read the whole input into it first and then replace the pattern space with the contents from hold space. That looks like this:

sed -n ‚1h; 1!H; ${ g; s/foo\nbar/bla\nblub/ p }‘

That first reads the first line from pattern space into hold space (1h) replacing all contents which currently exist in hold space. Then all lines except line 1 are appended to hold space (1!H). The reason why we cannot only use 1,$H is that this would result in a blank line at the beginning since hold space has not been emptied. As soon as the end of the string is reached (range marker $), a subclause is opened which writes contents from hold space into pattern space (g) and does the replacement. Because we have read everything into hold space and then into pattern space, we would get double output. To avoid this, the parameter -n (no output) is set and the edited final string is printed manually with the p command from within the subclause. This method works remarkably well, but you should note that it is much slower if the stream/file is very long. One advantage of sed over many other tools is that it reads line by line, so it doesn’t take more memory when working on long strings. This advantage is abrogated with this method. Keep that in mind.

Danke an Janek Bevendorff, der mein IT-Leben mit diesem Post echt leichter gemacht hat 😉