Using diff when the same text is on different lines

I recently ran into an issue where I needed to diff two files and determine what text was different though I did not want text to be marked as different if it appeared on different lines. As an example, if I had the below two files, I wanted diff to return nothing

$ cat file1
Some text Blah
$ cat file2
Blah Some text

To my surprise, when I ran diff it returned:

$ diff file1 file2
1d0 <
Some test
2a2 >
Some test

Looking at the diff man page, the only relevant option I could find was –ignore-all-space, but this flag had no impact on the result of the command. The reason why this option did not work was because it only removed spaces on the same line between the files. For example:

$ cat file3
Some text
$ cat file4
S o me tex t
$ diff file3 file4
1c1 <
Some text
--- >
S o me tex t
$ diff --ignore-all-space file3 file4
$

Now, you may be wondering why I desired the functionality to diff two files with the same text on different lines. I was trying to create a configuration synchronization script, but I learned that the configuration output was different depending on what order commands were executed on the device. In my particular case, the two devices I was comparing were initially configured differently so the lines did not always match up.
So how do you compare two files where the text is the same, but the text appears on different lines?

It turns out the solution is easier than expected. Simply run your files through sort prior to running them against diff and the problem is resolved, but remember diff requires files and not strings. This means the following command will not work:

$ diff `sort file1` `sort file2`
diff: extra operand `blah'
diff: Try `diff --help' for more information.

Instead, the diff command should read:

$ diff <(sort file) <(sort file2)
$

© 2011, Steve Flanders. All rights reserved.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top