On our last episode I talked about using EXIFtool to analyze Office documents pre-Office 2007 for metadata in order to acquire usernames to use against other services for brute forcing. One of the problems with EXIFtool is that it can't analyze metadata in Office documents created with Office 2007. Why? Microsoft adopted a new standard for storing metadata; XML (in the Office Open XML format) was selected over FlashPix for the new format.
So I figured, this will be easy! XML should be real easy to parse out:
$ strings TestingMetadata2007.docx
I was surprised when the output did not turn up any XML. Back to the drawing board. A little more research on Office Open XML revealed that the document format is zipped. Ok, now we can get somewhere.
$ unzip -e TestingMetadata2007.docx
This command unzips the Word document and gives us a whole bunch of XML files in several directories. I examined each one, and was able to determine the one that contains valuable metadata for determining usernames for brute forcing can be found within the core.xml file in the docProps directory contained within the Word document. We can extract the one file and drop it in the current directory, and dropping the path (with the -j option):
$ unzip -e -j TestingMetadata2007.docx docProps/core.xml
Now that we have our handy little XML with good metadata, we need to extract some specific XML elements, namely the cp:LastModifiedBy and cp:Creator elements. I tried to do this with various unix text processing tools, but no matter how I sliced it, I failed miserably. So, under Paul's advice I turned to perl. I used XML::DOM to write a little Perl script to extract the two XML elements. Introducing 2007XMLextract.pl! Get it here! This script will extract the two specific XML elements that contain good metadata for determining usernames, from a filename passed as a command line option:
$perl ./2007XMLextract.pl core.xml
So, let's put it all together to get us some good usernames, one per line. We jsut need to add the magic of unix command processing:
$ unzip -e -j TestingMetadata2007.docx docProps/core.xml | perl ./2007XMLextract.pl core.xml | tr '[:space:]' '\n' | sort | uniq > 2007users.txt
At this point, this method will only analyze one document at a time. I need to modify the process, and the perl script to handle multiple files!