Restoring a file from an unattached content database, no web app required!

The scenario: User needs a business-critical file restored from two days ago and the document library does not have versioning turned on. This is in an on-premise SharePoint 2013 farm. The content databases get backed up nightly, with the two most recent backups kept on disk.

The solutions:

  1. To recover the file from the SQL backup, follow this process:
    1. Restore SQL database (with a different name) from the backup you have to a SQL Server.
    2. Follow the instructions in https://technet.microsoft.com/en-us/library/hh269602.aspx?f=255&MSPPError=-2147217396 or http://www.akspug.org/Blog/Post/40/How-to-Recover-A-Single-Document-In-SharePoint up through the point where you have selected the library that contains the file and extracted the library to a network share. Note that the network share must be writeable by the SharePoint Farm account (a service account) that SharePoint will use for doing the extract. Skip creating a web application to restore the content to.
    3. At this point, you have a pile of .CMP files in a network share. The first one is called export.cmp and the rest are export<some number>.cmp. There is also a log file.
      1. You know the backup completed by looking in Central Admin for the status of the export job or at the export log file that is generated in the network share.
      2. All the .CMP files are actually .CAB files. Rename them so you can inspect them. You can rename them using this line of PowerShell:

        dir *.cmp  foreach { Rename-Item $($_.FullName) $($($_.FullName).Replace(“.cmp”,“.cab”)) }

      3. The last cab file, in my example export96.cab, holds xml files with the metadata about the list. The second to last cab file, in my example export95.cab, holds manifest.xml and a bunch of .dat files. This manfiest.xml file is critical to figuring out where to find the file you want to recover. All the other files (export.cab through export94.cab) hold only a collection of .dat files with sequential numerical filenames. The numbering system is hexadecimal (hex, where a single digit goes 0 through F instead of 0-9). Filenames go from 00000000.dat to FFFFFFFF.dat if needed. That’s enough digits to identify over four BILLION files (4,294,967,295), far beyond the 30 Million files or folders you can store in a document library [source: Software Boundaries and Limits support article, retrieved 8/4/2017]. That’s because these .cmp files can be used to export entire sites and site collections, not just a list or library.
      4. Copy the manifest.xml file out of its .cab file, and open it in a text editor or code editor of your choice. Search for the filename of the file that you want to recover. The XML node you find should look like this:

<SPObject
Id=a8acd897-014f-495b-91fc-081dd901753a
ObjectType=SPFile
ParentId=971d53e7-732f-4f22-90ab-ab43c47eb5a6
ParentWebId=5adad535-0f3d-4482-9862-e7caac20d9bf

ParentWebUrl=/sites/lorem
Url=/sites/lorem/Shared Documents/folder1/folderN/lipsum.xlsx>

<File
Url=Shared Documents/folder1/folderN/lipsum.xlsx
Id=a8acd897-014f-495b-91fc-081dd901753a

ParentWebId=5adad535-0f3d-4482-9862-e7caac20d9bf
ParentWebUrl=/sites/lorem
Name=lipsum.xlsx
ListItemIntId=3288
ListId=7cf605ce-99a9-4d66-8583-cf7c6398cfe9
ParentId=971d53e7-732f-4f22-90ab-ab43c47eb5a6
TimeCreated=2017-05-17T21:26:03

TimeLastModified=2017-07-29T17:26:01
Version=1.0
FileValue=00000088.dat
Author=38
ModifiedBy=178>

<Properties>…</Properties>

</File>

</SPObject>

Structurally, what you are seeing is that an SPObject has a lot of attributes and a child File object. That File object has a bunch of attributes and a child Properties collection. We can ignore the Properties, which is why I have collapsed the node and greyed it out. In my example, the file I am trying to recover is an Excel spreadsheet called lipsum.xlsx. In our SharePoint intranet website, it was in the Lorem site, in the Shared Documents library, in the folderN folder within the folder1 folder. Knowing that, I see that the URL of the SPObject and the File URL and the File Name all correspond to that. You can see I have highlighted that in yellow in the XML. I then look at the FileValue (highlighted in teal) , in this case 00000088.dat. That is the .dat file I am looking for.

Which .cab is the 00000088.dat file in? I manually employ a binary search algorithm – that is, I split the search pool approximately in half each time. Specifically, I have 98 cab files that contain .dat files, but they are all in order. I don’t happen to know how many files can be in a cab file, but I know they are all in order. I open cab file #51, export50.cab. Inside I see the first file is, for example, 0000048A.dat. Hex 48A (Decimal 1162) is much larger than Hex 88 (decimal 136), so I know I am looking for a .cab file between export.cab (think of this as export0.cab) and export50.cab, not export50.cab and export98.cab. I split the pool of files again, checking out the contents of Export25.cab. Eventually, I locate the cab that contains 00000088.dat, the 136th file (because 88 is hex, which is 136 in decimal). I pull 00000088.dat out of its .cab file, rename it to ipsum.xlsx, and open it in Excel. Sure enough, it opens and it’s a spreadsheet alright! I send it to the user and the user is very happy. They can confirm it’s the right file and upload it to SharePoint, where it will replace the bad version that is there.

A key point here is that I could locate and recover the user’s spreadsheet without having to stand up a new web application in the on-premise SharePoint Farm. Web Applications are conceptually a very large object, since they can have one or more content databases and many site collections. Standing up a web application to recover a single file would feel like building a whole house just to see if a chair would work in a particular room.

I want to thank MVPs Paul Schaeflein, Amit Vasu, Sean McDonough, and Todd Klindt for responding to me in the Office 365 MVP slack team, with extra thanks to Todd for answering the bulk of my questions. If you are a SharePoint farm admin, you should be listening to Todd’s podcast.

Another footnote: The structure of the CMP files is defined as part of the PRIME API, a content deployment and content migration API that SharePoint inherited in 2007 when it replaced Microsoft Content Management 2002 server. Sean says that the chunk size of the CMP files can be configured. Documentation on this form of content migration is at https://msdn.microsoft.com/en-us/library/office/bb249963(v=office.14).aspx

  1. To provide an easier way to restore previous versions of files going forward, turn versioning on. In the document library, go to library settings.
    1. Under General Settings on the library settings page (listedit.aspx), click on “Versioning settings”

      In Version settings (LstSetng.aspx), pick either the second or third choice depending on your business need.

      Click the OK button to save your changes.

    2. You can use PowerShell to turn versioning on in every library or list in your tenant. See this post by Jack Fruh at http://basementjack.com/sharepoint-2/enable-versions-on-every-sharepoint-site/

–Michael