My document is not being crawled!!!

OK – here’s one that I struggled with.

I was installing SharePoint Search Server 2008. The objective was to index, and crawl, several file shares. This would allow the users to “find” all those documents that they had been hoarding in the file shares over several years.

As well as the standard Office documents, there were also a lot of PDF documents. As you may already know, out-of-the box, SharePoint doesn’t have an iFilter that will allow it to crawl PDFs. You have to install a third-party iFilter.

Adobe used to offer a separate download that could be installed. Since version 8 of Adobe Reader, the iFilter is included as part of the application. At the time I was doing this project, Adobe Reader 9 was the latest version, so I installed it.

In the beginning I thought that all I had to do was install the application, and then ensure that PDFs were added to the File Types that SharePoint would crawl.

But no…there is some fancy work that has to be done in the registry. Although there are several excellent articles/posts on the Internet that discussed this, I still found that , even when I had followed them to the letter, my PDFs remained happily “undiscovered”.

I know that we are now in the age of Sharepoint 2010, and as just mentioned been documented to the nth degree, but for posterities sake, I have recorded here the process that helped me. (There is also a list of useful links at the end of this post).

Here’s what I did:

  1. Installed Adobe Reader on the same server SharePoint was installed on.
  2. Added the pdf icon (instructions for this can be found in any of the articles/posts that were mentioned above)
  3. Add PDF as a new File Type (also – standard SharePoint procedure)
  4. Opened Regedit, and navigated to
    [HKEY_LOCAL_MACHINESOFTWAREMicrosoftShared ToolsWeb Server Extensions12.0SearchSetupContentIndexCommonFiltersExtension.pdf]
  5. If necessary, changed the existing Multi-string value to:
    {E8978DA6-047F-4E3D-9C78-CDBE46041603}
  6. Navigate to the following key:
    [HKEY_LOCAL_MACHINESOFTWAREMicrosoftOffice Server12.0SearchSetupContentIndexCommonFiltersExtension.pdf]  (note that this is different from Step 4)
  7. If necessary, changed the existing Multi-string value to:
    {E8978DA6-047F-4E3D-9C78-CDBE46041603}
  8. Navigate to the following key:
    [HKEY_LOCAL_MACHINESOFTWAREMicrosoftShared ToolsWeb Server Extensions12.0SearchSetupFilters.pdf]
  9. If not already present, add the following entries:

Name: Default
Data: (value not set)

Name: Extension
Type: REG_SZ
Data: pdf

Name: FileTypeBucket
Type: REG_DWORD
Data: 0x00000001 (1)

Name: MimeTypes
Type: REG_SZ
Data: application/pdf

Type: REG_SZ

This whole process seems pretty straight forward – but it cost me a lot of pain, and many lost hours.

One thing to also be aware of.

If you install SP2 for SharePoint after setting this up, you will need to go back and change those GUIDs again. Chris Even, a giant in the Search world, points this out in his blog. (I strongly recommend having a look at it.)

Useful Links

  • Chris Even on SharePoint Search
  • http://support.microsoft.com/kb/2018558
  • http://support.microsoft.com/kb/944447
  • http://blogs.officezealot.com/zaidi/default.aspx

Of course, there are plenty more – just check with your favorite search engine :O)

http://sharepointsearch.com/cs/blogs/notorioustech/archive/2009/07/28/moss-and-wss-sp2-can-break-pdf-searching.aspx

Leave a Reply