The WorkerThread Blog

we know stuff so you don't have to

Add PDF Result Type to MOSS 2007 Advanced Search

Posted by workerthread on June 10, 2009

Originally this post was going to be a lot longer – the plan was to go into lots of detail about how to customise the SharePoint 2007 or Search Server Advanced Search page and extend the Result Type drop-down to allow selection of Acrobat PDF Files.  Then I saw this post over at Get the Point which contains links to several very good articles from Ben Curry and Bill English describing how to customise the SharePoint 2007 Search Centre. 

One of their articles, which you can download as a PDF file, is called Customize the Advanced Search Page in an Office SharePoint Server 2007 Search Center Site and it covers the initial customisation options for Advanced Search web parts very well.  So all I need to do is explain how to use their techniques to add PDF documents as a Result Type.

If we create a document in Microsoft Word 2007, we use standard document properties to create searchable metadata like this:

Word Document Properties (click for bigger picture)

And once the document has been uploaded to SharePoint and successfully crawled, those properties become searchable through the Advanced Search page.

Equally, an Acrobat PDF document may contain properties.  Here is the same document converted to PDF, this time showing the properties in Adobe Reader:

PDF Document Properties (click for bigger picture)

So how do we make it easier for our SharePoint users to search only within PDF files, and for specific document properties?  Well, as you will see if you download and read the article mentioned above, the trick is to add a new ResultType for PDF files in the properties XML.  NOTE – of course all of this assumes that your SharePoint server is already successfully crawling and indexing Acrobat PDF files…

The “properties” section which determines what can be selected for Result Type “Word Documents” looks like this:

<ResultType DisplayName="Word Documents" Name="worddocuments">
    <Query>FileExtension=’doc’ Or FileExtension=’docx’ Or FileExtension=’dot’</Query>
    <PropertyRef Name="Author" />
    <PropertyRef Name="DocComments"/>
    <PropertyRef Name="Description" />
    <PropertyRef Name="DocKeywords"/>
    <PropertyRef Name="FileName" />
    <PropertyRef Name="Size" />
    <PropertyRef Name="DocSubject"/>
    <PropertyRef Name="Path" />
    <PropertyRef Name="Created" />
    <PropertyRef Name="Write" />
    <PropertyRef Name="CreatedBy" />
    <PropertyRef Name="ModifiedBy" />
    <PropertyRef Name="Title"/>
    <PropertyRef Name="Manager" />
    <PropertyRef Name="Company"/>
</ResultType>

Now, not all of the same properties can be found in a PDF document.  In the test PDF document I created using the Office 2007 PDF add-in, the properties I was able to successfully search in SharePoint were Keywords, Name, Size, Subject, Created Date, Last Modified Date and Title.  And this is the ResultType I added to my Properties XML to make this work:

<ResultType DisplayName="PDF Documents" Name="acrobatpdf">
    <Query>FileExtension=’pdf’</Query>
    <PropertyRef Name="DocKeywords"/>
    <PropertyRef Name="FileName" />
    <PropertyRef Name="Size" />
    <PropertyRef Name="DocSubject"/>
    <PropertyRef Name="Created" />
    <PropertyRef Name="Write" />
    <PropertyRef Name="Title"/>
</ResultType>

All of which needs to be added above the </ResultTypes> closing tag at the end of the properties.  Once this was in place, I could select PDF result Types and search by property, like so:

PDF Result Type (click for bigger picture)

One word of caution if you plan to edit the Properties XML in the web part – the text editor you get in the Web part is pretty rudimentary, as you can see here

Advanced Search Properties (click for bigger picture)

So if you aren’t familiar with XML and don’t have another XML editor to paste the text into, you might want to copy and paste the original text into a notepad file before you start.

By the way, if your documents are currently stored in Office format, and you’re looking for a solution which manages conversion to PDF and transfers metadata across, take a look at the Office to PDF Conversion Solution Accelerator from Bamboo Solutions – more info here.

Update 12 June 09 – Another SharePoint add-on worth a look if you are converting Office documents to PDFs is Muhimbi PDF Coverter for SharePoint.  Right now it looks like you can convert to PDF from various Word document versions as well as RTF and plain text, although I understand more file types will follow.  They also have a feature which lets you do the PDF conversion from a SharePoint Designer Workflow which looks like it could be useful.  Muhimibi also tell me you get the option to copy metadata across when you create the PDF. I plan to try this out and hopefully post about it soon.

About these ads

3 Responses to “Add PDF Result Type to MOSS 2007 Advanced Search”

  1. Kevin Dube said

    this doesn’t seem to work in sharepoint 2010? trying it and can’t get any pdf results to show up in my advanced search…

    • Hi Kevin

      I haven’t tried this yet with SharePoint 2010. I did notice however that in 2010 the XML tag for the file extension is KeywordQuery whereas it was Query in 2007. Have you tried that?

      Derek

  2. Paul Wetzel said

    Kevin,
    there is a minor change needed for SP2010. Try this:

    FileExtension=”pdf”

    Thank you,
    Paul

Sorry, the comment form is closed at this time.

 
%d bloggers like this: