Help With Automating iFilter and PDF Indexing Support in SharePoint

July 23, 2009

The most popular post on this blog is still the one about Configuring PDF iFilter for WSS 3.0.  Today I saw a post on Notes for .Net which may help to automate this multi-step process.  What’s more the post says it works with a variety of iFilters including FoxIt and Adobe.

The Notes for .Net team have posted their solution, which adds a new stsadm command, on CodePlex here.  I haven’t tested it myself yet but will do on my next VM build.

Technorati tags: , , ,

Add PDF Result Type to MOSS 2007 Advanced Search

June 10, 2009

Originally this post was going to be a lot longer – the plan was to go into lots of detail about how to customise the SharePoint 2007 or Search Server Advanced Search page and extend the Result Type drop-down to allow selection of Acrobat PDF Files.  Then I saw this post over at Get the Point which contains links to several very good articles from Ben Curry and Bill English describing how to customise the SharePoint 2007 Search Centre. 

One of their articles, which you can download as a PDF file, is called Customize the Advanced Search Page in an Office SharePoint Server 2007 Search Center Site and it covers the initial customisation options for Advanced Search web parts very well.  So all I need to do is explain how to use their techniques to add PDF documents as a Result Type.

If we create a document in Microsoft Word 2007, we use standard document properties to create searchable metadata like this:

Word Document Properties (click for bigger picture)

And once the document has been uploaded to SharePoint and successfully crawled, those properties become searchable through the Advanced Search page.

Equally, an Acrobat PDF document may contain properties.  Here is the same document converted to PDF, this time showing the properties in Adobe Reader:

PDF Document Properties (click for bigger picture)

So how do we make it easier for our SharePoint users to search only within PDF files, and for specific document properties?  Well, as you will see if you download and read the article mentioned above, the trick is to add a new ResultType for PDF files in the properties XML.  NOTE – of course all of this assumes that your SharePoint server is already successfully crawling and indexing Acrobat PDF files…

The “properties” section which determines what can be selected for Result Type “Word Documents” looks like this:

<ResultType DisplayName="Word Documents" Name="worddocuments">
    <Query>FileExtension=’doc’ Or FileExtension=’docx’ Or FileExtension=’dot’</Query>
    <PropertyRef Name="Author" />
    <PropertyRef Name="DocComments"/>
    <PropertyRef Name="Description" />
    <PropertyRef Name="DocKeywords"/>
    <PropertyRef Name="FileName" />
    <PropertyRef Name="Size" />
    <PropertyRef Name="DocSubject"/>
    <PropertyRef Name="Path" />
    <PropertyRef Name="Created" />
    <PropertyRef Name="Write" />
    <PropertyRef Name="CreatedBy" />
    <PropertyRef Name="ModifiedBy" />
    <PropertyRef Name="Title"/>
    <PropertyRef Name="Manager" />
    <PropertyRef Name="Company"/>
</ResultType>

Now, not all of the same properties can be found in a PDF document.  In the test PDF document I created using the Office 2007 PDF add-in, the properties I was able to successfully search in SharePoint were Keywords, Name, Size, Subject, Created Date, Last Modified Date and Title.  And this is the ResultType I added to my Properties XML to make this work:

<ResultType DisplayName="PDF Documents" Name="acrobatpdf">
    <Query>FileExtension=’pdf’</Query>
    <PropertyRef Name="DocKeywords"/>
    <PropertyRef Name="FileName" />
    <PropertyRef Name="Size" />
    <PropertyRef Name="DocSubject"/>
    <PropertyRef Name="Created" />
    <PropertyRef Name="Write" />
    <PropertyRef Name="Title"/>
</ResultType>

All of which needs to be added above the </ResultTypes> closing tag at the end of the properties.  Once this was in place, I could select PDF result Types and search by property, like so:

PDF Result Type (click for bigger picture)

One word of caution if you plan to edit the Properties XML in the web part – the text editor you get in the Web part is pretty rudimentary, as you can see here

Advanced Search Properties (click for bigger picture)

So if you aren’t familiar with XML and don’t have another XML editor to paste the text into, you might want to copy and paste the original text into a notepad file before you start.

By the way, if your documents are currently stored in Office format, and you’re looking for a solution which manages conversion to PDF and transfers metadata across, take a look at the Office to PDF Conversion Solution Accelerator from Bamboo Solutions – more info here.

Update 12 June 09 – Another SharePoint add-on worth a look if you are converting Office documents to PDFs is Muhimbi PDF Coverter for SharePoint.  Right now it looks like you can convert to PDF from various Word document versions as well as RTF and plain text, although I understand more file types will follow.  They also have a feature which lets you do the PDF conversion from a SharePoint Designer Workflow which looks like it could be useful.  Muhimibi also tell me you get the option to copy metadata across when you create the PDF. I plan to try this out and hopefully post about it soon.


Bing vs Google

June 4, 2009

Since the launch of Microsoft’s Bing “decision engine” I’ve noticed quite a few bing URLs cropping up in the referrer stats for this blog.  I thought it would be interesting to take an example of one of these referred searches, in this case “add pdf ifilter to sharepoint” and see how it fares with results from Bing and from Google.

Blackdog has a neat split-screen page where you can type a your query into one place and compare the Bing and Google results side by side.   So I tried this -

The Bing search returned one of my blog posts in fourth place, as you can see here:

Bing Search Result

And Google returned it in third place:

Google Search Result

Interestingly though, I’m currently seeing more referrals from Bing.  Maybe Bing users need more PDF iFilter Info…

Technorati tags:

Important SharePoint Service Pack 2 Info

May 22, 2009

It seems there is a problem with SharePoint 2007 SP2 whereby during installation a product expiration date is improperly activated, and the result is that your SharePoint installation will expire as though it were a trial version after 180 days – more details available on the SharePoint Team blog here.

Apparently there will be a hotfix available soon, and if you have already installed SP2 on MOSS 2007 then you can fix by re-entering your Product ID.  If you have already applied SP2 to Search Server Express you need to wait for the hotfix, but will definitely need to install this.

If you haven’t already installed SP2, sounds like its definitely best to wait…..


64-Bit PDF iFilter Now Available from Adobe

December 10, 2008

Excellent news, picked up via From the Field, Adobe have released a proper 64-bit iFilter 9 in response to customer requests, complete with instructions for setup on SharePoint Server 2007, Exchange and SQL Server.  You can download the installer here, as well as various instruction documents.

Don’t forget if you already have a lot of PDF documents in SharePoint libraries, they won’t be indexed unless you do a full crawl….

Update 25 March 09 – here are some additional links people might find useful -

Following Adobe Instructions can cause problems

Installing 64 bit iFilter on Windows Server 2008

SharePoint Solutions – Installing 64 bit iFilter

Technorati tags: , , , ,