The WorkerThread Blog

we know stuff so you don't have to

Archive for the ‘Search’ Category

Help With Automating iFilter and PDF Indexing Support in SharePoint

Posted by workerthread on July 23, 2009

The most popular post on this blog is still the one about Configuring PDF iFilter for WSS 3.0.  Today I saw a post on Notes for .Net which may help to automate this multi-step process.  What’s more the post says it works with a variety of iFilters including FoxIt and Adobe.

The Notes for .Net team have posted their solution, which adds a new stsadm command, on CodePlex here.  I haven’t tested it myself yet but will do on my next VM build.

Technorati tags: , , ,

Posted in Microsoft SharePoint, Search | Comments Off

Add PDF Result Type to MOSS 2007 Advanced Search

Posted by workerthread on June 10, 2009

Originally this post was going to be a lot longer – the plan was to go into lots of detail about how to customise the SharePoint 2007 or Search Server Advanced Search page and extend the Result Type drop-down to allow selection of Acrobat PDF Files.  Then I saw this post over at Get the Point which contains links to several very good articles from Ben Curry and Bill English describing how to customise the SharePoint 2007 Search Centre. 

One of their articles, which you can download as a PDF file, is called Customize the Advanced Search Page in an Office SharePoint Server 2007 Search Center Site and it covers the initial customisation options for Advanced Search web parts very well.  So all I need to do is explain how to use their techniques to add PDF documents as a Result Type.

If we create a document in Microsoft Word 2007, we use standard document properties to create searchable metadata like this:

Word Document Properties (click for bigger picture)

And once the document has been uploaded to SharePoint and successfully crawled, those properties become searchable through the Advanced Search page.

Equally, an Acrobat PDF document may contain properties.  Here is the same document converted to PDF, this time showing the properties in Adobe Reader:

PDF Document Properties (click for bigger picture)

So how do we make it easier for our SharePoint users to search only within PDF files, and for specific document properties?  Well, as you will see if you download and read the article mentioned above, the trick is to add a new ResultType for PDF files in the properties XML.  NOTE – of course all of this assumes that your SharePoint server is already successfully crawling and indexing Acrobat PDF files…

The “properties” section which determines what can be selected for Result Type “Word Documents” looks like this:

<ResultType DisplayName="Word Documents" Name="worddocuments">
    <Query>FileExtension=’doc’ Or FileExtension=’docx’ Or FileExtension=’dot’</Query>
    <PropertyRef Name="Author" />
    <PropertyRef Name="DocComments"/>
    <PropertyRef Name="Description" />
    <PropertyRef Name="DocKeywords"/>
    <PropertyRef Name="FileName" />
    <PropertyRef Name="Size" />
    <PropertyRef Name="DocSubject"/>
    <PropertyRef Name="Path" />
    <PropertyRef Name="Created" />
    <PropertyRef Name="Write" />
    <PropertyRef Name="CreatedBy" />
    <PropertyRef Name="ModifiedBy" />
    <PropertyRef Name="Title"/>
    <PropertyRef Name="Manager" />
    <PropertyRef Name="Company"/>
</ResultType>

Now, not all of the same properties can be found in a PDF document.  In the test PDF document I created using the Office 2007 PDF add-in, the properties I was able to successfully search in SharePoint were Keywords, Name, Size, Subject, Created Date, Last Modified Date and Title.  And this is the ResultType I added to my Properties XML to make this work:

<ResultType DisplayName="PDF Documents" Name="acrobatpdf">
    <Query>FileExtension=’pdf’</Query>
    <PropertyRef Name="DocKeywords"/>
    <PropertyRef Name="FileName" />
    <PropertyRef Name="Size" />
    <PropertyRef Name="DocSubject"/>
    <PropertyRef Name="Created" />
    <PropertyRef Name="Write" />
    <PropertyRef Name="Title"/>
</ResultType>

All of which needs to be added above the </ResultTypes> closing tag at the end of the properties.  Once this was in place, I could select PDF result Types and search by property, like so:

PDF Result Type (click for bigger picture)

One word of caution if you plan to edit the Properties XML in the web part – the text editor you get in the Web part is pretty rudimentary, as you can see here

Advanced Search Properties (click for bigger picture)

So if you aren’t familiar with XML and don’t have another XML editor to paste the text into, you might want to copy and paste the original text into a notepad file before you start.

By the way, if your documents are currently stored in Office format, and you’re looking for a solution which manages conversion to PDF and transfers metadata across, take a look at the Office to PDF Conversion Solution Accelerator from Bamboo Solutions – more info here.

Update 12 June 09 – Another SharePoint add-on worth a look if you are converting Office documents to PDFs is Muhimbi PDF Coverter for SharePoint.  Right now it looks like you can convert to PDF from various Word document versions as well as RTF and plain text, although I understand more file types will follow.  They also have a feature which lets you do the PDF conversion from a SharePoint Designer Workflow which looks like it could be useful.  Muhimibi also tell me you get the option to copy metadata across when you create the PDF. I plan to try this out and hopefully post about it soon.

Posted in Microsoft SharePoint, Office, Search | 3 Comments »

Bing vs Google

Posted by workerthread on June 4, 2009

Since the launch of Microsoft’s Bing “decision engine” I’ve noticed quite a few bing URLs cropping up in the referrer stats for this blog.  I thought it would be interesting to take an example of one of these referred searches, in this case “add pdf ifilter to sharepoint” and see how it fares with results from Bing and from Google.

Blackdog has a neat split-screen page where you can type a your query into one place and compare the Bing and Google results side by side.   So I tried this -

The Bing search returned one of my blog posts in fourth place, as you can see here:

Bing Search Result

And Google returned it in third place:

Google Search Result

Interestingly though, I’m currently seeing more referrals from Bing.  Maybe Bing users need more PDF iFilter Info…

Technorati tags:

Posted in Search, web development | 1 Comment »