The WorkerThread Blog

we know stuff so you don't have to

Archive for June, 2009

Add PDF Result Type to MOSS 2007 Advanced Search

Posted by workerthread on June 10, 2009

Originally this post was going to be a lot longer – the plan was to go into lots of detail about how to customise the SharePoint 2007 or Search Server Advanced Search page and extend the Result Type drop-down to allow selection of Acrobat PDF Files.  Then I saw this post over at Get the Point which contains links to several very good articles from Ben Curry and Bill English describing how to customise the SharePoint 2007 Search Centre. 

One of their articles, which you can download as a PDF file, is called Customize the Advanced Search Page in an Office SharePoint Server 2007 Search Center Site and it covers the initial customisation options for Advanced Search web parts very well.  So all I need to do is explain how to use their techniques to add PDF documents as a Result Type.

If we create a document in Microsoft Word 2007, we use standard document properties to create searchable metadata like this:

Word Document Properties (click for bigger picture)

And once the document has been uploaded to SharePoint and successfully crawled, those properties become searchable through the Advanced Search page.

Equally, an Acrobat PDF document may contain properties.  Here is the same document converted to PDF, this time showing the properties in Adobe Reader:

PDF Document Properties (click for bigger picture)

So how do we make it easier for our SharePoint users to search only within PDF files, and for specific document properties?  Well, as you will see if you download and read the article mentioned above, the trick is to add a new ResultType for PDF files in the properties XML.  NOTE – of course all of this assumes that your SharePoint server is already successfully crawling and indexing Acrobat PDF files…

The “properties” section which determines what can be selected for Result Type “Word Documents” looks like this:

<ResultType DisplayName="Word Documents" Name="worddocuments">
    <Query>FileExtension=’doc’ Or FileExtension=’docx’ Or FileExtension=’dot'</Query>
    <PropertyRef Name="Author" />
    <PropertyRef Name="DocComments"/>
    <PropertyRef Name="Description" />
    <PropertyRef Name="DocKeywords"/>
    <PropertyRef Name="FileName" />
    <PropertyRef Name="Size" />
    <PropertyRef Name="DocSubject"/>
    <PropertyRef Name="Path" />
    <PropertyRef Name="Created" />
    <PropertyRef Name="Write" />
    <PropertyRef Name="CreatedBy" />
    <PropertyRef Name="ModifiedBy" />
    <PropertyRef Name="Title"/>
    <PropertyRef Name="Manager" />
    <PropertyRef Name="Company"/>
</ResultType>

Now, not all of the same properties can be found in a PDF document.  In the test PDF document I created using the Office 2007 PDF add-in, the properties I was able to successfully search in SharePoint were Keywords, Name, Size, Subject, Created Date, Last Modified Date and Title.  And this is the ResultType I added to my Properties XML to make this work:

<ResultType DisplayName="PDF Documents" Name="acrobatpdf">
    <Query>FileExtension=’pdf'</Query>
    <PropertyRef Name="DocKeywords"/>
    <PropertyRef Name="FileName" />
    <PropertyRef Name="Size" />
    <PropertyRef Name="DocSubject"/>
    <PropertyRef Name="Created" />
    <PropertyRef Name="Write" />
    <PropertyRef Name="Title"/>
</ResultType>

All of which needs to be added above the </ResultTypes> closing tag at the end of the properties.  Once this was in place, I could select PDF result Types and search by property, like so:

PDF Result Type (click for bigger picture)

One word of caution if you plan to edit the Properties XML in the web part – the text editor you get in the Web part is pretty rudimentary, as you can see here

Advanced Search Properties (click for bigger picture)

So if you aren’t familiar with XML and don’t have another XML editor to paste the text into, you might want to copy and paste the original text into a notepad file before you start.

By the way, if your documents are currently stored in Office format, and you’re looking for a solution which manages conversion to PDF and transfers metadata across, take a look at the Office to PDF Conversion Solution Accelerator from Bamboo Solutions – more info here.

Update 12 June 09 – Another SharePoint add-on worth a look if you are converting Office documents to PDFs is Muhimbi PDF Coverter for SharePoint.  Right now it looks like you can convert to PDF from various Word document versions as well as RTF and plain text, although I understand more file types will follow.  They also have a feature which lets you do the PDF conversion from a SharePoint Designer Workflow which looks like it could be useful.  Muhimibi also tell me you get the option to copy metadata across when you create the PDF. I plan to try this out and hopefully post about it soon.

Posted in Microsoft SharePoint, Office, Search | 3 Comments »

Bing vs Google

Posted by workerthread on June 4, 2009

Since the launch of Microsoft’s Bing “decision engine” I’ve noticed quite a few bing URLs cropping up in the referrer stats for this blog.  I thought it would be interesting to take an example of one of these referred searches, in this case “add pdf ifilter to sharepoint” and see how it fares with results from Bing and from Google.

Blackdog has a neat split-screen page where you can type a your query into one place and compare the Bing and Google results side by side.   So I tried this –

The Bing search returned one of my blog posts in fourth place, as you can see here:

Bing Search Result

And Google returned it in third place:

Google Search Result

Interestingly though, I’m currently seeing more referrals from Bing.  Maybe Bing users need more PDF iFilter Info…

Technorati tags:

Posted in Search, web development | 1 Comment »

Learn SharePoint Designer Workflow with Bjørn

Posted by workerthread on June 3, 2009

Until the day when we have a direct USB connection to the brain (USB 4.0?  5.0?) we still need to set aside frequent chunks of time to learn new things, especially if we are working in the SharePoint environment.  Since SharePoint Designer 2007 became a free download, there has been an increased interest in its various uses, both for site customisation and solution development.

Now it’s true to say SPD has its critics, and it’s also true that unless handled with care it can be a dangerous tool.  There have been some interesting posts about its role over at End User SharePoint recently, like this one from Lori Gowin and this from Paul Galvin.

SharePoint Designer workflows provide a code-free method for developing process-based solutions on your site.  They can be developed quickly and I’ve used them many times myself both for prototypes and full solutions.  Of course they do have some drawbacks which can prove painful if you don’t take them into consideration – here are a few:

– No easy way to loop through selected items in a list or library

– Workflows are tied to the list or library you develop them against

– An SPD workflow runs in the context of the user who started the workflow (and respects their permissions)

– You can set a workflow to run when a list item changes, but it doesn’t know what the previous value was

There are various ways around these issues, with many workarounds and open source solutions available on line, but it would be difficult to recommend a lot of these in a production environment.  So you need to consider the limitations before you commit too much time to developing that killer solution.

If you want to spend some of your “learning new things time” learning more about SharePoint Designer workflows, you should invest $14.95 in the latest edition of Understanding SharePoint from Bjørn Furuknaps.  As well as getting a PDF issue of the journal, you can also download lots of screencasts demonstrating topics covered (in both .wmv and iPod .m4v format).  Bjørn’s presenting style is very relaxed and humorous, but with lots of attention to detail.  I get the impression he spent a great deal of time putting the screencasts together, and this shows in the quality of the material.

Prior to releasing the full package, Bjørn created some taster screencasts at EnduserSharePoint.com so you can take a look there if you want more convincing.  By the way, you also get a “bonus issue” covering Nintex Workflow, one of several third party code-free workflow solutions that I’m hearing lots of good things about.

Update – Bjørn has written a review of Nintex Workflow 2007 for SharePoint Magazine.

Posted in Microsoft SharePoint, Training | Comments Off on Learn SharePoint Designer Workflow with Bjørn