Tuesday, November 30, 2010

Discovering Discovery: DSpace + Solr tips & tricks


DSpace 1.7.0, which is due for release on December 17th, will include a new module called "DSpace Discovery", contributed by the fine folk at @mire.

Discovery adds the ability to use Apache Solr for search, an XMLUI aspect that replaces (most of) the old 'ArtifactBrowser' to enable easy navigation through configurable facets, and a service to allow external sites to perform searches. In future releases, searching will get even easier as autocompletion is added to search boxes.

It's incredibly easy to set up, and because the Solr index exists alongside your traditional plain-old-Lucene search indices, you can switch back and forth without any hassle: no rebuilds, no re-indexing; just enabling/disabling the relevant XMLUI aspects.

You may have seen similar interfaces in other sites: Solr is being used for generic discovery interfaces like Blacklight, as a full-text search module in Drupal and as a custom solution for in-house sites.

You've also possibly seen DSpace Discovery in action at Dryad, an international biosciences data repository.

You can read some more information, including the official documentation and development roadmap on the DSpace wiki.

I've installed DSpace 1.7, now what do I do?

The Discovery Configuration guide in the DSpace documentation/wiki will get you up and running in no time.

I want to create some custom facets/filters. They don't exist as fields in my metadata registries so I can't easily configure them in dspace-solr-search.cfg. Can I configure Solr directly?

Yes! Let me give you an example:

(note: please excuse and ignore my horrible usage of qualified DC -- it's just an example!)

I've been working on a new repository/archive for the Archive of Māori and Pacific Music at The University of Auckland Library, and we had a few pieces of metadata we wanted to treat differently for the purposes of navigation -- a 3-tier "location" for each recording, which we wanted to combine into a single "Place" facet, and fields for both "iwi of the performer" and "iwi of the composer", which we wanted to combine into a single "Iwi" facet.

(for those outside New Zealand, iwi means 'people', and in this case, refers to Māori tribal affiliation, eg. Ngāti Porou or Tainui)

Here's how the Solr schema for DSpace Discovery is configured for faceted/filtered search:

* Defines a dspaceFilter type, which is a fairly simple Solr field type that converts to lowercase and preserves the entire string as a single token (ie. no splitting on spaces or commas, etc.)

* Copies every metadata value into a dynamic field, named [schema.element.qualifier]_filter, eg. dc.title_filter or dc.identifer.issn_filter

So we have three tiers of location data that might look something like:

dc.coverage.spatial_country: "New Zealand"
dc.coverage.spatial_region: "Hawkes Bay"
dc.coverage.spatial_locality: "Waipukurau"

Now, we edit [dspace]/solr/search/conf/schema.xml and add the following new field definitions beneath the definitions for internal fields like "search.resourceid":

<field name="spatial_filter" type="dspaceFilter" indexed="true" stored="true" multiValued="true"/>
<copyField source="dc.coverage.*" dest="spatial_filter"/>

This will take all values where schema is "dc" and element is "coverage", and copy them into a new spatial_filter field, which can then be accessed by dspace-solr-search.cfg when configuring your facets/filters.

Note that this particular example would also copy dc.coverage.temporal values, if any existed -- dc.coverage.spatial* is strictly better for this example, but not as relevant to most use cases ( eg. dc.subject.*, dc.identifier.*, dc.contributor.*, dc.title.* ).

Now all that's left is to add our new "spatial" field to our lists of facets and filters in [dspace]/config/dspace-solr-search.cfg, rebuild our discovery index (I recommend deleting and rebuilding when altering schema.xml) and create some new i18n labels for displaying in XMLUI.

DSpace Discovery will surface our new, helpful "Places" facet which we've created without touching our stored metadata or legacy browse/search indices. Check it out:

If we select "new zealand" and "waikato" to filter our results, the Place facet is now going to tell us about places just within "Waikato, New Zealand"


And that's all! The data does most of the work for us, and DSpace Discovery handles the rest.

In DSpace 1.6.x, I could export a CSV containing item metadata from my search results. Is that possible in DSpace Discovery?

Yes, sort of -- I've written an updated CSV exporter for XMLUI to work with Discovery, but it wasn't written in time for 1.7. It should be in the next release, and I will put a patch up on JIRA shortly for those who wish to use it with 1.7.0.

You mentioned the ability for external sites to query DSpace Discovery --  tell me more!


I'd love to, but I haven't played around with it quite enough to feel like I could do this topic justice -- watch this space!

If you have any questions or tips to share about DSpace Discovery or Apache Solr, please send me an email or leave a comment, or hop over to the DSpace Mailing Lists.

13 comments:

  1. Thank you very much, Kim. I just have added two facets based on your knowledge.

    ReplyDelete
  2. Thanks For Your valuable posting, it was very informative blog.
    ipad application development melbourne

    ReplyDelete
  3. Thoughtful blog post ! Speaking of which , if people is looking for a service to merge some PDF files , my wife found a tool here http://www.altomerge.com/.

    ReplyDelete
  4. Perfect!!! What I can say in this article is very important to be written as it may help everybody. Thanks for shared worth blog. Website Design Bangalore | Website Development Bangalore

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. Hi! Thank you for the share this information. This is very useful information for online blog review readers. Keep it up such a nice posting like this.
    Website Design
    SEO Company

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. Fantastic article! Many thanks for the exciting blog posting! I like the informative article from you. Will look forward for more updates. Good Job, Keep it up..

    ui ux course singapore
    PHP Course singapore
    ui ux course
    .net course singapore
    python online course
    python course

    ReplyDelete
  9. Great! Thank you for sharing your knowledge on this valuable topic. Please keep sharing such a information. It is really nice informative.


    Online Clothes Shopping | Clothes Shops in Malta | Shopping Online Malta

    ReplyDelete