Wednesday, June 20, 2012

Interact Intranet Thesaurus - under the covers

If you go into the thesaurus section of Interact's site admin area and see a blank list one can easily be tricked into thinking that there simply aren't any entries.  Try doing a search for a 2 letter combination such as DD and you'll see that there are in fact quite a lot of entries.

Some of the default pairing make perfect sense, others are colloquial, and some are a bit more mysterious.

Examples:
Trimorphodon genus Trimorphodon
Myrciaria cauliflora jaboticaba tree
Second Earl of Guilford north
charity toss foul shot
Anthophyta class Angiospermae
half-wit thicko
hebdomad week

How it is Used

The thesaurus is used to display a list of similar terms to what the user has searched on.  So if you search on "Second Earl of Guilford" it will display an area that says "Did you mean? north".
The search results themselves don't appear to be influenced in any way by the thesaurus from my testing, but I can't say that with 100% certainty.

Additionally, the thesaurus is used to build the list of keywords that users are promoted to users when they are creating documents, categories, and sections to help them make their content easier o find.

I appreciate why this approach was taken and it certainly fits the term thesaurus.  In my previous intranet experience thesaurus lists have been used to add to search results by executing what essentially is an OR based search.  For example in our organization we use HBWW as an acronym for Healthy Babies are Worth the Wait.  Our content is inconsistent with which of those is used in titles/descriptions so users need to search for both under the Interact thesaurus system and the result set would be different for each.  With an OR based search on the same terms the result set would be identical and inclusive.

Extracting the Synonym List: Database Query

If you want to examine the out of the box thesaurus list you'll need to dig into the SQL database.  There are 13 tables dedicated to the thesaurus, yet only 2 of them are populated with any content.  I've added a few thesaurus entries but still none of these tables have changed so I'll continue to look for how they are used. It could be that the other tables populate through site usage.

 SELECT SYN.SynonymID, WORDS1.Word, SYN.WordID, WORDS2.Word
  FROM Interact.dbo.THES_SYNONYMS AS SYN
    INNER JOIN Interact.dbo.THES_WORDS AS WORDS1
        on SYN.WordID = WORDS1.WordID
    INNER JOIN Interact.dbo.THES_WORDS as WORDS2
        on SYN.SynonymID = WORDS2.WordID

Deleting the Full List

I'm waiting on verification from Interact on how to delete the entire list, in case we need to.  My assumption  is that I could delete the full contents of the tables, but there might be a catch dealing with how the ID #'s are generated/incremented and I don't want to screw anything up.

My Advice

If you are moving from an existing intranet to Interact you might want to compare your top 100 search terms  against the thesaurus.

No comments:

Post a Comment