CFLib.org – Common Function Library Project

solrClean(input)

Last updated October 02, 2012

author

Sami Hoda

Version: 2 | Requires: CF9 | Library: UtilityLib

Description:
Like VerityClean, massages text input to make it Solr compatible. NOTE: requires uCaseWordsForSolr UDF.

Return Values:
Returns a string.

Example:

<cfset cleanSolrSearchText = solrClean(userSearchText) />

Parameters:

Name Description Required
input String to run against Yes

Full UDF Source:

/**
 * Like VerityClean, massages text input to make it Solr compatible.
 * v1.0 by Sami Hoda
 * v2.0 by Daria Norris to deal with wildcard characters used as the first letter of the search
 * v2.1 by Paul Alkema - updated list of characters to escape
 * v2.2 by Adam Cameron - Merge Paul's &amp; Daria's versions of the function, improve some regexes, fix logic error with input argument (was both required and had a default), converted wholly to script
 * 
 * @param input      String to run against (Required)
 * @return Returns a string. 
 * @author Sami Hoda (sami@bytestopshere.com) 
 * @version 2.2, October 2, 2012 
 */
string function solrClean(required string input){
    var cleanText = trim(arguments.input);
    // List of bad charecters. "+ - && || ! ( ) { } [ ] ^ " ~ * ? : \" 
    // http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Escaping Special Characters
    var reBadChars = "\+|-|&&|\|\||!|\(|\)|{|}|\[|\]|\^|\""|\~|\*|\?|\:|\\";
    
    // Replace comma with OR
    cleanText = replace(cleanText, "," , " or " , "all");

    // Strip bad characters
    cleanText = reReplace(cleanText, reBadChars, " ", "all");

    // Clean up sequences of space characters
    cleanText = reReplace(cleanText, "\s+", " ", "all");

    // clean up wildcard characters as first characters
    cleanText = reReplace(cleanText, "(^[\*\?]{1,})", "");

    // uCaseWords - and=AND, etc - lcase rest. if keyword is mixed case - solr treats as case-sensitive!
    cleanText = uCaseWordsForSolr(cleanText);
    return trim(cleanText);
}

Search CFLib.org


Latest Additions

Raymond Camden added
QueryDeleteRows
November 04, 2017

Leigh added
nullPad
May 11, 2016

Raymond Camden added
stripHTML
May 10, 2016

Kevin Cotton added
date2ExcelDate
May 05, 2016

Raymond Camden added
CapFirst
April 25, 2016

Created by Raymond Camden / Design by Justin Johnson