[Drupal] How to filter the text in the comments in a Drupal website?

| | 2 min read

Drupal provides a number of contributed modules to filter the text in comments. One of the most notable examples of such a module is Mollom which can be used to filter out spam comments from genuine ones in a Drupal site. However if we need an extra level of control over the filtering options, we need to write a custom module. Read on to know how to filter the text n the comments in a Drupal website

One of our Drupal clients came up with the followig requirements for their Drupal site.

  • Identify posts that are not in English and block/flag them for review
  • Identify posts with more than 3 links and block/flag them for review.

Following are the steps we used to filter the text in the comments

  1. To filter the comment which are not in English we needed a third party library to detect the language. For that we downloaded the Pear Lanaguage Detection library
  2. Next we renamed the folder to libs and placed it in our custom module.
  3. Since we renamed the directory, we needed to change the path in the file LanguageDetect.php accordingly as shown below.
    return __DIR__ . '/../data/' . $fname;  
    return drupal_get_path('module', 'module_name') . '/libs/languagedetect/Text' . '/../data/' . $fname;
  4. Next we wrote a form_alter function to call the new submit handler for the comment form in our custom module. We added the code to filter comment text in our submit handler function as shown below
    require_once('libs/languagedetect/Text/LanguageDetect.php');  
    function module_name_form_alter(&$form, $form_state, $form_id) {
       $form['#submit'][] = 'pi_comments_comment_form_submit';
    }
    
    function module_name_comment_form_submit($form, &$form_state) {
      $flag = 0;
      if ($form['_author']['#post']['form_id'] == 'comment_form') {
        // check whether comment includes more than 3 links
        preg_match_all('/\b(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[-A-Z0-9+&@#\/%=~_|$?!:,.]*[A-Z0-9+&@#\/%=~_|$]/i'  , $form['_author']['#post']['comment'], $result, PREG_PATTERN_ORDER);
        $cid = db_result(db_query('SELECT cid FROM comments ORDER BY cid DESC LIMIT 1'));
          if (count($result[0]) > 3) {
    	      $flag = 1;
          }
          try { 
            // check whether comment is in english
            $l = new Text_LanguageDetect();  
           	$l->setNameMode(2); //return 2-letter language codes only  
            $result = $l->detect($form['_author']['#post']['comment'], 4);  
                    		
            reset($result);
            $lan = key($result);
                
    	      if ($lan != 'en') {
              $flag = 1;
    	      }
          }  
          catch (Text_LanguageDetect_Exception $e) {     
          
          }
          // If the comments are not in english or have more than three links , then it will send for comment moderation.
          if($flag == 1) {
            $query = 'UPDATE {comments} SET status = 1 WHERE cid = %d';
            db_query($query, $cid);
          } 
             
      }
    }

References:

  1. Download Pear Lanaguage Detection library
  2. How to Detect Language for a String in PHP
  3. http://stackoverflow.com/questions/2720805/php-regular-expression-to-ge…