Recommended:

  • phpclasses.org
  • jsclasses.org
  • jsmag.com
  • siteapps.com
  • View our reviews on Hot Scripts
  • JS Tutorial
  • scripts.com
  • securesignup.com




Recent Comments

Powered by Disqus




Back to articles

Find and extract proper nouns from text

Proper nouns class can find and extract proper nouns from given text using heuristics based on syntactic clues like first letter uppercased, word position in sentence, etc.

It can try to combine proper nouns using conjunctions to find multiple word proper nouns

This class provides customizations so it can be applied to other languages, which grammar uses same heuristics.

Contents

Download

Example codes

<?php
//sample text
$text = "My dear Mr. Bennet, said his lady to him one day,
have you heard that Netherfield Park in London is let at last? 
Mr. Bennet replied that he had not.
But it is, returned she for Mrs. Long has just been here, 
and she told me and Jane all about it.
Mr. Bennet made no answer. His wife cried impatiently. 
Even the kind Dr. Smith knew better.  Mr. Bennet was so odd a mixture of quick
parts, sarcastic humour, reserve, and caprice, 
that the experience of three-and-twenty years living in England had been insufficient to
make his wife understand his character. 
Her mind, like her sister Lizzy's, was less difficult to develop.";

include("./proper_nouns.php");

//create instance
$pn = new proper_nouns();

//get array with proper nouns
$arr = $pn->get($text);

echo "<pre>";
//output text
echo $text."n";
//print result
print_r($arr);
echo "</pre>";
?>

Examples in action

Example scripts provided with package in action:

Method list

Constructor

Back to method list

Method namenew proper_nouns()
DescriptionCreate instance of class

Get proper nouns

Back to method list

Method nameget($text)
DescriptionExtract proper nouns from provided text. Returns array of proper nouns found in text
Input parameters

string $text - text from which to extract proper nouns

Example input
get('My name is Arturs Sosins');
Example output
//depends on configuration
Array
{
	0 => Arturs Sosins
}

Set conjuctions

Back to method list

Method nameset_conjunctions($arr, $type = "start")
DescriptionProvide words that can be used to connect proper nouns, like 'Mr' in 'Mr John Smith' or 'of' in 'Kingdom of Great Britain'
Default values used in class

"start" => array("the", "mr", "mrs", "ms", "dr", "mstr", "miss", "sir")

"middle" => array("of", "the", "and")

"dot" => array("mr", "mrs", "ms", "dr")

Input parameters

array $arr - array with conjunction words

string $type - type of conjunction, right now there are 3 of them:

  • start - words that can appear at the beggining of proper noun like Mr.
  • middle - words that needs to be enclosed by proper nouns, appear between proper nouns, for example - word "of".
  • dot - words that might have dot after them and it won't mean it's an end of a sentence, for example - word "Mr.".

Example input
set_conjunctions(array('mr', 'ms', 'mrs', 'dr'), 'start');

Set symbol filter

Back to method list

Method nameset_symbols($arr)
DescriptionSet array of symbols to filter out of text, so only words are left
Default values used in class

'/','',''','"',"'",',','.', '<','>','?',';',':','[',']','{','}', '|','=','+','-','_',')','(','*','&', '^','%','$','#','@','!','~','`','.', '0','1','2','3','4','5','6','7','8','9'

Input parameters

array $arr - array with symbols that needs to be filtered

Set symbols that needs to be ignored

Back to method list

Method nameset_ignore($arr)
DescriptionSet array of symbols, that might appear between end of one sentence and beggining of another
Default values used in class

" ", "n", "t", "r", "rn"

Input parameters

array $arr - array with symbols that needs to be ignored

Set punctuations

Back to method list

Method nameset_punctuation($arr)
DescriptionSet array of symbols, that might appear in the end or beggining of a sentence
Default values used in class

".", "?", "!", "'", '"'

Input parameters

array $arr - array with symbols that may mark end of beggining of sentence

Ignore words from text

Back to method list

Method namestop_words($arr)
DescriptionSet array of words, that will not be included in result
Default values used in class

none

Input parameters

array $arr - array with words that should not be included in result

Include acronyms

Back to method list

Method nameacronyms($bool)
DescriptionInclude acronyms in found proper nouns array
Default value

true - acronyms are included by default

Input parameters

bool $bool - should acronyms be included

Include possible proper nouns

Back to method list

Method namepossible($bool)
DescriptionInclude words that could possibly be proper nouns, words thet can not be determined for certain, for example if word only appears in the beggining of the sentence
Default value

false - possible words are not included in result by default

Input parameters

bool $bool - should possible proper nouns be included

Generate multiple word proper nouns

Back to method list

Method namemulti_words($bool)
DescriptionGenerate multiple word proper nouns using provided conjunctions. Any two proper nouns that are near each other or are in distance of conjunction word will be combined
Default value

true - words are combined to multiple word proper nouns by default

Input parameters

bool $bool - should words be combined to multiple word proper nouns

Strict search

Back to method list

Method namestrict($bool)
DescriptionMore strict search for proper nouns. For example only words with first uppercase letter in whole text will appear in results
Default value

false - strict mode is not used by default

Input parameters

bool $bool - should strict mode be used

Latest changes

None for now

Rate us

Like our script? Rate it at PHP > Hot Scripts

Scripts.com
RATE ME!

Try it out and Rate on PHPclasses.org

Support

PHP classes support forum or comments below

Awards

Proper nouns class was nominated to June Innovation Award and got 7th place, thank you for support.


You may also be interested in:

Powered by BlogAlike.com

blog comments powered by Disqus