Find and extract proper nouns from text
Proper nouns class can find and extract proper nouns from given text using heuristics based on syntactic clues like first letter uppercased, word position in sentence, etc.
It can try to combine proper nouns using conjunctions to find multiple word proper nouns
This class provides customizations so it can be applied to other languages, which grammar uses same heuristics.
Contents
Download
Example codes
<?php //sample text $text = "My dear Mr. Bennet, said his lady to him one day, have you heard that Netherfield Park in London is let at last? Mr. Bennet replied that he had not. But it is, returned she for Mrs. Long has just been here, and she told me and Jane all about it. Mr. Bennet made no answer. His wife cried impatiently. Even the kind Dr. Smith knew better. Mr. Bennet was so odd a mixture of quick parts, sarcastic humour, reserve, and caprice, that the experience of three-and-twenty years living in England had been insufficient to make his wife understand his character. Her mind, like her sister Lizzy's, was less difficult to develop."; include("./proper_nouns.php"); //create instance $pn = new proper_nouns(); //get array with proper nouns $arr = $pn->get($text); echo "<pre>"; //output text echo $text."n"; //print result print_r($arr); echo "</pre>"; ?>
Examples in action
Example scripts provided with package in action:
Method list
- Constructor
- Get proper nouns
- Set conjuctions
- Set symbol filter
- Set symbols that needs to be ignored
- Set punctuations
- Ignore words from text
- Include acronyms
- Include possible proper nouns
- Generate multiple word proper nouns
- Strict search
Constructor
| Method name | new proper_nouns() |
| Description | Create instance of class |
Get proper nouns
| Method name | get($text) |
| Description | Extract proper nouns from provided text. Returns array of proper nouns found in text |
| Input parameters | string $text - text from which to extract proper nouns |
| Example input |
get('My name is Arturs Sosins');
|
| Example output |
//depends on configuration
Array
{
0 => Arturs Sosins
}
|
Set conjuctions
| Method name | set_conjunctions($arr, $type = "start") |
| Description | Provide words that can be used to connect proper nouns, like 'Mr' in 'Mr John Smith' or 'of' in 'Kingdom of Great Britain' |
| Default values used in class |
"start" => array("the", "mr", "mrs", "ms", "dr", "mstr", "miss", "sir") "middle" => array("of", "the", "and") "dot" => array("mr", "mrs", "ms", "dr") |
| Input parameters | array $arr - array with conjunction words string $type - type of conjunction, right now there are 3 of them:
|
| Example input |
set_conjunctions(array('mr', 'ms', 'mrs', 'dr'), 'start');
|
Set symbol filter
| Method name | set_symbols($arr) |
| Description | Set array of symbols to filter out of text, so only words are left |
| Default values used in class |
'/','',''','"',"'",',','.', '<','>','?',';',':','[',']','{','}', '|','=','+','-','_',')','(','*','&', '^','%','$','#','@','!','~','`','.', '0','1','2','3','4','5','6','7','8','9' |
| Input parameters | array $arr - array with symbols that needs to be filtered |
Set symbols that needs to be ignored
| Method name | set_ignore($arr) |
| Description | Set array of symbols, that might appear between end of one sentence and beggining of another |
| Default values used in class | " ", "n", "t", "r", "rn" |
| Input parameters | array $arr - array with symbols that needs to be ignored |
Set punctuations
| Method name | set_punctuation($arr) |
| Description | Set array of symbols, that might appear in the end or beggining of a sentence |
| Default values used in class | ".", "?", "!", "'", '"' |
| Input parameters | array $arr - array with symbols that may mark end of beggining of sentence |
Ignore words from text
| Method name | stop_words($arr) |
| Description | Set array of words, that will not be included in result |
| Default values used in class | none |
| Input parameters | array $arr - array with words that should not be included in result |
Include acronyms
| Method name | acronyms($bool) |
| Description | Include acronyms in found proper nouns array |
| Default value | true - acronyms are included by default |
| Input parameters | bool $bool - should acronyms be included |
Include possible proper nouns
| Method name | possible($bool) |
| Description | Include words that could possibly be proper nouns, words thet can not be determined for certain, for example if word only appears in the beggining of the sentence |
| Default value | false - possible words are not included in result by default |
| Input parameters | bool $bool - should possible proper nouns be included |
Generate multiple word proper nouns
| Method name | multi_words($bool) |
| Description | Generate multiple word proper nouns using provided conjunctions. Any two proper nouns that are near each other or are in distance of conjunction word will be combined |
| Default value | true - words are combined to multiple word proper nouns by default |
| Input parameters | bool $bool - should words be combined to multiple word proper nouns |
Strict search
| Method name | strict($bool) |
| Description | More strict search for proper nouns. For example only words with first uppercase letter in whole text will appear in results |
| Default value | false - strict mode is not used by default |
| Input parameters | bool $bool - should strict mode be used |
Latest changes
None for now
Rate us
Try it out and Rate on PHPclasses.org
Support
PHP classes support forum or comments below
Awards
Proper nouns class was nominated to June Innovation Award and got 7th place, thank you for support.
You may also be interested in:
Powered by BlogAlike.com










