Natural Language Processing with JavaScript using Compromise
Basic Natural Language Processing (NLP) with JavaScript/NodeJS with Compromise Library dedicated to NLP, which covers many functionalities.
Introduction
Natural Language Processing(NLP) is a mostly used and discussed concept worldwide. But in the field of programming languages like Python, R and are Java widely used for this concept because of their large library support and community. Today we will show how to use the concept of NLP using JavaScript.
Installation
If you are setting up Node Project use npm.
npm install compromise
Add the following file in package.json if you face a module/import error.
"type": "module"
On the client side with CDN.
<script src="https://unpkg.com/compromise"></script>
Setup
Now let's import the library into the project.
import nlp from "compromise";
Initialise the Instance of nlp object with the demo string.
const text = "I was there, suddenly raining starts. I am still sick!";
const doc = nlp(text);
This returns the document object of an input string.
Examples
Tenses
Convert document sentences to different tenses e.g past, present, and future tense.
console.log(`Past Tense: ${doc.sentences().toPastTense().text()}`);
// Past Tense: I was there, suddenly raining starts. I was still sick!
console.log(`Present Tense: ${doc.sentences().toPresentTense().text()}`);
// Present Tense: I am there, suddenly raining starts. I am still sick!
console.log(`Future Tense: ${doc.sentences().toFutureTense().text()}`);
// Future Tense: I will be there, suddenly raining starts. I will be still sick!
Negative Statement
Convert regular or positive statements to negative statements.
console.log(doc.sentences().toNegative().text())
// I was not there, suddenly raining starts. I am not still sick!
Sentence Metadata
Let's look into the detail or meta data of the sentence. It returns array of objects for every sentence, which includes, text string, sentence detail(subject, verb, predicate, noun etc), and terms array.
console.log(doc.sentences().json())
// [
// ({
// text: "I was there, suddenly raining starts.",
// terms: [[Object], [Object], [Object], [Object], [Object], [Object]],
// sentence: { subject: "i", verb: "was", predicate: "there starts" },
// },
// {
// text: "I am still sick!",
// terms: [[Object], [Object], [Object], [Object]],
// sentence: { subject: "i", verb: "am still", predicate: "sick" },
// })
// ]
Metadata Details
As we see above, there is an array of objects named terms for every sentence, lets look inside that. It returns detail of every word and its attributes like text, pre/post symbols, tags, index, id, chunks, and dirty flag.
console.log(doc.sentences().json()[1].terms);
// [
// {
// text: 'I',
// pre: '',
// post: ' ',
// tags: [ 'Noun', 'Pronoun' ],
// normal: 'i',
// index: [ 1, 0 ],
// id: 'i|00700100W',
// chunk: 'Noun',
// dirty: true
// },
// {
// text: 'am',
// pre: '',
// post: ' ',
// tags: [ 'Verb', 'Copula', 'PresentTense' ],
// normal: 'am',
// index: [ 1, 1 ],
// id: 'am|00800101O',
// chunk: 'Verb',
// dirty: true
// },
// {
// text: 'still',
// pre: '',
// post: ' ',
// tags: [ 'Adverb' ],
// normal: 'still',
// index: [ 1, 2 ],
// id: 'still|00900102C',
// dirty: true,
// chunk: 'Verb'
// },
// {
// text: 'sick',
// pre: '',
// post: '!',
// tags: [ 'Adjective' ],
// normal: 'sick',
// index: [ 1, 3 ],
// id: 'sick|00A00103B',
// dirty: true,
// chunk: 'Adjective'
// }
// ]
Adjectives
Finding adjectives from the text.
console.log(doc.adjectives().text());
// sick
Adverbs
Looking for Adverbs that are describing the Adjectives.
console.log(doc.adjectives().adverbs().text());
// still
Adjectives Metadata
Same as a sentence, adjectives also have metadata that can retrieve as below, both have different formats of results.
console.log(doc.adjectives().json());
// [
// {
// text: 'sick!',
// terms: [ [Object] ],
// normal: 'sick!',
// adjective: {
// adverb: 'sickly',
// noun: 'sickness',
// superlative: 'sickest',
// comparative: 'sicker'
// }
// }
// ]
Pre Post Sentences
Here we will add a specific symbol in starting(/) and ending() of each sentence.
console.log(doc.pre("/").text())
console.log(doc.post("\\ ").text())
// /I was there, suddenly raining starts. /I am still sick!
// /I was there, suddenly raining starts\ /I am still sick\
Whitespace
Add hyphens to white spaces in sentences with the inbuilt hyphenate method.
console.log(doc.hyphenate().text())
// I-was-there-suddenly-raining-starts. I-am-still-sick!
Number Game
const str = "Price of an Apple is $1.5, per KG may around 6 to 7.5 USD";
const numDoc = nlp(str);
Parsing numbers in a given string and getting details with prefixes and suffixes.
console.log(numDoc.numbers().parse());
// [
// { prefix: "$", num: 1.5, suffix: "", hasComma: false, unit: "" },
// { prefix: "", num: 6, suffix: "", hasComma: false, unit: "" },
// { prefix: "", num: 7.5, suffix: "", hasComma: false, unit: "usd" },
// ];
Increment/Decrement numbers in a sentence.
console.log(numDoc.numbers().increment().text());
// $2.5, 7 8.5
Convert numbers or digits to text format.
console.log(numDoc.numbers().toText().text());
// one point five dollars, six seven point five
Conclusion
Thanks for reading. In this article, we get a basic idea of how to process textual data for NLP with JavsSCript using compromise library. These were very basic uses of a library you can review its documentation further. If you enjoyed the article give it a thumb, subscribe and stay tuned for more.