Natural Language Processing with JavaScript using Compromise

Natural Language Processing with JavaScript using Compromise

Basic Natural Language Processing (NLP) with JavaScript/NodeJS with Compromise Library dedicated to NLP, which covers many functionalities.

ยท

4 min read

Featured on Hashnode

Introduction

Natural Language Processing(NLP) is a mostly used and discussed concept worldwide. But in the field of programming languages like Python, R and are Java widely used for this concept because of their large library support and community. Today we will show how to use the concept of NLP using JavaScript.

Installation

If you are setting up Node Project use npm.

npm install compromise

Add the following file in package.json if you face a module/import error.

"type": "module"

On the client side with CDN.

<script src="https://unpkg.com/compromise"></script>

Setup

Now let's import the library into the project.

import nlp from "compromise";

Initialise the Instance of nlp object with the demo string.

const text = "I was there, suddenly raining starts. I am still sick!";
const doc = nlp(text);

This returns the document object of an input string.

Examples

Tenses

Convert document sentences to different tenses e.g past, present, and future tense.

console.log(`Past Tense: ${doc.sentences().toPastTense().text()}`);
// Past Tense: I was there, suddenly raining starts. I was still sick!

console.log(`Present Tense: ${doc.sentences().toPresentTense().text()}`);
// Present Tense: I am there, suddenly raining starts. I am still sick!

console.log(`Future Tense: ${doc.sentences().toFutureTense().text()}`);
// Future Tense: I will be there, suddenly raining starts. I will be still sick!

Negative Statement

Convert regular or positive statements to negative statements.

console.log(doc.sentences().toNegative().text())
// I was not there, suddenly raining starts. I am not still sick!

Sentence Metadata

Let's look into the detail or meta data of the sentence. It returns array of objects for every sentence, which includes, text string, sentence detail(subject, verb, predicate, noun etc), and terms array.

console.log(doc.sentences().json())
// [
//   ({
//     text: "I was there, suddenly raining starts.",
//     terms: [[Object], [Object], [Object], [Object], [Object], [Object]],
//     sentence: { subject: "i", verb: "was", predicate: "there starts" },
//   },
//   {
//     text: "I am still sick!",
//     terms: [[Object], [Object], [Object], [Object]],
//     sentence: { subject: "i", verb: "am still", predicate: "sick" },
//   })
// ]

Metadata Details

As we see above, there is an array of objects named terms for every sentence, lets look inside that. It returns detail of every word and its attributes like text, pre/post symbols, tags, index, id, chunks, and dirty flag.

console.log(doc.sentences().json()[1].terms);
// [
//     {
//       text: 'I',
//       pre: '',
//       post: ' ',
//       tags: [ 'Noun', 'Pronoun' ],
//       normal: 'i',
//       index: [ 1, 0 ],
//       id: 'i|00700100W',
//       chunk: 'Noun',
//       dirty: true
//     },
//     {
//       text: 'am',
//       pre: '',
//       post: ' ',
//       tags: [ 'Verb', 'Copula', 'PresentTense' ],
//       normal: 'am',
//       index: [ 1, 1 ],
//       id: 'am|00800101O',
//       chunk: 'Verb',
//       dirty: true
//     },
//     {
//       text: 'still',
//       pre: '',
//       post: ' ',
//       tags: [ 'Adverb' ],
//       normal: 'still',
//       index: [ 1, 2 ],
//       id: 'still|00900102C',
//       dirty: true,
//       chunk: 'Verb'
//     },
//     {
//       text: 'sick',
//       pre: '',
//       post: '!',
//       tags: [ 'Adjective' ],
//       normal: 'sick',
//       index: [ 1, 3 ],
//       id: 'sick|00A00103B',
//       dirty: true,
//       chunk: 'Adjective'
//     }
// ]

Adjectives

Finding adjectives from the text.

console.log(doc.adjectives().text());
// sick

Adverbs

Looking for Adverbs that are describing the Adjectives.

console.log(doc.adjectives().adverbs().text());
// still

Adjectives Metadata

Same as a sentence, adjectives also have metadata that can retrieve as below, both have different formats of results.

console.log(doc.adjectives().json());
// [
//     {
//       text: 'sick!',
//       terms: [ [Object] ],
//       normal: 'sick!',
//       adjective: {
//         adverb: 'sickly',
//         noun: 'sickness',
//         superlative: 'sickest',
//         comparative: 'sicker'
//       }
//     }
// ]

Pre Post Sentences

Here we will add a specific symbol in starting(/) and ending() of each sentence.

console.log(doc.pre("/").text())
console.log(doc.post("\\ ").text())

// /I was there, suddenly raining starts. /I am still sick!
// /I was there, suddenly raining starts\ /I am still sick\

Whitespace

Add hyphens to white spaces in sentences with the inbuilt hyphenate method.

console.log(doc.hyphenate().text())
// I-was-there-suddenly-raining-starts. I-am-still-sick!

Number Game

const str = "Price of an Apple is $1.5, per KG may around 6 to 7.5 USD";
const numDoc = nlp(str);

Parsing numbers in a given string and getting details with prefixes and suffixes.

console.log(numDoc.numbers().parse());
// [
//   { prefix: "$", num: 1.5, suffix: "", hasComma: false, unit: "" },
//   { prefix: "", num: 6, suffix: "", hasComma: false, unit: "" },
//   { prefix: "", num: 7.5, suffix: "", hasComma: false, unit: "usd" },
// ];

Increment/Decrement numbers in a sentence.

console.log(numDoc.numbers().increment().text());
// $2.5, 7 8.5

Convert numbers or digits to text format.

console.log(numDoc.numbers().toText().text());
// one point five dollars, six seven point five

Conclusion

Thanks for reading. In this article, we get a basic idea of how to process textual data for NLP with JavsSCript using compromise library. These were very basic uses of a library you can review its documentation further. If you enjoyed the article give it a thumb, subscribe and stay tuned for more.

Did you find this article valuable?

Support TheSourcePedia's Blog by becoming a sponsor. Any amount is appreciated!

ย