NLP Lecture 1¶

by Javantea
Oct 5, 2020

KNN Pipeline

I'm starting to work on Amazon Web Services (AWS)'s Natural Language Processing (NLP) course through Machine Learning University. This blog post wrote itself, not literally. My aims are far more practical. I want to be able to classify text for projects that build up a set of information that people would be able to use to create interesting solutions for their group. Would you like to input this blog to a python script and find out what I'm talking about? That may become a reality.

I'm writing this blog in a Jupyter notebook. I realize that you might resist the idea of installing a piece of software to run my examples, but I hope you will sandbox it and see what happens. Jupyter is surprisingly well-designed even if it takes time to learn how to use it properly. Since I realize you might not want to run Jupyter, I've also released my code as python scripts. Jupyter allows export to executable scripts and I highly recommend using this feature.

Lecture 1 of the NLP course is a mess. It impressed me to see what terrible code they could release as a course. But after providing a serious demotivation, it was able to reverse that. I am now motivated to show the shortcomings of this course and how easily they can be fixed.

MLA-NLP-Lecture1-KNN.ipynb is our first mess.

The stemmer converts sensible sentences into jibberish losing the original meaning of the sentence. How could this result in sentiment analysis? The answer is that it focuses on words that when stemmed coincide with sentiment. That is -- any nuance in your speech is lost and the classifier simply counts how many times you say "great" or "good" or "bad". If you use this system to classify nuanced speech, it will not work. At best you get 50% chance of correct classification of 2-class prediction. At worst, you classify text incorrectly. Let's take a quick look.

	reviewText
0	PURCHASED FOR YOUNGSTER WHO\nINHERITED MY "TOO...
1	unable to open or use
2	Waste of money!!! It wouldn't load to my system.
3	I attempted to install this OS on two differen...
4	I've spent 14 fruitless hours over the past tw...
5	I purchased the home and business because I wa...
6	The download doesn't take long at all. And it'...
7	This program is positively wonderful for word ...
8	Fantastic protection!! Great customer support!!
9	Obviously Win 7 now the last great operating s...

AltSci Cell

NLP Lecture 1

NLP Lecture 1¶

Comments: 0

Leave a reply »