-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
by Javantea
Sept 1, 2016
Yesterday I published a small piece of software to Small Wide World's git to very little fanfare. It was a generalization of a bad piece of software I wrote the day before. It uses NLTK to perform a simple task: parse a simple sentence which follows the form "subject verb object" with optional additional information starting with "because". Examples of this grammar include:
GnuPG is software
IRC is a protocol
software implements a protocol
Javantea is human
AI3 is software
Javantea wrote AI3
Javantea writes software
Javantea writes English
Javantea reads German
Javantea reads Japanese
Javantea reads Portuguese
Javantea reads Spanish
nlp1.py creates this graph of the relationships:
How does it parse? It uses NLTK to find parts of speech, splits the sentence by its verb, assumes that the first part of the sentence is the subject, the second part is the verb, and the third part is the object. First of all let's be honest that there are a bunch of bugs in NLTK's part of speech tagger out of the box. If you want to make something that does something simple or complex, you will run into this or you will not test it well enough to run into it. So in order to fix the numerous inaccuracies of the part of speech tagger I chose to hardcode a few fixes: if a sentence has three words, it assumes subject verb object. In this grammar, no other option is possible, so this is accurate. If there are more than three words and it can't find a verb, it looks at previous verbs and if one exists, it splits on that. This doesn't always work and fails badly when a sentence uses does not and a verb that isn't detected correctly, but it works for all of the text I have given it so far (about 66 lines).
So what practical use does this have? Let's say that there's a conflict between ten people. Let's say that conflict is really complex, for example someone has divorced someone else and married someone else. Then their friends became unruly and insulted someone random in the vicinity. So let's say that you don't actually know what is going on besides a few statements of fact that don't actually make much sense. Normally a person might completely avoid the conflict because they don't want to get involved unintentionally with something they don't understand. But graphing this might make it possible to navigate the landmine of social hypocricy, misunderstanding, and delusion that exists in any conflict. As Lady Grantham put it "however much the couple may strive to be honest, no one is ever in posession of the facts". Let's put a few rules down. Don't connect anyone who is disagreeing into the first graph. Don't put anyone who is agreeing into the second graph. The first graph is alliances and neutral parties, the second graph is active conflicts. Since we're not ready to take on Syria just yet, let's draw something a little simpler.
Comcast owns NBC
NBC owns SyFy
NBC owns MSNBC
Microsoft owned MSNBC
Microsoft divested MSNBC
Disney owns ABC
Disney owns ESPN
Disney owns A&E
Hearst owns A&E
Disney owns History Channel
Hearst owns History Channel
NBC owns USA
NBC owns Weather Channel
NBC owns Telemundo
Russian government owns RT
United Kingdom owns BBC
United Kingdom owns CBC
Rupert Murdoch owns Fox
CBS owns Showtime
CBS owns Viacom
CBS owns Westinghouse
CBS sold nuclear power plants to BNFL
United Kingdom owns [BNFL](https://en.wikipedia.org/wiki/British_Nuclear_Fuels_Ltd)
Viacom owns MTV
MTV owns Nickelodeon
MTV owns Comedy Central
MTV owns CMT
MTV owns VH1
MTV owns MTV2
Viacom owns BET
Rupert Murdoch owned News of the World
IBA founded Channel Four
United Kingdom operates IBA
Channel Four Television Corporation owns Channel Four
Scott Trust Limited owns The Guardian
Scott Trust Limited owns The Observer
Time Warner owns CNN
Time Warner owns TBS
Time Warner owns TNT
Time Warner owns HBO
Time Warner owns Cartoon Network
Time Warner owns Adult Swim
Time Warner merged AOL
Time Warner owns WB
Time Warner owns DC Comics
Time Warner owns New Line Cinema
Time Warner owns Time
Time owns Sports Illustrated
Time owns Travel + Leisure
Time owns Food & Wine
Time owns Fortune
Time owns People
Time owns InStyle
Time owns Life
Time owns Golf Magazine
Time owns Southern Living
Time owns Essence
Time owns Real Simple
Time owns Entertainment Weekly
Time owns Myspace
Hearst owns Popular Mechanics
Hearst owns Car and Driver
Hearst owns Cosmopolitan
Hearst owns Country Living
Hearst owns Dr. Oz
Hearst owns ELLE
Hearst owns Elle Decor
Hearst owns Esquire
Hearst owns Food Network Magazine
Hearst owns Good Housekeeping
Hearst owns Harper's Bazaar
Hearst owns House Beautiful
Hearst owns Marie Claire
Hearst owns Nat Mags
Hearst owns O
Hearst owns Red
Hearst owns Redbook
Hearst owns Road & Track
Hearst owns Seventeen
Hearst owns Town & Country
Hearst owns Veranda
Hearst owns Woman's Day
Hearst owns ESPN
Hearst owns Seattle Post-Intelligencer
Bonnier owns Popular Science
Only one of the sentences was too complex for our script to parse, "CBS sold nuclear power plants to BNFL". In order to fix this, I change it to two sentences: "CBS sold nuclear power plants" and "BNFL bought nuclear power plants". Clearly this is an oversimplification, but shows the limitation of my simple 152 line python script. If we wanted to handle complex grammars that involve subject object and a second object, we would need to incrementally add complex parsers. This isn't rocket science or brain surgery, but it is time consuming. I won't be publishing a generic English parser any sooner than I'll be publishing AI3. Funny that AI3 actually contains a considerable amount of English. From my work, my guess is that a generic parser would take a person a few months of pretty intense work. Regular people could help in this task by coming up with reasonable sentences that they would want parsed and what information they would want parsed from the sentence. This may seem easy, but if you want all the data from a sentence, you get no secondary information. Allow me to explain. I parsed "Hearst owns Popular Science" into owns("Hearst", "Popular Science")
, I can graph all ownership on a map. Hearst -- Popular Science
. Okay, let's parse it in a different way. ["Hearst", "owns", "Popular Science"]
This doesn't help the computer system nearly as much because it doesn't say whether owns is a noun, verb, preposition, adjective or adverb. The word "owns" can only be a verb luckily but what about the word "fights"? This is where nltk fails. Let's look at Wordnet's database entries for fights:
$ abs fights
Everything about 'fights' :
found battle, conflict, fight, engagement
battle.n.01
noun.act
a hostile meeting of opposing military forces in the course of a war
Examples:
Grant won a decisive victory in the battle of Chickamauga
he lost his romantic ideas about war when he got into a real engagement
Hyper: military_action, action
Hypo: Armageddon
Hypo: assault
Hypo: combat, armed_combat
Hypo: dogfight
Hypo: naval_battle
Hypo: pitched_battle
found fight, fighting, combat, scrap
fight.n.02
noun.act
the act of fighting; any contest or struggle
Examples:
a fight broke out at the hockey game
there was fighting in the streets
the unhappy couple got into a terrible scrap
Hyper: conflict, struggle, battle
Hypo: affray, disturbance, fray, ruffle
Hypo: battering, banging
Hypo: beating, whipping
Hypo: brawl, free-for-all
Hypo: brush, clash, encounter, skirmish
Hypo: close-quarter_fighting
Hypo: dogfight
Hypo: duel, affaire_d'honneur
Hypo: fencing
Hypo: fistfight, fisticuffs, slugfest
Hypo: gunfight, gunplay, shootout
Hypo: hassle, scuffle, tussle, dogfight, rough-and-tumble
Hypo: in-fighting
Hypo: knife_fight, snickersnee, cut-and-thrust
Hypo: rumble, gang_fight
Hypo: set-to
Hypo: shock, impact
Hypo: single_combat
found competitiveness, fight
competitiveness.n.01
noun.attribute
an aggressive willingness to compete
Examples:
the team was full of fight
Hyper: aggressiveness
found fight
fight.n.04
noun.communication
an intense verbal dispute
Examples:
a violent fight over the bill is expected in the Senate
Hyper: controversy, contention, contestation, disputation, disceptation, tilt, argument, arguing
found fight
fight.n.05
noun.act
a boxing or wrestling match
Examples:
the fight was on television last night
Hyper: boxing, pugilism, fisticuffs
found contend, fight, struggle
contend.v.06
verb.competition
be engaged in a fight; carry on a fight
Examples:
the tribesmen fought each other
Siblings are always fighting
Militant groups are contending for control of the country
Hypo: attack, assail
Hypo: bandy
Hypo: battle, combat
Hypo: bear_down
Hypo: box
Hypo: chicken-fight, chickenfight
Hypo: duel
Hypo: engage, wage
Hypo: fence
Hypo: feud
Hypo: fight, oppose, fight_back, fight_down, defend
Hypo: fight_back
Hypo: fistfight
Hypo: join_battle
Hypo: joust
Hypo: scuffle, tussle
Hypo: settle, get_back
Hypo: skirmish
Hypo: spar
Hypo: tourney
Hypo: tug
Hypo: war
Hypo: wrestle
found fight, oppose, fight_back, fight_down, defend
fight.v.02
verb.competition
fight against or resist strongly
Examples:
The senator said he would oppose the bill
Don't fight it!
Hyper: contend, fight, struggle
Hypo: recalcitrate
Hypo: repel, repulse, fight_off, rebuff, drive_back
Hypo: resist, hold_out, withstand, stand_firm
Hypo: resist, stand, fend
found fight, struggle
fight.v.03
verb.social
make a strenuous or labored effort
Examples:
She struggled for years to survive without welfare
He fought for breath
Hyper: try, seek, attempt, essay, assay
Hypo: flounder
Hypo: tug, labor, labour, push, drive
found crusade, fight, press, campaign, push, agitate
crusade.v.01
verb.social
exert oneself continuously, vigorously, or obtrusively to gain an end or engage in a crusade for a certain cause or person; be an advocate for
Examples:
The liberal party pushed for reforms
She is crusading for women's rights
The Dean is pushing for his favorite candidate
Hyper: advertise, advertize, promote, push
This is a bit too much information, but you can see that Wordnet has an incredibly dense set of relationships for each word as well as synonym information. Let's condense this to just parts of speech and how common it is.
Synset | Part of speech | Popularity |
---|---|---|
battle.n.01 | noun | 1 |
fight.n.02 | noun | 2 |
competitiveness.n.01 | noun | 1 |
fight.n.04 | noun | 4 |
fight.n.05 | noun | 5 |
contend.v.06 | verb | 6 |
fight.v.02 | verb | 2 |
fight.v.03 | verb | 3 |
crusade.v.01 | verb | 1 |
As you can see, fights as a verb can mean 4 different things, fights as a noun can mean 5 things. Thus when I say "USA fights ISIL in Syria", I'm being imprecise. Thus if you put that sentence even with clarification or qualification into your college essay, you should be marked down no matter how articulate the rest of the paper is. So if you're creating a catchy slogan for your social justice cause, don't use the word fight unless you're aiming for confusion. Let's give a good example: "Fight MS" is dually confusing. Both words are easily confused unless context is fully given. Using this we can understand historical significance of slogans, but are slogans really valuable? Who reading this remembers HOPE?
Unfortunately, language gives us a really poor method of precise communication and communicating with computers is even more difficult because they lack common sense. A little bit of effort on our part can create machine parsable logic that is capable of communicating our ideas clearly and articulately. Those statements can also be parsed by intelligent people effectively but for the most part, people will be able to understand drivel that we communicate to them if we are verbose enough. Here's a question for the reader, limiting yourself to 140 words can you express your feelings to this paragraph without reading the next paragraph? Please post your answer in the comment section before reading the next paragraph but don't worry about the 140 character limit, I won't be grading.
But is this my encouragement for the reader to spend a day reducing their most important thoughts into machine parsable logic statements? Spending an hour working on this problem would be beneficial if you wish to become a more intelligent person, so I recommend it to everyone. But there's absolutely no reason for us to spend copious amounts of time communicating and parsing vast quantities of human text until we have a purpose. Who here has a good purpose for a huge quantity of parsed natural language? I wrote a relationship graph because I want people to use my graph layout software. What do you want to do with parsed natural language?
I believe that is all. If you'd like to play with nlp1.py, you can clone my git repository for Small Wide World.
git clone https://www.altsci.com/repo/smallwideworld.git
Javantea out.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCgAGBQJXyKGnAAoJEDxoyNvLp4PvCKEP/2uQk8ltI7L1+Edf2W0+Z4xy kEDQi9H5cSGstte7nrN/JXnimUZbjpgvIWLan/koqXWRXSSIc/22hq28Pn0uq5hU 3G14T2zJgPSIf18snIkE2wgFul85mz7AmtgEhkrYP7AqZDUALF0DrfDD6dQg9LeH GcriodOFMuBxO9/k4g1iBBmwNBiu709qXFhzn0CrSTKxK6f7wD2F4bFVyusEgEf8 TrnPdvTA+w/NmEHtGFhLEdAwp+c+t+HhMftN3Pv/1Q9Fdr6kj1cInCifoFkvcEfn 4ITdasM/Nb7S4KNHLaGNEYxxk245Do1rJw4TEUEHSbzz0Gdvbmk6okKDk3qYKoHB jjvIM3m20+esQS16LwOF5ECw3dF9R4crAFqHN9U+2pl2avjV/cU1iFN0ouIu9gy2 VfYQKXSceQIuGdJM6znWz8d9oCWYvMLGd6dDRqY8VqG1EyGIWGQXffItk4vnlEXl xcLQSMWy7pm1M4+gV8mAKlsc81GRe8K68LaJCwMrDzlDnFem5mEfgBTyJZa29hbP 8Tc0lbjcODNE0JQFwZuyxVhZwp3m4yrbQrA+/y/S/0GwfEDXPcaYFXQN/iOVeJ+T Hej7eTfiiOhOWcGwrmsEFDXiRiyh5OW/KqW2QMSnjpV4WLUSpSTeOikm/jn2E3rC Kdb/MzsVszuK5MdVPyuK =ZjkD -----END PGP SIGNATURE-----Permalink
-
Leave a Reply
Comments: 0
Leave a reply »