Share my new passion for a vibrant new ecosystem Harry Potter Fan Fiction

Harry Potter and The Prisoner of Azkaban


Preliminary Results

The following is an example of an event chain extracted from a story from the fan fiction:

  1. (’CLUSTER 0’, ’nsubj’), (’walked’, ’root’)

  2. (’CLUSTER 0’, ’nsubj’), (’looked’, ’root’)

  3. (’CLUSTER 0’, ’nsubj’), (’changed’, ’root’)

  4. (’CLUSTER 0’, ’nsubj’), (’asked’, ’root’)

In the example above, the character in Harry Potter referred to as ’CLUSTER 0’ first walks, then looks, changes, and finally asks. The first element in every tuple is the word in the document with its replaced coreference-resolution tag and the second element is its first order dependency parse that was used to extract the event chain of every noun subject of a story.


Pointwise Mutual Information (PMI) Analysis

The most probable bigrams and the bigrams with the highest PMI were ran over all words, verbs, and events of the two corporas: Harry Potter Fanfiction and Harry Potter Canons. PMI analysis reveals the unique co-occuring concepts in each of the corporas and highlights other stylistic writing differences on average from the authors. Most probable bigrams are shown along side the highest PMI bigrams in each of the tables to show that the magnitude of the frequency count of each bigram over total bigrams does not entail importance or significance in a document and underscoring the value of PMI analysis.

PMI Analysis on Words.[long]
Harry Potter FanfictionHarry Potter CanonHarry Potter FanfictionHarry Potter Canon
of theof thebibbidi bobbidiavada kedavra
in thein themerus lumensfelix felicis
to thesaid harryzauberei dorfbertha jorkins
it washe washoity toitywhomping willow
on theat thepalatino linotypeexpecto patronum
at theto themagikos akademiadressing gown
he wason thealarte ascendaregrubbly plank
to beit wasinkosi inkosikazizacharias smith
she washe hadnamby pambylaw enforcement
i wasout ofmodus operandiauntie muriel
and isaid roncuevas gontanbathilda bagshot
out ofinto thefüvessy uramgoal posts
going toto betoothflossing stringmintspansy parkinson
with ain ababbitty rabbittydeathly hallows
he hadfrom thehiggledy piggledyphineas nigellus
as hesaid hermionedawh dawhking’s cross
in ahad beenshoop shoopdiagon alley
for thehe saidloundon’s townepumpkin juice
was awas afarlin flookeyst mungos
as sheof hishelter skeltergodrics hollow

Two random variable X, Y are independent iff their joint distribution is equal to the product of their marginal probabilities:
p(X, Y) = p(X)p(Y)
That is for all outcomes x, y:
p(X, Y) = p(X)p(Y)
So the mutual information “I” can be described as:
$$I(X; Y ) = \sum_{XY}^{} p(X = x, Y = y) log p(X = x, Y = y) p(X = x)p(Y = y)$$
We find pairs of words wi, wj that have high pointwise mutual information, because this signifies the frequency count of their co-occurrence is much greater than how often each word appeared independently in the corpora, as seen in equation 4.
$$PMI(wi, wj ) = log\tfrac{p(wi, wj )}{p(wi)p(wj )}$$

In Table 1, you can see that the most probable bigrams aren’t telling, but both the Canon and the Fanfiction feature common ’spells’ from the text as their highest rated PMI bigrams. From the canon you see ’avada kedavra’ and ’expecto patronum’ which were common spells from the books and films, and from the Fanfiction we see ’merus lumens’ and ’alarte ascendare.’

PMI Analysis on Verbs.[long]
Harry Potter FanfictionHarry Potter CanonHarry Potter FanfictionHarry Potter Canon
said saidsaid saidoyt cafruslammed shut
let gosaid lookingErised oytstarting feel
know saiddo saidborn dieslooking saw
said lookingknow saiddoo dootrying sound
was wassaid knowdestroyed destroyedcame striding
go saidgot saidcha chacame hurrying
do saidI’ve gotde gnomelooking puzzled
have dowas saiddrip driplooking relieved
get saidis saidtick ticklooking bewildered
said knowsaid lookedbeep beepturned face
have saidsaid waswon wontrying catch
said lookedgo saidcaptivate resonatingmanaged find
knew washave saidclick clicksupposed be
asked saidsaid thinklive bornstopped talking
know issaid gotliked hatedwanted talk
know knowsee saidvested pronouncemade feel
know doget saidraptured endsstood waiting
have goasked saidceases amazeexpecting see
see said’s saidbid adieulet go
want knowthink saidloved hatedneed talk

The difference between unique co-occuring verbs between the fanfiction and the canon is telling. One of the conclusions we can draw from the results of the PMI analysis of co-occuring verbs is writing style difference. There are 4 main writing styles: Expository, Descriptive, Persuasive, and Narrative. The co-occuring verbs that were identified as the highest-valued unique concepts in the fanfiction (3rd column from the left) reveal a certain writing style that is most in common with ’Descriptive’ and the Canon (last column in Table 2) is most in common with ’Narrative.’ In a descriptive writing style, the authors on average use language that lead by the five senses (what they hear, see, smell, taste, or touch), which is evidence by the “drip, drip” and “beep, beep” and “tick, tick” ranking amongst the highest PMI bigrams for this corpora. By contrast, narrative style of writing is concerned with character development, constructing a story, conflict and setting, which is evidenced by the pleasant flow of co-occurring verbs in the canon.

PMI Analysis on Events.[long]
Harry Potter FanfictionHarry Potter CanonHarry Potter FanfictionHarry Potter Canon
(was, gone)(wanted, watch)(acknowledged, manifested)(woke, remember)
(love, hear)(watch, was)(orgasimed, griping)(sleep, dozed)
(drifted, sleep)(was, rose)(clamp, her mouth)(fuming, hear)
(turned, walked)(rose, pressed)(streaking, ripple)(check, chose)
(love, know)(pressed, blinked)(sed, shuld)(clambered, hurried)
(want, know)(rolled, fell)(hace, xplained)(hurried, tell)
(walked, left)(fell, woke)(contunue, document)(living, giving)
(knew, was)(woke, remember)(clanged, aperating)(squeezed, could)
(hope, enjoyed)(was, heaving)(reinforced, quenched)(crumpled, burned)
(was, be)(slumped, fell)(differ, a bushy - haired girl)(avoid, hitting)
(turned, left)(looking, felt)(rejuvenated, cleansed)(hitting, groped)
(knew, be)(felt, was)(delves, submerges)(groped, smashed)
(would, like)(felt, fainted)(punishing, pine)(fighting, keep)
(be, was)(lay, looking)(oohed, ahhed)(know, meeting)
(like, know)(looking, sleep)(memorised, visualised)(meeting, holding)
(know, was)(sleep, dozed)(dk, the little blue box)(agrees, sent)
(try, update)(been, realized)(proceeds, doodle)(throws, getting)
(left, left)(wrenched, leaving)(infatuated, unbelieving)(getting, rid)
(hope, liked)(leaving, standing)(releases, vanishes)(go, turned)
(had, was)(standing, staring)(bowing, restoring)(cleared, read)

In Table 3, an event was considered co-occuring if it occurred alongside another event in the same story and in the same chapter. Events were extracted first using dependency parsing and semantic role labeling in conjunction with coreference resolution. Each event consisted of a tuple with one verb and one argument (whether subject or object) and counted over the whole corpora. The events here are not character dependent, this is an analysis over all the events found in the corpora in general. Future work will include PMI analysis over co-occuring events within each character to delineate not only events that correspond to each character, but which events consititute a unique concept to that character. These general results will also be applied in the future work as positive examples for predicitive modeling to predict an event given a previous event and/or it’s temperal order.


My research interests include Cross-Lingual Information Retrieval, Multilingual Information Retrieval, Event Extraction, Narrative Event Schemas, Personality Profiling, and Persuasive Content-Messaging.