TWDA word frequency analysis

Post Reply
Tobias
Seneschall
Seneschall
Posts: 221
Joined: 14 Oct 2003, 09:38
Location: Amersfoort Remember: Can't trust bald dutch people
VEKN Nr.:
Contact:

TWDA word frequency analysis

Post by Tobias »

1. tobiasopdenbr...@notsocoldmail.com
5 jan 11:57 opties weergeven
Nieuwsgroepen: rec.games.trading-cards.jyhad
Van: "tobiasopdenbr...@notsocoldmail.com" <tobiasopdenbr...@hotmail.com> - Berichten van deze auteur zoeken
Datum: 5 Jan 2006 02:57:27 -0800
Lokaal: do 5 jan 2006 11:57
Onderwerp: Some TWDA word frequency Analysis
Beantwoorden | Auteur beantwoorden | Doorsturen | Afdrukken | Afzonderlijk bericht | Origineel weergeven | Verwijderen | Misbruik melden

I ran the TWDA through a word frequency counter today (Hermetic Word
Frequency Counter 4.19t). For those interested, my results follow.
Words such as 'the' or 'x' (most occuring word), or words that I
couldn't clearly associate with a card/concept have been stripped.
Cards with names more than one word have the highest occurence of a
distinguishable word mentioned only. Note that the TWDA does not
request a specific deck layout, just legal decks, so counting is only
as fair as the input, as always. And, of course, there are many 'old'
datapoints in the TWDA as well - things that have been changed, things
that were introduced near the end, etc.?

There were 7500 unique words identified, with 115k words in the
document.

I can't cut and paste from the unregistered tool I have, so i present
my selection of the top 300. Yes, I may have missed words that were
important in some context. I could run it again in, say, a week, if you
all shout out terms for me to look for that I've missed (please check
the list carefully first). The software does have its limits given the
size of the TWDA, though.

Rank/Word:
6 aus
7 for
8 dom
9 obf
10 pre
15 blood
18 cel
19 pot
21 ani
30 doll
36 ventrue
37 malkavian
38 night (probably due to the many good 'night' cards)
41 toreador
43 pro
44 minion
47 wake
48 prince
51 telepathic
52 hunting
55 dem
57 tha
59 deflection
60 tradition
61 nosferatu
65 sudden
67 dream
68 praxis
71 gangrel
73 misdirection
78 giovanni
82 obt
83 anarch
85 awakening
87 brujah
89 govern
90 intervention
91 kindred
94 tremere
100 vic
101 dominate
103 kine
105 lasombra
108 cap (voter at 213)
111 ser
113 highway
114 pander
116 cloak
118 haven
119 powerbase
120 tzimisce
121 domain
123 nec
125 majesty
126 vitae
131 conditioning
133 crusade
123 swallowed
139 justicar
140 archon
141 giant's
142 boon
148 sight
149 faceless
150 form
155 rush
167 ravnos
169 freak
170 caitiff
171 hunter
177 bow
179 skin
182 labyrinth
185 bishop
186 senses
187 beast
188 enhanced
189 immortal
191 montreal
192 spying
193 chi
197 parity
199 impersonation
200 mind
202 rack
203 strength
210 follower
211 obfuscate
212 target
214 championship
215 forgotten
220 parthenon
222 eagle's
223 earth
226 inner
227 obedience
230 dramatic
231 fame
233 hungry
234 pool
236 shadow
237 banishment
240 delaying
245 winthrop
246 computer
247 counter
248 crow
249 enemy
253 ghoul
255 management
258 bonding
263 morgan (tasha at 294)
264 raven
266 disputed
268 oration
269 touch
270 ancient
279 elysium
281 pentex
284 david
287 qui
288 revolt
290 black
291 enchant
292 newspaper
293 seduction
295 troublemaker
296 ben
298 destruction
299 fifth
300 legal

Some observations (barring mistakes):

- Dominate isn't the #1 discipline, "aus" and "for" beat "dom" ("for"
may of course be used in sentences as well). However, 'dominate' as a
word does appear in the top 300 - 'auspex' (306) and 'fortitude' (412)
don't. (People talking about dominate in descriptions? Dominate Kine?
Dominate master:disciplines?)

- old-school powerhouse clans rank highest, as expected: ventrue,
malkavian, toreador. (ranking higher than many disciplines). Giovanni
is the first clan to break the trend.

- blood doll is the most common card. Minion Tap, Wake, "Telepathic",
deflection and the group of 'night' cards score well, sudden, dreams,
direct intervention.

- hunting grounds and prince-ness occur frequently.

- Gilbert is the first-occuring recognizable vampire name (344),
followed by ingrid (345) and Isabel (360), anson 387 and Arika 388.

It would also be interested to see which words appear in proximity of
each other - but I don't have a tool for that. Also note that cards
such as Aid from Bats, which would be used heavily when they're used,
don't have to rank highly - "bat" clocks in at 537.

Happy analysing
Tobias


Post Reply