15  传统机器学习NLP模型

15.1 词袋模型(Bag of Words)

自然语言处理(Natural Language Processing,NLP),可以大致分为两个阶段,Transformer 问世前的阶段,和 Transformer 问世后的阶段。

如果想了解 Transformer 问世前的主流的 NLP 技术,不妨读读 Gensim 的官网教程。Gensim 的教程,不仅介绍了 Gensim 的用法,而且系统地介绍了各种主题模型(Topic Models),尤其着重介绍了各种词袋(Bag Of Words,BOW)主题模型,包括 TF-IDF,Latent Semantic Indexing(LSI) 等等。

词袋模型假设我们不考虑文本中词与词之间的上下文关系,仅仅只考虑所有词的权重。而权重与词在文本中出现的频率有关。

  • 词袋模型首先会进行分词,
  • 在分词之后,通过统计每个词在文本中出现的次数,我们就可以得到该文本基于词的特征

如果将各个文本样本的这些词对应的词频放在一起,就是我们常说的向量化。

向量化完毕后一般也会使用TF-IDF进行特征的权重修正,再将特征进行标准化。 再进行一些其他的特征工程后,就可以将数据带入机器学习算法进行分类聚类了。

总结下词袋模型的三部曲:

  • 分词(tokenizing),
  • 统计修订词特征值(counting)
  • 标准化(normalizing)。

与词袋模型非常类似的一个模型是词集模型(Set of Words,简称SoW),和词袋模型唯一的不同是它仅仅考虑词是否在文本中出现,而不考虑词频。也就是一个词在文本中出现1次和多次特征处理是一样的。在大多数时候,我们使用词袋模型,后面的讨论也是以词袋模型为主。

当然,词袋模型有很大的局限性,因为它仅仅考虑了词频,没有考虑上下文的关系,因此会丢失一部分文本的语义。但是大多数时候,如果我们的目的是分类聚类,则词袋模型表现的很好。

图 15.1 Document Term Matrix
  • Stop words

  • Stemming vs Lemmatizing

  • Bi-gram

  • TF-IDF : \(tf _{ i , j } \times \log \left( \frac { N } { d f _ { i } } \right)\)

  • Corpus trimming

15.2 TF-IDF

在scikit-learn中,有两种方法进行TF-IDF的预处理 https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_extraction.text:

  • 第一种方法是在用CountVectorizer类向量化。
  • 第二种方法是直接用TfidfVectorizer完成向量化与TF-IDF预处理。

15.2.1 count vectors

from sklearn.feature_extraction.text import CountVectorizer
corpus = [
    'This is the first document.',
    'This document is the second document.',
    'And this is the third one.',
    'Is this the first document?',
]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
vectorizer.get_feature_names_out()

print(X.toarray())
print(vectorizer.get_feature_names_out())
[[0 1 1 1 0 0 1 0 1]
 [0 2 0 1 0 1 1 0 1]
 [1 0 0 1 1 0 1 1 1]
 [0 1 1 1 0 0 1 0 1]]
['and' 'document' 'first' 'is' 'one' 'second' 'the' 'third' 'this']
vectorizer = CountVectorizer(
    analyzer='word',
    ngram_range=(2, 2), # tuple (min_n, max_n)
    # stop_words = stopwords,
    max_features=1800,
    min_df = 0.05, # lower
    max_df = 0.95
    )

X2 = vectorizer.fit_transform(corpus)
print(X2.toarray())
print(vectorizer.get_feature_names_out())
[[0 0 1 1 0 0 1 0 0 0 0 1 0]
 [0 1 0 1 0 1 0 1 0 0 1 0 0]
 [1 0 0 1 0 0 0 0 1 1 0 1 0]
 [0 0 1 0 1 0 1 0 0 0 0 0 1]]
['and this' 'document is' 'first document' 'is the' 'is this'
 'second document' 'the first' 'the second' 'the third' 'third one'
 'this document' 'this is' 'this the']

15.2.2 IDF

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)
vectorizer.get_feature_names_out()

print(X.shape)
(4, 9)

15.2.3 pipe method

from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline

corpus=["I come to China to travel",
    "This is a car popular in China",
    "I love tea and Apple ",
    "The work is to write some papers in science"]

pipe = Pipeline([('count', CountVectorizer()),
                    ('tfid', TfidfTransformer())]).fit(corpus)

pipe['count'].transform(corpus).toarray()
array([[0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0],
       [0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1]],
      dtype=int64)
print(pipe.transform(corpus).toarray())
print(pipe.get_feature_names_out())
[[0.         0.         0.         0.34884223 0.44246214 0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.69768446 0.44246214 0.
  0.        ]
 [0.         0.         0.4533864  0.35745504 0.         0.35745504
  0.35745504 0.         0.         0.4533864  0.         0.
  0.         0.         0.4533864  0.         0.         0.
  0.        ]
 [0.5        0.5        0.         0.         0.         0.
  0.         0.5        0.         0.         0.         0.
  0.5        0.         0.         0.         0.         0.
  0.        ]
 [0.         0.         0.         0.         0.         0.28113163
  0.28113163 0.         0.35657982 0.         0.35657982 0.35657982
  0.         0.35657982 0.         0.28113163 0.         0.35657982
  0.35657982]]
['and' 'apple' 'car' 'china' 'come' 'in' 'is' 'love' 'papers' 'popular'
 'science' 'some' 'tea' 'the' 'this' 'to' 'travel' 'work' 'write']

15.3 topic model

15.3.1 Gensim 介绍

Gensim is a free open-source Python library for representing documents as semantic vectors, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim is designed to process raw, unstructured digital texts (“plain text”) using unsupervised machine learning algorithms.

Gensim 提供了三种词向量算法,包括 Word2Vec,Glove,和 FastText。用 Gensim 生成 FastText 十分方便,以至于有些人把 Gensim 当成专项工具,专门用于生成 FastText 词向量,甚至忽视了 Gensim 的其它功能。

  1. Gensim 官网教程:实际上就是手把手教你编程。
  2. Gensim Tutorial – A Complete Beginners Guide:把 Gensim 的主要功能,基本都涉足了。
  3. 使用 Gensim tf-idf 模型,求文本相似度:中文的教程,重点讲述如何用 TF-IDF 模型搜寻相似文本。

15.3.2 topic model 步骤

Topic Modelling is a technique to extract hidden topics from large volumes of text.

图 15.2 topic model 流程

15.3.3 Topic vs Doc vs Words

import matplotlib.pyplot as plt
import spacy
import pyLDAvis.gensim_models
pyLDAvis.enable_notebook()# Visualise inside a notebook
import gensim
from gensim.corpora.dictionary import Dictionary
from gensim.models import LdaModel,LdaMulticore, CoherenceModel, LsiModel, HdpModel

15.3.4 prepare data

import os
test_data_dir = '{}'.format(os.sep).join([gensim.__path__[0], 'test', 'test_data'])
print(test_data_dir)

lee_train_file = test_data_dir + os.sep + 'lee_background.cor'
print(lee_train_file)

text = open(lee_train_file).read()
C:\Users\xinlu\anaconda3\lib\site-packages\gensim\test\test_data
C:\Users\xinlu\anaconda3\lib\site-packages\gensim\test\test_data\lee_background.cor
nlp = spacy.load('en_core_web_lg')
my_stop_words = ['say', 'mr', 'Mr', 'said', 'says', 'saying', 'today', 'be']

for stopword in my_stop_words:
    lexeme = nlp.vocab[stopword]
    lexeme.is_stop = True

doc = nlp(text)
# We add some words to the stop word list
texts, article = [], []

for word in doc:

    if word.text != '\n' and not word.is_stop and not word.is_punct and not word.like_num and word.text != 'I':
        article.append(word.lemma_)

    if word.text == '\n':
        texts.append(article)
        article = []

Phrases(sentences=None, min_count=5, threshold=10.0, max_vocab_size=40000000, delimiter='_', progress_per=10000, scoring='default', connector_words=frozenset({}))

bigram = gensim.models.phrases.Phrases(texts, )
texts = [bigram[line] for line in texts]

print(texts[0])
['hundred', 'people', 'force', 'vacate', 'home', 'Southern', 'Highlands', 'New_South', 'Wales', 'strong', 'wind', 'push', 'huge', 'bushfire', 'town', 'Hill', 'new', 'blaze', 'near', 'Goulburn', 'south_west', 'Sydney', 'force', 'closure', 'Hume', 'Highway', '4:00pm', 'AEDT', 'marked', 'deterioration', 'weather', 'storm', 'cell', 'move', 'east', 'Blue_Mountains', 'force', 'authority', 'decision', 'evacuate', 'people', 'home', 'outlying', 'street', 'Hill', 'New_South', 'Wales', 'southern', 'highland', 'estimated', 'resident', 'leave', 'home', 'nearby', 'Mittagong', 'New_South', 'Wales', 'Rural_Fire', 'Service', 'weather_condition', 'cause', 'fire_burn', 'finger', 'formation', 'ease', 'fire', 'unit', 'Hill', 'optimistic', 'defend', 'property', 'blaze', 'burn', 'New', 'Year', 'Eve', 'New_South', 'Wales', 'fire', 'crew', 'call', 'new', 'fire', 'Gunning', 'south', 'Goulburn', 'detail', 'available', 'stage', 'fire', 'authority', 'close', 'Hume', 'Highway', 'direction', 'new', 'fire', 'Sydney', 'west', 'long', 'threaten', 'property', 'Cranebrook', 'area', 'rain', 'fall', 'part', 'Illawarra', 'Sydney', 'Hunter', 'Valley', 'north', 'coast', 'Bureau', 'Meteorology', 'Claire', 'Richards', 'rain', 'little', 'ease', 'fire_burn', 'state', 'fall', 'isolated', 'area', 'generally', 'fall', 'millimetre', 'place', 'significant', 'millimetre', 'relief', 'far', 'rain', 'concern', 'fact', 'probably', 'hamper', 'effort', 'firefighter', 'wind', 'gust', 'associate', 'thunderstorm']

15.3.5 Dictionary and Corpus in Gensim

The two main inputs to the LDA topic model are the dictionary and the corpus:

  • Dictionary: The idea of the dictionary is to give each token a unique ID.
  • Corpus: Having assigned a unique ID to each token, the corpus simply contains each ID and its frequency (if you wanna dive into it, then search for Bag of Word (BoW) which will introduce you to word embedding).
# I will apply the Dictionary Object from Gensim, which maps each word to their unique ID:
dictionary = Dictionary(texts)

You can print the dictionary which will tell you that 8848 unique ID’s were found. We can look at some of the IDs assigned to the tokens:

print(dictionary.token2id)
len(dictionary)
{'4:00pm': 0, 'AEDT': 1, 'Blue_Mountains': 2, 'Bureau': 3, 'Claire': 4, 'Cranebrook': 5, 'Eve': 6, 'Goulburn': 7, 'Gunning': 8, 'Highlands': 9, 'Highway': 10, 'Hill': 11, 'Hume': 12, 'Hunter': 13, 'Illawarra': 14, 'Meteorology': 15, 'Mittagong': 16, 'New': 17, 'New_South': 18, 'Richards': 19, 'Rural_Fire': 20, 'Service': 21, 'Southern': 22, 'Sydney': 23, 'Valley': 24, 'Wales': 25, 'Year': 26, 'area': 27, 'associate': 28, 'authority': 29, 'available': 30, 'blaze': 31, 'burn': 32, 'bushfire': 33, 'call': 34, 'cause': 35, 'cell': 36, 'close': 37, 'closure': 38, 'coast': 39, 'concern': 40, 'crew': 41, 'decision': 42, 'defend': 43, 'detail': 44, 'deterioration': 45, 'direction': 46, 'ease': 47, 'east': 48, 'effort': 49, 'estimated': 50, 'evacuate': 51, 'fact': 52, 'fall': 53, 'far': 54, 'finger': 55, 'fire': 56, 'fire_burn': 57, 'firefighter': 58, 'force': 59, 'formation': 60, 'generally': 61, 'gust': 62, 'hamper': 63, 'highland': 64, 'home': 65, 'huge': 66, 'hundred': 67, 'isolated': 68, 'leave': 69, 'little': 70, 'long': 71, 'marked': 72, 'millimetre': 73, 'move': 74, 'near': 75, 'nearby': 76, 'new': 77, 'north': 78, 'optimistic': 79, 'outlying': 80, 'part': 81, 'people': 82, 'place': 83, 'probably': 84, 'property': 85, 'push': 86, 'rain': 87, 'relief': 88, 'resident': 89, 'significant': 90, 'south': 91, 'south_west': 92, 'southern': 93, 'stage': 94, 'state': 95, 'storm': 96, 'street': 97, 'strong': 98, 'threaten': 99, 'thunderstorm': 100, 'town': 101, 'unit': 102, 'vacate': 103, 'weather': 104, 'weather_condition': 105, 'west': 106, 'wind': 107, 'December': 108, 'Dora': 109, 'Hafiz': 110, 'India': 111, 'Jaish': 112, 'Karachi': 113, 'Kashmir': 114, 'Kashmiri': 115, 'Lashkar': 116, 'Mohammed': 117, 'Pakistan': 118, 'Police': 119, 'Saeed': 120, 'Srinagar': 121, 'Taiba': 122, 'accuse': 123, 'announce': 124, 'arrest': 125, 'attack': 126, 'base': 127, 'behest': 128, 'border': 129, 'capital': 130, 'carry': 131, 'chief': 132, 'come': 133, 'death': 134, 'diplomatic': 135, 'dozen': 136, 'e': 137, 'encounter': 138, 'escalate': 139, 'extremist': 140, 'group': 141, 'indian': 142, 'intelligence': 143, 'kilometer': 144, 'launch': 145, 'level': 146, 'likely': 147, 'mass': 148, 'militant': 149, 'military': 150, 'mohammad': 151, 'night': 152, 'organisation': 153, 'pakistani': 154, 'parliament': 155, 'police': 156, 'raid': 157, 'sanction': 158, 'security_force': 159, 'see': 160, 'shoot_dead': 161, 'shootout': 162, 'side': 163, 'soar': 164, 'summer': 165, 'suspect': 166, 'take': 167, 'targette': 168, 'tat': 169, 'tension': 170, 'tit': 171, 'trade': 172, 'troop': 173, 'village': 174, 'war': 175, 'yesterday': 176, 'ACT': 177, 'Christmas': 178, 'Northern_Territory': 179, 'Queensland': 180, 'South_Australia': 181, 'Tasmania': 182, 'Victoria': 183, 'Western_Australia': 184, 'fatality': 185, 'few': 186, 'free': 187, 'holiday': 188, 'national': 189, 'people_die': 190, 'period': 191, 'record': 192, 'remain': 193, 'road': 194, 'stand': 195, 'time': 196, 'toll': 197, 'year': 198, 'Aldolfo': 199, 'Argentina': 200, 'March': 201, 'President': 202, 'Puerta': 203, 'Ramon': 204, 'Rodregiuez': 205, 'Saa': 206, 'act': 207, 'agency': 208, 'appoint': 209, 'assume': 210, 'bad': 211, 'caretaker': 212, 'congress': 213, 'crisis': 214, 'currency': 215, 'daunting': 216, 'day': 217, 'debt': 218, 'deepen': 219, 'default': 220, 'economic': 221, 'election': 222, 'end': 223, 'fail': 224, 'fellow': 225, 'follow': 226, 'foreign': 227, 'fresh': 228, 'government': 229, 'interim': 230, 'international': 231, 'isolate': 232, 'job': 233, 'key': 234, 'lawmaker': 235, 'leader': 236, 'lending': 237, 'massive': 238, 'nation': 239, 'office': 240, 'package': 241, 'peronist': 242, 'plan': 243, 'political': 244, 'predecessor': 245, 'presidency': 246, 'president': 247, 'promise': 248, 'quit': 249, 'recession': 250, 'repayment': 251, 'rescue': 252, 'resign': 253, 'resignation': 254, 'role': 255, 'schedule': 256, 'senate': 257, 'senior': 258, 'series': 259, 'stunned': 260, 'support': 261, 'tackle': 262, 'task': 263, 'tell': 264, 'week_ago': 265, 'Area': 266, 'Health': 267, 'Hospital': 268, 'Sherbon': 269, 'Tony': 270, 'Wollongong': 271, 'able': 272, 'action': 273, 'angry': 274, 'annoyed': 275, 'behaviour': 276, 'body': 277, 'care': 278, 'chief_executive': 279, 'concerned': 280, 'conduct': 281, 'consider': 282, 'hospital': 283, 'hour': 284, 'inappropriate': 285, 'investigation': 286, 'involve': 287, 'know': 288, 'labour': 289, 'midwife': 290, 'nitrous': 291, 'occasion': 292, 'officer': 293, 'oxide': 294, 'relocate': 295, 'risk': 296, 'service': 297, 'staff': 298, 'suspend': 299, 'unprofessional': 300, 'use': 301, 'week': 302, 'well': 303, 'woman': 304, 'work': 305, 'Afghanistan': 306, 'Afghans': 307, 'Australia': 308, 'Britain': 309, 'Department': 310, 'Downer': 311, 'Europe': 312, 'Federal_Government': 313, 'Foreign_Affairs': 314, 'Government': 315, 'Immigration': 316, 'Island': 317, 'Kabul': 318, 'Minister_Alexander': 319, 'Nauru': 320, 'Pacific': 321, 'Taliban': 322, 'afghani': 323, 'aircraft': 324, 'airlift': 325, 'application': 326, 'asylum': 327, 'asylum_seeker': 328, 'await': 329, 'chartered': 330, 'claim': 331, 'country': 332, 'deliver': 333, 'detainee': 334, 'environment': 335, 'establish': 336, 'finish': 337, 'flee': 338, 'fly': 339, 'future': 340, 'hold': 341, 'interim_government': 342, 'island': 343, 'major': 344, 'matter': 345, 'operation': 346, 'power': 347, 'process': 348, 'processing': 349, 'refuse': 350, 'return': 351, 'safe': 352, 'secure': 353, 'seek': 354, 'spokesman': 355, 'temporary': 356, 'threat': 357, 'total': 358, 'try': 359, 'visa': 360, '$': 361, 'American': 362, 'Americans': 363, 'Arnaud': 364, 'Burswood': 365, 'Clement': 366, 'Cup': 367, 'Dome': 368, 'Federer': 369, 'France': 370, 'Gambill': 371, 'Hingis': 372, 'Hopman': 373, 'Jan': 374, 'Martina': 375, 'Michael': 376, 'Monica': 377, 'Perth': 378, 'Razzano': 379, 'Roger': 380, 'Seles': 381, 'United_States': 382, 'Virginie': 383, 'aim': 384, 'ascendancy': 385, 'beat': 386, 'bit': 387, 'break': 388, 'chance': 389, 'clash': 390, 'complete': 391, 'contest': 392, 'currently': 393, 'decisive': 394, 'delighted': 395, 'despite': 396, 'determined': 397, 'dollar': 398, 'early': 399, 'event': 400, 'expect': 401, 'fight': 402, 'final': 403, 'frankly': 404, 'gain': 405, 'game': 406, 'get': 407, 'great': 408, 'hard': 409, 'hunger': 410, 'improve': 411, 'junior': 412, 'lead': 413, 'match': 414, 'minute': 415, 'mixed': 416, 'opening': 417, 'opponent': 418, 'overpower': 419, 'pair': 420, 'peak': 421, 'play': 422, 'player': 423, 'position': 424, 'press': 425, 'quickly': 426, 'rank': 427, 'reason': 428, 'recover': 429, 'regain': 430, 'relative': 431, 'row': 432, 'runner': 433, 'score': 434, 'scrapper': 435, 'season': 436, 'set': 437, 'shaky': 438, 'shape': 439, 'single': 440, 'slip': 441, 'slow': 442, 'snatch': 443, 'sport': 444, 'start': 445, 'straight': 446, 'stride': 447, 'strive': 448, 'surprise': 449, 'swiss': 450, 'team': 451, 'tenacious': 452, 'tennis': 453, 'tentative': 454, 'think': 455, 'tough': 456, 'tournament': 457, 'unbeatable': 458, 'unseeded': 459, 'victory': 460, 'want': 461, 'way': 462, 'wear': 463, 'win': 464, 'world': 465, 'year_old': 466, 'Cross': 467, 'K': 468, 'Marathon': 469, 'Melbourne': 470, 'Murray': 471, 'Red': 472, 'River': 473, 'Swan': 474, 'Thursday': 475, 'Yarrawonga': 476, 'Years': 477, 'afternoon': 478, 'battle': 479, 'canoe': 480, 'canoeist': 481, 'celebration': 482, 'condition': 483, 'dusty': 484, 'earn': 485, 'enjoy': 486, 'exhausting': 487, 'gruelle': 488, 'hot': 489, 'kilometre': 490, 'line': 491, 'paddle': 492, 'raise': 493, 'strongly': 494, 'victim': 495, '30': 496, 'Appin': 497, 'Bulaburra': 498, 'Cessnock': 499, 'Commissioner': 500, 'Friday': 501, 'Glenbrook': 502, 'Grafton': 503, 'Great': 504, 'Greater': 505, 'Koperberg': 506, 'Marks': 507, 'Newcastle': 508, 'Phil': 509, 'Shoalhaven': 510, 'Spencer': 511, 'Sullivan': 512, 'Western': 513, 'activity': 514, 'allow': 515, 'bring': 516, 'bureau': 517, 'cancel': 518, 'caution': 519, 'check': 520, 'city': 521, 'community': 522, 'contain': 523, 'containment': 524, 'continue': 525, 'control': 526, 'device': 527, 'drop': 528, 'dry': 529, 'electrical': 530, 'excited': 531, 'favourable': 532, 'fighter': 533, 'firework': 534, 'flank': 535, 'flare': 536, 'forecast': 537, 'give': 538, 'guard': 539, 'high': 540, 'highlands': 541, 'inaccessible': 542, 'incendiary': 543, 'increase': 544, 'large': 545, 'lightne': 546, 'lot': 547, 'low': 548, 'mean': 549, 'mild': 550, 'morning': 551, 'overly': 552, 'overnight': 553, 'pose': 554, 'protect': 555, 'reluctant': 556, 'ring': 557, 'severe': 558, 'spark': 559, 'spend': 560, 'strengthen': 561, 'temperature': 562, 'welcome': 563, 'westerly': 564, 'western': 565, 'will': 566, 'Bulli': 567, 'Color': 568, 'Ferry': 569, 'National': 570, 'Park': 571, 'Penrith': 572, 'Picton': 573, 'Road': 574, 'Robertson': 575, 'Royal': 576, 'Springwood': 577, 'Tops': 578, 'Upper': 579, 'Wilton': 580, 'Wisemans': 581, 'access': 582, 'ask': 583, 'avoid': 584, 'closed': 585, 'dangerous': 586, 'escort': 587, 'limit': 588, 'local': 589, 'motorist': 590, 'reduce': 591, 'reduced': 592, 'smoke': 593, 'speed': 594, 'visibility': 595, 'Boonah': 596, 'Brisbane': 597, 'Coast': 598, 'Ergon': 599, 'Gale': 600, 'Landsborough': 601, 'Nambour': 602, 'Sunshine': 603, 'Toowoomba': 604, 'black': 605, 'car': 606, 'crash': 607, 'damage': 608, 'energex': 609, 'energy': 610, 'fierce': 611, 'house': 612, 'injure': 613, 'inside': 614, 'location': 615, 'person': 616, 'protective': 617, 'repair': 618, 'restore': 619, 'rip': 620, 'send': 621, 'strike': 622, 'supply': 623, 'tarpaulin': 624, 'ten': 625, 'thousand': 626, 'trap': 627, 'tree': 628, 'undergo': 629, 'uproot': 630, 'voltage': 631, 'wild': 632, ' ': 633, 'Alejandro': 634, 'Congress': 635, 'Lima': 636, 'Peru': 637, 'Toledo': 638, 'adjoining': 639, 'apartment': 640, 'architecture': 641, 'beach': 642, 'begin': 643, 'blame': 644, 'block': 645, 'building': 646, 'buy': 647, 'cache': 648, 'colonial': 649, 'crowd': 650, 'cut': 651, 'destroy': 652, 'donate': 653, 'downtown': 654, 'engulf': 655, 'enter': 656, 'era': 657, 'evening': 658, 'exacerbate': 659, 'explode': 660, 'fame': 661, 'flame': 662, 'heritage': 663, 'illegal': 664, 'inquiry': 665, 'kill': 666, 'list': 667, 'market': 668, 'medicine': 669, 'mourning': 670, 'narrow': 671, 'official': 672, 'oversee': 673, 'poor': 674, 'public': 675, 'race': 676, 'shop': 677, 'short': 678, 'spanish': 679, 'storey': 680, 'surround': 681, 'traditional': 682, 'traffic': 683, 'urge': 684, 'vendor': 685, 'Abdul': 686, 'Asian': 687, 'Association': 688, 'Atal': 689, 'Behari': 690, 'Cooperation': 691, 'European': 692, 'Foreign_Minister': 693, 'General': 694, 'Group': 695, 'January': 696, 'Musharraf': 697, 'Nepal': 698, 'Pervez': 699, 'Prime_Minister': 700, 'Regional': 701, 'SAARC': 702, 'Sattar': 703, 'Saturday': 704, 'South': 705, 'Sunday': 706, 'Union': 707, 'Vajpayee': 708, 'accept': 709, 'allegation': 710, 'anybody': 711, 'armed': 712, 'assault': 713, 'beg': 714, 'bloody': 715, 'brewing': 716, 'conflict': 717, 'consequence': 718, 'counter': 719, 'create': 720, 'dangerously': 721, 'defuse': 722, 'deny': 723, 'dialogue': 724, 'dispute': 725, 'domestic': 726, 'eastern': 727, 'eradicate': 728, 'erupt': 729, 'exercise': 730, 'explosive': 731, 'extremism': 732, 'face': 733, 'feeling': 734, 'form': 735, 'fully': 736, 'goal': 737, 'grow': 738, 'harmony': 739, 'hurdle': 740, 'industrialise': 741, 'internal': 742, 'intervention': 743, 'intolerance': 744, 'keep': 745, 'let': 746, 'like': 747, 'mastermind': 748, 'measure': 749, 'meet': 750, 'meeting': 751, 'militancy': 752, 'mount': 753, 'obstacle': 754, 'offer': 755, 'party': 756, 'peace': 757, 'positive': 758, 'potentially': 759, 'prepared': 760, 'provocation': 761, 'receive': 762, 'reject': 763, 'repeat': 764, 'resolve': 765, 'respond': 766, 'restraint': 767, 'rule': 768, 'sideline': 769, 'situation': 770, 'small': 771, 'snowball': 772, 'society': 773, 'soil': 774, 'stock': 775, 'summit': 776, 'talk': 777, 'tense': 778, 'terrorism': 779, 'thrust': 780, 'useful': 781, 'vigorously': 782, 'warn': 783, 'willing': 784, 'Abdullah': 785, 'Dr': 786, 'Interior': 787, 'Kanooni': 788, 'Minister': 789, 'Yunis': 790, 'afghan': 791, 'agreement': 792, 'appear': 793, 'british': 794, 'commander': 795, 'conflicting': 796, 'confusion': 797, 'criticise': 798, 'delay': 799, 'deployment': 800, 'document': 801, 'emerge': 802, 'enlarge': 803, 'finalise': 804, 'lack': 805, 'number': 806, 'occupation': 807, 'patrol': 808, 'peacekeeper': 809, 'proposal': 810, 'sign': 811, 'signal': 812, 'soon': 813, 'suitable': 814, 'tandem': 815, 'tantamount': 816, 'translation': 817, 'Alei': 818, 'Gaza_Strip': 819, 'Hamas': 820, 'Israelis': 821, 'Nitzanit': 822, 'October': 823, 'Palestinians': 824, 'Sinai': 825, 'armoured': 826, 'army': 827, 'edge': 828, 'gunman': 829, 'include': 830, 'infiltrate': 831, 'islamic': 832, 'israeli': 833, 'jewish': 834, 'killing': 835, 'month': 836, 'northern': 837, 'observation': 838, 'open_fire': 839, 'palestinian': 840, 'post': 841, 'radical': 842, 'rifle': 843, 'settlement': 844, 'settler': 845, 'shell': 846, 'source': 847, 'tank': 848, 'teenage': 849, 'uprising': 850, 'vehicle': 851, 'Bumblebee': 852, 'Catt': 853, 'Eddystone': 854, 'Geoff': 855, 'Helens': 856, 'Ian': 857, 'Liberator': 858, 'Point': 859, 'St': 860, 'Sydney_Hobart': 861, 'Zeus': 862, 'apparently': 863, 'australian': 864, 'away': 865, 'boat': 866, 'clinch': 867, 'coveted': 868, 'handicap': 869, 'honour': 870, 'injury': 871, 'lose': 872, 'metre': 873, 'mile': 874, 'object': 875, 'onboard': 876, 'overall': 877, 'protest': 878, 'report': 879, 'rival': 880, 'rudder': 881, 'skipper': 882, 'stop': 883, 'tasmanian': 884, 'title': 885, 'tow': 886, 'trouble': 887, 'wine': 888, 'yacht': 889, 'Boje': 890, 'Nicky': 891, 'Shaun_Pollock': 892, 'South_Africa': 893, 'Test': 894, 'arm': 895, 'arrive': 896, 'captain': 897, 'fit': 898, 'go': 899, 'good': 900, 'hope': 901, 'left': 902, 'pick': 903, 'possible': 904, 'prepare': 905, 'ready': 906, 'south_african': 907, 'spinner': 908, 'squad': 909, 'test': 910, 'tour': 911, 'withdraw': 912, 'Alicia': 913, 'Arantxa': 914, 'Australian': 915, 'Emilio': 916, 'Hewitt': 917, 'Lleyton': 918, 'Molik': 919, 'Open': 920, 'Robredoboth': 921, 'Sanchez': 922, 'Spain': 923, 'Switzerland': 924, 'Tennis': 925, 'Tommy': 926, 'Vicario': 927, 'brother': 928, 'campaign': 929, 'cent': 930, 'champion': 931, 'double': 932, 'grand': 933, 'happen': 934, 'look_forward': 935, 'nice': 936, 'objective': 937, 'pressure': 938, 'put': 939, 'reach': 940, 'slam': 941, 'sort': 942, 'tie': 943, 'today': 944, 'Honours': 945, 'Jackson': 946, 'Labour': 947, 'Lord': 948, 'Merit': 949, 'New_Zealand': 950, 'Order': 951, 'Peter': 952, 'Rings': 953, 'Tolkien': 954, 'big': 955, 'budget': 956, 'cast': 957, 'classic': 958, 'companion': 959, 'director': 960, 'employ': 961, 'favour': 962, 'film': 963, 'knighthood': 964, 'production': 965, 'reward': 966, 'Mark': 967, 'Williams': 968, 'alert': 969, 'burning': 970, 'change': 971, 'controller': 972, 'creek': 973, 'crucial': 974, 'devour': 975, 'fear': 976, 'incident': 977, 'initially': 978, 'instance': 979, 'outbreak': 980, 'predict': 981, 'region': 982, 'revise': 983, 'southerly': 984, 'word': 985, '8:00am': 986, '8:00pm': 987, 'Adolfo': 988, 'Monday': 989, 'Rodriguez': 990, 'US1,000': 991, 'appeal': 992, 'argentine': 993, 'bank': 994, 'banking': 995, 'cash': 996, 'facilitate': 997, 'help': 998, 'issue': 999, 'open': 1000, 'payment': 1001, 'pension': 1002, 'peso': 1003, 'retiree': 1004, 'salary': 1005, 'worker': 1006, 'boy': 1007, 'critical': 1008, 'die': 1009, 'hit': 1010, 'man': 1011, 'mid': 1012, 'passenger': 1013, 'pole': 1014, 'rise': 1015, 'telephone': 1016, 'David': 1017, 'Environment': 1018, 'Federal': 1019, 'Kemp': 1020, 'benzene': 1021, 'build': 1022, 'carbon': 1023, 'diesel': 1024, 'distributor': 1025, 'emission': 1026, 'fuel': 1027, 'greatly': 1028, 'illness': 1029, 'inspector': 1030, 'introduce': 1031, 'law': 1032, 'monoxide': 1033, 'petrol': 1034, 'pollutant': 1035, 'pollution': 1036, 'prior': 1037, 'prohibit': 1038, 'quality': 1039, 'random': 1040, 'reduction': 1041, 'refinery': 1042, 'replacement': 1043, 'require': 1044, 'respiratory': 1045, 'sampling': 1046, 'standard': 1047, 'station': 1048, 'terminal': 1049, 'territory': 1050, 'toxic': 1051, 'truck': 1052, 'Atlanta': 1053, 'Beretta': 1054, 'Florida': 1055, 'Memphis': 1056, 'Miami': 1057, 'Paris': 1058, 'ability': 1059, 'acknowledge': 1060, 'airline': 1061, 'airplane': 1062, 'attempt': 1063, 'automatic': 1064, 'bail': 1065, 'board': 1066, 'boarding': 1067, 'discover': 1068, 'drama': 1069, 'finally': 1070, 'flight': 1071, 'gun': 1072, 'hand': 1073, 'highlight': 1074, 'loaded': 1075, 'luggage': 1076, 'mm': 1077, 'personnel': 1078, 'pistol': 1079, 'plane': 1080, 'problem': 1081, 'release': 1082, 'security': 1083, 'semi': 1084, 'separate': 1085, 'shoe': 1086, 'suggestion': 1087, 'terrorist_attack': 1088, 'travel': 1089, 'weapon': 1090, 'Alice': 1091, 'Finke': 1092, 'Hermansburg': 1093, 'Rescue': 1094, 'Search': 1095, 'Springs': 1096, 'activate': 1097, 'attend': 1098, 'beacon': 1099, 'central': 1100, 'cross': 1101, 'drive': 1102, 'emergency': 1103, 'ground': 1104, 'river': 1105, 'safety': 1106, 'sand': 1107, 'sink': 1108, 'soft': 1109, 'stick': 1110, 'strand': 1111, 'submerged': 1112, 'tourist': 1113, 'walk': 1114, '2:15am': 1115, '6:15am': 1116, '9:30am': 1117, 'Blake': 1118, 'Bulabarra': 1119, 'Card': 1120, 'Clapham': 1121, 'Gary': 1122, 'Hawkesbury': 1123, 'Hobart': 1124, 'Kontrol': 1125, 'Lapstone': 1126, 'Miguel': 1127, 'San': 1128, 'Sutherland': 1129, 'Tevake': 1130, 'Wild': 1131, 'arrival': 1132, 'clear': 1133, 'concentrate': 1134, 'day_ago': 1135, 'decide': 1136, 'desperately': 1137, 'entry': 1138, 'exhausted': 1139, 'extensive': 1140, 'fast': 1141, 'favourite': 1142, 'fleet': 1143, 'half': 1144, 'head': 1145, 'heatwave': 1146, 'materialise': 1147, 'modification': 1148, 'preparation': 1149, 'rage': 1150, 'sight': 1151, 'spread': 1152, 'successful': 1153, 'victorian': 1154, 'weekend': 1155, 'windy': 1156, 'winner': 1157, 'Indian': 1158, 'Parliament': 1159, 'allege': 1160, 'back': 1161, 'clap': 1162, 'initiate': 1163, 'mind': 1164, 'regional': 1165, 'suicide_attack': 1166, 'willingness': 1167, '60': 1168, 'Assa_Abloy': 1169, 'Auckland': 1170, 'Derwent': 1171, 'Nicorette': 1172, 'Round': 1173, 'Swedish': 1174, 'Volvo': 1175, 'World': 1176, 'accompany': 1177, 'celebrate': 1178, 'dock': 1179, 'journey': 1180, 'light': 1181, 'maxi': 1182, 'nautical': 1183, 'ocean': 1184, 'racer': 1185, 'rest': 1186, 'sail': 1187, 'spectator': 1188, 'swedish': 1189, 'Airlines': 1190, 'Boston': 1191, 'Court': 1192, 'Cronin': 1193, 'Dein': 1194, 'Investigation': 1195, 'Judith': 1196, 'Magistrate': 1197, 'Margaret': 1198, 'Reid': 1199, 'Richard': 1200, 'TATP': 1201, 'additional': 1202, 'agent': 1203, 'allegedly': 1204, 'charge': 1205, 'detonate': 1206, 'disaster': 1207, 'divert': 1208, 'federal': 1209, 'file': 1210, 'functioning': 1211, 'fuselage': 1212, 'grant': 1213, 'hearing': 1214, 'hole': 1215, 'improvise': 1216, 'interfere': 1217, 'intimidation': 1218, 'investigator': 1219, 'laden': 1220, 'later': 1221, 'magistrate': 1222, 'offense': 1223, 'outer': 1224, 'possession': 1225, 'possibility': 1226, 'powerful': 1227, 'sneaker': 1228, 'term': 1229, 'transatlantic': 1230, 'wall': 1231, 'year_jail': 1232, 'Anthony_Zinni': 1233, 'Colin': 1234, 'Corps': 1235, 'Marine': 1236, 'Memorandum': 1237, 'Mitchell': 1238, 'November': 1239, 'Plan': 1240, 'Powell': 1241, 'Russia': 1242, 'Secretary': 1243, 'State': 1244, 'Tenet': 1245, 'Yasser_Arafat': 1246, 'Zinni': 1247, 'administration': 1248, 'agree': 1249, 'american': 1250, 'believe': 1251, 'bloodshed': 1252, 'blueprint': 1253, 'cease_fire': 1254, 'chair': 1255, 'confidence': 1256, 'effective': 1257, 'envoy': 1258, 'general': 1259, 'halt': 1260, 'implement': 1261, 'implementation': 1262, 'importance': 1263, 'internationally': 1264, 'late': 1265, 'month_ago': 1266, 'palestinian_leader': 1267, 'palestinian_leadership': 1268, 'presence': 1269, 'proceed': 1270, 'quick': 1271, 'recall': 1272, 'retired': 1273, 'statement': 1274, 'underscore': 1275, 'violence': 1276, 'Aids': 1277, 'Bulletin': 1278, 'Centre': 1279, 'Control': 1280, 'Council': 1281, 'Disease': 1282, 'Farmer': 1283, 'Frank': 1284, 'HIV': 1285, 'Territory': 1286, 'compare': 1287, 'complacent': 1288, 'contact': 1289, 'council': 1290, 'feel': 1291, 'female': 1292, 'figure': 1293, 'infection': 1294, 'male': 1295, 'nationally': 1296, 'need': 1297, 'percent': 1298, 'practice': 1299, 'previous': 1300, 'rate': 1301, 'remind': 1302, 'sex': 1303, 'shift': 1304, 'statistic': 1305, 'transmission': 1306, 'turn': 1307, 'twice': 1308, 'wake': 1309, 'Antarctic': 1310, 'BAS': 1311, 'British': 1312, 'Det': 1313, 'Ice': 1314, 'Norske': 1315, 'Sheet': 1316, 'Survey': 1317, 'Vaughan': 1318, 'Veritas': 1319, 'West': 1320, 'accord': 1321, 'account': 1322, 'assessment': 1323, 'balance': 1324, 'case': 1325, 'centimetre': 1326, 'century': 1327, 'climate': 1328, 'coastal': 1329, 'combination': 1330, 'continent': 1331, 'damn': 1332, 'develop': 1333, 'dire': 1334, 'disintegrate': 1335, 'dramatic': 1336, 'environmental': 1337, 'estimate': 1338, 'expensive': 1339, 'extraction': 1340, 'factor': 1341, 'fantastically': 1342, 'frozen': 1343, 'giant': 1344, 'global': 1345, 'happening': 1346, 'human': 1347, 'ice': 1348, 'impact': 1349, 'industrial': 1350, 'lie': 1351, 'likelihood': 1352, 'look': 1353, 'melt': 1354, 'norwegian': 1355, 'old': 1356, 'polluter': 1357, 'population': 1358, 'potential': 1359, 'relate': 1360, 'scientist': 1361, 'sea': 1362, 'severity': 1363, 'survey': 1364, 'warming': 1365, 'water': 1366, '12:30am': 1367, 'Ballidu': 1368, 'Bradford': 1369, 'Burakin': 1370, 'Geo': 1371, 'Richter': 1372, 'September': 1373, 'Shane': 1374, 'earthquake': 1375, 'epicentre': 1376, 'occur': 1377, 'quake': 1378, 'scale': 1379, 'science': 1380, 'shake': 1381, 'sleep': 1382, 'wheatbelt': 1383, '7:30pm': 1384, 'Backburning': 1385, 'Barrack': 1386, 'Cataract': 1387, 'Cove': 1388, 'Dam': 1389, 'Gully': 1390, 'Heights': 1391, 'Helensburgh': 1392, 'Kembla': 1393, 'Mt': 1394, 'Old': 1395, 'Princes': 1396, 'Shell': 1397, 'Shellharbour': 1398, 'Windy': 1399, 'ahead': 1400, 'bound': 1401, 'certainly': 1402, 'complacency': 1403, 'couple': 1404, 'extinguish': 1405, 'eye': 1406, 'interview': 1407, 'jump': 1408, 'kind': 1409, 'lay': 1410, 'nasty': 1411, 'outlook': 1412, 'preventative': 1413, 'relation': 1414, 'room': 1415, 'somewhat': 1416, 'target': 1417, 'teenager': 1418, 'workload': 1419, 'worried': 1420, 'Ahmed': 1421, 'Aziz': 1422, 'Delhi': 1423, 'Foreign': 1424, 'Islamabad': 1425, 'Jaswant': 1426, 'Khan': 1427, 'Ministry': 1428, 'New_Delhi': 1429, 'Singh': 1430, 'add': 1431, 'airspace': 1432, 'ban': 1433, 'complex': 1434, 'confine': 1435, 'cosmetic': 1436, 'downgrade': 1437, 'dupe': 1438, 'embassy': 1439, 'ministry': 1440, 'mission': 1441, 'movement': 1442, 'municipal': 1443, 'non': 1444, 'operate': 1445, 'regret': 1446, 'representation': 1447, 'result': 1448, 'retaliation': 1449, 'retaliatory': 1450, 'sponsor': 1451, 'Al_Qaeda': 1452, 'Bin': 1453, 'Defence': 1454, 'Fazalur': 1455, 'Habeel': 1456, 'Islam': 1457, 'Jamiat': 1458, 'Laden': 1459, 'Maulana': 1460, 'Mohamad': 1461, 'New_York': 1462, 'Osama': 1463, 'Osama_bin': 1464, 'Party': 1465, 'Rehman': 1466, 'Saudi': 1467, 'September_attack': 1468, 'Ulema': 1469, 'Washington': 1470, 'altogether': 1471, 'bear': 1472, 'bin_Laden': 1473, 'collapse': 1474, 'completely': 1475, 'dissident': 1476, 'fundamentalist': 1477, 'hide': 1478, 'individual': 1479, 'influence': 1480, 'information': 1481, 'live': 1482, 'network': 1483, 'protection': 1484, 'resistance': 1485, 'supporter': 1486, 'sure': 1487, 'Bloomberg': 1488, 'City': 1489, 'Giuliani': 1490, 'Mayor': 1491, 'Person': 1492, 'Republican': 1493, 'Rudolph': 1494, 'Time': 1495, 'World_Trade': 1496, 'adopt': 1497, 'aftermath': 1498, 'beautiful': 1499, 'bid': 1500, 'billionaire': 1501, 'brutality': 1502, 'crime': 1503, 'deadly': 1504, 'deal': 1505, 'decry': 1506, 'difficult': 1507, 'diversity': 1508, 'emotion': 1509, 'ensure': 1510, 'farewell': 1511, 'highly': 1512, 'hispanic': 1513, 'homelessness': 1514, 'insist': 1515, 'jerk': 1516, 'knee': 1517, 'magazine': 1518, 'mayor': 1519, 'medium': 1520, 'memorial': 1521, 'model': 1522, 'mogul': 1523, 'past': 1524, 'policing': 1525, 'praise': 1526, 'prevent': 1527, 'pull': 1528, 'reality': 1529, 'reign': 1530, 'rightly': 1531, 'sanctify': 1532, 'scandal': 1533, 'serve': 1534, 'site': 1535, 'speak': 1536, 'stance': 1537, 'strategy': 1538, 'strength': 1539, 'succeed': 1540, 'wrongly': 1541, 'Africans': 1542, 'Allan': 1543, 'Andy_Bichel': 1544, 'Bichel': 1545, 'Boucher': 1546, 'Boxing_Day': 1547, 'Brett_Lee': 1548, 'Claude': 1549, 'Donald': 1550, 'Glenn_McGrath': 1551, 'Hayden': 1552, 'Haywood': 1553, 'Henderson': 1554, 'Jacques_Kallis': 1555, 'Justin': 1556, 'Kallis': 1557, 'Klusener': 1558, 'Lance': 1559, 'Langer': 1560, 'Lee': 1561, 'MCG': 1562, 'Matthew': 1563, 'McKenzie': 1564, 'Nantie': 1565, 'Neil_McKenzie': 1566, 'Ricky_Ponting': 1567, 'Waugh': 1568, 'abandon': 1569, 'ball': 1570, 'bat': 1571, 'batsman': 1572, 'bounce': 1573, 'bowler': 1574, 'bowling': 1575, 'catch': 1576, 'chip': 1577, 'cruise': 1578, 'delivery': 1579, 'direct': 1580, 'dismiss': 1581, 'dominant': 1582, 'duck': 1583, 'field': 1584, 'fielding': 1585, 'four': 1586, 'happily': 1587, 'hat': 1588, 'inning': 1589, 'innings': 1590, 'inswinger': 1591, 'introduction': 1592, 'leg': 1593, 'loss': 1594, 'luck': 1595, 'miss': 1596, 'opener': 1597, 'over': 1598, 'paceman': 1599, 'pay': 1600, 'perfect': 1601, 'rearguard': 1602, 'reel': 1603, 'remove': 1604, 'replay': 1605, 'respectable': 1606, 'resume': 1607, 'roar': 1608, 'run': 1609, 'sharp': 1610, 'shot': 1611, 'show': 1612, 'shy': 1613, 'slump': 1614, 'star': 1615, 'steady': 1616, 'stump': 1617, 'suddenly': 1618, 'swallow': 1619, 'timer': 1620, 'trick': 1621, 'unlucky': 1622, 'vital': 1623, 'wicket': 1624, 'Collingwood': 1625, 'Franklin': 1626, 'Romanszko': 1627, 'Romaszko': 1628, 'alarm': 1629, 'camping': 1630, 'find': 1631, 'friend': 1632, 'guy': 1633, 'helicopter': 1634, 'junction': 1635, 'member': 1636, 'mode': 1637, 'nearly': 1638, 'overturn': 1639, 'phone': 1640, 'raft': 1641, 'rafter': 1642, 'rapid': 1643, 'satellite': 1644, 'survival': 1645, 'survive': 1646, 'sweep': 1647, 'swell': 1648, 'upside': 1649, 'wave': 1650, 'winch': 1651, 'Barron': 1652, 'Bartels': 1653, 'Cape': 1654, 'Corp': 1655, 'Eden': 1656, 'Firma': 1657, 'Flinders': 1658, 'Green': 1659, 'Grundig': 1660, 'Illbruck': 1661, 'Krakatoa': 1662, 'Lady': 1663, 'News': 1664, 'Race': 1665, 'Skipper': 1666, 'Team': 1667, 'Terra': 1668, 'Tyco': 1669, 'Ulludullah': 1670, 'Yacht': 1671, 'astern': 1672, 'crew_member': 1673, 'hull': 1674, 'injured': 1675, 'mandatory': 1676, 'neck': 1677, 'radio': 1678, 'retire': 1679, 'sick': 1680, 'Baulkham': 1681, 'Engadine': 1682, 'Heathcote': 1683, 'Hills': 1684, 'Huskisson': 1685, 'John': 1686, 'Kempsey': 1687, 'Mudgee': 1688, 'Narromine': 1689, 'Richmond': 1690, 'Weather': 1691, 'Winter': 1692, 'average': 1693, 'breeze': 1694, 'challenging': 1695, 'confirm': 1696, 'firefighting': 1697, 'forecaster': 1698, 'gear': 1699, 'inland': 1700, 'obviously': 1701, 'particularly': 1702, 'shire': 1703, 'stretch': 1704, 'suburb': 1705, 'thing': 1706, 'third': 1707, 'Baker': 1708, 'Brixton': 1709, 'Hak': 1710, 'London': 1711, 'blow': 1712, 'camp': 1713, 'convert': 1714, 'criminal': 1715, 'easily': 1716, 'effect': 1717, 'jail': 1718, 'jihad': 1719, 'loyalist': 1720, 'mosque': 1721, 'muslim': 1722, 'petty': 1723, 'prisoner': 1724, 'profound': 1725, 'remember': 1726, 'surprised': 1727, 'television': 1728, 'totally': 1729, 'train': 1730, 'type': 1731, 'worship': 1732, 'Adelaide': 1733, 'Ansett': 1734, 'Blue': 1735, 'Canberra': 1736, 'Fox': 1737, 'Huttner': 1738, 'Launceston': 1739, 'Vigin': 1740, 'Virgin': 1741, 'aggressive': 1742, 'business': 1743, 'competition': 1744, 'customer': 1745, 'endeavour': 1746, 'everybody': 1747, 'fare': 1748, 'internet': 1749, 'main': 1750, 'midweek': 1751, 'operational': 1752, 'recently': 1753, 'revamp': 1754, 'seat': 1755, 'simply': 1756, 'spirit': 1757, 'stave': 1758, 'thank': 1759, 'wish': 1760, 'Baccus': 1761, 'Day': 1762, 'Marsh': 1763, 'bike': 1764, 'life': 1765, 'motor': 1766, 'suffer': 1767, 'Army': 1768, 'Bank': 1769, 'Israel': 1770, 'Jenin': 1771, 'Palestinian': 1772, 'Palestinian_Authority': 1773, 'Qadim': 1774, 'Saadi': 1775, 'Walid': 1776, 'besiege': 1777, 'chase': 1778, 'civilian': 1779, 'cover': 1780, 'escape': 1781, 'exchange': 1782, 'firefight': 1783, 'grenade': 1784, 'heavy': 1785, 'immediately': 1786, 'intense': 1787, 'jeep': 1788, 'land': 1789, 'machine': 1790, 'medic': 1791, 'pitch': 1792, 'policeman': 1793, 'pursuit': 1794, 'self': 1795, 'shoot': 1796, 'slightly': 1797, 'soldier': 1798, 'throw': 1799, 'track': 1800, 'Appleby': 1801, 'BBC': 1802, 'George': 1803, 'Hawthorne': 1804, 'Humphrey': 1805, 'Ken': 1806, 'King': 1807, 'McReddie': 1808, 'Nigel': 1809, 'Oscar': 1810, 'Sir': 1811, 'Thatcher': 1812, 'Wednesday': 1813, 'Yes': 1814, 'actor': 1815, 'age': 1816, 'approval': 1817, 'award': 1818, 'brilliant': 1819, 'cancer': 1820, 'chemotherapy': 1821, 'civil': 1822, 'extremely': 1823, 'glow': 1824, 'have': 1825, 'hawthorne': 1826, 'heart': 1827, 'host': 1828, 'madness': 1829, 'master': 1830, 'minister': 1831, 'nominate': 1832, 'partner': 1833, 'peacefully': 1834, 'performance': 1835, 'prime': 1836, 'range': 1837, 'sad': 1838, 'scheme': 1839, 'screen': 1840, 'sequel': 1841, 'servant': 1842, 'smug': 1843, 'substantial': 1844, 'treatment': 1845, 'unexpected': 1846, 'wonderful': 1847, 'yes': 1848, 'Ausmaid': 1849, 'Broomstick': 1850, 'Business': 1851, 'Cadibarra': 1852, 'Men': 1853, 'SAP': 1854, 'Secret': 1855, 'Sting': 1856, 'Thing': 1857, 'honor': 1858, 'mast': 1859, 'sustain': 1860, 'Proteas': 1861, 'affect': 1862, 'badly': 1863, 'cricket': 1864, 'occasionally': 1865, 'outstanding': 1866, 'typical': 1867, 'Bay': 1868, 'Black': 1869, 'Campbelltown': 1870, 'Emergency': 1871, 'Jervis': 1872, 'Mount': 1873, 'Mountain': 1874, "O'Connor": 1875, 'Queanbeyan': 1876, 'Quinlan': 1877, 'Services': 1878, 'Ted': 1879, 'Wanniassa': 1880, 'assess': 1881, 'band': 1882, 'benefit': 1883, 'busy': 1884, 'confident': 1885, 'danger': 1886, 'extent': 1887, 'fan': 1888, 'hectare': 1889, 'magnificent': 1890, 'pass': 1891, 'resource': 1892, 'ruin': 1893, 'sacrifice': 1894, 'tomorrow': 1895, 'tribute': 1896, 'America': 1897, 'Carl': 1898, 'Carlos': 1899, 'Elizabeth': 1900, 'Gustaf': 1901, 'II': 1902, 'Juan': 1903, 'Queen': 1904, 'Sweden': 1905, 'XVI': 1906, 'annual': 1907, 'belief': 1908, 'circumstance': 1909, 'conscience': 1910, 'describe': 1911, 'european': 1912, 'evil': 1913, 'express': 1914, 'fair': 1915, 'faith': 1916, 'friendship': 1917, 'grief': 1918, 'guide': 1919, 'history': 1920, 'honest': 1921, 'humanity': 1922, 'idea': 1923, 'impose': 1924, 'innocently': 1925, 'instrument': 1926, 'magnitude': 1927, 'message': 1928, 'monarch': 1929, 'mood': 1930, 'ordinary': 1931, 'outrage': 1932, 'pain': 1933, 'previously': 1934, 'reaction': 1935, 'refer': 1936, 'reflect': 1937, 'religion': 1938, 'share': 1939, 'shatter': 1940, 'sombre': 1941, 'speech': 1942, 'terrible': 1943, 'terror': 1944, 'terrorist': 1945, 'tolerance': 1946, 'tragedy': 1947, 'trial': 1948, 'typify': 1949, 'tyranny': 1950, 'unknown': 1951, 'valuable': 1952, 'value': 1953, 'Agha': 1954, 'Akbar': 1955, 'Arab': 1956, 'Arabs': 1957, 'Gul': 1958, 'Kandahar': 1959, 'Mirwais': 1960, 'Yemeni': 1961, 'admit': 1962, 'airport': 1963, 'bandage': 1964, 'barricade': 1965, 'bombing': 1966, 'custody': 1967, 'departure': 1968, 'governor': 1969, 'militia': 1970, 'patient': 1971, 'persuade': 1972, 'provincial': 1973, 'surrender': 1974, 'ward': 1975, 'wound': 1976, 'Caucuses': 1977, 'Chechen': 1978, 'Dagestan': 1979, 'Raduyev': 1980, 'Salman': 1981, 'achievement': 1982, 'corner': 1983, 'destabilise': 1984, 'forefront': 1985, 'hostage': 1986, 'important': 1987, 'insurgency': 1988, 'jealous': 1989, 'negotiate': 1990, 'neighbouring': 1991, 'outperform': 1992, 'prison': 1993, 'republic': 1994, 'russian': 1995, 'sentence': 1996, 'siege': 1997, 'solder': 1998, 'warlord': 1999, 'Bass': 2000, 'Batt': 2001, 'Ingvall': 2002, 'Ludde': 2003, 'Strait': 2004, 'beginning': 2005, 'buoy': 2006, 'collide': 2007, 'design': 2008, 'distance': 2009, 'easy': 2010, 'entrant': 2011, 'equalise': 2012, 'flying': 2013, 'friendly': 2014, 'harm': 2015, 'headstart': 2016, 'height': 2017, 'maxis': 2018, 'maybe': 2019, 'money': 2020, 'round': 2021, 'scenario': 2022, 'silly': 2023, 'somebody': 2024, 'spectacular': 2025, 'spinnaker': 2026, 'split': 2027, 'system': 2028, 'tactic': 2029, 'Anne': 2030, 'Centrelink': 2031, 'Coalition': 2032, 'Criminal': 2033, 'Debt': 2034, 'Justice': 2035, 'Ms': 2036, 'Prison': 2037, 'Project': 2038, 'Stringer': 2039, 'accumulate': 2040, 'deduct': 2041, 'educate': 2042, 'education': 2043, 'employment': 2044, 'family': 2045, 'financial': 2046, 'funding': 2047, 'income': 2048, 'owe': 2049, 'profit': 2050, 'program': 2051, 'project': 2052, 'reoffende': 2053, 'repay': 2054, 'repossess': 2055, 'right': 2056, 'slim': 2057, 'spokeswoman': 2058, 'tool': 2059, 'vastly': 2060, 'Elliot': 2061, 'Northside': 2062, 'Potts': 2063, 'Sergeant': 2064, 'Shopping': 2065, 'Street': 2066, 'Stuart': 2067, 'Tanami': 2068, 'bush': 2069, 'child': 2070, 'kerb': 2071, 'offender': 2072, 'old_boy': 2073, 'parent': 2074, 'pursue': 2075, 'shortly': 2076, 'sit': 2077, 'steal': 2078, 'Alan': 2079, 'Brad': 2080, 'Jason_Gillespie': 2081, 'beckon': 2082, 'bouncy': 2083, 'challenge': 2084, 'cloudy': 2085, 'cool': 2086, 'greeting': 2087, 'hang': 2088, 'hopefully': 2089, 'loose': 2090, 'mark': 2091, 'memory': 2092, 'pace': 2093, 'provide': 2094, 'punt': 2095, 'question': 2096, 'relish': 2097, 'seam': 2098, 'shoulder': 2099, 'shower': 2100, 'starting': 2101, 'toss': 2102, 'unlikely': 2103, 'veteran': 2104, 'Airport': 2105, 'Director': 2106, 'FBI': 2107, 'International': 2108, 'Kinton': 2109, 'Logan': 2110, 'Tom': 2111, 'attendant': 2112, 'avert': 2113, 'fake': 2114, 'ignite': 2115, 'jet': 2116, 'jetliner': 2117, 'literally': 2118, 'mantravelling': 2119, 'passport': 2120, 'smell': 2121, 'sulphur': 2122, 'Hamid': 2123, 'Interim': 2124, 'Karzai': 2125, 'aside': 2126, 'billion': 2127, 'cabinet': 2128, 'ceremony': 2129, 'cost': 2130, 'economy': 2131, 'enable': 2132, 'historic': 2133, 'inauguration': 2134, 'livelihood': 2135, 'living': 2136, 'ravage': 2137, 'rebuild': 2138, 'reconstruction': 2139, 'Bora': 2140, 'China': 2141, 'Pentagon': 2142, 'Tora': 2143, 'Tora_Bora': 2144, 'ally': 2145, 'bombardment': 2146, 'capture': 2147, 'cave': 2148, 'chinese': 2149, 'dead': 2150, 'eastern_Afghanistan': 2151, 'foot': 2152, 'identify': 2153, 'mountain': 2154, 'porous': 2155, 'search': 2156, 'step': 2157, 'suppose': 2158, 'visit': 2159, 'warplane': 2160, 'Adam': 2161, 'Damien': 2162, 'Gilchrist(vc': 2163, 'MacGill': 2164, 'Martyn': 2165, 'SCG': 2166, 'Shane_Warne': 2167, 'Steve': 2168, 'Waugh(c': 2169, 'australian_cricket': 2170, 'replace': 2171, 'selector': 2172, 'spin': 2173, 'sway': 2174, 'Arafat': 2175, 'Ariel_Sharon': 2176, 'Bethlehem': 2177, 'Gaza': 2178, 'Islamic_Jihad': 2179, 'Mass': 2180, 'Rhavam': 2181, 'West_Bank': 2182, 'Zeevi': 2183, 'attack_Israel': 2184, 'biblical': 2185, 'blockade': 2186, 'bury': 2187, 'declare': 2188, 'dismantle': 2189, 'foil': 2190, 'funeral': 2191, 'mourner': 2192, 'murderer': 2193, 'order': 2194, 'organization': 2195, 'palestinian_police': 2196, 'peaceful': 2197, 'punish': 2198, 'reputation': 2199, 'stay': 2200, 'suicide_bombing': 2201, 'tourism': 2202, 'Development': 2203, 'IVF': 2204, 'Institute': 2205, 'Monash': 2206, 'Professor': 2207, 'Reproduction': 2208, 'Trounson': 2209, 'anaemia': 2210, 'baby': 2211, 'blood': 2212, 'bone': 2213, 'boundary': 2214, 'critically': 2215, 'deputy': 2216, 'disease': 2217, 'donation': 2218, 'embryo': 2219, 'exactly': 2220, 'extend': 2221, 'fertalisation': 2222, 'franconi': 2223, 'genetic': 2224, 'genetically': 2225, 'ill': 2226, 'infertility': 2227, 'inherit': 2228, 'institute': 2229, 'make': 2230, 'marrow': 2231, 'normal': 2232, 'opposition': 2233, 'permission': 2234, 'placenta': 2235, 'procedure': 2236, 'rare': 2237, 'request': 2238, 'save': 2239, 'sibling': 2240, 'stem': 2241, 'supportive': 2242, 'technique': 2243, 'technology': 2244, 'toddler': 2245, 'transplant': 2246, 'vitro': 2247, 'young': 2248, 'East': 2249, 'Guard': 2250, 'Japan': 2251, 'Sea': 2252, 'brandish': 2253, 'coastguard': 2254, 'deck': 2255, 'fishing': 2256, 'gunfire': 2257, 'ignore': 2258, 'japanese': 2259, 'korean': 2260, 'metal': 2261, 'mobilise': 2262, 'naval': 2263, 'pipe': 2264, 'reconnaissance': 2265, 'resemble': 2266, 'rough': 2267, 'sailor': 2268, 'smuggler': 2269, 'speculation': 2270, 'spot': 2271, 'spy': 2272, 'trawler': 2273, 'unidentified': 2274, 'vessel': 2275, 'warning': 2276, 'apology': 2277, 'approach': 2278, 'driver': 2279, 'headache': 2280, 'number_people': 2281, 'present': 2282, 'queue': 2283, 'sky': 2284, 'suitcase': 2285, 'tight': 2286, 'traveller': 2287, 'unwrapped': 2288, 'venture': 2289, 'weak': 2290, 'Archbishop': 2291, 'Baptist': 2292, 'Chief': 2293, 'Costello': 2294, 'Crean': 2295, 'Dr_Hollingworth': 2296, 'Governor_General': 2297, 'Hollingworth': 2298, 'Howard': 2299, 'Opposition': 2300, 'Reverend': 2301, 'Simon': 2302, 'Tim': 2303, 'abuse': 2304, 'anglican': 2305, 'church': 2306, 'clarification': 2307, 'clarify': 2308, 'criticism': 2309, 'explanation': 2310, 'handling': 2311, 'having': 2312, 'hear': 2313, 'import': 2314, 'inconsistancie': 2315, 'insufficient': 2316, 'legal_advice': 2317, 'point': 2318, 'profile': 2319, 'school': 2320, 'sexual': 2321, 'terribly': 2322, 'unclear': 2323, 'understand': 2324, 'Battalion': 2325, 'Cosgrove': 2326, 'East_Timor': 2327, 'Force': 2328, 'Lieutenant': 2329, 'Regiment': 2330, 'breach': 2331, 'comment': 2332, 'company': 2333, 'discharge': 2334, 'guess': 2335, 'link': 2336, 'respect': 2337, 'responsibility': 2338, 'skill': 2339, 'unreasonable': 2340, 'Fernando': 2341, 'Fund': 2342, 'Monetary': 2343, 'People': 2344, 'Peronists': 2345, 'Rua': 2346, 'austerity': 2347, 'brink': 2348, 'crumble': 2349, 'de': 2350, 'devaluation': 2351, 'la': 2352, 'nationwide': 2353, 'policy': 2354, 'riot': 2355, 'rioting': 2356, 'unity': 2357, 'Commission': 2358, 'Industrial_Relations': 2359, 'Qantas': 2360, 'Qantas_maintenance': 2361, 'disruption': 2362, 'employee': 2363, 'freeze': 2364, 'stoppage': 2365, 'union': 2366, 'unresolved': 2367, 'Bayu': 2368, 'Darwin': 2369, 'Ministers': 2370, 'Petroleum': 2371, 'Phillips': 2372, 'Timor': 2373, 'Undan': 2374, 'development': 2375, 'fiscal': 2376, 'gas': 2377, 'gasfield': 2378, 'indefinitely': 2379, 'multi': 2380, 'pipeline': 2381, 'ratify': 2382, 'regime': 2383, 'revenue': 2384, 'tax': 2385, 'ABC': 2386, 'Abu': 2387, 'Butler': 2388, 'Hussein': 2389, 'Iraq': 2390, 'Mohammad': 2391, 'Radio': 2392, 'Saddam': 2393, 'United_Nations': 2394, 'apply': 2395, 'authenticity': 2396, 'biological': 2397, 'bomb': 2398, 'chemical': 2399, 'decade': 2400, 'defector': 2401, 'discuss': 2402, 'engineer': 2403, 'exact': 2404, 'factory': 2405, 'goodness': 2406, 'iraqi': 2407, 'manufacture': 2408, 'name': 2409, 'pinpoint': 2410, 'plant': 2411, 'read': 2412, 'real': 2413, 'residency': 2414, 'secret': 2415, 'stuff': 2416, 'substitute': 2417, 'true': 2418, 'Beattie': 2419, 'Gabriel': 2420, 'Gold': 2421, 'Premier': 2422, 'Rome': 2423, 'abscond': 2424, 'cranky': 2425, 'facility': 2426, 'handle': 2427, 'health': 2428, 'killer': 2429, 'mental': 2430, 'mentally': 2431, 'overseas': 2432, 'pretty': 2433, 'stab': 2434, 'unhappy': 2435, 'Baz': 2436, 'Crowe': 2437, 'Ewen': 2438, 'Globe': 2439, 'Golden': 2440, 'Kidman': 2441, 'Luhrmann': 2442, 'McGregor': 2443, 'Moulin': 2444, 'Nicole': 2445, 'Ron': 2446, 'Rouge': 2447, 'Russell': 2448, 'actress': 2449, 'category': 2450, 'comedy': 2451, 'genius': 2452, 'math': 2453, 'musical': 2454, 'nomination': 2455, 'original': 2456, 'picture': 2457, 'portrayal': 2458, 'song': 2459, 'thriller': 2460, 'troubled': 2461, 'vie': 2462, 'campaigner': 2463, 'evidence': 2464, 'judgment': 2465, 'knowledge': 2466, 'ridiculous': 2467, 'unfounded': 2468, 'Alexander_Downer': 2469, 'Assistance': 2470, 'Bonn': 2471, 'Canada': 2472, 'Charter': 2473, 'Germany': 2474, 'ISAF': 2475, 'Italy': 2476, 'NATO': 2477, 'Security': 2478, 'Turkey': 2479, 'UN': 2480, 'VII': 2481, 'anti_taliban': 2482, 'authorise': 2483, 'broker': 2484, 'chapter': 2485, 'contingent': 2486, 'deploy': 2487, 'duty': 2488, 'environ': 2489, 'eventually': 2490, 'initial': 2491, 'landmark': 2492, 'mandate': 2493, 'multinational': 2494, 'peacekeeping': 2495, 'possibly': 2496, 'principle': 2497, 'renewal': 2498, 'resolution': 2499, 'subject': 2500, 'swear': 2501, 'transitional': 2502, 'unanimously': 2503, 'vanguard': 2504, 'vote': 2505, 'warfare': 2506, 'Bush': 2507, 'LAT': 2508, 'President_Bush': 2509, 'President_George': 2510, 'UTN': 2511, 'W_Bush': 2512, 'asset': 2513, 'assist': 2514, 'atomic': 2515, 'crackdown': 2516, 'hungry': 2517, 'needy': 2518, 'nuclear': 2519, 'stateless': 2520, 'Aires': 2521, 'Brazil': 2522, 'Buenos': 2523, 'Chile': 2524, 'IMF': 2525, 'acceptable': 2526, 'approve': 2527, 'decree': 2528, 'demonstration': 2529, 'discredited': 2530, 'falter': 2531, 'food': 2532, 'gathering': 2533, 'heavily': 2534, 'infect': 2535, 'looting': 2536, 'neighbour': 2537, 'palace': 2538, 'presidential': 2539, 'protester': 2540, 'social': 2541, 'unemployed': 2542, 'unrest': 2543, '1:00am': 2544, 'ACDT': 2545, 'Alamein': 2546, 'Augusta': 2547, 'El': 2548, 'Port': 2549, 'Woomera': 2550, 'Woomera_Detention': 2551, 'arson': 2552, 'bill': 2553, 'centre': 2554, 'experience': 2555, 'extra': 2556, 'load': 2557, 'minibus': 2558, 'outback': 2559, 'outside': 2560, 'quiet': 2561, 'recent': 2562, 'vandalism': 2563, 'Crompton': 2564, 'Malcolm': 2565, 'Privacy': 2566, 'accuracy': 2567, 'assure': 2568, 'code': 2569, 'collect': 2570, 'comply': 2571, 'consent': 2572, 'consumer': 2573, 'credit': 2574, 'culture': 2575, 'data': 2576, 'datum': 2577, 'doctor': 2578, 'enforcement': 2579, 'example': 2580, 'identifiable': 2581, 'intend': 2582, 'majority': 2583, 'marketing': 2584, 'medical': 2585, 'not': 2586, 'oblige': 2587, 'personal': 2588, 'pharmaceutical': 2589, 'privacy': 2590, 'private': 2591, 'purpose': 2592, 'research': 2593, 'sector': 2594, 'solution': 2595, 'vast': 2596, '3:00am': 2597, '6:00pm': 2598, 'AFP': 2599, 'African': 2600, 'Bellis': 2601, 'Brian': 2602, 'Broadcasting': 2603, 'Corporation': 2604, 'Johan': 2605, 'Kolonnade': 2606, 'Marius': 2607, 'Pieterse': 2608, 'Pretoria': 2609, 'adult': 2610, 'ambulance': 2611, 'bystander': 2612, 'casualty': 2613, 'cordone': 2614, 'dog': 2615, 'duPlessis': 2616, 'dust': 2617, 'fine': 2618, 'floor': 2619, 'glass': 2620, 'helicoptere': 2621, 'leak': 2622, 'local_time': 2623, 'locate': 2624, 'mall': 2625, 'manager': 2626, 'news_agency': 2627, 'partially': 2628, 'pray': 2629, 'quote': 2630, 'rink': 2631, 'roof': 2632, 'roofing': 2633, 'rubble': 2634, 'scream': 2635, 'seriously': 2636, 'shopping': 2637, 'skate': 2638, 'skating': 2639, 'son': 2640, 'square': 2641, 'stretcher': 2642, 'watch': 2643, 'witness': 2644, 'Action': 2645, 'Commonwealth': 2646, 'Fiji': 2647, 'Harare': 2648, 'Kingdom': 2649, 'Ministerial': 2650, 'United': 2651, 'Zimbabwe': 2652, 'consistent': 2653, 'declaration': 2654, 'democracy': 2655, 'democratic': 2656, 'farm': 2657, 'formally': 2658, 'invasion': 2659, 'lift': 2660, 'observer': 2661, 'own': 2662, 'response': 2663, 'suspension': 2664, 'table': 2665, 'upcoming': 2666, 'wait': 2667, 'white': 2668, 'Abdel': 2669, 'Beitunya': 2670, 'Gaza_City': 2671, 'Haifa': 2672, 'Jerusalem': 2673, 'Nablus': 2674, 'Rantissi': 2675, 'Tira': 2676, 'al': 2677, 'autonomous': 2678, 'calm': 2679, 'dawn': 2680, 'defy': 2681, 'failure': 2682, 'joint': 2683, 'knife': 2684, 'pre': 2685, 'rupture': 2686, 'suicide_bomber': 2687, 'tactical': 2688, 'withdrawal': 2689, 'August': 2690, 'Authority': 2691, 'Bakes': 2692, 'Bonney': 2693, 'David_Hicks': 2694, 'Defence_Minister': 2695, 'Fahim': 2696, 'Jirga': 2697, 'Lake': 2698, 'Loya': 2699, 'Nudist': 2700, 'Pelican': 2701, 'Qasim': 2702, 'Resort': 2703, 'Rex': 2704, 'Riverland': 2705, 'Robert_Hill': 2706, 'Senator_Hill': 2707, 'Tunick': 2708, 'assembly': 2709, 'defence': 2710, 'define': 2711, 'depression': 2712, 'diplomat': 2713, 'disagreement': 2714, 'draft': 2715, 'elder': 2716, 'fight_alongside': 2717, 'footstep': 2718, 'inaugurate': 2719, 'interest': 2720, 'maintenance': 2721, 'nude': 2722, 'nudist': 2723, 'owner': 2724, 'participate': 2725, 'photo': 2726, 'photograph': 2727, 'priest': 2728, 'relationship': 2729, 'representative': 2730, 'resort': 2731, 'unfortunately': 2732, 'whereabouts': 2733, 'wife': 2734, 'Jim': 2735, 'Office': 2736, 'Soorley': 2737, 'advocate': 2738, 'answer': 2739, 'bishop': 2740, 'catholic': 2741, 'child_sex': 2742, 'christian': 2743, 'incumbent': 2744, 'regard': 2745, 'shepherd': 2746, 'tarnish': 2747, 'tradition': 2748, 'unanswered': 2749, 'wash': 2750, 'Christian': 2751, 'Jesus': 2752, 'chant': 2753, 'coalition': 2754, 'collective': 2755, 'compassion': 2756, 'compound': 2757, 'current': 2758, 'daytime': 2759, 'department': 2760, 'deserve': 2761, 'detention': 2762, 'fencing': 2763, 'heighten': 2764, 'intake': 2765, 'maker': 2766, 'mutilate': 2767, 'nursing': 2768, 'refugee': 2769, 'religious': 2770, 'reveal': 2771, 'surely': 2772, 'voice': 2773, 'zone': 2774, 'crack': 2775, 'detain': 2776, 'outlaw': 2777, 'palestinian_security': 2778, 'wing': 2779, 'Cavallo': 2780, 'Domingo': 2781, 'Economy': 2782, 'discontent': 2783, 'disperse': 2784, 'news': 2785, 'officially': 2786, 'outlet': 2787, 'popular': 2788, 'teargas': 2789, 'widespread': 2790, 'ASIC': 2791, 'April': 2792, 'Archer': 2793, 'Beaumont': 2794, 'Exchange': 2795, 'Gillies': 2796, 'Harris': 2797, 'Helen': 2798, 'Helicopters': 2799, 'Hodgson': 2800, 'Investment': 2801, 'Mrs': 2802, 'Prosecutions': 2803, 'Public': 2804, 'Safety': 2805, 'Scarfe': 2806, 'Securities': 2807, 'Shirley': 2808, 'Stock': 2809, 'Strachan': 2810, 'Stray': 2811, 'Tongue': 2812, 'Transport': 2813, 'accident': 2814, 'advise': 2815, 'air': 2816, 'attribute': 2817, 'blue': 2818, 'boom': 2819, 'buyout': 2820, 'connect': 2821, 'count': 2822, 'court': 2823, 'decline': 2824, 'dishonestly': 2825, 'executive': 2826, 'false': 2827, 'finding': 2828, 'flap': 2829, 'management': 2830, 'navigation': 2831, 'painful': 2832, 'personality': 2833, 'phenomenon': 2834, 'pilot': 2835, 'receivership': 2836, 'retailer': 2837, 'retention': 2838, 'rotor': 2839, 'solo': 2840, 'store': 2841, 'struggle': 2842, 'student': 2843, 'tail': 2844, 'thorough': 2845, 'trainer': 2846, 'training': 2847, 'turbulence': 2848, 'uncommon': 2849, 'B-52': 2850, 'Beveridge': 2851, 'Camp': 2852, 'Confederation': 2853, 'Coroner': 2854, 'Defence_Secretary': 2855, 'Formula': 2856, 'Graeme': 2857, 'Graham': 2858, 'Grand': 2859, 'Hoon': 2860, 'Jalalabad': 2861, 'Johnston': 2862, 'Mazar': 2863, 'Motor': 2864, 'Prix': 2865, 'QC': 2866, 'Ray': 2867, 'Rhino': 2868, 'Ross': 2869, 'Sharif': 2870, 'Sports': 2871, 'advance': 2872, 'air_strike': 2873, 'alter': 2874, 'assurance': 2875, 'bomber': 2876, 'circle': 2877, 'clearing': 2878, 'coroner': 2879, 'corporation': 2880, 'debris': 2881, 'determine': 2882, 'engineering': 2883, 'expert': 2884, 'fence': 2885, 'forthcoming': 2886, 'hunt': 2887, 'independent': 2888, 'indicate': 2889, 'infantry': 2890, 'interrogation': 2891, 'investigate': 2892, 'involvement': 2893, 'judicial': 2894, 'legal': 2895, 'logistical': 2896, 'marine': 2897, 'marshal': 2898, 'marshall': 2899, 'nature': 2900, 'offshore': 2901, 'outline': 2902, 'pende': 2903, 'reasonably': 2904, 'recommendation': 2905, 'represent': 2906, 'ship': 2907, 'transfer': 2908, 'Star': 2909, 'actually': 2910, 'cannon': 2911, 'commit': 2912, 'relatively': 2913, 'standby': 2914, 'stitch': 2915, 'torching': 2916, 'Abuse': 2917, 'Alliance': 2918, 'Anglican': 2919, 'Child': 2920, 'Generals': 2921, 'Governor': 2922, 'Hetty': 2923, 'Senate': 2924, 'Sexual': 2925, 'advice': 2926, 'alleged': 2927, 'anti': 2928, 'consideration': 2929, 'differently': 2930, 'explain': 2931, 'hindsight': 2932, 'hollow': 2933, 'insurance': 2934, 'intervene': 2935, 'jeopardise': 2936, 'obligation': 2937, 'page': 2938, 'personally': 2939, 'queensland': 2940, 'refute': 2941, 'rely': 2942, 'sensitivity': 2943, 'version': 2944, 'voracity': 2945, 'wisdom': 2946, 'Australians': 2947, 'Battle': 2948, 'Belgium': 2949, 'Flanders': 2950, 'Graves': 2951, 'July': 2952, 'Pilkem': 2953, 'Ridge': 2954, 'War': 2955, 'Ypres': 2956, 'battlefield': 2957, 'bitumen': 2958, 'bypass': 2959, 'cemetery': 2960, 'flemish': 2961, 'graveyard': 2962, 'maintain': 2963, 'motorway': 2964, 'overrun': 2965, 'propose': 2966, 'route': 2967, 'serviceman': 2968, 'ALP': 2969, 'Carmen': 2970, 'Fremantle': 2971, 'Generations': 2972, 'Labor': 2973, 'Lawrence': 2974, 'MP': 2975, 'Member': 2976, 'Stolen': 2977, 'annoy': 2978, 'behalf': 2979, 'carefully': 2980, 'compromise': 2981, 'differentiate': 2982, 'extinguishment': 2983, 'forward': 2984, 'momentum': 2985, 'native': 2986, 'obliterate': 2987, 'option': 2988, 'poll': 2989, 'section': 2990, 'sympathy': 2991, 'Agence': 2992, 'Fatah': 2993, 'Presse': 2994, 'Tuesday': 2995, 'air_raid': 2996, 'anonymous': 2997, 'avow': 2998, 'bus': 2999, 'condemnation': 3000, 'crowded': 3001, 'date': 3002, 'faction': 3003, 'focus': 3004, 'grisly': 3005, 'guerrilla': 3006, 'inform': 3007, 'likewise': 3008, 'martyrdom': 3009, 'one': 3010, 'operative': 3011, 'provoke': 3012, 'revert': 3013, 'roadside': 3014, 'sever': 3015, 'strategic': 3016, 'sweeping': 3017, 'trigger': 3018, 'Bangladesh': 3019, 'Barbados': 3020, 'Botswana': 3021, 'CMAG': 3022, 'Malaysia': 3023, 'Mugabe': 3024, 'Nigeria': 3025, 'Robert': 3026, 'agenda': 3027, 'cmag': 3028, 'coup': 3029, 'discussion': 3030, 'governance': 3031, 'informal': 3032, 'recognise': 3033, 'restrict': 3034, 'review': 3035, 'violate': 3036, 'violent': 3037, 'watchdog': 3038, '10:00pm': 3039, '11:30am': 3040, '5:00am': 3041, '9:00am': 3042, 'Assembly': 3043, 'Bill': 3044, 'Deputy': 3045, 'House': 3046, 'Kons': 3047, 'Legislative': 3048, 'Lennon': 3049, 'Llewellyn': 3050, 'Paul': 3051, 'Polley': 3052, 'Speaker': 3053, 'Steven': 3054, 'abortion': 3055, 'abstain': 3056, 'amendment': 3057, 'apart': 3058, 'backbencher': 3059, 'counselling': 3060, 'debate': 3061, 'marathon': 3062, 'medically': 3063, 'opinion': 3064, 'reading': 3065, 'retrospective': 3066, 'specialist': 3067, 'unchanged': 3068, 'write': 3069, 'Bangalore': 3070, 'England': 3071, 'Englishmen': 3072, 'Immigration_Minister': 3073, 'Indians': 3074, 'Philip_Ruddock': 3075, 'Ramprakash': 3076, 'disappoint': 3077, 'disappointed': 3078, 'disturbance': 3079, 'fielder': 3080, 'flick': 3081, 'minor': 3082, 'red': 3083, 'tear': 3084, 'ASIO': 3085, 'Arabian': 3086, 'Attorney_General': 3087, 'Daryl_Williams': 3088, 'Federal_Police': 3089, 'Northern_Alliance': 3090, 'Peleliu': 3091, 'USS': 3092, 'aboard': 3093, 'appropriate': 3094, 'australian_authority': 3095, 'context': 3096, 'imagine': 3097, 'interrogate': 3098, 'Mujaheddin': 3099, 'Wolfowitz': 3100, 'commando': 3101, 'contribution': 3102, 'doubt': 3103, 'focused': 3104, 'foremost': 3105, 'haggle': 3106, 'hilltop': 3107, 'local_afghan': 3108, 'scour': 3109, 'spearhead': 3110, 'tribal': 3111, 'unsure': 3112, 'Aviation': 3113, 'CASA': 3114, 'Civil': 3115, 'Gulf': 3116, 'Pawlik': 3117, 'Teresa': 3118, 'Wal': 3119, 'Whyalla': 3120, 'conservative': 3121, 'daughter': 3122, 'digest': 3123, 'ditch': 3124, 'encourage': 3125, 'engine': 3126, 'error': 3127, 'fatal': 3128, 'feature': 3129, 'formal': 3130, 'husband': 3131, 'improvement': 3132, 'lean': 3133, 'leaning': 3134, 'mechanical': 3135, 'mixture': 3136, 'operating': 3137, 'operator': 3138, 'piston': 3139, 'powered': 3140, 'regulation': 3141, 'relieve': 3142, 'Dale': 3143, 'Detention_Centre': 3144, 'Dick': 3145, 'Don': 3146, 'February': 3147, 'Juvenile': 3148, 'Wallace': 3149, 'aboriginal': 3150, 'alternative': 3151, 'compression': 3152, 'conference': 3153, 'custodial': 3154, 'diversionary': 3155, 'eligible': 3156, 'instead': 3157, 'lawyer': 3158, 'lonely': 3159, 'negelected': 3160, 'offend': 3161, 'orphan': 3162, 'prosecutor': 3163, 'stationery': 3164, 'undertake': 3165, 'year_ago': 3166, 'Mentha': 3167, 'administrator': 3168, 'complexity': 3169, 'difficulty': 3170, 'entitlement': 3171, 'flow': 3172, 'negotiation': 3173, 'opt': 3174, 'pleased': 3175, 'prefer': 3176, 'prevail': 3177, 'redundancy': 3178, 'remainder': 3179, 'sheer': 3180, 'size': 3181, 'Lavarch': 3182, 'Law': 3183, 'absolutely': 3184, 'attorney': 3185, 'draconian': 3186, 'essential': 3187, 'guarantee': 3188, 'safeguard': 3189, 'secretary': 3190, 'Dominic': 3191, 'Ernst': 3192, 'Federa': 3193, 'Gibbons': 3194, 'HIH': 3195, 'Kim': 3196, 'Smith': 3197, 'Young': 3198, 'accountancy': 3199, 'chairman': 3200, 'colleague': 3201, 'commission': 3202, 'creditor': 3203, 'finance': 3204, 'firm': 3205, 'insolvency': 3206, 'note': 3207, 'restructuring': 3208, 'shortage': 3209, 'Bond': 3210, 'Steve_Waugh': 3211, 'advantage': 3212, 'aspect': 3213, 'batting': 3214, 'convincingly': 3215, 'definitely': 3216, 'dominance': 3217, 'fast_bowler': 3218, 'intimidatory': 3219, 'learn': 3220, 'lightly': 3221, 'love': 3222, 'output': 3223, 'professional': 3224, 'tailender': 3225, 'unsportsmanlike': 3226, 'warm': 3227, 'whichever': 3228, 'Breuer': 3229, 'Channel': 3230, 'Lyn': 3231, 'abate': 3232, 'ath': 3233, 'billow': 3234, 'canister': 3235, 'complaint': 3236, 'damaging': 3237, 'electorate': 3238, 'engage': 3239, 'hurt': 3240, 'midnight': 3241, 'pall': 3242, 'particular': 3243, 'plea': 3244, 'plume': 3245, 'scene': 3246, 'subdue': 3247, 'ultimate': 3248, 'Affairs': 3249, 'Airline': 3250, 'Bennett': 3251, 'Board': 3252, 'Burgess': 3253, 'Faulkner': 3254, 'Federation': 3255, 'Home': 3256, 'Intelligence': 3257, 'Organisation': 3258, 'Representatives': 3259, 'Warren': 3260, 'adequate': 3261, 'afford': 3262, 'combat': 3263, 'cope': 3264, 'examination': 3265, 'examine': 3266, 'labor': 3267, 'legislation': 3268, 'obstruct': 3269, 'parliamentary': 3270, 'passage': 3271, 'postpone': 3272, 'significantly': 3273, 'unnecessarily': 3274, 'unprecedented': 3275, 'upgrade': 3276, 'virtually': 3277, 'wary': 3278, 'Brussels': 3279, 'Fawzi': 3280, 'Ocean': 3281, 'Rumsfeld': 3282, 'administrative': 3283, 'coffee': 3284, 'depend': 3285, 'destruction': 3286, 'enemy': 3287, 'exceptional': 3288, 'freedom': 3289, 'increasingly': 3290, 'instant': 3291, 'necessary': 3292, 'nexus': 3293, 'preview': 3294, 'commodity': 3295, 'considerable': 3296, 'contribute': 3297, 'export': 3298, 'globally': 3299, 'growth': 3300, 'oil': 3301, 'percentage': 3302, 'price': 3303, 'prospect': 3304, 'recovery': 3305, 'robust': 3306, 'stimulus': 3307, 'uncertainty': 3308, 'weakening': 3309, '6:30am': 3310, 'Columbia': 3311, 'Dennehy': 3312, 'Divine': 3313, 'Division': 3314, 'Edward': 3315, 'Fire': 3316, 'Manhattan': 3317, 'Savarese': 3318, 'University': 3319, 'atop': 3320, 'bright': 3321, 'campus': 3322, 'cathedral': 3323, 'ceiling': 3324, 'crane': 3325, 'fireman': 3326, 'gift': 3327, 'hose': 3328, 'immense': 3329, 'noticed': 3330, 'orange': 3331, 'pour': 3332, 'sanctuary': 3333, 'structure': 3334, 'subsequently': 3335, 'tall': 3336, 'thick': 3337, 'Israeli': 3338, 'impossible': 3339, 'numerous': 3340, 'reportedly': 3341, 'tend': 3342, 'Club': 3343, 'Demon': 3344, 'Flower': 3345, 'Football': 3346, 'Gutnick': 3347, 'Joseph': 3348, 'Szondy': 3349, 'Vision': 3350, 'cherry': 3351, 'club': 3352, 'comprehensively': 3353, 'difference': 3354, 'divide': 3355, 'dump': 3356, 'elect': 3357, 'happy': 3358, 'outcome': 3359, 'overwhelmingly': 3360, 'regardless': 3361, 'saviour': 3362, 'settle': 3363, 'ticket': 3364, 'unite': 3365, '11:00pm': 3366, 'accommodation': 3367, 'brigade': 3368, 'computing': 3369, 'deliberate': 3370, 'hall': 3371, 'inhalation': 3372, 'mess': 3373, 'rock': 3374, 'treat': 3375, 'Admiral': 3376, 'Cabinet': 3377, 'Stufflebeem': 3378, 'balancing': 3379, 'battleground': 3380, 'certainty': 3381, 'clue': 3382, 'cooperation': 3383, 'defeat': 3384, 'email': 3385, 'endorse': 3386, 'headquarters': 3387, 'insert': 3388, 'justice': 3389, 'mountainous': 3390, 'mystery': 3391, 'offence': 3392, 'path': 3393, 'pocket': 3394, 'rear': 3395, 'silent': 3396, 'thinking': 3397, 'tunnel': 3398, 'unread': 3399, 'Hicks': 3400, 'different': 3401, 'fight_Taliban': 3402, 'hopeful': 3403, 'Chirac': 3404, 'Hamas_Islamic': 3405, 'Hebron': 3406, 'Jacques': 3407, 'Jihad': 3408, 'Larsen': 3409, 'Middle_East': 3410, 'Roed': 3411, 'Sharon': 3412, 'Terje': 3413, 'address': 3414, 'ambush': 3415, 'exterminate': 3416, 'father': 3417, 'french': 3418, 'hail': 3419, 'holy': 3420, 'invitation': 3421, 'nominally': 3422, 'perpetrator': 3423, 'persistent': 3424, 'shooting': 3425, 'successive': 3426, 'televise': 3427, 'turning': 3428, 'vow': 3429, 'wanted': 3430, 'AMWU': 3431, 'Cameron': 3432, 'Doug': 3433, 'Manufacturing': 3434, 'Workers_Union': 3435, 'arbitrate': 3436, 'arbitration': 3437, 'crush': 3438, 'industrial_action': 3439, 'lock': 3440, 'maintenance_worker': 3441, 'underfoot': 3442, 'wage_freeze': 3443, 'Boeta': 3444, 'Captain': 3445, 'Dippenaar': 3446, 'Gilchrist': 3447, 'Gillespie': 3448, 'Hayward': 3449, 'Makhaya': 3450, 'McGrath': 3451, 'Nanti': 3452, 'Ntini': 3453, 'Oval': 3454, 'Ponting': 3455, 'Warne': 3456, 'ably': 3457, 'apiece': 3458, 'brilliantly': 3459, 'deflect': 3460, 'frustrate': 3461, 'glove': 3462, 'interval': 3463, 'last': 3464, 'lengthy': 3465, 'luncheon': 3466, 'menace': 3467, 'notch': 3468, 'pad': 3469, 'partnership': 3470, 'pop': 3471, 'prove': 3472, 'scorer': 3473, 'session': 3474, 'swinge': 3475, 'toe': 3476, 'unbeaten': 3477, 'yorker': 3478, 'Bradstreet': 3479, 'Dun': 3480, 'GST': 3481, 'expectation': 3482, 'intention': 3483, 'investment': 3484, 'projection': 3485, 'quarter': 3486, 'sale': 3487, 'shed': 3488, 'solid': 3489, 'Aristide': 3490, 'Bertrand': 3491, 'Haiti': 3492, 'Haitian': 3493, 'Jean': 3494, 'Prince': 3495, 'attacker': 3496, 'au': 3497, 'communication': 3498, 'depose': 3499, 'equipment': 3500, 'haitian': 3501, 'residence': 3502, 'seize': 3503, 'Welfare': 3504, 'assistance': 3505, 'homeless': 3506, 'suggest': 3507, 'widely': 3508, 'Kelvin': 3509, 'Phillip': 3510, 'Shadow': 3511, 'Thomson': 3512, 'clean': 3513, 'deterrent': 3514, 'fairy': 3515, 'negligently': 3516, 'penalty': 3517, 'penguin': 3518, 'prosecution': 3519, 'recklessly': 3520, 'spill': 3521, 'volunteer': 3522, '12.55pm': 3523, 'Bon': 3524, 'Bursch': 3525, 'Culbertson': 3526, 'Dan': 3527, 'Dezhurov': 3528, 'Endeavour': 3529, 'Jovi': 3530, 'Kennedy': 3531, 'Mikhail': 3532, 'NASA': 3533, 'Onufrienko': 3534, 'Shuttle': 3535, 'Space': 3536, 'Starshine': 3537, 'Station': 3538, 'Tyurin': 3539, 'Vladimir': 3540, 'Walz': 3541, 'Yuri': 3542, 'astronaut': 3543, 'atmosphere': 3544, 'bay': 3545, 'calculate': 3546, 'cosmonaut': 3547, 'density': 3548, 'dodge': 3549, 'earth': 3550, 'experiment': 3551, 'maneuver': 3552, 'material': 3553, 'orbit': 3554, 'outgoing': 3555, 'payload': 3556, 'piece': 3557, 'scientific': 3558, 'shuttle': 3559, 'soviet': 3560, 'space': 3561, 'tonne': 3562, 'touch': 3563, 'trio': 3564, 'trip': 3565, 'tune': 3566, 'undock': 3567, 'upper': 3568, 'Greenpeace': 3569, 'Lucas': 3570, 'McGauran': 3571, 'Science': 3572, 'banner': 3573, 'demonstrator': 3574, 'entrance': 3575, 'reactor': 3576, 'rush': 3577, 'tower': 3578, 'unfurl': 3579, 'Central': 3580, 'Command': 3581, 'Franks': 3582, 'S': 3583, 'U': 3584, 'accurate': 3585, 'basis': 3586, 'cautious': 3587, 'certain': 3588, 'concede': 3589, 'distinctive': 3590, 'flush': 3591, 'hideout': 3592, 'progress': 3593, 'vicinity': 3594, 'yemeni': 3595, 'Dixon': 3596, 'Executive': 3597, 'briefing': 3598, 'disappointment': 3599, 'disrupt': 3600, 'earning': 3601, 'generous': 3602, 'incentive': 3603, 'nervous': 3604, 'paper': 3605, 'produce': 3606, 'soften': 3607, 'weekly': 3608, 'workforce': 3609, 'Chatelet': 3610, 'Palais': 3611, 'Theatre': 3612, 'ago': 3613, 'arrondissement': 3614, 'asphyxiate': 3615, 'du': 3616, 'elevator': 3617, 'guest': 3618, 'hotel': 3619, 'inferno': 3620, 'obtain': 3621, 'publicly': 3622, 'shaft': 3623, 'spokesperson': 3624, 'standing': 3625, 'teacher': 3626, 'theatre': 3627, 'undamaged': 3628, 'window': 3629, 'Bapak': 3630, 'Eid': 3631, 'Fitr': 3632, 'IV': 3633, 'Indonesia': 3634, 'Suharto': 3635, 'anonymity': 3636, 'appendectomy': 3637, 'argue': 3638, 'attach': 3639, 'aud1.102': 3640, 'breathing': 3641, 'complication': 3642, 'dictator': 3643, 'drip': 3644, 'embezzlle': 3645, 'ex': 3646, 'festival': 3647, 'fund': 3648, 'hospitalise': 3649, 'indonesian': 3650, 'intravenous': 3651, 'mask': 3652, 'oxygen': 3653, 'pacemaker': 3654, 'perform': 3655, 'periodic': 3656, 'repeatedly': 3657, 'rhythm': 3658, 'ruler': 3659, 'rupiah': 3660, 'slight': 3661, 'stroke': 3662, 'unlike': 3663, 'urinary': 3664, 'vice': 3665, 'visitor': 3666, 'Eagle': 3667, 'Ephraim': 3668, 'Freedom': 3669, 'God': 3670, 'Guadalcanal': 3671, 'Isatabu': 3672, 'Islands': 3673, 'Kemakeza': 3674, 'Malaita': 3675, 'Movement': 3676, 'Peace': 3677, 'Sneh': 3678, 'Solomon': 3679, 'Tanzin': 3680, 'Townsville': 3681, 'announcement': 3682, 'ballot': 3683, 'consist': 3684, 'defiance': 3685, 'disarm': 3686, 'disarmament': 3687, 'effectively': 3688, 'enormous': 3689, 'ethnic': 3690, 'forcefully': 3691, 'grass': 3692, 'grouping': 3693, 'newly': 3694, 'overhaul': 3695, 'priority': 3696, 'root': 3697, 'scepticism': 3698, 'sincerely': 3699, 'solomons': 3700, 'success': 3701, 'trust': 3702, 'Gibbs': 3703, 'Hershelle': 3704, 'Kirsten': 3705, 'balloon': 3706, 'bowl': 3707, 'impressive': 3708, 'keeper': 3709, 'opposite': 3710, 'pavilion': 3711, 'safely': 3712, 'six': 3713, 'Baghram': 3714, 'Haji': 3715, 'Zaman': 3716, 'alongside': 3717, 'comb': 3718, 'coordinate': 3719, 'deeply': 3720, 'drift': 3721, 'forest': 3722, 'fugitive': 3723, 'hill': 3724, 'intercept': 3725, 'reform': 3726, 'scatter': 3727, 'shelter': 3728, 'special': 3729, 'swoop': 3730, 'villager': 3731, 'White': 3732, 'broadcast': 3733, 'commitment': 3734, 'disarray': 3735, 'exert': 3736, 'inability': 3737, 'keenly': 3738, 'pretext': 3739, 'primarily': 3740, 'productive': 3741, 'react': 3742, 'wage': 3743, 'grounding': 3744, 'inevitable': 3745, 'assumption': 3746, 'burden': 3747, 'demand': 3748, 'detailed': 3749, 'exaggerate': 3750, 'healthy': 3751, 'mainly': 3752, 'shrink': 3753, 'unsustainable': 3754, 'voluntary': 3755, 'Energy': 3756, 'Latrobe': 3757, 'Lindsay': 3758, 'Striking': 3759, 'Terry': 3760, 'Ward': 3761, 'Yallourn': 3762, 'blackout': 3763, 'coal': 3764, 'down': 3765, 'generation': 3766, 'interrupt': 3767, 'interruption': 3768, 'mining': 3769, 'modernise': 3770, 'ongoing': 3771, 'sell': 3772, 'Batholomew': 3773, 'Blunkett': 3774, 'Oti': 3775, 'Patterson': 3776, 'Payne': 3777, 'Rebekah': 3778, 'Rini': 3779, 'Roy': 3780, 'Sarah': 3781, 'Snyder': 3782, "Ulufa'alu": 3783, 'Wade': 3784, 'Whiting': 3785, 'Worl': 3786, 'beer': 3787, 'choose': 3788, 'cigarette': 3789, 'convict': 3790, 'conviction': 3791, 'editor': 3792, 'importer': 3793, 'monitor': 3794, 'murder': 3795, 'paedophile': 3796, 'parlous': 3797, 'publish': 3798, 'register': 3799, 'remission': 3800, 'resumption': 3801, 'rumour': 3802, 'shame': 3803, 'submit': 3804, 'supervision': 3805, 'tabloid': 3806, 'uproar': 3807, 'vigilante': 3808, 'wrong': 3809, 'career': 3810, 'deteriorate': 3811, 'theirs': 3812, '5:30': 3813, 'Ahmad': 3814, 'Arabic': 3815, 'Center': 3816, 'Dr.': 3817, 'English': 3818, 'al-(Khair': 3819, 'amateur': 3820, 'calculation': 3821, 'entire': 3822, 'hijack': 3823, 'iron': 3824, 'notification': 3825, 'overjoy': 3826, 'planning': 3827, 'pm': 3828, 'roughly': 3829, 'segment': 3830, 'sheikh': 3831, 'tape': 3832, 'topple': 3833, 'transcript': 3834, 'translate': 3835, 'unrelated': 3836, 'video': 3837, 'videotape': 3838, 'bombard': 3839, 'climax': 3840, 'imminent': 3841, 'intensify': 3842, 'retreat': 3843, 'truckload': 3844, 'aware': 3845, 'background': 3846, 'join': 3847, 'motivate': 3848, 'ought': 3849, 'psychiatrist': 3850, 'psychologist': 3851, 'Kashmiris': 3852, 'MJC': 3853, 'Muttahida': 3854, 'achieve': 3855, 'alliance': 3856, 'attention': 3857, 'connection': 3858, 'motive': 3859, 'Associate': 3860, 'Care': 3861, 'Children': 3862, 'Diseases': 3863, 'Human': 3864, 'Infectious': 3865, 'Intensive': 3866, 'Microbiology': 3867, 'Turnidge': 3868, 'Unit': 3869, 'Women': 3870, 'antibiotic': 3871, 'bacteria': 3872, 'bacterial': 3873, 'bug': 3874, 'infected': 3875, 'microbiologist': 3876, 'pseudomona': 3877, 'resistant': 3878, 'select': 3879, 'underway': 3880, 'Apache': 3881, 'Egypt': 3882, 'Rafah': 3883, 'Ramallah': 3884, 'apache': 3885, 'backing': 3886, 'blast': 3887, 'bombshell': 3888, 'bulldozer': 3889, 'bus_ambush': 3890, 'cloud': 3891, 'confrontation': 3892, 'crossing': 3893, 'darkness': 3894, 'dominate': 3895, 'dynamite': 3896, 'f-16': 3897, 'forfeit': 3898, 'gunship': 3899, 'hardliner': 3900, 'institution': 3901, 'irrelevant': 3902, 'lynch': 3903, 'missile': 3904, 'mob': 3905, 'nationalist': 3906, 'partly': 3907, 'plough': 3908, 'plunge': 3909, 'pound': 3910, 'reoccupie': 3911, 'rocket': 3912, 'secular': 3913, 'symbolic': 3914, 'undoubtedly': 3915, 'weaken': 3916, 'counterpart': 3917, 'dad': 3918, 'inspection': 3919, 'lineup': 3920, 'milestone': 3921, 'mum': 3922, 'relevance': 3923, 'significance': 3924, 'twin': 3925, 'Act': 3926, 'Corcoran': 3927, 'Crimes': 3928, 'Incursion': 3929, 'Kevin': 3930, 'Rudd': 3931, 'Victorian': 3932, 'ascertain': 3933, 'consular': 3934, 'fate': 3935, 'immediate': 3936, 'maximum': 3937, 'nationality': 3938, 'permit': 3939, 'punishable': 3940, 'treason': 3941, 'vary': 3942, 'Gissin': 3943, 'Raanan': 3944, 'directive': 3945, 'directly': 3946, 'expel': 3947, 'imply': 3948, 'quarrel': 3949, 'responsible': 3950, 'spell': 3951, 'Abbott': 3952, 'Employment': 3953, 'degree': 3954, 'encouraging': 3955, 'historical': 3956, 'participation': 3957, 'seeker': 3958, 'unemployment': 3959, 'volatility': 3960, 'ANZ': 3961, 'Derrick': 3962, 'Finance': 3963, 'Lording': 3964, 'Sector': 3965, 'Westpac': 3966, 'bargaining': 3967, 'branch': 3968, 'coincide': 3969, 'contingency': 3970, 'enterprise': 3971, 'industry': 3972, 'initiative': 3973, 'negotiating': 3974, 'overtime': 3975, 'pr': 3976, 'stunt': 3977, 'suit': 3978, 'unjustified': 3979, 'unpaid': 3980, 'Bob': 3981, 'Building': 3982, 'CFMEU': 3983, 'Collins': 3984, 'Construction': 3985, 'Forestry': 3986, 'Hawke': 3987, 'Industry': 3988, 'Kingham': 3989, 'Martin': 3990, 'Mining': 3991, 'Neville': 3992, 'Place': 3993, 'Senior': 3994, 'Square': 3995, 'Wran': 3996, 'astound': 3997, 'cfmeu': 3998, 'construction': 3999, 'demonstrate': 4000, 'egg': 4001, 'guidance': 4002, 'intersection': 4003, 'overwhelm': 4004, 'properly': 4005, 'slogan': 4006, 'standover': 4007, 'suck': 4008, 'teach': 4009, 'unfairly': 4010, 'venue': 4011, 'view': 4012, 'Al': 4013, 'Aqsa': 4014, 'Brigades': 4015, 'Chairman': 4016, 'Cullinan': 4017, 'Daryl': 4018, 'Ezzedin': 4019, 'Hezbollah': 4020, 'Immanuel': 4021, 'Lebanon': 4022, 'Muslim': 4023, 'Qassam': 4024, 'activist': 4025, 'belong': 4026, 'condemn': 4027, 'differ': 4028, 'flatten': 4029, 'infrastructure': 4030, 'jointly': 4031, 'middle': 4032, 'moment': 4033, 'nick': 4034, 'offshoot': 4035, 'out': 4036, 'pegging': 4037, 'revenge': 4038, 'selection': 4039, 'shiite': 4040, 'simultaneous': 4041, 'swift': 4042, 'underdogs': 4043, 'urgent': 4044, '5:30am': 4045, 'Diego': 4046, 'Garcia': 4047, 'Hanifi': 4048, 'Islamic': 4049, 'Karen': 4050, 'Kosovo': 4051, 'Liberation': 4052, 'Mountains': 4053, 'Navy': 4054, 'Nick': 4055, 'Salisbury': 4056, 'Society': 4057, 'Taylor': 4058, 'Timothy': 4059, 'Walli': 4060, 'b-1': 4061, 'basic': 4062, 'clearly': 4063, 'deadline': 4064, 'destroyer': 4065, 'district': 4066, 'follower': 4067, 'frontline': 4068, 'journalist': 4069, 'mate': 4070, 'trench': 4071, 'unable': 4072, 'usually': 4073, 'Geneva': 4074, 'High': 4075, 'Lubbers': 4076, 'Refugees': 4077, 'Ruddock': 4078, 'Ruud': 4079, 'alsosaid': 4080, 'contributor': 4081, 'convention': 4082, 'cooperate': 4083, 'immigrant': 4084, 'legalistic': 4085, 'mistrust': 4086, 'openly': 4087, 'sharing': 4088, 'Jodee': 4089, 'Rich': 4090, 'Tel': 4091, 'baseless': 4092, 'discrepancy': 4093, 'insolvent': 4094, 'known': 4095, 'manage': 4096, 'mislead': 4097, 'opportunity': 4098, 'proceeding': 4099, 'Crown': 4100, 'Curtis': 4101, 'Ford': 4102, 'June': 4103, 'Langdale': 4104, 'Lewes': 4105, 'Prosecutor': 4106, 'Sierra': 4107, 'abduct': 4108, 'abduction': 4109, 'cheer': 4110, 'clothe': 4111, 'convince': 4112, 'cunning': 4113, 'defendant': 4114, 'delight': 4115, 'dirty': 4116, 'disappear': 4117, 'door': 4118, 'foreman': 4119, 'girl': 4120, 'glib': 4121, 'graphic': 4122, 'guilty': 4123, 'imprisonment': 4124, 'indecent': 4125, 'judge': 4126, 'jury': 4127, 'kidnap': 4128, 'liar': 4129, 'mouth': 4130, 'naked': 4131, 'parole': 4132, 'probation': 4133, 'prompt': 4134, 'rampage': 4135, 'rope': 4136, 'schoolgirl': 4137, 'shock': 4138, 'shut': 4139, 'strangle': 4140, 'suffocate': 4141, 'terrified': 4142, 'unanimous': 4143, 'unwell': 4144, 'verdict': 4145, 'AFL': 4146, 'Lockett': 4147, 'Swans': 4148, 'comeback': 4149, 'comfortable': 4150, 'kicker': 4151, 'media': 4152, 'retirement': 4153, 'Air': 4154, 'Chiefs': 4155, 'Cutter': 4156, 'Daisy': 4157, 'Joint': 4158, 'Myers': 4159, 'desire': 4160, 'flurry': 4161, 'havoc': 4162, 'kilogram': 4163, 'wreak': 4164, 'Ashcroft': 4165, 'Moussaoui': 4166, 'Zaccarias': 4167, 'active': 4168, 'conspiracy': 4169, 'dramatically': 4170, 'hi': 4171, 'hijacker': 4172, 'hijacking': 4173, 'indictment': 4174, 'lesson': 4175, 'moroccan': 4176, 'participant': 4177, 'pledge': 4178, 'retool': 4179, 'suicide': 4180, 'tech': 4181, 'Adventure_World': 4182, 'Alissa': 4183, 'Bendigo': 4184, 'Billy': 4185, 'Interlaken': 4186, 'Kylie': 4187, 'Morrow': 4188, 'Peel': 4189, 'Redmond': 4190, 'Scott': 4191, 'accountable': 4192, 'accrue': 4193, 'acquit': 4194, 'adventure': 4195, 'canyon': 4196, 'canyoning': 4197, 'compensate': 4198, 'culpable': 4199, 'forever': 4200, 'jail_sentence': 4201, 'loved': 4202, 'manslaughter': 4203, 'mother': 4204, 'negligence': 4205, 'satisfied': 4206, 'survivor': 4207, 'Agriculture': 4208, 'Truss': 4209, 'USA': 4210, 'boast': 4211, 'degenerate': 4212, 'delegation': 4213, 'dependent': 4214, 'efficient': 4215, 'entrench': 4216, 'especially': 4217, 'farmer': 4218, 'intent': 4219, 'lobby': 4220, 'mentality': 4221, 'obvious': 4222, 'proudly': 4223, 'subsidy': 4224, 'taxpayer': 4225, 'Reith': 4226, 'allocate': 4227, 'boost': 4228, 'distribution': 4229, 'pacific': 4230, 'portfolio': 4231, 'submission': 4232, 'transport': 4233, 'Magna': 4234, 'Mitsubishi': 4235, 'coordination': 4236, 'reaffirm': 4237, 'Farina': 4238, 'Knop': 4239, 'Soccer': 4240, 'Socceroos': 4241, 'coach': 4242, 'contract': 4243, 'excellent': 4244, 'in"kicke': 4245, 'interested': 4246, 'levy': 4247, 'positve': 4248, 'qualifying': 4249, 'Lindh': 4250, 'Walker': 4251, 'mid-1999': 4252, '12:46am': 4253, '8:46am': 4254, 'Ayman': 4255, 'Earth': 4256, 'Grace': 4257, 'Pennsylvania': 4258, 'Rudy': 4259, 'Zawahiri': 4260, 'aerial': 4261, 'airliner': 4262, 'amazing': 4263, 'assemble': 4264, 'bagpiper': 4265, 'cite': 4266, 'co': 4267, 'commandeer': 4268, 'conspirator': 4269, 'crater': 4270, 'daub': 4271, 'egyptian': 4272, 'excavate': 4273, 'fix': 4274, 'flag': 4275, 'forget': 4276, 'gape': 4277, 'grey': 4278, 'haunting': 4279, 'horror': 4280, 'idle': 4281, 'imam': 4282, 'lone': 4283, 'machinery': 4284, 'origin': 4285, 'pause': 4286, 'probe': 4287, 'rabbi': 4288, 'remembrance': 4289, 'rendition': 4290, 'resolute': 4291, 'roll': 4292, 'rural': 4293, 'signature': 4294, 'sing': 4295, 'slice': 4296, 'solemn': 4297, 'steer': 4298, 'strew': 4299, 'sunny': 4300, 'tenor': 4301, 'unindicted': 4302, 'wide': 4303, 'wreckage': 4304, 'A8,800': 4305, 'Abegglen': 4306, 'Balmer': 4307, 'Bernhard': 4308, 'Bradley': 4309, 'Dewar': 4310, 'Felix': 4311, 'Friedli': 4312, 'Gafner': 4313, 'Georg': 4314, 'Hoedle': 4315, 'Mackay': 4316, 'Oehler': 4317, 'Stefan': 4318, 'Stephan': 4319, 'Steuri': 4320, 'Wiget': 4321, 'abide': 4322, 'franc': 4323, 'oh': 4324, 'plaintiff': 4325, 'punishment': 4326, 'Los': 4327, 'Palos': 4328, 'deportation': 4329, 'gang': 4330, 'independence': 4331, 'loyal': 4332, 'massacre': 4333, 'nun': 4334, 'panel': 4335, 'persecution': 4336, 'torture': 4337, 'Committee': 4338, 'Market': 4339, 'Reserve': 4340, 'borrow': 4341, 'borrowing': 4342, 'commercial': 4343, 'discount': 4344, 'expansion': 4345, 'inflation': 4346, 'interest_rate': 4347, 'lower': 4348, 'margin': 4349, 'modest': 4350, 'shockwave': 4351, 'slide': 4352, 'underlie': 4353, 'Statistics': 4354, 'decrease': 4355, 'discern': 4356, 'dividend': 4357, 'drought': 4358, 'drug': 4359, 'equip': 4360, 'heroin': 4361, 'trend': 4362, 'Britt': 4363, 'Helena': 4364, 'Nutrition': 4365, 'Rachael': 4366, 'automation': 4367, 'automobile': 4368, 'childhood': 4369, 'convenience': 4370, 'diagnose': 4371, 'dietitian': 4372, 'disturb': 4373, 'frightening': 4374, 'grave': 4375, 'implication': 4376, 'lifestyle': 4377, 'obese': 4378, 'obesity': 4379, 'overweight': 4380, 'physical': 4381, 'prevalence': 4382, 'privileged': 4383, 'professor': 4384, 'reliable': 4385, 'resist': 4386, 'reverse': 4387, 'study': 4388, 'surprising': 4389, 'temptation': 4390, 'walking': 4391, 'worry': 4392, 'Minister_Ariel': 4393, 'Reuters': 4394, 'constant': 4395, 'favourably': 4396, 'israeli_Prime': 4397, 'mar': 4398, '11:55am': 4399, 'F': 4400, 'FM': 4401, 'Guglielmo': 4402, 'J': 4403, 'Local': 4404, 'Marconi': 4405, 'Milestones': 4406, 'NewsRadio': 4407, 'Triple': 4408, 'assassination': 4409, 'audio': 4410, 'birth': 4411, 'coverage': 4412, 'landing': 4413, 'listener': 4414, 'moon': 4415, 'simulcast': 4416, 'story': 4417, 'Beck': 4418, 'proxy': 4419, 'recognition': 4420, 'shareholder': 4421, 'timing': 4422, 'Anti': 4423, 'Somalia': 4424, 'civilisation': 4425, 'extension': 4426, 'incredible': 4427, 'indication': 4428, 'pleasure': 4429, 'realise': 4430, 'rebel': 4431, 'secretly': 4432, 'somali': 4433, 'soul': 4434, 'taliban': 4435, 'Bayt': 4436, 'Force-17': 4437, 'Hanum': 4438, 'command': 4439, 'execution': 4440, 'eyewitness': 4441, 'legitimate': 4442, 'township': 4443, 'abdominal': 4444, 'brief': 4445, 'contaminated': 4446, 'fever': 4447, 'hygiene': 4448, 'quarantine': 4449, 'remote': 4450, 'symptom': 4451, 'typhoid': 4452, 'Adams': 4453, 'Asia': 4454, 'Boeing': 4455, 'Cairns': 4456, 'Daniel': 4457, 'Denis': 4458, 'Fukuoka': 4459, 'Gschwind': 4460, 'Hong': 4461, 'Kong': 4462, 'Nagoya': 4463, 'Osaka': 4464, 'Singapore': 4465, 'Taipai': 4466, 'Taipei': 4467, 'Tourism': 4468, 'Whitsundays': 4469, 'asian': 4470, 'carrier': 4471, 'class': 4472, 'energise': 4473, 'flexibility': 4474, 'generate': 4475, 'independently': 4476, 'leisure': 4477, 'mainland': 4478, 'outsource': 4479, 'perspective': 4480, 'profitable': 4481, 'rostering': 4482, 'saving': 4483, 'unveil': 4484, 'wholly': 4485, 'workplace': 4486, 'Rene': 4487, 'ablution': 4488, 'aid': 4489, 'catering': 4490, 'channel': 4491, 'inspect': 4492, 'meal': 4493, 'practical': 4494, 'sanitation': 4495, 'soccer': 4496, 'tv': 4497, 'understanding': 4498, 'volleyball': 4499, 'Bronwyn': 4500, 'Doctor': 4501, 'Flying': 4502, 'Gambier': 4503, 'Inspector': 4504, 'Killmier': 4505, 'Millicent': 4506, 'donor': 4507, 'liver': 4508, 'miraculous': 4509, 'nurse': 4510, 'pine': 4511, 'runway': 4512, 'scrub': 4513, 'stun': 4514, 'unscathed': 4515, 'vegetation': 4516, 'perturb': 4517, 'rift': 4518, 'rivalry': 4519, 'whilst': 4520, 'Afghan': 4521, 'Ali': 4522, 'Amin': 4523, 'Anzare': 4524, 'Hazrat': 4525, 'Melawa': 4526, 'Palanai': 4527, 'Sar': 4528, 'Soviets': 4529, 'atrocity': 4530, 'interpreter': 4531, 'militiaman': 4532, 'mujahideen': 4533, 'offensive': 4534, 'portion': 4535, 'reporter': 4536, 'riddle': 4537, 'rout': 4538, 'situate': 4539, 'Arye': 4540, 'Berlin': 4541, 'Chancellor': 4542, 'EU': 4543, 'Fischer': 4544, 'Gerhard': 4545, 'Joschka': 4546, 'Mekel': 4547, 'Nabil': 4548, 'Palestine': 4549, 'Peres': 4550, 'Rudeina': 4551, 'Schroeder': 4552, 'Shaath': 4553, 'Shimon': 4554, 'Sidr': 4555, 'advisor': 4556, 'alive': 4557, 'botch': 4558, 'escalation': 4559, 'german': 4560, 'intifada': 4561, 'quest': 4562, 'truce': 4563, 'Cole': 4564, 'Counsel': 4565, 'Lionel': 4566, 'Roberts': 4567, 'Sutton': 4568, 'Terrence': 4569, 'corruption': 4570, 'dissatisfaction': 4571, 'fraud': 4572, 'rally': 4573, 'royal_commission': 4574, 'sided': 4575, 'unsatisfactory': 4576, 'zero': 4577, 'AIDS': 4578, 'Annan': 4579, 'Ghana': 4580, 'Kofi': 4581, 'Nationals': 4582, 'Nobel': 4583, 'Oslo': 4584, 'Prize': 4585, 'audience': 4586, 'fundamental': 4587, 'gala': 4588, 'human_right': 4589, 'downturn': 4590, 'Beech': 4591, 'Dismal': 4592, 'RFDS': 4593, 'Swamp': 4594, 'rfds': 4595, 'seal': 4596, 'Andrew': 4597, 'Cancer': 4598, 'Penman': 4599, 'airway': 4600, 'alarming': 4601, 'breast': 4602, 'lung': 4603, 'lung_cancer': 4604, 'modern': 4605, 'peripheral': 4606, 'Brett': 4607, 'Conservationists': 4608, 'Dempsey': 4609, 'Herberton': 4610, 'Heritage': 4611, 'Lyndon': 4612, 'Management': 4613, 'Protection': 4614, 'Range': 4615, 'Ravenshoe': 4616, 'Schneiders': 4617, 'Tropics': 4618, 'Wet': 4619, 'Wilderness': 4620, 'applaud': 4621, 'draw': 4622, 'log': 4623, 'plead': 4624, 'prosecute': 4625, 'rainforest': 4626, 'Frost': 4627, 'Kieren': 4628, 'Perkins': 4629, 'Thorpe': 4630, 'championship': 4631, 'consecutive': 4632, 'contender': 4633, 'emulate': 4634, 'feat': 4635, 'gold': 4636, 'layoff': 4637, 'limited': 4638, 'medal': 4639, 'swimmer': 4640, 'Cheney': 4641, 'Mullah': 4642, 'Omar': 4643, 'Vice': 4644, 'b-52': 4645, 'concerted': 4646, 'confront': 4647, 'disagree': 4648, 'division': 4649, 'fa-18': 4650, 'sortie': 4651, 'Aboriginal': 4652, 'Boyd': 4653, 'Crime': 4654, 'Islander': 4655, 'Torres': 4656, 'alcohol': 4657, 'author': 4658, 'avenue': 4659, 'Anthony': 4660, 'Fodera': 4661, 'Sturesteps': 4662, 'liquidator': 4663, 'payout': 4664, 'provisional_liquidation': 4665, 'severance': 4666, 'statutory': 4667, 'termination': 4668, 'worth': 4669, 'arrogant': 4670, 'belligerent': 4671, 'bow': 4672, 'breakdown': 4673, 'greedy': 4674, 'improved': 4675, 'inch': 4676, 'lowest': 4677, 'reliability': 4678, 'skilled': 4679, 'unacceptable': 4680, 'Cathy': 4681, 'Championships': 4682, 'Classic': 4683, 'Freeman': 4684, 'Games': 4685, 'Olympics': 4686, 'Track': 4687, 'olympic': 4688, 'Ramadan': 4689, 'assassinate': 4690, 'ceasefire': 4691, 'conditional': 4692, 'fighting': 4693, 'frustration': 4694, 'ultimatum': 4695, 'upsurge': 4696, 'Cocos': 4697, 'Grant': 4698, 'Lankan': 4699, 'Lankans': 4700, 'Sri': 4701, 'ashore': 4702, 'getting': 4703, 'moor': 4704, 'p-3': 4705, 'undetected': 4706, 'whereabout': 4707, 'Robbards': 4708, 'afraid': 4709, 'counsel': 4710, 'physically': 4711, 'AIP': 4712, 'Chuck': 4713, 'Ghar': 4714, 'Hagel': 4715, 'Islamist': 4716, 'Khil': 4717, 'Mosh': 4718, 'Paktika': 4719, 'Press': 4720, 'Relations': 4721, 'Sharana': 4722, 'Shura': 4723, 'Spin': 4724, 'Zabul': 4725, 'arab': 4726, 'defiant': 4727, 'dig': 4728, 'dinner': 4729, 'ignominious': 4730, 'informed': 4731, 'protector': 4732, 'province': 4733, 'regular': 4734, 'unjustly': 4735, 'Beit': 4736, 'armored': 4737, 'pile': 4738, 'yield': 4739, 'Bernadette': 4740, 'McMenamin': 4741, 'Wise': 4742, 'advent': 4743, 'gap': 4744, 'image': 4745, 'invisible': 4746, 'largely': 4747, 'paedophilia': 4748, 'pornography': 4749, 'promote': 4750, 'prostitute': 4751, 'vulnerable': 4752, 'AWU': 4753, 'Kirribilli': 4754, 'Sheldon': 4755, 'Shorten': 4756, 'South_Wales': 4757, 'TWU': 4758, 'bleak': 4759, 'carol': 4760, 'noise': 4761, 'picket': 4762, 'satisfactory': 4763, 'substance': 4764, 'Aamer': 4765, 'Equal': 4766, 'Journal': 4767, 'Medical': 4768, "O'Sullivan": 4769, 'Opportunity': 4770, 'Rights': 4771, 'Sultan': 4772, 'Villawood': 4773, 'article': 4774, 'can[not': 4775, 'chronic': 4776, 'commendation': 4777, 'depressive': 4778, 'display': 4779, 'distress': 4780, 'fulfill': 4781, 'journal': 4782, 'letter': 4783, 'migration': 4784, 'pen': 4785, 'psychological': 4786, 'casino': 4787, 'expense': 4788, 'gambling': 4789, 'gaming': 4790, 'poker': 4791, 'taxis': 4792, 'Terence': 4793, 'coercion': 4794, 'competitive': 4795, 'stress': 4796, 'summons': 4797, 'witch': 4798, 'Huegill': 4799, 'Klim': 4800, 'better': 4801, 'course': 4802, 'huegill': 4803, 'metre_butterfly': 4804, 'swim': 4805, 'Daka': 4806, 'Democratic': 4807, 'Iffam': 4808, 'incursion': 4809, 'undercover': 4810, 'Chris': 4811, 'Evans': 4812, 'blowout': 4813, 'reprioritise': 4814, 'scrap': 4815, 'spending': 4816, 'undertaking': 4817, 'Afroz': 4818, 'Rialto': 4819, 'Towers': 4820, 'confess': 4821, 'skepticism': 4822, 'untrue': 4823, '3:00pm': 4824, '4:45pm': 4825, '7:00am': 4826, 'CNN': 4827, 'County': 4828, 'Decorative': 4829, 'Donna': 4830, 'Elkhart': 4831, 'Fort': 4832, 'Goshen': 4833, 'Indiana': 4834, 'Millwork': 4835, 'Nu': 4836, 'Rohrer': 4837, 'Sheriff': 4838, 'WNDU': 4839, 'Wayne': 4840, 'Wood': 4841, 'evaluate': 4842, 'lunchtime': 4843, 'shooter': 4844, 'Almao': 4845, 'Amazon': 4846, 'Brazilians': 4847, 'Clark': 4848, 'Denise': 4849, 'Edmund': 4850, 'Hillary': 4851, 'Joyce': 4852, 'Macapa': 4853, 'Seamaster': 4854, 'ambassador': 4855, 'devastate': 4856, 'feels': 4857, 'grudge': 4858, 'hero': 4859, 'impression': 4860, 'okay': 4861, 'pirate': 4862, 'spirited': 4863, 'utmost': 4864, 'Gerber': 4865, 'Markus': 4866, 'Saxet': 4867, 'Saxetbach': 4868, 'deceive': 4869, 'defunct': 4870, 'excursion': 4871, 'flash': 4872, 'flood': 4873, 'gorge': 4874, 'optimism': 4875, 'proof': 4876, 'snap': 4877, 'swamp': 4878, '<': 4879, '>': 4880, 'Archive': 4881, 'Beijing': 4882, 'Diplomacy</i': 4883, 'Gerald': 4884, 'Henry': 4885, 'Jakarta': 4886, 'Kissinger': 4887, 'Portugal': 4888, 'Timorese': 4889, 'Vietnam': 4890, 'autonomy': 4891, 'blizzard': 4892, 'book': 4893, 'cable': 4894, 'classify': 4895, 'colony': 4896, 'conclusively': 4897, 'consign': 4898, 'consistently': 4899, 'deem': 4900, 'drastic': 4901, 'geo': 4902, 'green': 4903, 'happens': 4904, 'oppression': 4905, 'portuguese': 4906, 'regrettable': 4907, 'reply': 4908, 'salvo': 4909, 'stability': 4910, 'Cahill': 4911, 'Darren': 4912, 'Davis': 4913, 'Jason': 4914, 'Stoltenberg': 4915, 'Wimbledon': 4916, 'breakup': 4917, 'dedication': 4918, 'enthusiastic': 4919, 'evolve': 4920, 'finalist': 4921, 'grateful': 4922, 'invaluable': 4923, 'rewarding': 4924, 'skip': 4925, 'talented': 4926, 'Pashtun': 4927, 'Salam': 4928, 'Zaeef': 4929, 'ammunition': 4930, 'bastion': 4931, 'tribe': 4932, 'Advani': 4933, 'Commons': 4934, 'Hindustan': 4935, 'Krishna': 4936, 'Lal': 4937, 'Mumbai': 4938, 'Qaeda': 4939, 'Razzak': 4940, 'Thailand': 4941, 'Times': 4942, 'Trust': 4943, 'card': 4944, 'faces': 4945, 'mention': 4946, 'newspaper': 4947, 'similar': 4948, 'Ethnic': 4949, 'Ghulam': 4950, 'Hassan': 4951, 'Hazara': 4952, 'Solution': 4953, 'bully': 4954, '3:14pm': 4955, '7:14am': 4956, 'Chicago': 4957, 'disgruntled': 4958, 'dispatcher': 4959, 'park': 4960, 'preliminary': 4961, 'Authorities': 4962, 'Sheikh': 4963, 'Yassin': 4964, 'arrange': 4965, 'backdrop': 4966, 'bar': 4967, 'baton': 4968, 'frantic': 4969, 'spiritual': 4970, 'stone': 4971, 'unstable': 4972, 'youth': 4973, '70': 4974, '80': 4975, '90': 4976, 'Macfarlane': 4977, 'Reserve_Bank': 4978, 'calendar': 4979, 'remarkably': 4980, 'ride': 4981, 'Food': 4982, 'GM': 4983, 'Grocery': 4984, 'Head': 4985, 'Hooke': 4986, 'Mitch': 4987, 'association': 4988, 'brand': 4989, 'crop': 4990, 'disrepute': 4991, 'grocery': 4992, 'ingredient': 4993, 'label': 4994, 'modify': 4995, 'product': 4996, 'shelf': 4997, 'sidestep': 4998, 'tangible': 4999, 'Christians': 5000, 'Java': 5001, 'Laskar': 5002, 'Maluka': 5003, 'Militia': 5004, 'Muslims': 5005, 'Sulawesi': 5006, 'Tena': 5007, 'Amazonia': 5008, 'Basin': 5009, 'Calypso': 5010, 'Cousteau': 5011, 'Jour': 5012, 'Jules': 5013, 'OBE': 5014, 'Sportsman': 5015, 'Steinlager': 5016, 'Tour': 5017, 'Trophy': 5018, 'Verne': 5019, 'Whitbread': 5020, 'Whitbreads': 5021, 'Yachtsman': 5022, 'Zealander': 5023, 'accolade': 5024, 'backer': 5025, 'compete': 5026, 'determination': 5027, 'expedition': 5028, 'gifted': 5029, 'inspire': 5030, 'loyalty': 5031, 'meticulous': 5032, 'planner': 5033, 'sailing': 5034, 'sporting': 5035, 'unlimited': 5036, 'voyage': 5037, 'yachting': 5038, 'citizen': 5039, 'specific': 5040, 'California': 5041, 'Campbell': 5042, 'Class': 5043, 'Cody': 5044, 'Forces': 5045, 'Jefferson': 5046, 'Kentucky': 5047, 'Massachusetts': 5048, 'Master': 5049, 'Petithory': 5050, 'Prosser': 5051, 'Special': 5052, 'Staff': 5053, 'Tennessee': 5054, 'misguided': 5055, 'Jabril': 5056, 'Rajoub': 5057, 'amid': 5058, 'behave': 5059, 'bulk': 5060, 'conclude': 5061, 'conclusion': 5062, 'dictate': 5063, 'revere': 5064, 'deter': 5065, 'illegally': 5066, 'ineffective': 5067, 'Brendan': 5068, 'Education': 5069, 'Nelson': 5070, 'Treasurer': 5071, 'advertising': 5072, 'committee': 5073, 'expenditure': 5074, 'pejorative': 5075, 'razor': 5076, 'responsibly': 5077, 'scrutiny': 5078, 'slash': 5079, 'ultimately': 5080, 'waste': 5081, 'analysis': 5082, 'catastrophe': 5083, 'emotional': 5084, 'mistake': 5085, 'natural': 5086, 'sorrow': 5087, 'supervisor': 5088, 'unpredictable': 5089, 'wrongdoing': 5090, 'Administration': 5091, 'Aeronautics': 5092, 'ISS': 5093, 'Raffaello': 5094, 'cargo': 5095, 'docking': 5096, 'en': 5097, 'italian': 5098, 'module': 5099, 'rectify': 5100, 'spacewalk': 5101, 'Creedy': 5102, 'Ed': 5103, 'Elka': 5104, 'Ham': 5105, 'Hass': 5106, 'Jones': 5107, 'Julia': 5108, 'Liesel': 5109, 'Matt': 5110, 'Moses': 5111, 'Rebecca': 5112, 'Ryan': 5113, 'Welsh': 5114, 'awesome': 5115, 'backstroke': 5116, 'breaststroke': 5117, 'champiom': 5118, 'eliminate': 5119, 'elimination': 5120, 'format': 5121, 'freestyle': 5122, 'heater': 5123, 'nail': 5124, 'pool': 5125, 'prizemoney': 5126, 'second': 5127, 'skin': 5128, 'spa': 5129, 'swam': 5130, 'swimming': 5131, 'swum': 5132, 'thermostat': 5133, 'upset': 5134, 'women': 5135, 'Brahimi': 5136, 'Gailani': 5137, 'Hamed': 5138, 'Lakhdar': 5139, 'Peshawar': 5140, 'Qanooni': 5141, 'Sayed': 5142, 'Victori': 5143, 'Yunus': 5144, 'applause': 5145, 'appointment': 5146, 'cement': 5147, 'courage': 5148, 'cry': 5149, 'delegate': 5150, 'delicate': 5151, 'diplomacy': 5152, 'dislodge': 5153, 'evident': 5154, 'interior': 5155, 'king': 5156, 'mix': 5157, 'moderate': 5158, 'negotiator': 5159, 'pave': 5160, 'phrase': 5161, 'proud': 5162, 'signatory': 5163, 'smile': 5164, 'strain': 5165, 'stray': 5166, 'stronghold': 5167, 'sum': 5168, 'sun': 5169, 'transformation': 5170, 'update': 5171, 'whirlwind': 5172, 'LK': 5173, 'confession': 5174, 'hoax': 5175, 'suspicion': 5176, 'verify': 5177, 'conversation': 5178, 'leader_Yasser': 5179, 'rein': 5180, 'rounded': 5181, 'sabotage': 5182, 'undermine': 5183, 'ACA': 5184, 'ACCC': 5185, 'Competition': 5186, 'Consumer': 5187, 'Consumers': 5188, 'Katherine': 5189, 'Wolthuizen': 5190, 'amend': 5191, 'proper': 5192, 'regulate': 5193, 'Austar': 5194, 'Traveland': 5195, 'centrelink': 5196, 'pleasant': 5197, 'shopfront': 5198, 'uncertain': 5199, 'Community': 5200, 'DFAT': 5201, 'Trade': 5202, 'competitiveness': 5203, 'misunderstanding': 5204, 'Counting': 5205, 'Honiara': 5206, 'Solomons': 5207, 'bitter': 5208, 'box': 5209, 'counting': 5210, 'cycle': 5211, 'disintegration': 5212, 'keen': 5213, 'slowly': 5214, 'tampering': 5215, 'prize': 5216, 'Hilton': 5217, 'Hotel': 5218, 'apparent': 5219, 'explosion': 5220, 'installation': 5221, 'Marines': 5222, 'Shah': 5223, 'Zahir': 5224, 'engagement': 5225, 'exiled': 5226, 'govern': 5227, 'royalist': 5228, '1980s': 5229, 'combine': 5230, 'contraction': 5231, 'lender': 5232, 'stall': 5233, 'trading': 5234, 'MacFarlane': 5235, 'McMullan': 5236, 'pickup': 5237, 'seriousness': 5238, 'Templeton': 5239, 'albeit': 5240, 'goalkicker': 5241, 'swan': 5242, 'undue': 5243, 'APRA': 5244, 'Allianz': 5245, 'Lombe': 5246, 'Prudential': 5247, 'Regulation': 5248, 'abruptly': 5249, 'accounting': 5250, 'acquisition': 5251, 'adjourn': 5252, 'apra': 5253, 'conceal': 5254, 'corporate': 5255, 'goodwill': 5256, 'inadequate': 5257, 'intangible': 5258, 'solvency': 5259, 'Bellamack': 5260, 'Complex': 5261, 'Larrakia': 5262, 'Palmerston': 5263, 'Rosebury': 5264, 'Sporting': 5265, 'custodian': 5266, 'developer': 5267, 'lease': 5268, 'lodge': 5269, 'urban': 5270, 'Burhanuddin': 5271, 'Masood': 5272, 'Petersberg': 5273, 'Rabbani': 5274, 'SAS': 5275, 'Wali': 5276, 'composition': 5277, 'instal': 5278, 'intensive': 5279, 'legendary': 5280, 'nominee': 5281, 'observe': 5282, 'oust': 5283, 'signing': 5284, 'tonight': 5285, 'Blair': 5286, 'Brigadier': 5287, 'Erakat': 5288, 'Hubert': 5289, 'Kitrey': 5290, 'Saeb': 5291, 'Salfit': 5292, 'Tulkarem': 5293, 'Vedrine': 5294, 'abroad': 5295, 'accusation': 5296, 'airstrike': 5297, 'amount': 5298, 'argument': 5299, 'brace': 5300, 'denounce': 5301, 'devastating': 5302, 'dispatch': 5303, 'downfall': 5304, 'establishment': 5305, 'fallout': 5306, 'frequent': 5307, 'furious': 5308, 'harassment': 5309, 'hardline': 5310, 'purely': 5311, 'schoolboy': 5312, 'spiral': 5313, 'stabilise': 5314, 'symbol': 5315, 'torpedo': 5316, 'unhurt': 5317, 'weakness': 5318, 'winger': 5319, 'Austrade': 5320, 'Bayliss': 5321, 'Glenn': 5322, 'Maguire': 5323, 'SG': 5324, 'cocoon': 5325, 'consensus': 5326, 'definitive': 5327, 'economist': 5328, 'equity': 5329, 'footing': 5330, 'globe': 5331, 'housing': 5332, 'manufacturing': 5333, 'quarterly': 5334, 'retail': 5335, 'sharply': 5336, 'tick': 5337, 'Ferguson': 5338, 'Fitzgibbon': 5339, 'Joel': 5340, 'archaic': 5341, 'constructive': 5342, 'formula': 5343, 'frontbencher': 5344, 'internally': 5345, 'requirement': 5346, 'scrapping': 5347, 'Badtrans': 5348, 'Goner': 5349, 'McAfee': 5350, 'McAfee.com': 5351, 'Microsoft': 5352, 'Outlook': 5353, 'attachment': 5354, 'badtran': 5355, 'computer': 5356, 'copy': 5357, 'delete': 5358, 'erase': 5359, 'goner': 5360, 'rating': 5361, 'recommend': 5362, 'replicate': 5363, 'software': 5364, 'user': 5365, 'virus': 5366, 'worm': 5367, 'Bernard': 5368, 'Struree': 5369, 'Strureewas': 5370, 'colour': 5371, 'drown': 5372, 'fateful': 5373, 'insight': 5374, 'instruct': 5375, 'notice': 5376, 'sternly': 5377, 'Aquilina': 5378, 'Fair': 5379, 'Fire_Service': 5380, 'Goodin': 5381, 'Hornsby': 5382, 'Laura': 5383, 'SES': 5384, 'Trading': 5385, 'contractor': 5386, 'falsely': 5387, 'householder': 5388, 'sub': 5389, 'tradespeople': 5390, 'unlicensed': 5391, 'Economic': 5392, 'Educational': 5393, 'Masters': 5394, 'OECD': 5395, 'Research': 5396, 'attitude': 5397, 'literacy': 5398, 'mathematical': 5399, 'narrative': 5400, 'negative': 5401, 'storybook': 5402, 'text': 5403, 'Allahabad': 5404, 'Beatle': 5405, 'Beatles': 5406, 'Benares': 5407, 'Dhani': 5408, 'Ganges': 5409, 'Hare': 5410, 'Harrison': 5411, 'Hindu': 5412, 'Hinduism': 5413, 'Hindus': 5414, 'Olivia': 5415, 'Tomorrow': 5416, 'Vrindavan': 5417, 'ashe': 5418, 'bedside': 5419, 'birthplace': 5420, 'closely': 5421, 'curious': 5422, 'devotee': 5423, 'devout': 5424, 'discreet': 5425, 'eternal': 5426, 'ghat': 5427, 'guitarist': 5428, 'immerse': 5429, 'immersion': 5430, 'km': 5431, 'odd': 5432, 'onlooker': 5433, 'platform': 5434, 'prayer': 5435, 'rite': 5436, 'ritual': 5437, 'salvation': 5438, 'sect': 5439, 'symbolise': 5440, 'tightlipped': 5441, 'upstream': 5442, 'vigil': 5443, 'widow': 5444, 'Parore': 5445, 'Robinson': 5446, 'Tasman': 5447, 'Trans': 5448, 'controversial': 5449, 'retain': 5450, 'umpire': 5451, 'zimbabwean': 5452, 'accomplishment': 5453, 'airforce': 5454, 'churn': 5455, 'foist': 5456, 'heliport': 5457, 'liaison': 5458, 'penetrate': 5459, 'reprisal': 5460, 'ruthless': 5461, 'spate': 5462, 'tanzim': 5463, '1:00pm': 5464, 'Albarran': 5465, 'Compensation': 5466, 'Foley': 5467, 'Luke': 5468, 'Travel': 5469, 'annualise': 5470, 'anticipate': 5471, 'approximately': 5472, 'buyer': 5473, 'cease': 5474, 'entitle': 5475, 'exist': 5476, 'franchise': 5477, 'function': 5478, 'liable': 5479, 'lion': 5480, 'redundant': 5481, 'streamline': 5482, 'tad': 5483, 'compose': 5484, 'convene': 5485, 'jirga': 5486, 'loya': 5487, 'permanent': 5488, 'Regulatory': 5489, 'commissioner': 5490, 'extreme': 5491, 'liability': 5492, 'royal': 5493, 'solvently': 5494, 'Gorge': 5495, 'Saxeten': 5496, 'organise': 5497, 'preventable': 5498, 'unforeseeable': 5499, 'Bankstown': 5500, 'Carr': 5501, 'Gunnedah': 5502, 'Hurstville': 5503, 'Liverpool': 5504, 'Pitwater': 5505, 'Tamworth': 5506, 'Wahroonga': 5507, 'bend': 5508, 'came': 5509, 'ferocity': 5510, 'freight': 5511, 'mop': 5512, 'sound': 5513, 'thehuge': 5514, 'uneven': 5515, 'unpredictability': 5516, 'Caird': 5517, 'Wendy': 5518, 'attractive': 5519, 'bankruptcy': 5520, 'cheap': 5521, 'departmental': 5522, 'telecommunication': 5523, 'Hendriks': 5524, 'Jackie': 5525, 'abusive': 5526, 'crude': 5527, 'gesture': 5528, 'language': 5529, 'outburst': 5530, 'referee': 5531, 'Aarage': 5532, 'Amnesty': 5533, 'choice': 5534, 'contravene': 5535, 'excuse': 5536, 'gather': 5537, 'horrifying': 5538, 'humanitarian': 5539, 'interpret': 5540, 'loudspeaker': 5541, 'paramount': 5542, 'practically': 5543, 'squarely': 5544, 'targeting': 5545, 'tragic': 5546, 'violation': 5547, 'Arabia': 5548, 'Chaman': 5549, 'Commander': 5550, 'Hands': 5551, 'Lali': 5552, 'Libya': 5553, 'Nangarhar': 5554, 'Torkotal': 5555, 'abdomen': 5556, 'column': 5557, 'cranking': 5558, 'frequency': 5559, 'hamstring': 5560, 'lieutenant': 5561, 'noon': 5562, 'strap': 5563, 'Megawati': 5564, 'Sukarnoputri': 5565, 'bilateral': 5566, 'congratulatory': 5567, 'smuggle': 5568, 'Anderson': 5569, 'Bracks': 5570, 'Lew': 5571, 'aviation': 5572, 'businessman': 5573, 'competitor': 5574, 'Campbeltown': 5575, 'Debus': 5576, 'Forest': 5577, 'Frenches': 5578, 'Ives': 5579, 'Kurringai': 5580, 'Leete': 5581, 'Quirindi': 5582, 'Turramurra': 5583, 'Warringah': 5584, 'arrangement': 5585, 'complicated': 5586, 'earnest': 5587, 'electricity': 5588, 'extensively': 5589, 'gale': 5590, 'loan': 5591, 'reconnect': 5592, 'refund': 5593, 'reserve': 5594, 'welfare': 5595, 'Owen': 5596, 'adversely': 5597, 'ample': 5598, 'confidentiality': 5599, 'fairness': 5600, 'prudential': 5601, 'relevant': 5602, 'unequivocally': 5603, '9:30pm': 5604, 'Dovedales': 5605, 'Elsie': 5606, 'Fab': 5607, 'Gerry': 5608, 'Juniors': 5609, 'Leanne': 5610, 'McCormack': 5611, 'Silent': 5612, 'Sweet': 5613, 'brave': 5614, 'candlelight': 5615, 'cavalcade': 5616, 'cold': 5617, 'craze': 5618, 'famous': 5619, 'guitar': 5620, 'hometown': 5621, 'lucky': 5622, 'million': 5623, 'music': 5624, 'musician': 5625, 'recording': 5626, 'silence': 5627, 'throng': 5628, 'university': 5629, 'Buchanan': 5630, 'Kiwi': 5631, 'Lunchtime': 5632, 'disciplinary': 5633, 'dressing': 5634, 'misconduct': 5635, 'reassess': 5636, 'Haniya': 5637, 'Ismail': 5638, 'Shanab': 5639, 'net': 5640, 'warrant': 5641, '6b': 5642, 'Commissions': 5643, 'adhere': 5644, 'auditor': 5645, 'circulate': 5646, 'confidential': 5647, 'content': 5648, 'disclose': 5649, 'strict': 5650, 'Eslake': 5651, 'Index': 5652, 'Internet': 5653, 'Job': 5654, 'Olivier': 5655, 'Recruitment': 5656, 'Saul': 5657, 'advertise': 5658, 'advertisement': 5659, 'cyberspace': 5660, 'daily': 5661, 'indicator': 5662, 'justification': 5663, 'Bourne': 5664, 'Democrats': 5665, 'Greens': 5666, 'Kerry': 5667, 'Nettle': 5668, 'Senator': 5669, 'Vicki': 5670, 'canyone': 5671, 'defunctoperator': 5672, 'inexperienced': 5673, 'provision': 5674, 'Jeff': 5675, 'McDonald': 5676, 'accelerate': 5677, 'chain': 5678, 'financially': 5679, 'subsidiary': 5680, 'Arthurs': 5681, 'Australain': 5682, 'Escude': 5683, 'Fitzgerald': 5684, 'Forget': 5685, 'Guy': 5686, 'Nice': 5687, 'Nicholas': 5688, 'Pat': 5689, 'Rafter': 5690, 'Todd': 5691, 'Woodbridge': 5692, 'bitterly': 5693, 'character': 5694, 'dream': 5695, 'exciting': 5696, 'harsh': 5697, 'indefinite': 5698, 'offical': 5699, 'partisan': 5700, 'recur': 5701, 'rubber': 5702, 'tired': 5703, 'triple': 5704, 'unbelievably': 5705, 'CBS': 5706, 'Defense': 5707, 'W.': 5708, 'saudi': 5709, 'status': 5710, 'BIS': 5711, 'BankSA': 5712, 'Shrapnel': 5713, 'Trends': 5714, 'bulletin': 5715, 'capable': 5716, 'forecasting': 5717, 'immunity': 5718, 'necessarily': 5719, 'normally': 5720, 'pattern': 5721, 'stable': 5722, 'Keelty': 5723, 'Mick': 5724, 'Naeil': 5725, 'coordinated': 5726, 'malaysian': 5727, 'transnational': 5728, 'Assisting': 5729, 'Dickie': 5730, 'Investments': 5731, 'procedural': 5732, 'reference': 5733, 'subpoena': 5734, 'Corowa': 5735, 'Walter': 5736, 'constitutional': 5737, 'historian': 5738, 'plebiscite': 5739, 'referendum': 5740, 'renew': 5741, 'cow': 5742, 'dairy': 5743, 'incinerate': 5744, 'mad': 5745, 'meat': 5746, 'ministerial': 5747, 'organ': 5748, 'Gould': 5749, 'Power': 5750, 'Unions': 5751, 'compulsory': 5752, 'frustrating': 5753, 'producer': 5754, 'adjournment': 5755, 'deadlocke': 5756, 'objection': 5757, 'sizeable': 5758, '1960': 5759, 'McCartney': 5760, 'Sun': 5761, 'Taxman': 5762, 'UK': 5763, 'beatle': 5764, 'beset': 5765, 'famously': 5766, 'intruder': 5767, 'limelight': 5768, 'mysticism': 5769, 'phenomenal': 5770, 'plagiarism': 5771, 'racing': 5772, 'reformation': 5773, 'resentment': 5774, 'sneak': 5775, 'songwriter': 5776, 'stabbing': 5777, 'successfully': 5778, 'sue': 5779, 'throat': 5780, 'underage': 5781, 'vinyl': 5782, 'Rob': 5783, 'Sherrard': 5784, 'basically': 5785, 'marquee': 5786, 'tent': 5787, 'terminate': 5788, 'unwind': 5789, 'Advance': 5790, 'Cell': 5791, 'Congressmen': 5792, 'George_W': 5793, 'Parkinson': 5794, 'Technologies': 5795, 'brain': 5796, 'clone': 5797, 'cloning': 5798, 'excess': 5799, 'indistinguishable': 5800, 'inject': 5801, 'mouse': 5802, 'precursor': 5803, 'recrimination': 5804, 'tissue': 5805, 'variety': 5806, 'Eastern': 5807, 'Soviet': 5808, 'Ukraine': 5809, 'annually': 5810, 'epidemic': 5811, 'insecurity': 5812, 'steep': 5813, 'Liberal': 5814, 'Liberals': 5815, 'Stone': 5816, 'amalgamation': 5817, 'categorically': 5818, 'merge': 5819, 'merger': 5820, 'muscle': 5821, 'BedeHarris': 5822, 'Centenary': 5823, 'academic': 5824, 'codification': 5825, 'consult': 5826, 'federation': 5827, 'lecturer': 5828, 'monarchy': 5829, 'opporunity': 5830, 'referenda': 5831}
5832

I will filter out low-frequency and high-frequency tokens, also limit the vocabulary to a max of 1000 words:

dictionary.filter_extremes(no_below=5, no_above=0.5, keep_n=1000)
  • No_below: Tokens that appear in less than 5 documents are filtered out.
  • No_above: Tokens that appear in more than 50% of the total corpus are also removed as default.
  • Keep_n: We limit ourselves to the top 1000 most frequent tokens (default is 100.000). Set to ‘None’ if you want to keep all.

We are now ready to construct the corpus using the dictionary from above and the doc2bow function. The function doc2bow() simply counts the number of occurrences of each distinct word, converts the word to its integer word id and returns the result as a sparse vector:

corpus = [dictionary.doc2bow(text) for text in texts]
print(corpus[1])
[(45, 1), (53, 1), (60, 1), (62, 1), (63, 1), (75, 1), (76, 4), (77, 1), (78, 3), (79, 1), (80, 3), (81, 1), (82, 2), (83, 2), (84, 1), (85, 1), (86, 1), (87, 1), (88, 1), (89, 1), (90, 1), (91, 1), (92, 1), (93, 1), (94, 1), (95, 2), (96, 1), (97, 1), (98, 1), (99, 1), (100, 1), (101, 1), (102, 3), (103, 3), (104, 1), (105, 1), (106, 2), (107, 1), (108, 2), (109, 1), (110, 1), (111, 1), (112, 1), (113, 1), (114, 1), (115, 2), (116, 1), (117, 1), (118, 1), (119, 1)]

15.3.6 Model building

The next step is to train the unsupervised machine learning model on the data. I choose to work with the LdaMulticore, which uses all CPU cores to parallelize and speed up model training. If this doesn’t work for you for some reason, try the gensim.models.ldamodel.LdaModel class which is an equivalent, but more straightforward and single-core implementation.

When inserting our corpus into the topic modelling algorithm, the corpus gets analyzed in order to find the distribution of words in each topic and the distribution of topics in each document. #### LSI - Latent Semantic Indexing

LSI stands for Latent Semantic Indexing - It is a popular information retreival method which works by decomposing the original matrix of words to maintain key topics.

lsi_model = LsiModel(corpus=corpus, num_topics=10, id2word=dictionary)
lsi_model.show_topics(num_topics=5)
C:\Users\xinlu\anaconda3\lib\site-packages\gensim\models\lsimodel.py:963: DeprecationWarning:

Please use `csc_matvecs` from the `scipy.sparse` namespace, the `scipy.sparse.sparsetools` namespace is deprecated.
[(0,
  '-0.242*"israeli" + -0.221*"Arafat" + -0.207*"palestinian" + -0.181*"force" + -0.163*"official" + -0.162*"kill" + -0.155*"attack" + -0.143*"people" + -0.122*"day" + -0.120*"Israel"'),
 (1,
  '-0.316*"israeli" + -0.307*"Arafat" + -0.282*"palestinian" + 0.177*"Afghanistan" + 0.164*"Australia" + -0.164*"Sharon" + -0.157*"Israel" + -0.131*"Hamas" + 0.129*"force" + -0.125*"West_Bank"'),
 (2,
  '0.269*"Afghanistan" + -0.242*"fire" + 0.222*"force" + 0.188*"Al_Qaeda" + -0.178*"Sydney" + 0.173*"bin_Laden" + 0.157*"Pakistan" + 0.133*"Taliban" + 0.133*"Tora_Bora" + 0.130*"fighter"'),
 (3,
  '0.377*"fire" + 0.279*"area" + -0.238*"Australia" + 0.190*"Sydney" + 0.177*"firefighter" + 0.155*"north" + 0.147*"wind" + 0.131*"south" + 0.129*"New_South" + 0.129*"Wales"'),
 (4,
  '0.282*"company" + 0.258*"Qantas" + 0.210*"union" + -0.197*"test" + 0.179*"worker" + -0.172*"win" + -0.165*"match" + -0.149*"South_Africa" + -0.136*"play" + -0.124*"day"')]

HDP - Hierarchical Drichlet Process

HDP, the Hierarchical Drichlet Process is an unsupervised topic model which figures out the number of topics on it’s own.

hdp_model = HdpModel(corpus=corpus, id2word=dictionary)
hdp_model.show_topics()
[(0,
  '0.006*criticism + 0.005*Government + 0.005*play + 0.005*australian + 0.004*successful + 0.004*israeli + 0.004*group + 0.004*Palestinian_Authority + 0.004*match + 0.004*decision + 0.004*Federal_Government + 0.004*Afghanistan + 0.004*team + 0.004*September_attack + 0.004*Australia + 0.003*Sharon + 0.003*responsibility + 0.003*israeli_Prime + 0.003*Hamas + 0.003*explosive'),
 (1,
  '0.006*Israel + 0.005*deadly + 0.005*call + 0.005*campaign + 0.005*Williams + 0.005*Parliament + 0.005*track + 0.005*court + 0.005*Sydney + 0.004*wind + 0.004*union + 0.004*population + 0.004*New + 0.004*Defence_Secretary + 0.004*budget + 0.004*Senator_Hill + 0.004*Pentagon + 0.004*mission + 0.004*report + 0.004*result'),
 (2,
  '0.010*decision + 0.008*camp + 0.006*edge + 0.006*see + 0.005*Rural_Fire + 0.005*suffer + 0.005*protect + 0.005*work + 0.005*sign + 0.004*vote + 0.004*October + 0.004*victory + 0.004*life + 0.004*current + 0.004*twice + 0.004*holiday + 0.004*chance + 0.004*match + 0.004*policy + 0.004*violence'),
 (3,
  '0.009*Gaza + 0.007*disaster + 0.006*non + 0.006*HIH + 0.006*Police + 0.005*treat + 0.005*eastern + 0.005*future + 0.005*Arafat + 0.005*Jihad + 0.005*nearly + 0.004*case + 0.004*avoid + 0.004*feel + 0.004*road + 0.004*community + 0.004*israeli + 0.004*Sharon + 0.004*hope + 0.004*include'),
 (4,
  '0.006*city + 0.006*family + 0.006*HIH + 0.005*end + 0.005*want + 0.005*schedule + 0.005*future + 0.005*company + 0.004*helicopter + 0.004*contain + 0.004*weekend + 0.004*suspend + 0.004*maintenance_worker + 0.004*director + 0.004*face + 0.004*negotiate + 0.004*responsibility + 0.004*drive + 0.004*camp + 0.004*official'),
 (5,
  '0.007*lot + 0.006*deadly + 0.006*survive + 0.006*challenge + 0.006*attack_Israel + 0.005*management + 0.005*Gaza + 0.005*firm + 0.005*political + 0.005*violence + 0.005*World_Trade + 0.005*industrial_action + 0.005*Defence + 0.004*accident + 0.004*stop + 0.004*Argentina + 0.004*security_force + 0.004*aim + 0.004*inquiry + 0.004*National'),
 (6,
  '0.006*storm + 0.006*determine + 0.006*damage + 0.006*final + 0.005*insurance + 0.005*clearly + 0.005*Wales + 0.005*land + 0.005*Andy_Bichel + 0.005*relate + 0.005*today + 0.004*establish + 0.004*victory + 0.004*allegation + 0.004*french + 0.004*press + 0.004*December + 0.004*change + 0.004*New_York + 0.004*Government'),
 (7,
  '0.007*defend + 0.006*situation + 0.005*Americans + 0.004*Lee + 0.004*fight_alongside + 0.004*sponsor + 0.004*official + 0.004*serve + 0.004*receive + 0.004*sign + 0.004*challenge + 0.004*Victoria + 0.004*plane + 0.004*peace + 0.004*responsible + 0.004*Thursday + 0.004*accept + 0.004*Glenn_McGrath + 0.004*strike + 0.004*large'),
 (8,
  '0.009*bill + 0.008*week_ago + 0.007*bomb + 0.007*detail + 0.006*confident + 0.006*water + 0.005*express + 0.005*face + 0.005*respond + 0.005*Commission + 0.005*September_attack + 0.004*Anthony_Zinni + 0.004*current + 0.004*south_african + 0.004*press + 0.004*person + 0.004*conflict + 0.004*war + 0.004*federal + 0.004*assist'),
 (9,
  '0.007*airport + 0.006*area + 0.006*term + 0.005*opposition + 0.005*Taliban + 0.004*search + 0.004*south_west + 0.004*match + 0.004*warplane + 0.004*safety + 0.004*world + 0.004*disaster + 0.004*impact + 0.004*HIH + 0.004*palestinian_leader + 0.004*resign + 0.004*mass + 0.004*Kandahar + 0.004*prisoner + 0.004*little'),
 (10,
  '0.007*bombing + 0.006*matter + 0.006*Eve + 0.006*Waugh + 0.005*international + 0.005*court + 0.005*crackdown + 0.005*weather_condition + 0.005*guard + 0.005*islamic + 0.005*television + 0.005*able + 0.005*Peter + 0.004*foreign + 0.004*shortly + 0.004*ready + 0.004*passenger + 0.004*stand + 0.004*director + 0.004*suspend'),
 (11,
  '0.006*war + 0.006*Australian + 0.005*think + 0.005*Kandahar + 0.005*significant + 0.005*sponsor + 0.005*office + 0.005*deliver + 0.005*rule + 0.004*foreign + 0.004*deadly + 0.004*service + 0.004*deploy + 0.004*probably + 0.004*serve + 0.004*organisation + 0.004*reach + 0.004*ceremony + 0.004*suicide_bomber + 0.004*$'),
 (12,
  '0.007*Shane + 0.007*Israelis + 0.007*structure + 0.006*survive + 0.006*safety + 0.006*Defence_Secretary + 0.006*fire_burn + 0.005*David + 0.005*resign + 0.005*France + 0.005*question + 0.005*south + 0.005*death + 0.005*resident + 0.005*seek + 0.004*Commonwealth + 0.004*Brisbane + 0.004*west + 0.004*facility + 0.004*source'),
 (13,
  '0.007*catch + 0.006*well + 0.006*Wales + 0.006*Attorney_General + 0.005*investigation + 0.005*move + 0.005*annual + 0.005*question + 0.005*area + 0.005*Peter + 0.005*shoot_dead + 0.005*training + 0.004*collapse + 0.004*source + 0.004*discuss + 0.004*attempt + 0.004*Royal + 0.004*World + 0.004*surrender + 0.004*blame'),
 (14,
  '0.007*australian_cricket + 0.007*right + 0.006*agree + 0.006*initial + 0.006*major + 0.005*passenger + 0.005*community + 0.005*edge + 0.005*person + 0.004*protect + 0.004*radical + 0.004*structure + 0.004*stage + 0.004*television + 0.004*negotiation + 0.004*life + 0.004*security_force + 0.004*firm + 0.004*al + 0.004*circumstance'),
 (15,
  '0.011*wave + 0.006*move + 0.006*address + 0.005*raise + 0.005*facility + 0.005*evidence + 0.005*financial + 0.005*March + 0.005*election + 0.005*champion + 0.005*continue + 0.004*toll + 0.004*commission + 0.004*leg + 0.004*resign + 0.004*role + 0.004*conduct + 0.004*word + 0.004*system + 0.004*offer'),
 (16,
  '0.006*company + 0.005*captain + 0.005*level + 0.005*understand + 0.005*man + 0.005*rival + 0.004*concern + 0.004*board + 0.004*represent + 0.004*carry + 0.004*Queensland + 0.004*detail + 0.004*feel + 0.004*blame + 0.004*well + 0.004*Britain + 0.004*Brisbane + 0.004*Australian + 0.004*role + 0.004*spend'),
 (17,
  '0.007*prove + 0.006*well + 0.006*Highway + 0.006*Martin + 0.006*nearby + 0.005*Michael + 0.005*task + 0.005*growth + 0.005*package + 0.005*escalate + 0.005*serve + 0.005*grant + 0.004*figure + 0.004*expert + 0.004*Andy_Bichel + 0.004*conduct + 0.004*batsman + 0.004*big + 0.004*state + 0.004*military'),
 (18,
  '0.011*area + 0.008*AFP + 0.007*arrest + 0.006*political + 0.006*release + 0.006*station + 0.006*later + 0.005*television + 0.005*issue + 0.005*Mark + 0.005*flee + 0.005*official + 0.005*prepare + 0.005*target + 0.005*suspend + 0.004*coast + 0.004*night + 0.004*death + 0.004*decade + 0.004*injure'),
 (19,
  '0.008*election + 0.006*part + 0.005*light + 0.005*announce + 0.005*victory + 0.005*Parliament + 0.005*principle + 0.005*confidence + 0.004*woman + 0.004*get + 0.004*Argentina + 0.004*batsman + 0.004*well + 0.004*campaign + 0.004*final + 0.004*sweep + 0.004*HIH + 0.004*mass + 0.004*Mark + 0.004*palestinian_security')]

LDA - Latent Dirchlet Allocation

LDA, or Latent Dirchlet Allocation is arguably the most famous topic modeling algorithm out there. Out here we create a simple topic model with 10 topics.

lda_model = LdaModel(corpus=corpus, num_topics=10, id2word=dictionary)
lda_model.show_topics()
[(0,
  '0.009*"Pakistan" + 0.008*"claim" + 0.008*"year" + 0.006*"Afghanistan" + 0.006*" " + 0.006*"give" + 0.006*"force" + 0.006*"attack" + 0.005*"day" + 0.005*"report"'),
 (1,
  '0.015*"people" + 0.010*"kill" + 0.007*"company" + 0.007*"day" + 0.006*"fire" + 0.006*"near" + 0.006*"staff" + 0.006*"israeli" + 0.005*"include" + 0.005*"come"'),
 (2,
  '0.010*"people" + 0.009*"month" + 0.009*"arrest" + 0.007*"israeli" + 0.007*"year" + 0.007*"day" + 0.007*"area" + 0.006*"Afghanistan" + 0.006*"palestinian" + 0.006*"United_States"'),
 (3,
  '0.008*"Qantas" + 0.007*"government" + 0.006*"Taliban" + 0.006*"Pakistan" + 0.006*"area" + 0.006*"police" + 0.005*"cent" + 0.005*"day" + 0.005*"new" + 0.005*"year"'),
 (4,
  '0.017*"Australia" + 0.010*"people" + 0.009*"man" + 0.008*"force" + 0.007*"year" + 0.007*"australian" + 0.006*"official" + 0.006*"report" + 0.006*"new" + 0.006*"cent"'),
 (5,
  '0.011*"company" + 0.009*"day" + 0.009*"fire" + 0.008*"test" + 0.007*"australian" + 0.007*"play" + 0.007*"think" + 0.007*"Sydney" + 0.006*"team" + 0.006*"start"'),
 (6,
  '0.009*"report" + 0.007*"government" + 0.007*"detainee" + 0.006*"year" + 0.006*"run" + 0.006*"time" + 0.006*"day" + 0.006*"people" + 0.006*"think" + 0.005*"fire"'),
 (7,
  '0.018*"Australia" + 0.011*"wicket" + 0.010*"day" + 0.010*"South_Africa" + 0.010*"israeli" + 0.009*"palestinian" + 0.009*"attack" + 0.009*"test" + 0.008*"take" + 0.008*"Arafat"'),
 (8,
  '0.008*"metre" + 0.006*"year" + 0.006*"hour" + 0.006*"tell" + 0.006*"kill" + 0.005*"group" + 0.005*"company" + 0.005*"attack" + 0.005*"win" + 0.005*"palestinian"'),
 (9,
  '0.012*"force" + 0.009*"Arafat" + 0.008*"israeli" + 0.007*"Afghanistan" + 0.007*"new" + 0.007*"australian" + 0.006*"attack" + 0.006*"believe" + 0.006*"palestinian" + 0.006*"people"')]

15.3.7 pyLDAvis

Python library for interactive topic model visualization. This is a port of the fabulous R package by Carson Sievert and Kenny Shirley.

pyLDAvis is designed to help users interpret the topics in a topic model that has been fit to a corpus of text data. The package extracts information from a fitted LDA topic model to inform an interactive web-based visualization.

The visualization is intended to be used within an IPython notebook but can also be saved to a stand-alone HTML file for easy sharing.

import pyLDAvis.gensim

pyLDAvis.enable_notebook()
pyLDAvis.gensim.prepare(lda_model, corpus, dictionary)
C:\Users\xinlu\anaconda3\lib\site-packages\pyLDAvis\_prepare.py:243: FutureWarning:

In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only.
  1. Topics on the left while their respective keywords are on the right.
  2. Larger topics are more frequent and closer the topics, mor the similarity
  3. Selection of keywords is based on their frequency and discriminancy.

15.4 BERTopic

https://maartengr.github.io/BERTopic/index.html

https://spacy.io/universe/project/bertopic

The default embedding model is all-MiniLM-L6-v2 when selecting language=“english” and paraphrase-multilingual-MiniLM-L12-v2 when selecting language=“multilingual”.

import spacy
from bertopic import BERTopic
docs = [" ".join(text) for text in texts]
topic_model = BERTopic()
topics, probs = topic_model.fit_transform(docs)
topic_model.get_topic_info()
topic_model.get_topic(0)
topic_model.get_document_info(docs)
#| column: screen-inset-right
fig = topic_model.visualize_topics()
fig.show()

15.5 其他资源

15.5.1 tomotopy

速度最快的LDA主题模型

https://github.com/bab2min/tomotopy

tomotopy is a Python extension of tomoto (Topic Modeling Tool) which is a Gibbs-sampling based topic model library written in C++. It utilizes a vectorization of modern CPUs for maximizing speed. The current version of tomoto supports several major topic models including

  • Latent Dirichlet Allocation (LDAModel)
  • Labeled LDA (LLDAModel)
  • Partially Labeled LDA (PLDAModel)
  • Supervised LDA (SLDAModel)
  • Dirichlet Multinomial Regression (DMRModel)
  • Generalized Dirichlet Multinomial Regression (GDMRModel)
  • Hierarchical Dirichlet Process (HDPModel)
  • Hierarchical LDA (HLDAModel)
  • Multi Grain LDA (MGLDAModel)
  • Pachinko Allocation (PAModel)
  • Hierarchical PA (HPAModel)
  • Correlated Topic Model (CTModel)
  • Dynamic Topic Model (DTModel)
  • Pseudo-document based Topic Model (PTModel).

15.5.2 cntext

中文sentiment分析、readability分析。

https://pypi.org/project/cntext/

https://github.com/hiDaDeng/cntext/blob/main/chinese_readme.md

  • stats文本统计指标
    • 词频统计
    • 可读性
    • 内置pkl词典
    • 情感分析
  • dictionary 构建词表(典)
    • Sopmi 互信息扩充词典法
    • W2Vmodels 词向量扩充词典法
    • Glove词向量模型
  • similarity 文本相似度
    • cosine相似度
    • jaccard相似度
    • 编辑距离相似度
  • mind.py计算文本中的认知方向(态度、偏见)