PUBMED/QUERY2_RUN1 ================== First run on query2: anthrax or smallpox or vx or sarin or terrorism or bioterrorism or ricin or sars or terrorist or plague or tularemia or nile or brucellosis or anthracis or syndromic or ebola or botulism or glanders or melioidosis or "mustard gas" or soman or tabun or lewisite or "hemorrhagic fever" Entrez Pubmed reported yield of 50629 records. Splitting into files yielded = 47551. Extracted abstracts = 24456. PMID = Pubmed id number associated with publication. Other than the date, no normalization has been done. =================== top-level directory =================== expanded.zip XML files from query, all in one file. pubmed_query2_split.zip One file per abstract. (47551 files) pubmed_query2_abstracts.zip One abstract per file, named with PMID. (24456 files) -------------- pubmed_query2_pmid.zip = Contains the files below, pulled from xml files. -------------- err.txt = Extraction error messages. id_author.txt 2-column file pmid lastname_initials id_country.txt 2-column file pmid country id_language.txt 2-column file pmid language id_pubdate.txt 2-column file pmid publication_date id_title.txt 2-column file pmid title -------------- pubmed_query2_run1_bagin.zip = Word lists & frequency, via processing abstracts. -------------- allwords.txt 2-column file frequency word_text infreq.txt 2-column file wordid word_text (Infrequent words) puncts.txt 2-column file wordid word_text stops.txt 2-column file wordid word_text summary.txt Miscellaneous run statistics. words.txt 2-column file wordid word_text (Frequent words) -------------- pubmed_query2_run1_bagout.zip = 2nd pass on abstracts, with gathered word info. -------------- docwords.txt 3-column file pmid wordid frequency -------------- pubmed_query2_run1_makeremap.zip = Mapping of PMID to sequential numeric id. -------------- map.txt 2-column file id pmid -------------- pubmed_query2_run1_dataset.zip = Final results, mapped to sequential id. -------------- docwords.txt 3-column file id wordid frequency id_author.txt 2-column file id lastname_initials id_country.txt 2-column file id country id_language.txt 2-column file id language id_pubdate.txt 2-column file id publication_date id_title.txt 2-column file id title map.txt 2-column file id pmid (Copy of file from make_remap) words.txt 2-column file wordid word_text (Copied from bag_in)