Dataset title: Amazon Rainforest Microbial Observatory Metagenomes Data files are located at: http://lib-vm-rdmi.uoregon.edu/data/21993/ Basic information: This data archive contains 79 files of quality-filtered, shotgun metagenomic DNA sequence data as well as community matrices for methanogens and methanotrophs detected in the study. Data were generated as part of the Amazon Rainforest Microbial Observatory (ARMO) project. Data file names: See metadata file (ARMO_metagenome_metadata.txt). Other Files Related to These Data: ARMO_metagenome_metadata.txt, Methanogen_otu_table.txt, Methanotroph_otu_table.txt Author names: Brendan J. M. Bohannan, Kyle M. Meyer, Ann M. Klein, Jorge L. M. Rodrigues, Klaus Nusslein, Susannah G. Tringe, Babur S. Mirza, James M. Tiedje Contact information: Ann M. Klein, Institute of Ecology and Evolution, University of Oregon, annmaureenklein@gmail.com Dates of data collection: 2010-04 Geographic information: Samples were collected at the ARMO site (10°10’5’’ S and 62°49’27’’ W) in the Brazilian state of Rondônia. Methodological information: Ten soil cores were collected from ARMO in April 2010 (5 soil cores from primary rainforest and 5 from a 38 year-old converted pasture). Soil was sampled to a depth of 10 cm (after removal of the litter layer) using standard coring methods and homogenized. Samples were frozen on the spot, transported on dry ice, and stored at -80° C until extraction. DNA was extracted from five soil subsamples per core (i.e. 50 extractions per 10 soil cores). Metagenomic libraries were constructed from the 10 samples using the Illumina TruSeq kit with ~270 bp insert sizes according to standard protocol. Sequencing of 150 bp paired-end reads was performed on the Illumina HiSeq platform at the Joint Genome Institute. In total, 21 lanes (2-3 lanes per sample) were sequenced to produce 6.4 billion paired-end reads, resulting in an average of 636 million (±12%) reads per sample. Raw sequences were uploaded to MG-RAST (http://metagenomics.anl.gov), and paired-end reads were joined using fastq-join as part of the MG-RAST pipeline. Single end reads that could not be joined were retained. After merging paired-end reads, a total of 6.3 billion sequences with an average length of 171 bp were processed through the MG-RAST pipeline. All other pipeline options were left as default (i.e. trimming of low quality bases, removal of artificial replicate sequences, and filtering of sequences with greater than 5 ambiguous bases). Specific information needed to understand or interpret the data: The metadata file (ARMO_metagenome_metadata.txt) contains information for relating individual sequences file to the original samples. File names correspond to name of the sample, the lane and the number of the subset. For example, in the file name "Forest_A001_1_split_1.fasta" the "_1_" is for lane 1 (Forest_A001 was sequenced on two lanes). Each lane of data was split into 3-4 files, so "split_1" refers to the first one. The column headings of the metadata are as follows: MGRAST_ID = unique identifier for each sequence file uploaded to MG-RAST sequence_run = sequencing lane (each sample was sequenced on more than 1 lane) split_number = subset number bps = number of basepairs in the file sequences = number of sequences in the file file_name = name of fasta file Organismal community matrices were obtained via the MG-RAST M5RNA database which assigns taxonomy strictly from ribosome-encoding genes including those from the SILVA, RDP, and Greengenes databases. We used the “Representative Hit” classification method for organismal annotation, which selects a single, unambiguous annotation for each feature and assigns taxonomy. Default parameters (e-value cutoff = 1e-5, Min. % identity = 60%, Min. alignment length cutoff = 15) were used for taxonomic annotations. The community matrix was rarefied to 195,000 observations per sample to achieve approximate equal sampling depth across samples. Methanogen and methanotroph taxa were subset from the community matrix and stored as separate community matrices (Methanogen_otu_table.txt, Methanotroph_otu_table.txt). In each of these files, columns represent the individual samples (see metadata file: ARMO_metagenome_metadata.txt) and rows represent individual taxa. The 'Type' column shows in which group each taxon is categorized. Licenses: Attribution-NonCommercial-NoDerivs CC BY-NC-ND Funding: Agriculture and Food Research Competitive Grant 2009, United States Department of Agriculture, Grant number 35319-05186