Services - Genome Atlas (SOAP) [Provided by: Center for Biological Sequence Analysis (CBS)]

Provider:
Center for Biological Sequence Analysis (CBS)

Location:
European Union

Submitter / Source:
pfhallin (about 1 year ago)

Base URL:
http://ws.cbs.dtu.dk/cgi-bin/soap/ws/quasi.cgi

WSDL Location:
http://www.cbs.dtu.dk/ws/GenomeAtlas/GenomeAtlas_3_0_ws1.wsdl(download last cached WSDL file)

Documentation URL(s): pfhallin (about 1 year ago)http://www.cbs.dtu.dk/ws/GenomeAtlas

Login to add a documentation URL Description(s): from provider’s description doc (about 1 year ago) This Web Service accesses the database records and various tools of the
GenomeAtlas database v3. The records maintained by this database are synchronized regularly
with the Entrez Genome Project (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi?view=1)

#
# DATABASE LOOK-UP FUNCTIONS
#
1. getSeq
Get one or more genomic sequences from the Genome Atlas database (update regularly against
Entrez Microbial Genomes), providing the genbank accession number.
Input:
* ‘genbank’ : A genbank accession number
Output:
* ‘sequencedata’
* ‘sequence’ [array]
* ‘id’ : id sequence
* ‘comment’: comment of sequence
* ‘seq’ : The DNA sequence of the genome
2. getProt
Get the protein sequences encoded by annotated coding regions of GenBank record
Input:
* ‘genbank’ : A genbank accession number
Output:
* ‘sequencedata’
* ‘sequence’ [array]
* ‘id’ : id sequence
* ‘comment’: comment of sequence
* ‘seq’ : The translations of the predicted protein coding genes

3. getOrfs
Get the nucleotide sequences of annotated coding regions of GenBank record
Input:
* ‘accession’: Ond or more GenBank accession numbers.
Output:
* ‘contig’
* ‘id’: accession number as provided in input
* ‘sequencedata’: An array of sequencedata objects
* ‘id’ : The identifier of the sequence ( output from GenBank record converter )
* ‘seq’ : Protein coding DNA sequence

4. queryGenomes
Query records of the GenomeAtlas database
Input:
* ‘search’ : Records can be search by various optional fields (AND separated) All fields
except ‘pid’ are surrounded by wildcards.
* ‘kingdom’ : bacteria / archaea
* ‘phyla’ : Phyla
* ‘pid’ : Project id
* ‘organism’ : Organism name
* ‘genbank’ : Genbank accession number
* ‘refseq’ : RefSeq accession number
* ‘segment’ : Segment / replicon name (e.g. ‘GENOME[PID]’, ‘Chromosome”‘, ‘pVir’ …)
* ‘hideMerged’ : yes / no: Hide merged segments (GENOME[PID])

Output: An array of entries containing:
* ‘descriptions’ : A genome atlas database record
* ‘entry’
* ‘field’ : The name of the field (e.g. ATCONTENT, NGENES)
* ‘description’ : A descriptive text for the field
* ‘entry’ : A genome atlas database record
* ‘kingdom’ : bacteria / archaea
* ‘phyla’ : Phyla
* ‘pid’ : Project id
* ‘organism’ : Organism name
* ‘genbank’ : Genbank accession number
* ‘refseq’ : RefSeq accession number
* ‘segment’ : Segment / replicon name (e.g. ‘GENOME[PID]’, ‘Chromosome”‘, ‘pVir’ …)
* ‘properties’ : Returned the calculated gemomic properties of the segment
* ‘ATCONTENT’
* ‘NGENES’
* ‘LENGTH’
* ‘BPPRGENE’
* ‘CODING_FRACTION’
* ‘GEOMETRY’
* ‘RNAMMER_TSU_COUNT’
* ‘RNAMMER_SSU_COUNT’
* ‘RNAMMER_LSU_COUNT’
* ‘GLO_DIR_REPEAT’
* ‘GLO_INV_REPEAT’
* ‘SR_PERCENT’
* ‘ANN_TRNA_COUNT’
* ‘TRNA_SCAN_COUNT’
* ‘TRUE_PROTEINS’
* ‘TRUE_PROT_RATIO’
* ’60_ORIGIN’
* ’60_TERMINUS’
* ‘ADNACC’
* ‘CURVATURE_AVG’
* ‘ELHASSAN_AVG’
* ‘OLSON_AVG’
* ‘ORNSTEIN_AVG’
* ‘RRRECIEVER_COUNT’
* ‘HISKA_1_COUNT’
* ‘HISKA_2_COUNT’
* ‘HISKA_3_COUNT’
* ‘HISKA_COUNT’
* ‘HWE_HK_COUNT’
* ‘LOC_DIR_REPEAT’
* ‘LOC_EVR_REPEAT’
* ‘LOC_INV_REPEAT’
* ‘LOC_MIR_REPEAT’

5. getFeatures
Get details for all annotated features of a single genbank record
Input:
* ‘accession’ : Genbank accession number
* ‘features’ : Comma separated list of features to be returned
(e.g. all or cds,rrna,trna)
* ‘keys’ : Comma separated list of keys to be returned
(e.g. all or locus_tag,gene,translation)

Output: ‘features’: An array of ‘feature’ elements, containing:
* ‘type’ : feature type, e.g. CDS, rRNA, tRNA
* ‘begin’ : lower boundary of annotation
* ‘end’ : upper boundary of annotation
* ‘end’ : upper boundary of annotation
* ‘dir’ : Annotation direction + or /
* ‘label’ : Acquired from ‘gene’ annotation
* ‘featurekey’ : An array of additional annotation keys provided in the Genbank record
* ‘Key’ : the annotation key, e.g. ‘product’
* ‘Value’ : the annotation value, e.g. ’16S ribosomal RNA’

Please be aware, that begin and end refers to the boundaries of the annotation,
meaning that if multiple concatenations/junctions are present in the annotation, begin
end and will only refer to the smallest and largest of those numbers. To get a detailed map
of the junction, this is found in the ‘featurekey’ element, having attribute key=coordinates.

#
# TOOLS
#

6. DNApropertyRun
Calculates structural and physical properties of the DNA molecule. These properties
are used in the DNA Atlas representation on the Genome Atlas web pages. Properties include
Intrinsic Curvature, Stacking energy, position preference, various repeats etc. (please see
below for documentation). Use operation ‘pollQueue’ to poll the status of the job.

Input:
* ‘method’ : Calculation method, specifying which result are to be generated,
e.g. ‘Intrinsic Curvature’ (see documentation below)
* ‘sequence’
* ‘id’ : Sequence identifier
* ‘seq’ : DNA sequence

The following DNA properties can be calculated:

Intrinsic Curvature
DNA curvature is calculated using the CURVATURE programme (Bolshoy et al. 1991, Shpigelman
et al. 1993). The term curved DNA here refers to DNA that is intrinsically curved
in solution and can be readily characterised by anomalous migration in acrylamide
gels. There are different models for curved DNA (Sinden et al. 1998), although the
predictions for curvature fragments largerthan a few hundred bp is essentially the
same (Haran et al. 1994). The scale is in arbitrary “Curvature units”, which ranges
from 0 (e.g. no curvature) to 1.0, which is the curvature of DNA when wrapped around
the nucleosome. The scale used for this atlas ranges 3 standard deviations around
the mean.

* R.R. Sinden and C.E. Pearson and V.N. Potaman and D.W. Ussery DNA: Structure and
Function (1998) 5A:1-141

* E.S. Shpigelman and E.N. Trifonov and A. Bolshoy CURVATURE: Software for the Analysis
of Curved DNA. (1993) 9:435-444

* T.E. Haran and J.D. Kahn and D.M. Crothers Sequences elements responsible for
DNA curvature (1994) 225:729-738

* A. Bolshoy and P. McNamara and R.E. Harrington and E.N. Trifonov Curved DNA Without
A-A – Experimental Estimation of All 16 DNA Wedge Angles (1991) 88:2312-2316

Position Preference
– a trinucleotide model based on the preferential location
of sequences within nucleosomal core sequences (Satchwell et al. 1986). We use the
magnitude (e.g.absolute values) of the trinucleotide numbers as a measure of DNA
flexibility (Baldi et al. 1996). The trinucleotide values range from essentially
zero (0.003, presumably more flexible), to 0.28 (considered rigid). Since very few
of the trinucleotide have values close to zero (e.g. little preference for nucleosome
positioning), this measureis considered most sensitive towards the low (“flexibity”)

* S.C. Satchwell and H.R. Drew and A.A. Travers Sequence periodicities in chicken
nucleosome core DNA (1986) 191:659-675

* P. Baldi and S. Brunak and Y. Chauvin and A. Krogh Naturally occurring nucleosome
positioning signals in human exons and introns. (1996) 263:503-510

Stacking Energy
Base-stacking energies are from the dinucleotide values provided by (Ornstein et
al. 1978). The scale is in kcal/mol, and the dinucleotide values range from -3.82
kcal/mol (will melt easily) up to a maximum value of -14.59 kcal/mol (which would
require more energy to destack or melt the helix). (All 10 values are listed in the
table below.) A positive peak in base-stacking (i.e., numbers closer to 0) reflectsregions
of the helix which would de-stack or melt more readily. Conversely, minima (larger
negative numbers) in this plot would represent more stable regions of the chromosome.

Dinucleotide melting energies in kcal/mols:

(GC).(GC) -14.59
(AC).(GT) -10.51
(TC).(GA) -9.81
(CG).(CG) -9.61
(GG).(CC) -8.26
(AT).(AT) -6.57
(TG).(CA) -6.57
(AG).(CT) -6.78
(AA).(TT) -5.37
(TA).(TA) -3.82

* R.L. Ornstein and R. Rein and D.L. Breen and R.D. MacElroy An optimized potential
function for the calculation of nucleic acid interaction energies. I. Base stacking
(1978) 17:2341-2360

Protein Deformability
“Protein Induced Deformability” dinucleotide values are from protein induced deformation
of DNA helices as determined by examination of more than a hundred cr et et al. 1997al
structures of DNA/protein complexes (Olson et al. 1998). The dinucleotide values
range from 2.1 (the least deformable dinucleotide), to 12.1 (i.e., the dinucleotide
step (CpG), which is often deformed by proteins). Thus, on this scale, a larger value
reflects a more deformable sequence whilst a smaller value indicates a region where
the DNA helix is less likely to be changed dramatically by proteins. The average
protein deformability value in the entire E. coli K-12 genome is 5.12.

* Goffeau et al. The Yeast Genome Directory (1997) 387 (supplement):5-105

* W.K. Olson and A.A. Gorin and X.J. Lu and L.M. Hock and V.B. Zhurkin DNA sequence-dependent
deformability deduced from protein-DNA crystal complexes. (1998) 95:11163-11168

Propeller twist
We use propeller twist as a measure of helix rigidity, since the propeller twist
angles have been shown to be inversely related to rigidity of the DNA helix in crystals
(el Hassan et al. 1996). Thus, a region with high propeller twist would
mean the helix is quite rigid in this area, and similarly regions that are quite
flexible would have a low propeller twist. Propeller twist values were obtained from
cr et et al. 1997allographic data (el et al. 1996), with the exception of the TA
step, which was taken from a theoretical estimate (Gorin et al. 1995). Plots using
other sets of propeller twist dinucleotide values were very similar (data not shown).
The average propeller twist value in the entire E. coli K-12 genome is -12.63 degrees.

* Goffeau et al. The Yeast Genome Directory (1997) 387 (supplement):5-105

* M.A. el Hassan and C.R. Calladine Propeller-twisting of base-pairs and the conformational
mobility of dinucleotide steps in DNA. (1996) 259:95-103

* A.A. Gorin and V.B. Zhurkin and W.K. Olson B-DNA twisting correlates with base-pair
morphology. (1995) 247:34-48

DNase I Sensitivity
DNase I values are based on experimentally determined trinucleotide values (Brukner
et al. 1995, Brukner et al. 1995). These values are reflectiveof the anisotropic
flexibility or “bendability” of a particular DNAsequence. The trinucleotide values
range from -0.280 (rigid) to +0.194 (very “bendable” towards the major groove). Smoothing
over a large regions, (which is necessary for viewing entire genomes) tends to smooth
out differences in bendability. The average DNase I (“bendability”) value in the

* I. Brukner and R. Sanchez and D. Suck and S. Pongor Sequence-dependent bending
propensity of DNA as revealed by DNase I: parameters for trinucleotides. (1995) 14:1812-1818

* I. Brukner and R. Sanchez and D. Suck and S. Pongor Trinucleotide models for DNA
bending propensity: comparison of models based on DNaseI digestion and nucleosome
packaging data. (1995) 13:309-317

Palindromic hexamers
For a given sequence, any palindrome of 6 nt (e.g., AAATTT) is given a value of 1,
while all bases not included inpalindromic hexamers are given a value of 0 (van et
al. 2003).

* van Noort V, Worning P, Ussery DW, Rosche WA, Sinden RR Strand misalignments lead
to quasipalindrome correction (2003) 19:365-9

G Content
The “G Content” of a given sequence is merely the fraction of G’s in a given sequence
(Jensen et al. 1999). It can range from 0(no G’s), to 1 (all G’s). For a sequence
that is 50% AT content, one would expect roughly 25% G’s.