Spectrum Translator Assumptions

Back

Overview

CONNJUR-ST uses a combination of vendor documentation and file exploration to determine the contents of files. We call data about data metadata. ST reads both NMR data and a subset of metadata for translation from one format to another. This document outlines assumptions utilized in developing CONNJUR-ST; where does the come from?

The design philosophy of the spectrum translator is that if a translation can not be performed with a high degree of confidence, it should not be performed. Therefore in practice if ST encounters what it considers illogical or inconsistent metadata, or the size of the NMR data is not what ST expects, it will output an error message and stop conversion. In the event more NMR data is present than expected a configuration switch allows the user to direct ST to ignore the excess.

NMRPipe

NMRPipe files contain a header of float encoded metadata followed by a block of NMR spectrum data. Files fdatap.html and fdatap.h in the NMRPipe distribution document the header. The table below documents usage by Spectrum Translator. Usage codes are as follows:

  • L indicates the metadata is used to determine the layout of NMR data.
  • M indicates metadata read and written by ST as part of the translation that does not affect the NMR data layout.
  • MWindicates metadata written by ST as part of the translation that does not affect the NMR data layout.
ST name Offset NMRpipe name Dimension Description Usage
ENDIAN_CONST_INDEX 2 na Constant value used to determine endianess L
DIMENSIONS 9 FDDIMCOUNT Number of dimensions present L
DIMORDER1 24 FDDIMORDER1 First x/y/z/a dimension designation M
DIMORDER2 25 FDDIMORDER2 Second x/y/z/a dimension designation M
DIMORDER3 26 FDDIMORDER3 Third x/y/z/a dimension designation M
DIMORDER4 27 FDDIMORDER4 Fourth x/y/z/a dimension designation M
D1SIZE 99 FDSIZE First dimension size L
D2SIZE 219 FDSPECNUM Second dimension size L
D3SIZE 15 FDF3SIZE Third dimension size L
D4SIZE 32 FDF4SIZE Fourth dimension size L
D2TYPE 56 FDF2QUADFLAG First Complex or real data L
D1TYPE 55 FDF1QUADFLAG Second Complex or real data L
D3TYPE 51 FDF3QUADFLAG Third Complex or real data L
D4TYPE 54 FDF4QUADFLAG Fourth Complex or real data L
D1SWEEPWIDTH 229 FDF1SW First sweep width M
D2SWEEPWIDTH 100 FDF2SW Second sweep width M
D3SWEEPWIDTH 11 FDF3SW Third sweep width M
D4SWEEPWIDTH 29 FDF4SW Fourth sweep width M
FDF1AQSIGN 475 FDF1AQSIGN First combined alternating / negate imaginaries flag M
FDF2AQSIGN 64 FDF2AQSIGN Second combined alternating / negate imaginaries flag M
FDF3AQSIGN 476 FDF3AQSIGN Third combined alternating / negate imaginaries flag M
FDF4AQSIGN 477 FDF4AQSIGN Fourth combined alternating / negate imaginaries flag M
FDF1FTFLAG 222 FDF1FTFLAG First Time or Frequency domain info 1 M
FDF2FTFLAG 220 FDF2FTFLAG Second Time or Frequency domain info M
FDF3FTFLAG 13 FDF3FTFLAG Third Time or Frequency domain info M
FDF4FTFLAG 31 FDF4FTFLAG Fourth Time or Frequency domain info M
FDF2OBS 119 FDF2OBS First "Observed MHz, or spectral frequency" M
FDF1OBS 218 FDF1OBS Second "Observed MHz, or spectral frequency" M
FDF3OBS 10 FDF3OBS Third "Observed MHz, or spectral frequency" M
FDF4OBS 28 FDF4OBS Fourth "Observed MHz, or spectral frequency" M
FDF1CAR 67 FDF1CAR First Carrier ppm M
FDF2CAR 66 FDF2CAR Second Carrier ppm M
FDF3CAR 68 FDF3CAR Third Carrier ppm M
FDF4CAR 69 FDF4CAR Fourth Carrier ppm M
FDF1P0 245 FDF1P0 First Zero order phase correction M
FDF2P0 109 FDF2P0 Second Zero order phase correction M
FDF3P0 60 FDF3P0 Third Zero order phase correction M
FDF4P0 62 FDF4P0 Fourth Zero order phase correction M
FDF1P1 246 FDF1P1 First First order phase correction M
FDF2P1 110 FDF2P1 Second First order phase correction M
FDF3P1 61 FDF3P1 Third First order phase correction M
FDF4P1 63 FDF4P1 Fourth First order phase correction M
FDCOMMENT 312 FDCOMMENT Comment field 160 characters M
FDF1LABEL 18 FDF1LABEL First Nucleus label M
FDF2LABEL 16 FDF2LABEL Second Nucleus label M
FDF3LABEL 20 FDF3LABEL Third Nucleus label M
FDF4LABEL 22 FDF4LABEL Fourth Nucleus label M
TRANSPOSED 221 FDTRANSPOSED Dimensions are transposed flag M
FD2DPHASE 256 FD2DPHASE Second States/TPPI designation MW

Rowland Toolkit

The Rowland Toolkit format is documented in the online manual. It consists of an ASCII parameter ("par") file and separate binary NMR Data file.

The following lines are used to determine the file layout. The number of dimensions is implicitly determined based on the number of columns present.

  • Dom indicates the domain (time/frequency) of the data. 1
  • Format indicates endianess and datatype.
  • N indicates the number of points in each dimension and whether the data is real/complex.
  • Layout indicates ordering of the data in the binary file and provides a secondary description of the number of points in a dimension.

The following lines are read and/or written for metadata support.

  • Cphase indicates zero order phase correction term.
  • Lphase indicates the first order phase correction term.
  • Sf indicates spectral frequency.
  • Ppm indicates ppm of the carrier frequency
  • Nacq indicates the number points. ST only writes this as it is redundant with N.
  • Quad indicates quadrature; that is States, TPPI, et. al.
  • Comment is the data set comment.

Varian

Varian information is stored in a binary file fid and an ASCII procpar file. Metadata exists in both files. Documentation of the binary fid file is found in VNMR User Programming VNMR 6.1C Software2 and documentation of the ASCII procpar file is found in VNMR Command and Parameter Reference Varian NMR Spectrometer Systems With VNMR 6.1C Software 3.

The binary file is composed of multiple blocks separated by a block header. From the header information about the number of blocks, the type of data (float or 16 bit integer of 32 bit integer), whether the data is time or frequency domain.1, and a valid data flag is read.

The procpar files in parsed and a subset of parameters are used. Currently only uniformly sampled data is supported. The table below documents usage by Spectrum Translator. Usage codes are as follows:

  • L indicates the metadata is used to determine the layout of NMR data.
  • M indicates metadata read and written by ST as part of the translation that does not affect the NMR data layout.

Some metadata depends on the channel assignment. This can be specified via the channel_assignment configuration option. (By default channel one is assigned dimension one, etc.) The Channel or Dimension column indicates whether data is assigned by channel (C) or dimension (D).

Procpar parameter Dimension or Channel Description Channel or Dimension Usage
tn 1 Name of nucleus C M
dn 2 Name of nucleus C M
dn2 3 Name of nucleus C M
dn3 4 Name of nucleus C M
dn4 5 Name of nucleus C M
np 1 Number points D L
ni 2 Number points D L
ni2 3 Number points D L
ni3 4 Number points D L
sfrq 1 Spectral frequency C M
dfrq 2 Spectral frequency C M
dfrq2 3 Spectral frequency C M
dfrq3 4 Spectral frequency C M
dfrq4 5 Spectral frequency C M
sw 1 Sweep width D M
sw1 2 Sweep width D M
sw2 3 Sweep width D M
sw3 4 Sweep width D M
rfl 1 Reference Peak Position C M
rfl1 2 Reference Peak Position C M
rfl2 3 Reference Peak Position C M
rfl3 4 Reference Peak Position C M
rfl4 5 Reference Peak Position C M
rfp 1 Reference Peak Frequency C M
rfp1 2 Reference Peak Frequency C M
rfp2 3 Reference Peak Frequency C M
rfp3 4 Reference Peak Frequency C M
rfp4 5 Reference Peak Frequency C M
rp 1 Zero order phase correction D M
rp1 2 Zero order phase correction D M
rp2 3 Zero order phase correction D M
rp3 4 Zero order phase correction D M
rp4 5 Zero order phase correction D M
lp 1 First order phase correction D M
lp1 2 First order phase correction D M
lp2 3 First order phase correction D M
lp3 4 First order phase correction D M
lp4 5 First order phase correction D M

When translating from Varian format Carrier PPM is calculated using the above values using the equation:

Carrier PPM = (Sweep Width/2 - Reference Peak Position + Reference Peak Frequency) / Spectral Frequency

When translating to Varian format the Reference Peak Frequency is set to zero and the inverse of the above equation used to calculate Reference Peak Position..

The layout of real and imaginaries numbers and whether a dimension is complex is inferred by the values of the array procpar parameter, as outlined in the table below. Varian data which does not follow this convention, e.g. custom pulse programs, cannot currently be translated by the Spectrum Translator.

array parameter value Dimension or Channel Description Channel or Dimension Usage
phase 2 Data complex D L
phase2 3 Data complex D L
phase3 4 Data complex D L

The procpar file is also used to support configuration of Varian's VNMJR software. The ST Varian translation output does not directly support VNMJR, however, the procpar_template option may used to modify an existing procpar file with metadata translated from another data set.

Bruker

Bruker file formats are documented in the TopSpin Acquistion Reference Guide4. Two formats are used; one for raw time domain data coming from the spectrometer and another for data which has been processed, including conversion to frequency domain. Code for processed data is present in the Spectrum Translator but has not completed Quality Assurance testing.

Bruker data consists of a file system hiearchy of files. Time domain data is stored in numbered directories beneath a name root directory; the time domain directories contain a subdirectory named pdata containing numbered directories in which processed data is stored.

Both formats store metadata in dimension specific files.

Dimension Time domain data Processed data
1 acqus procs
2 acqu2s proc2s
3 acqu3s proc3s
4 acqu4s proc4s

The tables below documents usage by Spectrum Translator. Usage codes are as follows:

  • L indicates the metadata is used to determine the layout of NMR data.
  • M indicates metadata read and written by ST as part of the translation that does not affect the NMR data layout.
  • MWindicates metadata written by ST as part of the translation that does not affect the NMR data layout.

The following parameters are read for the dimension 1 (direct) time domain data.

Parameter Description Usage
BYTORDP whether data is big or little endian L
TD number of data points L
AQ_mod real or complex data L
PARMODE the number of dimensions L
SFO1 spectral frequency M
SW sweep width (spectral window) in ppm M
SW_h sweep width (spectral window) in Hertz MW
AQSEQ acquistion sequence (3D sets only) MW
NUC1 nucleus name M

The following parameters are read for the dimension 2 and above (indirect) time domain data.

Parameter Description Usage
FnMODE real or complex data and sign alternation L
TD number of data points L
SFO1 spectral frequency M
SW sweep width (spectral window) M
NUC1 nucleus name M

1Much of the Spectrum Translator has been written to support frequency domain data; however this code has not undergone quality assurance testing and is considered experimental.

2VNMR User Programming VNMR 6.1C Software Pub. No. 01-999165-00, Rev. A1200, (C) 2000, Varian, Inc.

3VNMR Command and Parameter Reference Varian NMR Spectrometer Systems With VNMR 6.1C Software Pub. No. 01-999164-00, Rev. B0801, (C) 2001, Varian, Inc.

4TOPSPINAcquistion Reference Guide Part Number H9775SA1 V2/February 3rd 2005 (C) 2005 Bruker BioSpin GmbH