Getting data from the spectrometer and converting it to a numpy (python) array

by Scott Anthony Robson Ph. D. | March 11, 2019, 1:56 p.m.

Getting Data From A Bruker Spectrometer:

Bruker NMR instruments record data into a directory that contains a number of files, which include files that describe how the data was acquired along with the actual experimental data collected from the sample. The experimental data is contained in a single file: a Free Induction Decay (FID) file or 'fid' file for 1D data or serial or 'ser' file for multidimensional data. Experimental details, chemical shift referencing, frequencies and what-not are stored in other auxillary files with consistent names. This means you usually need to not only load information from the 'fid' or 'ser' file but also additional information from these auxillary files. In this article, we will deal mostly with just reading in the actual experimental data in an FID or Serial file.

The FID or Serial File:

When data collection is initiated on a Bruker instrument the entire computer memory space for the data is recorded immediately as a series of zeros. In the case of a single 1D experiment with a single FID recorded, this is recorded onto the disk at the end of the experiment. There are cases where multiple 1D experiments are recorded. As each experiment is finished it is written sequentially to the fid or ser file. As the experiment progresses data is recorded over the top of the zeroes sequentially. 

What makes up an FID?

A regular quadrature detected FID is composed of a series of complex numbers. At the spectrometer when you request 2048 points in the directly detected dimension, you get 1024 real and 1024 imaginary points stored in the serial (ser) file. Bruker stores these numbers as interleaved real and imaginary 32 bit (4 byte) integers. The first data point gives the first real data point and the second data point gives the first imaginary data point. So to construct the first FID from the serial file, we grab the first 2048 numbers into a vector (or programatically an array), lets call it 'f' for now. f0 + f1i will equal the first complex point. f2 + f3i will equal the second complex point... etc. until we get to 1024 complex points. Lets do this using python code and some helpful python libraries. Each line of code will be commented (a '#' followed by explanatory text) wich should make it clear what going on. 

import numpy as np                       # import a library called numpy - we reference it as 'np'
byte_order = 1                           # this means integers are encoded as 
                                         # little endian in our example. 
file = '3D_NNOESY_long_linear/ser'       # location of the Bruker serial file, 'ser'
with open(file, 'rb') as serial_file:    # method of opening the file. We now reference it 
                                         # as 'serial_file'
    if byte_order == 0:                  # byte order refers to big or little endian - 
                                         # different ways of storing 32 bit integers
                                         # Don't worry. In this example, byte_order is 1.
                                         # we'll show you later how this is detected.
        raw_data = np.frombuffer(        # we read the file data into an 'np' vector 
                                         # called 'raw_data'
            serial_file.read(),          # this is the command to read it
            dtype='<i4'                  # this says 'read it as type big endian'
        )                                # close off the reading function
    elif byte_order == 1:                # or if the byte order is 1 - which it is for this example
                                         # lets read it as little endian
        raw_data = np.frombuffer(        # read into raw data
            serial_file.read(),          # the serial file
            dtype='>i4'                  # with little endian encoding
        )                                # and close off the reading function

This will read the entire serial file into a variable called 'raw_data'. One thing we can do with this variable is count how many points are in it. And since we know there should be 2048 data points per FID (1024 real + 1024 imaginary), we can find out how many FIDs are in the file. 

data_length = len(raw_data)               # len() returns length of the raw data variable.
                                          # or just the number of data points in it
num_points_per_fid = 2048                 # our FIDs are composed of 1024 real and 
                                          # 1024 imaginary points. Total = 2048
print(data_length/num_points_per_fid)     # print out length divided by num. points gives
                                          # number of FIDs

And the answer is 81600. This is a big data set! 

It turns out this is a 3D 15N edited NOE experiment. There are 68 (34 complex) points in the 15N dimension and 1200 (600 complex) points in the indirect 1H dimension. 68 * 1200 = 81600. So this checks out. 

For now lets just look at the first FID. Numpy allows us to access data within a vector like 'raw_data' easily. The way uses 'indexing' and is best described by example. We want the first FID. The real points are the first 2048 points in 'raw_data'. The imaginary points are the next 2048 points. So we access them like this:

fid_data = raw_data[0:2048]      # python indices reference from 0. This grabs data from
                                 # from 0th point to the 2047th point (total 2048)
            
reals = fid_data[0::2]           # This grabs points 0 through 2046 (the reals)
                                 # skipping every second point.. 0, 2, 4, 6, 8...

imags = fid_data[1::2]           # This grabs points 1 through 2047 (the imaginaries)
                                 # skipping every second point.. 1, 3, 5, 7, 9...

fid = reals + 1.j*imags          # we reconstruct the complex FID by adding reals to the 
                                 # imaginaries. But the imags variable is a vector of regular
                                 # numbers. So we multiply by '1.j', which is the same as the
                                 # mathematical notation 'i'. 'j' is used in engineering. 

We can now use a library called matplotlib to plot the FID. 

import matplotlib.pyplot as plt              # import plotting library as plt
plt.figure(figsize=(12,8))                   # make a figure. Size it 12 x 8. 
plt.subplot(2, 1, 1)                         # create subplot (2x1) and this is the first (1)
plt.plot(np.real(fid),                       # plot the real part of fid. 
         c='r',                              # Colour it red ('r')
         label='real'                        # Label is 'real'
        )  
plt.legend()                                 # plot the legend (the label)

plt.subplot(2, 1, 2)                         # new subplot, same (2x1) format. This one is (2)
plt.plot(np.imag(fid),                       # plot the imaginary part of fid. 
         c='k',                              # color it black (k = black. It means 'key' )
         label='imaginary'                   # Label is 'imaginary'
        )
plt.legend()                                 # plot the legend (the label)

plt.show()                                   # show it!

And here is our FID, extracted from a Bruker serial (ser) file. 

Bruker Oversampling and Digital Filtering (Advanced Topic):

Now, you may notice that the first 100 or so points in these plots are zeros before it ramps up to what looks like the beginning of the FID. This is some weird Bruker weirdness which really is just the result of how the data is recorded by the spectrometer. In order to get accurate samples, the digitizer samples more frequently than it really needs to. But these points get averaged to provide a smoother, more accurate measure of each point. But that is not entirely how the points are recorded. Few people really know how this works - and this author is not one of them. But we do know how to get the points we do need so our FIDs look like they should.

Below is a code sample of how it is done. We will need the data of course. We also need a number refered to as the Group Delay or grpdly. Sometimes Bruker doesn't give is the grpdly, but instead gives as two parameters used to set up the digitizer. They are dspfvs and decim. It is of little importance to us what these are or mean. But there is a one-to-one correspondance between these two values and a grpdly value. We know those values. In python we use a dictionary to map dspfvs and decim to a grpdly value when we don't have an explicit grpdly value to work with. Lets make a function that does this for us. 

def dd2g(dspfvs, decim):              # def defines a function called dd2g. It takes in two 
                                      # variables, dspfvs and decim. 

    dspdic = {                        # we make a python dictionary which maps dspfvs and decim
                                      # values to a grpdly value
        10: {
            2: 44.75,
            3: 33.5,
            4: 66.625,
            6: 59.083333333333333,
            8: 68.5625,
            12: 60.375,
            16: 69.53125,
            24: 61.020833333333333,
            32: 70.015625,
            48: 61.34375,
            64: 70.2578125,
            96: 61.505208333333333,
            128: 70.37890625,
            192: 61.5859375,
            256: 70.439453125,
            384: 61.626302083333333,
            512: 70.4697265625,
            768: 61.646484375,
            1024: 70.48486328125,
            1536: 61.656575520833333,
            2048: 70.492431640625,
            },

        11: {
            2: 46.,
            3: 36.5,
            4: 48.,
            6: 50.166666666666667,
            8: 53.25,
            12: 69.5,
            16: 72.25,
            24: 70.166666666666667,
            32: 72.75,
            48: 70.5,
            64: 73.,
            96: 70.666666666666667,
            128: 72.5,
            192: 71.333333333333333,
            256: 72.25,
            384: 71.666666666666667,
            512: 72.125,
            768: 71.833333333333333,
            1024: 72.0625,
            1536: 71.916666666666667,
            2048: 72.03125
            },

        12: {
            2: 46.,
            3: 36.5,
            4: 48.,
            6: 50.166666666666667,
            8: 53.25,
            12: 69.5,
            16: 71.625,
            24: 70.166666666666667,
            32: 72.125,
            48: 70.5,
            64: 72.375,
            96: 70.666666666666667,
            128: 72.5,
            192: 71.333333333333333,
            256: 72.25,
            384: 71.666666666666667,
            512: 72.125,
            768: 71.833333333333333,
            1024: 72.0625,
            1536: 71.916666666666667,
            2048: 72.03125
            },

        13: {
            2: 2.75,
            3: 2.8333333333333333,
            4: 2.875,
            6: 2.9166666666666667,
            8: 2.9375,
            12: 2.9583333333333333,
            16: 2.96875,
            24: 2.9791666666666667,
            32: 2.984375,
            48: 2.9895833333333333,
            64: 2.9921875,
            96: 2.9947916666666667
            }
        }

    return dspdic[dspfvs][decim]          # grpdly is returned based on dspfvs and decim values
                                          # in the dictionary

OK thats a lot of variables. But this is needed if you can only find the dspfvs and decim values in the files in a Bruker data directory. If you can find the 'grpdly' value in the files in the Bruker data directory, you should use it in the next step.

In our 3D_NNOESY_long_linear experiment we have the following values, but no grpdly was recorded. Generally the grpdly value is recorded in modern versions of Bruker's TopSpin software.

DECIM= 16 DSPFVS= 12

Using the dictionary above we see that these values give a grpdly of 71.625. 

Below, we define a function that does some exotic deconvolution of the Bruker digital sampling/filtering method. You don't really need to know how this works so it is offered with no comments or explanations. Just know, the function takes in a complex FID called data and a grpdly value. It will return the data, but typically the data will be somewhat truncated in the number of values. This really doesn't matter. 

def remove_bruker_filter(data, grpdly):

    n = float(len(data.shape))
    data = np.fft.fft(np.fft.ifftshift(data)) / n
    data = data * np.exp(2.j * np.pi * grpdly * np.arange(n) / n)
    data = np.fft.fftshift(np.fft.ifft(data)) * n
    skip = int(np.floor(grpdly + 2))    
    add = int(max(skip - 6, 0))           
    data[..., :add] = data[..., :add] + data[..., :-(add + 1):-1]
    data = data[..., :-skip]

    return data

If we put out first FID into this function we will see it becomes truncated by about 70 points. 

decim = 16
dspfvs = 12

grpdly = dd2g(dspfvs, decim)                        # set up grpdly
print(grpdly)                                       # prints 71.625

print(len(fid))                                     # print length of FID before filter removal
fid_adjusted = remove_bruker_filter(fid, grpdly)    # do the filter removal
print(len(fid_adjusted))                            # print length after filter removal

We see that the length befor was 1024 complex points and after is not 951 points. 

Lets plot the before and after so we get a feel for what has happened

plt.figure(figsize=(12,8))                   # make a figure. Size it 12 x 8. 
plt.subplot(4, 1, 1)                         # create subplot (2x1) and this is the first (1)
plt.plot(np.real(fid),                       # plot the real part of fid before. 
         c='r',                              # Colour it red ('r')
         label='real FID before'                  # Label is 'FID Before'
        )  
plt.legend()                                 # plot the legend (the label)

plt.subplot(4, 1, 2)                         # create subplot (2x1) and this is the first (1)
plt.plot(np.imag(fid),                       # plot the real part of fid before. 
         c='r',                              # Colour it red ('r')
         label='imaginary FID before'                  # Label is 'FID Before'
        )  
plt.legend()                                 # plot the legend (the label)

plt.subplot(4, 1, 3)                         # new subplot, same (2x1) format. This is the second one (2)
plt.plot(np.real(fid_adjusted),              # plot the real part of fid after adjustment. 
         c='k',                              # color it black (k = black. It means 'key' )
         label='real FID after'                   # Label is 'FID after'
        )
plt.legend()                                 # plot the legend (the label)

plt.subplot(4, 1, 4)                         # new subplot, same (2x1) format. This is the second one (2)
plt.plot(np.imag(fid_adjusted),              # plot the real part of fid after adjustment. 
         c='k',                              # color it black (k = black. It means 'key' )
         label='imaginary FID after'                   # Label is 'FID after'
        )
plt.legend()                                 # plot the legend (the label)


plt.show()                                   # show it!

Which gives:

And we see that the initial part of the FID which followed the zero line is now gone for both the real and imaginary component. This is the proper FID after the filter removal. The FID is now ready for Fourier Transformation by the Discrete Fourier Transform algorithm so we can look at the frequencies present in the signal (the chemical shifts of the atoms in the molecule). 

A close look/comparison between before and after and the real versus imaginary components suggests the real and imaginary components have been flipped. This is not some clerical error in the code. The Bruker Digital Filter is very bizzare from an outsiders point of view. It is not as simple as just truncating the first 60-70 or so points. We just have to accept it. Varian/Agilent data is not stored like this. 

 

In the next section we will learn a little about Fourier Transforms, how they work in Python/Numpy and how we plot the results in Matplotlib.


Back to All Articles.