get_wfm_windows.py 20.6 KB
Newer Older
1
2
from __future__ import division, print_function
import numpy as np
3
from mantid.simpleapi import *
4
from scipy.ndimage import gaussian_filter1d
Neil Vaytet's avatar
Neil Vaytet committed
5
6
7
8
try:
    from scipy.signal import find_peaks
except ImportError:
    print("Warning: scipy.signal.find_peaks could not be imported. "
Neil Vaytet's avatar
Neil Vaytet committed
9
10
          "Some functionality will be missing. "
          "Install scipy>=1.1.0 to fix this problem.")
11
12
13
14
15
16
17
18
19

################################################################################
################################################################################
################################################################################

""" Wave-frame multiplication window edge finder

    Author: Neil Vaytet, European Spallation Source, <neil.vaytet@esss.se>
    Date: 11/2018
20
    Last modified: 05/2019 : now using scipy to find peaks
21
22


23
24
    This file contains a method to automatically find the WFM frame edges, given
    some spectrum of intensity/counts vs time of flight.
25
26
27
28
29
30
31
32
33
34
35
36


    Examples:
    ---------

    1. Reading in a data file: the file must contain two columns: the first is
       TOF data (x) and the second is the amplitude (y). All other lines should
       be commented with a '#' sign so that `np.loadtxt()` ignores them.

       python get_wfm_windows.py --filename=spectrum.txt --plot

       This will print the window edges to standard output and produce an image
Neil Vaytet's avatar
Neil Vaytet committed
37
38
       wfm_auto_frames.pdf showing the different variables that were used to
       find the window limits.
39
40
41
42
43


    2. Calling from another python script: you must have a 2D array containing
       the x and y data.

Neil Vaytet's avatar
Neil Vaytet committed
44
       left_edges, right_edges = get_wfm_windows(data=my_2D_data_array, plot=True)
45
46
47
48
49


    3. Changing the thresholds: there are two different thresholds.

       - bg_threshold is used to find the global left and right edges of the
50
51
52
53
54
55
56
57
58
         signal over the background. To determine the background, we histogram
         the data in 50 intensity bins and assume that the background is the bin
         that contains the intensity which is most common in the data.
         The leading edge is the first point that exceeds the background value
         is first point whose value exceeds:
           background_value + bg_threshold * (ymax - background_value)
         where ymax is the maximum intensity in the data.
         The trialing edge is found in the same way, starting from the right
         end of the spectrum.
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
         The default value is bg_threshold = 0.05.

       - win_threshold is used to find the left and right edges of each pulse
         frame. We iterate to one side starting from the valley center. We will
         first iterate towards the right, to find the leading edge of the next
         window. The mean y value between this valley and the next one (`mean`)
         is computed. The window edge is the first value that exceeds the a
         fraction of the mean: `y > win_threshold * mean`.
         The default value is win_threshold = 0.3.

       python get_wfm_windows.py --filename=spectrum.txt --bg_threshold=0.1 \
           --win_threshold=0.5

    4. Get the output as an array of strings as direct input for Rebin:

Neil Vaytet's avatar
Neil Vaytet committed
74
       frame_parameters = get_wfm_windows(data=my_data, \
75
76
77
78
79
80
81
82
           rebin_step_for_string_output=64)

"""

################################################################################
################################################################################
################################################################################

Neil Vaytet's avatar
Neil Vaytet committed
83
def check_peaks(peaks, nwindows):
84
85
86
87
    """
    This function check if the number of peaks found is the correct one.
    """
    nvalleys = len(peaks)-2
88
89
90
91
92
93
94
95
96
    if nvalleys > 2*(nwindows - 1):
        print("Way too many valleys: should be {}, found {}.".format(nwindows-1, nvalleys))
        return 3 # Continue looping
    elif nvalleys > (nwindows - 1):
        print("Too many valleys: should be {}, found {}.".format(nwindows-1, nvalleys))
        return 2 # Continue looping
    elif nvalleys < (nwindows - 1):
        print("Too few valleys: should be {}, found {}.".format(nwindows-1, nvalleys))
        return 1 # Continue looping
97
98
    else:
        print("OK: Found {} valleys.".format(nvalleys))
99
        return 0 # Exit the loop
100
101
102
103
104

################################################################################
################################################################################
################################################################################

105
106
107
def get_wfm_windows(workspace=None, data=None, filename=None, nwindows=6,
                    bg_threshold=0.05, win_threshold=0.3, plot=False, gsmooth=2,
                    xrange=None, output_format="tof", rebin_parameters=None):
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136

    """
    Get the left and right edges of wave-frame multiplication windows.

    The parameters are:

    - filename: the text file to read containing the data

    - nwindows: the number of pulse windows (6 by default)

    - bg_threshold: the percentage above which y values are no longer considered
                    as background

    - win_threshold: the minimum percentage of window average to each when
                     searching for window edges

    - plot: plot figure if True

    - gsmooth: width of Gaussian kernel to smooth the data with. If gmsooth = 0
               (its default value), then a guess is performed based on the
               number of data points. If it is set to `None` then no smoothing
               is performed.

    - xrange: an array or tuple containing 2 elements which are the start and
              end TOF values to be considered for analysis. The data outside
              these bounds is ignored. You can set one of the limits to None
              if you want to have no limit on that end.
              For example: xrange=(1000,55000) or xrange=[5000,None]

Neil Vaytet's avatar
Neil Vaytet committed
137
138
139
140
141
142
143
144
145
146
147
148
149
    - output_format: a string. Can be 3 different things:
                     - "TOF" or "tof": returns an array of floats containing the
                         time-of-flight boundaries for the windows in the format
                         [frame1_left_edge, frame1_right_edge, frame2_left_edge, frame2_right_edge, ...]
                     - "indices": returns the indices of the left and right
                         edges as arrays of values.
                     - "rebin:64": If in this format, specified, the function
                         will return an array of strings that are to be used as
                         the input for the Rebin algorithm. The number given
                         after the colon will appear as the bin step in the
                         middle of each range. For example, if the number
                         supplied is 64, the output will look like:
                         ['15000,64,20000', '25000,64,30000', ...]
150
151
    """

152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
    if (workspace is None) and (data is None) and (filename is None):
        raise RuntimeError("Either workspace, data, or filename must be defined!")

    if workspace is not None:
        summed = SumSpectra(workspace)
        if rebin_parameters is None:
            summed_x = summed.extractX()[0]
            tof_min = np.amin(summed_x)
            tof_max = np.amax(summed_x)
            nbins = 512
            binsize = (tof_max-tof_min) / float(nbins)
            rebin_parameters = '{},{},{}'.format(tof_min, binsize, tof_max)
        summed = Rebin(summed, rebin_parameters, PreserveEvents=False)
        nz = summed.blocksize()
        data = np.zeros([nz,2])
        summed_x = summed.extractX()[0]
        summed_y = summed.extractY()[0]
        data[:,0] = 0.5*(summed_x[1:]+summed_x[:-1])
        data[:,1] = summed_y
        DeleteWorkspace(summed)
    elif filename is not None:
        data = np.loadtxt(filename)

175
176
177
178
179
    nx = np.shape(data)[0]
    print(nx)

    x = data[:,0]
    y = data[:,1]
Neil Vaytet's avatar
Neil Vaytet committed
180

181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
    # Smooth the data with a gaussian filter
    if gsmooth is not None:
        y = gaussian_filter1d(y, gsmooth)

    # If xrange is defined, then take that into account
    lbound = 0
    rbound = nx - 1
    if xrange is not None:
        if len(xrange) != 2:
            raise RuntimeError("xrange must contain exactly 2 elements")
        else:
            if xrange[0] is not None:
                for i in range(nx):
                    if x[i] >= xrange[0]:
                        lbound = i
                        break
            if xrange[1] is not None:
                for i in range(nx-1,1,-1):
                    if x[i] <= xrange[1]:
                        rbound = i
                        break

    # Find min and max values
    ymin = np.amin(y[lbound:rbound+1])
    ymax = np.amax(y[lbound:rbound+1])
Neil Vaytet's avatar
Neil Vaytet committed
206

207
208
209
    # Determine background threshold by histogramming data
    hist, edges = np.histogram(y[lbound:rbound+1], bins=50)
    bg_raw = edges[np.argmax(hist) + 1]
210
    bg_value = bg_raw + bg_threshold * (ymax - bg_raw)
211
212
213
    hmin = np.amin(hist)
    hmax = np.amax(hist)

214
    # Find the leading and trailing edges; i.e. the leftmost and rightmost
Neil Vaytet's avatar
Neil Vaytet committed
215
    # points that exceed the background value
216
217
218
    i_start = 0
    i_end = nx-1
    for i in range(lbound,nx):
Neil Vaytet's avatar
Neil Vaytet committed
219
        if y[i] > bg_value:
220
221
222
            i_start = i
            break
    for i in range(rbound,1,-1):
Neil Vaytet's avatar
Neil Vaytet committed
223
        if y[i] > bg_value:
224
225
226
            i_end = i
            break
    print(i_start,i_end, lbound, rbound)
227
228
229
230
231
232

    # Fine the valleys using the scipy find_peaks tool.
    # The prominence parameter is probably the most relevant here.
    prominence = 0.05
    delta_prominence = 1.5
    loop = 2
233
234
    max_loops = 10
    nloops = 0
235
    while loop > 0:
236
        nloops += 1
237
238
239
240
241
242
243
244
245
246
247
248
        print("The peak prominence is:", prominence)
        # Find valleys using `detect_peaks` function from scipy.signal
        peaks, _ = find_peaks(-y[lbound:rbound+1], prominence=prominence*(ymax-ymin))
        # Added start and end to peaks array
        peaks = np.concatenate(([i_start], peaks+lbound, [i_end]))
        # Check how many peaks were found
        loop = check_peaks(peaks, nwindows)
        # If loop is 3, it means way too many peaks were found, so we re-run
        # the peak search having increased the prominence
        if loop == 3:
            print("Increasing the peak prominence")
            prominence *= delta_prominence
249
        # Try to remove peaks if there are too many
250
        if loop == 2:
251
            print("Too many peaks found. Attempting automatic dismissal")
252
253
            # if the value at the bottom of the valley is over the lowest window
            # average, then this is not a real valley
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
            removed_a_peak = True
            while removed_a_peak:
                removed_a_peak = False
                for p in range(1, len(peaks)-1):
                    rmean = np.average(y[peaks[p]:peaks[p+1]])
                    lmean = np.average(y[peaks[p-1]:peaks[p]])
                    mean = min(lmean,rmean)
                    if y[peaks[p]] > mean:
                        print("Removed peak number {} at x,y = {},{}".format(p,x[peaks[p]],y[peaks[p]]))
                        peaks = np.delete(peaks, p)
                        removed_a_peak = True
                        break
            # Check peaks again
            loop = check_peaks(peaks, nwindows)
        # Try to reduce the peak prominence if there are too few peaks
        if loop == 1:
            print("Reducing the peak prominence")
            prominence /= delta_prominence
        # Exit if maximum number of loops has been reached
273
        if nloops > max_loops:
274
            loop = 0
275
276
277
278
279
280
281
282
283
284
285

    # Now for each valley, iterate to one side starting from the valley center and
    # find the window edge. We start from the first valley, which is the second
    # element of the `good_peaks` array because the first is the global leading
    # edge.
    # We first iterate towards the right, to find the leading edge of the next
    # window.
    # The mean y value between this valley and the next one (`mean`) is computed.
    # The window edge is the first value that exceeds the a fraction of the mean:
    # `y > win_threshold * mean`.

286
287
288
289
290
291
292
293
294
    # Compute frame means once
    y_minus_background = y - bg_value
    frame_mean = []
    for p in range(nwindows):
        frame_mean.append(np.average(
            y_minus_background[peaks[p]:peaks[p+1]][np.where(
                y_minus_background[peaks[p]:peaks[p+1]] > 0.0)]))
        print(frame_mean[-1])

295
296
297
    # Define left and right window edges
    ledges = [i_start]
    redges = []
298
    for p in range(1,len(peaks)-1):
299

300
        # Towards the right: Find left edge iterating towards the right
301
        for i in range(peaks[p]+1,peaks[p+1]):
302
            if y_minus_background[i] >= win_threshold*frame_mean[p]:
303
304
305
                ledges.append(i)
                break

306
        # Towards the left: Find left edge iterating towards the right
307
        for i in range(peaks[p]-1,peaks[p-1],-1):
308
            if y_minus_background[i] >= win_threshold*frame_mean[p-1]:
309
310
311
312
313
314
315
316
317
318
319
320
321
322
                redges.append(i)
                break

    # Remember to append the global trailing edge
    redges.append(i_end)

    print("The frame boundaries are the following:")
    for i in range(len(ledges)):
        print("{}, {}".format(x[ledges[i]],x[redges[i]]))

    if plot:
        import matplotlib.pyplot as plt
        from matplotlib.patches import Rectangle
        fig = plt.figure()
323
324
325
326
        ratio = 0.35
        sizex = 20.0
        fig.set_size_inches(sizex, ratio*sizex)
        ax = fig.add_subplot(121)
327
328
        ax.grid(True, color='gray', linestyle="dotted")
        ax.set_axisbelow(True)
329
330
        colors = ["r","g","b","magenta","cyan","orange"]
        for i in range(len(ledges)):
Neil Vaytet's avatar
Neil Vaytet committed
331
            ax.add_patch(Rectangle((x[ledges[i]], ymin), (x[redges[i]]-x[ledges[i]]), (ymax-ymin), facecolor=colors[i%6], alpha=0.5))
332
333
334
335
        for p in range(len(peaks)-1):
            rmean = np.average(y_minus_background[peaks[p]:peaks[p+1]])
            ax.plot([x[peaks[p]],x[peaks[p+1]]],[rmean+bg_value,rmean+bg_value],color="lime", lw=2)
            ax.plot([x[peaks[p]],x[peaks[p+1]]],[win_threshold*rmean+bg_value,win_threshold*rmean+bg_value], color="sienna", lw=2)
336
337
        ax.plot(x, data[:,1], color="k", lw=2, label="Raw data")
        ax.plot(x, y, color="lightgrey", lw=1, label="Smoothed data")
338
        ax.plot([np.amin(x),np.amax(x)], [bg_value, bg_value], "--", color="pink", lw=1, label="Background threshold")
339
340
341
342
343
344
        ax.plot([x[peaks[0]],x[peaks[1]]], [-ymax,-ymax], color="lime", label="Window mean", lw=2)
        ax.plot([x[peaks[0]],x[peaks[1]]], [-ymax,-ymax], color="sienna", label="Window mean", lw=2)
        for p in range(1,len(peaks)-1):
            ax.plot(x[peaks[p]], y[peaks[p]], "o", color="r")
        ax.plot(x[peaks[0]], y[peaks[0]], "o", color="deepskyblue", label="Leading edge")
        ax.plot(x[peaks[-1]], y[peaks[-1]], "o", color="yellow", label="Trailing edge")
Neil Vaytet's avatar
Neil Vaytet committed
345
        ax.set_xlabel("Arrival time at detector")
346
347
        ax.set_ylabel("Amplitude")
        ax.set_ylim([0.0,1.05*np.amax(data[:,1])])
348
        ax.plot(x[peaks[0]], -ymax, "o", color="r",label="Valleys")
349
350
        ax.legend(loc=(0,1.02),ncol=4, fontsize=10)
        ax2 = fig.add_subplot(122)
351
352
353
        ax2.set_axisbelow(True)
        ax2.grid(True, color='gray', linestyle="dotted")
        ax2.hist(y[lbound:rbound+1], bins=edges, orientation='horizontal', label="Amplitude frequency")
354
355
        ax2.set_xlim([hmin,1.05*hmax])
        ax2.set_ylim([0.0,1.05*np.amax(data[:,1])])
356
        ax2.plot([hmin, hmax], [bg_raw, bg_raw], "--", color="r", lw=3, label="Background estimation")
357
358
359
360
        ax2.plot([hmin, hmax], [bg_value, bg_value], "--", color="pink", lw=1, label="Background threshold")
        ax2.set_xlabel("Counts in background search")
        ax2.set_ylabel("Amplitude")
        ax2.legend(loc=(0,1.02),ncol=4, fontsize=10)
361
362
363
364
365
        if isinstance(plot, str):
            fname = plot
        else:
            fname = "wfm_auto_frames.pdf"
        fig.savefig(fname,bbox_inches="tight")
366

Neil Vaytet's avatar
Neil Vaytet committed
367
368
369
370
    # Format the output
    output_format = output_format.lower() # convert to lowercase
    output = []
    if output_format.startswith("rebin"):
371
        rebin_step = output_format.split(":")[1]
Neil Vaytet's avatar
Neil Vaytet committed
372
        # If we want output in Rebin format, then construct the array of strings
373
        for i in range(len(ledges)):
374
            output.append("{},{},{}".format(x[ledges[i]],rebin_step,x[redges[i]]))
Neil Vaytet's avatar
Neil Vaytet committed
375
376
377
378
    elif output_format == "tof":
        # Return the tofs as float
        for i in range(len(ledges)):
            output.extend([x[ledges[i]],x[redges[i]]])
379
    else:
Neil Vaytet's avatar
Neil Vaytet committed
380
381
382
383
        # If not, the indices of the edges are returned
        for i in range(len(ledges)):
            output.extend([ledges[i],redges[i]])
    return output
384
385
386
387
388
389
390
391
392
393
394
395

################################################################################
################################################################################
################################################################################

if __name__ == "__main__":

    import argparse

    parser = argparse.ArgumentParser(
        description="Automatically detect wave-frame multiplication windows")

396
397
398
    parser.add_argument("-w", "--workspace", default=None, dest="workspace",
                        help="A Mantid workspace containig the WFM data.")

399
    parser.add_argument("-d", "--data", type=str, default=None, dest="data",
400
                        help="A 2D data array to process.")
401

402
403
404
    parser.add_argument("-f", "--filename", type=str, default=None,
                        dest="filename",
                        help="The name of the file to process.")
405

406
407
408
    parser.add_argument("-n", "--nwindows", type=int, default=6,
                        dest="nwindows",
                        help="The number of windows to be found.")
409

410
411
412
    parser.add_argument("-b", "--bg-threshold", type=float, default=0.05,
                        dest="bg_threshold",
                        help="Threshold above which we are no longer in "
413
414
                        "background signal, as a percentage of (ymax - ymin).")

415
416
417
418
    parser.add_argument("-t", "--win-threshold", type=float, default=0.3,
                        dest="win_threshold",
                        help="Threshold to find window edge, as a percentage "
                        "of average signal inside window.")
419
420

    parser.add_argument("-g", "--gsmooth", type=int, default=0, dest="gsmooth",
421
422
423
                        help="The width of the Gaussian kernel to smooth the "
                        "data. If 0 (the default), then a guess is made based "
                        "on the resolution of the data set. If None, then no "
424
425
426
                        "smoothing is carried out.")

    parser.add_argument("-x", "--xrange", default=None, dest="xrange",
427
428
429
430
431
                        help="An array or tuple containing 2 elements which "
                        "are the start and end TOF values to be considered "
                        "for analysis. The data outside these bounds is "
                        "ignored. You can set one of the limits to None if "
                        "you want to have no limit on that end. For example: "
432
433
                        "xrange=(1000,55000) or xrange=[5000,None].")

434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
    parser.add_argument("-p", "--plot", type=str, default=None,
                        help="Name of the file for saving a figure showing "
                        "the frames.")

    parser.add_argument("-o", "--output-format", type=str, default=None,
                        dest="output_format",
                        help="A string. Can be 3 different things:\n"
                        "  - \"TOF\" or \"tof\": returns an array of floats "
                        "containing the time-of-flight boundaries for the "
                        "windows in the format [frame1_left_edge, "
                        "frame1_right_edge, frame2_left_edge, "
                        "frame2_right_edge, ...]\n"
                        "  - \"indices\": returns the indices of the left and"
                        "right edges as arrays of values.\n"
                        "  - \"rebin:64\": If in this format, the function "
                        "will return an array of strings that are to be used "
                        "as the input for the Rebin algorithm. The number "
                        "given after the colon will appear as the bin step in "
                        "the middle of each range. For example, if the number"
                        "supplied is 64, the output will look like:"
                        "['15000,64,20000', '25000,64,30000', ...]")
455
456
457

    options = parser.parse_args()

458
459
    output = get_wfm_windows(workspace=options.workspace,
                             data=options.data,
Neil Vaytet's avatar
Neil Vaytet committed
460
461
462
463
464
465
                             filename=options.filename,
                             nwindows=options.nwindows,
                             bg_threshold=options.bg_threshold,
                             win_threshold=options.win_threshold,
                             gsmooth=options.gsmooth,
                             xrange=options.xrange,
466
                             output_format=options.output_format,
Neil Vaytet's avatar
Neil Vaytet committed
467
                             plot=options.plot)