aquavitae

February 13th, 2008, 08:57 AM

I've been asked at work to write a script to do some calculations. The input is a text file containing records of the levels in a river i.e. time - value. I need to do a fast fourier transform on it, which means first tweaking the data into equal time steps. I don't know much about the statistics behind it all, bu apparently the time step I should achieve is the peak in the histogram of the times. At the moment my program looks like this (pseudocode)

times, values = extract values from file # i.e. [time0, time1], [value01, value1]

histogram = homemade_histogram_function(times)

time_step = peak(histogram)

new_times, newvalues = interpolate(times, values, time_step)

output = numpy.fft(newvalues)

The main problem with this is the size of the input data - an average input file is about 200Mb and contains millions of rows of data, that means each loop is run a few million times, and it takes about a day to run.

My first question is, is it necessary to do the histogram, or is there a quicker way of estimating the time_step without much impact on the fft?

The second question is, if I do need to calculate the peak of the histogram, whats the fastest way of doing it in python (using numpy). I don't really want to have to do this in C to get the speed!

times, values = extract values from file # i.e. [time0, time1], [value01, value1]

histogram = homemade_histogram_function(times)

time_step = peak(histogram)

new_times, newvalues = interpolate(times, values, time_step)

output = numpy.fft(newvalues)

The main problem with this is the size of the input data - an average input file is about 200Mb and contains millions of rows of data, that means each loop is run a few million times, and it takes about a day to run.

My first question is, is it necessary to do the histogram, or is there a quicker way of estimating the time_step without much impact on the fft?

The second question is, if I do need to calculate the peak of the histogram, whats the fastest way of doing it in python (using numpy). I don't really want to have to do this in C to get the speed!