japhyr
April 8th, 2009, 11:01 PM
Hello everyone,
I have been wanting to learn bash scripting for a while, so I decided to learn by writing a script that will help a friend with a research project (http://www.sitkanature.org/wordpress/2009/03/16/starrigavan-temperatures/) he is involved in. The project is to compare temperature data from specific places around our town to the temperatures that are recorded at the local airport. We are in a small coastal mountain town, so the airport temperature is the only one that is ever reported. We would like to know how much temperatures around town vary from this reported temperature.
The script should accept a start date and end date. It should then download the weather data from weather underground for all dates between these two dates, and write them to one text file.
My script so far collects the right data, but I am getting lost in how to use regular expressions in bash to pick out the right lines and write the data neatly to a text file. I get lost in all the reading about sed, awk, pipes, formatting, etc. I have used regexp's before in some other languages, so they are not brand new to me. Can someone clarify:
How should I use regexp's to pick out the right lines?
How can I rewrite the lines to a text file, with the date added at the beginning of each line?
How can I format the output so it is comma-separated but also tabbed so it is visually simple?
I know I will need to add some logic dealing with days of the month, and some logic to accept input rather than hard-coding the start and end dates. That I can take care of later.
There is no rush on this project. Right now it is mostly for my own learning, but the end product will be useful to my friend who is doing the project. Thank you very much for any feedback.
My script looks like this:
#!/bin/sh
# Script for downloading weather history
# Set first day to retrieve data
day="1"
month="4"
year="2009"
# Set last day to retrieve data
last_day="3"
# Set start and end date, build filename, and create file
start_date=$year/$month/$day
end_date=$year/$month/$last_day
wxhx_filename="wxhx_$month-$day-${year}_$month-$last_day-$year.txt"
echo "Data will be stored in file: $wxhx_filename"
> $wxhx_filename
# Create temp file for day data
> wx_day_data.txt
# Begin retrieval loop
while [ $day -lt $(($last_day+1)) ]
do
# Build date
date=$year/$month/$day
# Set url for weatherunderground
url="http://www.wunderground.com/history/airport/PASI/$date/DailyHistory.html?req_city=Sitka&req_state=AK&req_statename=Alaska&format=1"
# Get the data from the url
wget $url -o wx_log.txt -O wx_day_data.txt
# Print date in data file
echo "$month/$day/$year" >> $wxhx_filename
while read inputline
do
echo $inputline
grep ^.*[,],*[,] $inputline | awk 'BEGIN { FS="," } { print " " $1 "\t" $2 }' >> $wxhx_filename
done < wx_day_data.txt
echo "Processed data for $date"
day=$(($day+1))
done
Each day's data looks like this:
TimeAKDT,TemperatureF,Dew PointF,Humidity,Sea Level PressureIn,VisibilityMPH,Wind Direction,Wind SpeedMPH,Gust SpeedMPH,PrecipitationIn,Events,Conditions<br />
12:53 AM,30.0,25.0,82,30.15,10.0,East,5.8,-,N/A,,Clear<br />
1:53 AM,28.9,25.0,85,30.15,10.0,ESE,3.5,-,N/A,,Clear<br />
2:53 AM,28.0,25.0,88,30.16,10.0,East,3.5,-,N/A,,Clear<br />
3:53 AM,28.9,24.1,82,30.15,10.0,East,5.8,-,N/A,,Clear<br />
4:53 AM,28.9,24.1,82,30.14,10.0,SE,3.5,-,N/A,,Clear<br />
5:53 AM,28.9,23.0,78,30.13,10.0,East,5.8,-,N/A,,Clear<br />
6:53 AM,28.9,21.9,75,30.11,10.0,East,3.5,-,N/A,,Clear<br />
7:53 AM,30.9,21.9,69,30.09,10.0,ESE,4.6,-,N/A,,Clear<br />
8:53 AM,34.0,19.9,56,30.08,10.0,East,6.9,-,N/A,,Clear<br />
9:53 AM,36.0,19.9,52,30.06,10.0,ESE,11.5,-,N/A,,Partly Cloudy<br />
10:53 AM,37.0,21.0,52,30.04,10.0,ESE,10.4,-,N/A,,Clear<br />
11:53 AM,37.9,21.0,51,30.02,10.0,East,9.2,-,N/A,,Mostly Cloudy<br />
12:53 PM,39.0,21.0,49,30.00,10.0,East,10.4,-,N/A,,Overcast<br />
1:53 PM,39.0,21.0,49,29.98,10.0,ESE,12.7,-,N/A,,Overcast<br />
2:53 PM,39.9,21.0,47,29.97,10.0,East,13.8,-,N/A,,Mostly Cloudy<br />
3:53 PM,39.0,21.0,49,29.94,10.0,ESE,15.0,20.7,N/A,,Scattered Clouds<br />
4:53 PM,39.0,21.9,50,29.88,10.0,ESE,19.6,28.8,N/A,,Scattered Clouds<br />
5:53 PM,39.0,23.0,53,29.86,10.0,ESE,19.6,27.6,N/A,,Overcast<br />
6:53 PM,39.0,24.1,55,29.85,10.0,ESE,11.5,23.0,N/A,,Overcast<br />
7:53 PM,37.9,25.0,60,29.84,10.0,ESE,18.4,25.3,N/A,,Overcast<br />
8:53 PM,37.0,27.0,67,29.82,10.0,ESE,17.3,-,N/A,,Overcast<br />
9:53 PM,37.9,27.0,65,29.80,10.0,SE,20.7,26.5,N/A,,Overcast<br />
10:53 PM,37.0,28.0,70,29.79,10.0,ESE,20.7,28.8,N/A,,Overcast<br />
11:01 PM,37.4,28.4,70,29.79,10.0,ESE,17.3,28.8,N/A,,Overcast<br />
11:22 PM,37.4,28.4,70,29.78,10.0,ESE,16.1,28.8,N/A,,Overcast<br />
11:53 PM,37.0,28.0,70,29.76,10.0,ESE,17.3,27.6,N/A,,Overcast<br />
<!-- 0.151:0 -->
The current output looks like this:
4/1/2009
4:53 AM 34.0
5:53 AM 34.0
6:53 AM 34.0
7:53 AM 34.0
8:53 AM 37.0
9:53 AM 37.0
10:53 AM 39.0
11:53 AM 39.9
12:53 PM 41.0
1:53 PM 43.0
2:53 PM 42.1
3:53 PM 41.0
4:53 PM 39.9
5:53 PM 39.0
6:53 PM 39.0
7:53 PM 37.0
8:53 PM 37.0
9:53 PM 36.0
10:53 PM 36.0
4/2/2009
12:24 AM 33.8
3:00 AM 33.8
3:53 AM 34.0
5:53 AM 33.1
6:53 AM 34.0...
I have been wanting to learn bash scripting for a while, so I decided to learn by writing a script that will help a friend with a research project (http://www.sitkanature.org/wordpress/2009/03/16/starrigavan-temperatures/) he is involved in. The project is to compare temperature data from specific places around our town to the temperatures that are recorded at the local airport. We are in a small coastal mountain town, so the airport temperature is the only one that is ever reported. We would like to know how much temperatures around town vary from this reported temperature.
The script should accept a start date and end date. It should then download the weather data from weather underground for all dates between these two dates, and write them to one text file.
My script so far collects the right data, but I am getting lost in how to use regular expressions in bash to pick out the right lines and write the data neatly to a text file. I get lost in all the reading about sed, awk, pipes, formatting, etc. I have used regexp's before in some other languages, so they are not brand new to me. Can someone clarify:
How should I use regexp's to pick out the right lines?
How can I rewrite the lines to a text file, with the date added at the beginning of each line?
How can I format the output so it is comma-separated but also tabbed so it is visually simple?
I know I will need to add some logic dealing with days of the month, and some logic to accept input rather than hard-coding the start and end dates. That I can take care of later.
There is no rush on this project. Right now it is mostly for my own learning, but the end product will be useful to my friend who is doing the project. Thank you very much for any feedback.
My script looks like this:
#!/bin/sh
# Script for downloading weather history
# Set first day to retrieve data
day="1"
month="4"
year="2009"
# Set last day to retrieve data
last_day="3"
# Set start and end date, build filename, and create file
start_date=$year/$month/$day
end_date=$year/$month/$last_day
wxhx_filename="wxhx_$month-$day-${year}_$month-$last_day-$year.txt"
echo "Data will be stored in file: $wxhx_filename"
> $wxhx_filename
# Create temp file for day data
> wx_day_data.txt
# Begin retrieval loop
while [ $day -lt $(($last_day+1)) ]
do
# Build date
date=$year/$month/$day
# Set url for weatherunderground
url="http://www.wunderground.com/history/airport/PASI/$date/DailyHistory.html?req_city=Sitka&req_state=AK&req_statename=Alaska&format=1"
# Get the data from the url
wget $url -o wx_log.txt -O wx_day_data.txt
# Print date in data file
echo "$month/$day/$year" >> $wxhx_filename
while read inputline
do
echo $inputline
grep ^.*[,],*[,] $inputline | awk 'BEGIN { FS="," } { print " " $1 "\t" $2 }' >> $wxhx_filename
done < wx_day_data.txt
echo "Processed data for $date"
day=$(($day+1))
done
Each day's data looks like this:
TimeAKDT,TemperatureF,Dew PointF,Humidity,Sea Level PressureIn,VisibilityMPH,Wind Direction,Wind SpeedMPH,Gust SpeedMPH,PrecipitationIn,Events,Conditions<br />
12:53 AM,30.0,25.0,82,30.15,10.0,East,5.8,-,N/A,,Clear<br />
1:53 AM,28.9,25.0,85,30.15,10.0,ESE,3.5,-,N/A,,Clear<br />
2:53 AM,28.0,25.0,88,30.16,10.0,East,3.5,-,N/A,,Clear<br />
3:53 AM,28.9,24.1,82,30.15,10.0,East,5.8,-,N/A,,Clear<br />
4:53 AM,28.9,24.1,82,30.14,10.0,SE,3.5,-,N/A,,Clear<br />
5:53 AM,28.9,23.0,78,30.13,10.0,East,5.8,-,N/A,,Clear<br />
6:53 AM,28.9,21.9,75,30.11,10.0,East,3.5,-,N/A,,Clear<br />
7:53 AM,30.9,21.9,69,30.09,10.0,ESE,4.6,-,N/A,,Clear<br />
8:53 AM,34.0,19.9,56,30.08,10.0,East,6.9,-,N/A,,Clear<br />
9:53 AM,36.0,19.9,52,30.06,10.0,ESE,11.5,-,N/A,,Partly Cloudy<br />
10:53 AM,37.0,21.0,52,30.04,10.0,ESE,10.4,-,N/A,,Clear<br />
11:53 AM,37.9,21.0,51,30.02,10.0,East,9.2,-,N/A,,Mostly Cloudy<br />
12:53 PM,39.0,21.0,49,30.00,10.0,East,10.4,-,N/A,,Overcast<br />
1:53 PM,39.0,21.0,49,29.98,10.0,ESE,12.7,-,N/A,,Overcast<br />
2:53 PM,39.9,21.0,47,29.97,10.0,East,13.8,-,N/A,,Mostly Cloudy<br />
3:53 PM,39.0,21.0,49,29.94,10.0,ESE,15.0,20.7,N/A,,Scattered Clouds<br />
4:53 PM,39.0,21.9,50,29.88,10.0,ESE,19.6,28.8,N/A,,Scattered Clouds<br />
5:53 PM,39.0,23.0,53,29.86,10.0,ESE,19.6,27.6,N/A,,Overcast<br />
6:53 PM,39.0,24.1,55,29.85,10.0,ESE,11.5,23.0,N/A,,Overcast<br />
7:53 PM,37.9,25.0,60,29.84,10.0,ESE,18.4,25.3,N/A,,Overcast<br />
8:53 PM,37.0,27.0,67,29.82,10.0,ESE,17.3,-,N/A,,Overcast<br />
9:53 PM,37.9,27.0,65,29.80,10.0,SE,20.7,26.5,N/A,,Overcast<br />
10:53 PM,37.0,28.0,70,29.79,10.0,ESE,20.7,28.8,N/A,,Overcast<br />
11:01 PM,37.4,28.4,70,29.79,10.0,ESE,17.3,28.8,N/A,,Overcast<br />
11:22 PM,37.4,28.4,70,29.78,10.0,ESE,16.1,28.8,N/A,,Overcast<br />
11:53 PM,37.0,28.0,70,29.76,10.0,ESE,17.3,27.6,N/A,,Overcast<br />
<!-- 0.151:0 -->
The current output looks like this:
4/1/2009
4:53 AM 34.0
5:53 AM 34.0
6:53 AM 34.0
7:53 AM 34.0
8:53 AM 37.0
9:53 AM 37.0
10:53 AM 39.0
11:53 AM 39.9
12:53 PM 41.0
1:53 PM 43.0
2:53 PM 42.1
3:53 PM 41.0
4:53 PM 39.9
5:53 PM 39.0
6:53 PM 39.0
7:53 PM 37.0
8:53 PM 37.0
9:53 PM 36.0
10:53 PM 36.0
4/2/2009
12:24 AM 33.8
3:00 AM 33.8
3:53 AM 34.0
5:53 AM 33.1
6:53 AM 34.0...