K7522
August 16th, 2008, 11:34 PM
I have over 1000 csv formatted files that need to go into seperate tables in a mysql db. These tables would be named by the cvs file name. I have an example of these cvs files listed below.
These files are generally about 600,000 lines long each, so automation is a requirement in this instance.
File Name: 08-16-2008.csv
columnname01, columnname02, columnname03, columnname04, columnname05, columnname06, columnname07, columnname08, columnname09, columnname10, columnname11, columnname12, columnname13, columnname14, columnname15
607473114 , 10000064 , 30004969 , 60011917 , 1826 , 0 , 124.82 , 1 , 1156 , 1158 , 2007-12-30 00:00:00.00 , 30:00:00:00.00 , 32767 , 0 , 2008-01-01 00:00:20.07 ,
603977756 , 10000064 , 30004969 , 60011917 , 1826 , 0 , 125.0 , 1 , 100 , 100 , 2007-12-27 00:00:00.00 , 30:00:00:00.00 , 32767 , 0 , 2008-01-01 00:00:20.10 ,
569763001 , 10000064 , 30004969 , 60011740 , 1826 , 0 , 210.0 , 1 , 111697 , 150000 , 2007-12-31 00:00:00.00 , 90:00:00:00.00 , 32767 , 0 , 2008-01-01 00:00:20.11 ,
What I am trying to do here goes a bit beyond me, but it is something I'm trying to learn.
What I'd like to do is create a .sh that will handle these files and future files as I receive these daily.
This .sh would do the following:
Check specified folder for new cvs files against a log.
Define in a log what files we have sourced, we don't want to try sourcing every file every time we add one new file to the folder.
Create a mysql table for the new cvs file, named by the cvs file name and using the first line as the row names.
Source the cvs file into the new table.
As far as finding what files are in the folder against what has been put into the database, I was thinking of having the script do an ls > ls.txt for the file directory, and at the end of sourcing a file to the cvs have it append the filename to sourced.txt, then diff the two to see what it needs to source.
Anyway, there may be a much easier way of going about this, but I would like it to be completely automated, by the file being in cron.hourly.
Thanks for all your time in advance for reading this and posting possible solutions, even small bits suggested help as I continue to drudge along figuring this out on my own.
These files are generally about 600,000 lines long each, so automation is a requirement in this instance.
File Name: 08-16-2008.csv
columnname01, columnname02, columnname03, columnname04, columnname05, columnname06, columnname07, columnname08, columnname09, columnname10, columnname11, columnname12, columnname13, columnname14, columnname15
607473114 , 10000064 , 30004969 , 60011917 , 1826 , 0 , 124.82 , 1 , 1156 , 1158 , 2007-12-30 00:00:00.00 , 30:00:00:00.00 , 32767 , 0 , 2008-01-01 00:00:20.07 ,
603977756 , 10000064 , 30004969 , 60011917 , 1826 , 0 , 125.0 , 1 , 100 , 100 , 2007-12-27 00:00:00.00 , 30:00:00:00.00 , 32767 , 0 , 2008-01-01 00:00:20.10 ,
569763001 , 10000064 , 30004969 , 60011740 , 1826 , 0 , 210.0 , 1 , 111697 , 150000 , 2007-12-31 00:00:00.00 , 90:00:00:00.00 , 32767 , 0 , 2008-01-01 00:00:20.11 ,
What I am trying to do here goes a bit beyond me, but it is something I'm trying to learn.
What I'd like to do is create a .sh that will handle these files and future files as I receive these daily.
This .sh would do the following:
Check specified folder for new cvs files against a log.
Define in a log what files we have sourced, we don't want to try sourcing every file every time we add one new file to the folder.
Create a mysql table for the new cvs file, named by the cvs file name and using the first line as the row names.
Source the cvs file into the new table.
As far as finding what files are in the folder against what has been put into the database, I was thinking of having the script do an ls > ls.txt for the file directory, and at the end of sourcing a file to the cvs have it append the filename to sourced.txt, then diff the two to see what it needs to source.
Anyway, there may be a much easier way of going about this, but I would like it to be completely automated, by the file being in cron.hourly.
Thanks for all your time in advance for reading this and posting possible solutions, even small bits suggested help as I continue to drudge along figuring this out on my own.