Results 1 to 2 of 2

Thread: Editing a text file using SED or equivalent

  1. #1
    Join Date
    Dec 2008

    Editing a text file using SED or equivalent

    Hello Ubuntu Folks

    I am looking to edit a text file using sed or whatever method works to make the following changes:

    Chromosome      ena     CDS     153     1535    .       +       0       transcript_name "transcript:AAS13770";  gene_id "gene:WD_0001"; gene_name "dnaA";
    Chromosome      ena     exon    3028    3115    .       +       .       transcript_name "transcript:WD_tRNA-Leu-1-1"; gene_id "gene:WD_tRNA-Leu-1"; gene_name "WD_tRNA-Leu-1";
    Chromosome      ena     CDS     153     1535    .       +       0       transcript_id "transcript:AAS13770"; transcript_name "transcript:AAS13770"; gene_id "gene:WD_0001"; gene_name "dnaA";
    Chromosome      ena     exon    3028    3115    .       +       .       transcript id transcript:WD_tRNA-Leu-1-1"; transcript_name "transcript:WD_tRNA-Leu-1-1"; gene_id "gene:WD_tRNA-Leu-1"; gene_name "WD_tRNA-Leu-1";
    Essentially I would like to take the transcript_name ("transcript:AAS13770"), add a new field before with transcript_id and paste the transcript name after this ending with a semi colon and continue this for every line.

    Does anyone have any idea how to do this? I'm looking at sed but I cannot get my head around it. sorry to be a pain.

    Best wishes,


  2. #2
    Join Date
    Mar 2010
    Ubuntu Mate 16.04 Xenial Xerus

    Re: Editing a text file using SED or equivalent

    The parts of the line that are identical don't really matter. It is the parts that need to be different and some method of locating where that difference can be 100% assured in each line.

    sed works on 1 line at time. So does awk, which is a little more powerful.

    sed -e 's/transcript_name/transcript_id/g' inputfile > output
    is how you replace things. If you want to insert things, the you can match on the leading whitespace and transcript - something like this:
    sed -e 's/transcript_name/transcript_id "transcript:AAS13770"; transcript_name/g' inputfile > output
    So perform other changes when the pattern isn't 100%, you can either run another sed command and pass the output from the first into the second (via pipes) or add another -e s////g stanza or use tools that work on column locations or have better grouping capabilities like ruby, python or perl.

    For amazing text processing, using another scripting language like perl would be my choice. Perl has a chunking function that will split on and delimiter you like (whitespace is default) and easily store each line into an array.
    #!/usr/bin/env perl
    while (<>){
       my @line = split(/ +/, $_);
       print join ' ', @line, "\n";
       $line[8] = $line[8] . "-foo";
       print "8: ", $line[8], "\n";
    The @line is an array, so $line[8] should have the 'transcript_name' inside. If you modify just that part of the array before printing it out, you can make it say anything you like. See above. There are 50 other ways to handle this too. Arrays and splitting is 1, probably not even the best. If all you care about is the output line, then I wouldn't bother assigning anything inside the array. Let the 'print' handle what you need.

    Run this with inputfile. Output goes to stdout so it can be used as a filter. If you use the output from another program's stdout as stdin to the perl or sed, then is looks like this:
    cat file | | sed -e "s/whatever/something new/g" |more
    Filters that use stdin and write to stdout are very powerful. sed, grep, cut, join and 150 other Unix tools work that way.

    du -h * | sort -hr |more
    That's for lurkers.


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts