pokerbirch
January 1st, 2009, 04:58 AM
I need to chop up some html and extract a few elements. The actual data i need is within a table which is only a small section of the total html, so i can't see much point in parsing the whole document.
BeautifulSoup seems to be the most used library, however i'm not sure that it will work on this particular html due to JavaScript. You see, some of the links within the table are built using JavaScript, however:
soup = BeautifulSoup(html)Leaves me with a virtually empty table due to the JavaScript being removed...which renders the 'soup' useless. Is there a way to parse this html by tag names? For instance, let's say i want to extract the table as a parent, and then have each row as a child and each column as a grandchild element...is that possible? The table has a unique 'id' name, so this would seem like a logical key...IF i knew which library/functions to use.
What i want to avoid is using string functions and regex...unless i do them in C...but with so many libs knocking around the 'net, i don't think it will come to that.
BeautifulSoup seems to be the most used library, however i'm not sure that it will work on this particular html due to JavaScript. You see, some of the links within the table are built using JavaScript, however:
soup = BeautifulSoup(html)Leaves me with a virtually empty table due to the JavaScript being removed...which renders the 'soup' useless. Is there a way to parse this html by tag names? For instance, let's say i want to extract the table as a parent, and then have each row as a child and each column as a grandchild element...is that possible? The table has a unique 'id' name, so this would seem like a logical key...IF i knew which library/functions to use.
What i want to avoid is using string functions and regex...unless i do them in C...but with so many libs knocking around the 'net, i don't think it will come to that.