View Full Version : add another attribute to href tag with python
mbeach
August 20th, 2009, 01:27 AM
For the purposes described at:
http://www.google.com/support/googleanalytics/bin/answer.py?hl=en&answer=55527
I'm trying to add an onclick event to <a> tags in a particular string stored in the datastore.
So assuming I have a string
this is a <a href="http://www.somedomain.com">link</a> and
so is <a href="home.html">this</a> and <a href="/home">this is too</a>
I want to add the first <a> tag to look like
<a href="http://www.somedomain.com" onClick="javascript: pageTracker._trackPageview('/outgoing/somedomain.com');">
but not the other links. So I'm playing around with searching for the "http" in a regex expression as that will determine an external link (for this site's setup anyway).
Still digging away at it but thought I'd throw up a post here in case someone knows a public function/library out there that already does just this.
Thanks,
mb.
myrtle1908
August 20th, 2009, 02:00 AM
Beautiful Soup ... http://www.crummy.com/software/BeautifulSoup
For modifying some HTML see ... http://www.crummy.com/software/BeautifulSoup/documentation.html#Modifying%20the%20Parse%20Tree
You could also do this fairly easily at runtime with JavaScript.
mbeach
August 20th, 2009, 11:00 AM
I had started down the road figuring javascript should handle this easy enough, but couldn't seem to find much other than people attempting to pop up a window when someone left their site.
I'd be happy to find a javascript solution, but for now I'm taking a look at BeautifulSoup for a whole host of other reasons as well.
Thank you
mbeach
August 20th, 2009, 07:45 PM
Continuing to toy away with this, have come up with this so far (almost posting here for my personal reference now) but I just need the url out of the href attribute value
from externals.bs import BeautifulSoup
content = """
this is a <a href="http://www.somedomain.com">link</a> and
so is <a href="home.html">this</a> and <a href="/home">this is too</a>
"""
tracker = "javascript: pageTracker._trackPageview(\'%s\');"
soup = BeautifulSoup(content)
for i in range(len(soup('a'))):
#add an if here to check if this is an external link
#for now, adding to all links
soup('a')[i]["onclick"] = tracker % 'baselinkurl'
print soup
mbeach
August 20th, 2009, 07:55 PM
there is probably a prettier way to handle the if statement here, but this seems to work as long as the http:// is in lower case.
from externals.bs import BeautifulSoup
content = """
this is a <a href="http://www.somedomain.com">link</a> and
so is <a href="home.html">this</a> and <a href="/home">this is too</a>
"""
tracker = "javascript: pageTracker._trackPageview(\'%s\');"
soup = BeautifulSoup(content)
for i in range(len(soup('a'))):
if soup('a')[i]['href'][0:7] == "http://":
soup('a')[i]["onclick"] = tracker % ("/outgoing/" + soup('a')[i]['href'][7:])
print soup
myrtle1908
August 20th, 2009, 10:54 PM
I'd be happy to find a javascript solution ...
<script>
window.onload = function() {
var a = document.getElementsByTagName('a');
for (var i=0; i<a.length; i++) {
var link = a[i];
if (link.href.match(/^https?:\/\/.*/i)) {
var re = /^https?:\/\/(www\.)?([^/]+)?/i;
link.href.match(re);
var domain = RegExp.$2;
link.onclick = function() {
pageTracker._trackPageview('/outgoing/' + domain);
};
}
}
}
</script>
this is a <a href="http://www.somedomain.com">link</a> and
so is <a href="home.html">this</a> and <a href="/home">this is too</a>
mbeach
August 20th, 2009, 11:09 PM
thanks - I'll try that out - in my situation will work better, as I can place it in the base template and be done. My python method was going to be a bit painful - not serious pain, but enough to cause some grief.
Good stuff,
mb.
myrtle1908
August 20th, 2009, 11:17 PM
thanks - I'll try that out - in my situation will work better, as I can place it in the base template and be done. My python method was going to be a bit painful - not serious pain, but enough to cause some grief.
Good stuff,
mb.
Can shorten to this ...
<script>
window.onload = function() {
var a = document.getElementsByTagName('a');
for (var i=0; i<a.length; i++) {
var link = a[i];
if (link.href.match(/^https?:\/\/(www\.)?([^/]+)?/i)) {
var domain = RegExp.$2;
link.onclick = function() {
pageTracker._trackPageview('/outgoing/' + domain);
};
}
}
}
</script>
I assume you are only interested in the domain name. For example it wouldn't include '/abc/' in http://www.somedomain.com/abc/'.
mbeach
August 20th, 2009, 11:31 PM
yes, just trying to measure the amount of traffic I'm sending to a number of external links - domain is fine, no need for further path info.
thanks for your help.
mb.
Powered by vBulletin® Version 4.2.2 Copyright © 2024 vBulletin Solutions, Inc. All rights reserved.