PDA

View Full Version : [SOLVED] PHP Get ALL Links in a page



LinuxRocks713
July 5th, 2008, 12:36 AM
Hi:

How do I get all links in a page? For example ,if I have a page:

<html>
<head></head>
<body>
<a href="/link1">Link1</a>
<a href="http://aaa.com/>Link2</a>
<h3><small><a href="a/b/c.php">Link Text</small></h3>
<a href="ftp://server.net/">Link...</a>
</body>
</html>

and it returns an array:

Array[0] = "/link1"
Array[1] = "http://aaa.com/"
Array[2] = "a/b/c.php"
Array[3] = "ftp://server.net/"

How do you do this

LaRoza
July 5th, 2008, 12:38 AM
http://www.php.net/dom

You can use the DOM to get all such elements. getElementsByTagName()

slavik
July 5th, 2008, 01:21 AM
regex solution in Perl:



...
$html; #assume this var has the HTML code
@links = $html =~ /href=\"(.*?)\"/gis;
...

LinuxRocks713
July 5th, 2008, 01:24 AM
http://www.php.net/dom

You can use the DOM to get all such elements. getElementsByTagName()

How do you return it in an array? It's return type is a DOMNodeList object.

henchman
July 5th, 2008, 01:48 AM
hi :)

just stick to the php.net documentation, it always has good examples :)

http://us2.php.net/domnodelist


foreach ($nodeList as $node) {
echo $node->nodeValue;
}

LinuxRocks713
July 5th, 2008, 02:01 AM
hi :)

just stick to the php.net documentation, it always has good examples :)

http://us2.php.net/domnodelist


foreach ($nodeList as $node) {
echo $node->nodeValue;
}

Though it returns this:



Link1
Link2
Link Text
Link...


Here is test.php:


<?php

$doc = new DOMDocument;
$doc->load('file.htm');

$items = $doc->getElementsByTagName('a');

foreach($items as $value) {
echo $value->nodeValue . "\n";
};

?>


and here is file.htm:



<html>
<head>
<title>Title Page</title>
</head>
<body>
<a href="/link1" target="_blank">Link1</a>
<a href="http://aaa.com/">Link2</a>
<h3><small><a href="a/b/c.php">Link Text</a></small></h3>
<a href="ftp://server.net/">Link...</a>
</body>
</html>

LaRoza
July 5th, 2008, 02:08 AM
regex solution in Perl:


In PHP in case the title didn't work ;)

henchman
July 5th, 2008, 11:46 AM
well, the PHP-documentation also has a site on the topic XMLNode located here:

http://usphp.com/manual/en/class.domnode.php#domnode.props.attributes

object of this class have an attribute called "attributes" *grml* which is an instance of the class "DOMNamedNodeMap". Documentation for that class is here:

http://usphp.com/manual/en/class.domnamednodemap.php

It again has several methods. You may try to retrieve the href attribute that way :)

LinuxRocks713
July 5th, 2008, 01:23 PM
well, the PHP-documentation also has a site on the topic XMLNode located here:

http://usphp.com/manual/en/class.domnode.php#domnode.props.attributes

object of this class have an attribute called "attributes" *grml* which is an instance of the class "DOMNamedNodeMap". Documentation for that class is here:

http://usphp.com/manual/en/class.domnamednodemap.php

It again has several methods. You may try to retrieve the href attribute that way :)

Thanks! It worked.

If anyone wants the test.php file:



<?php


$doc = new DOMDocument;
$doc->load('file.htm');

$items = $doc->getElementsByTagName('a');

foreach($items as $value) {
echo $value->nodeValue . "\n";
$attrs = $value->attributes;
echo $attrs->getNamedItem('href')->nodeValue . "\n";
};

?>