|
|
Spider and get tag information of one web page |
|
Author |
Message |
_BNC

|
Posted: Wed May 09 02:24:20 CDT 2007 |
Top |
ASP.Net >> Spider and get tag information of one web page
Hi all
i would like to know if anyone knows about a code sample.
Lets say for example
http://www.hide-link.com/ ;_ylt=AowRaqOx9PVGQC1OCxcj9vsEgFoB;_ylu=X3oDMTBhNjRqazhxBHNlYwNzZWFyY2g-?p=+friendship+roses+&did=&x=51&y=10
As you can see that there is a lot of items.
I need to be able to get the image link, navigate url, price,
description etc. of each item and then store them in a database.
I know that there is a way of searching in the html code and return
values (but don't know how)
And help would be appreciated.
Thank you,
Web Programming343
|
|
|
|
 |
Alexey

|
Posted: Wed May 09 02:24:20 CDT 2007 |
Top |
ASP.Net >> Spider and get tag information of one web page
> I know that there is a way of searching in the html code and return
> values (but don't know how)
Use Regular Expressions.
More info: http://www.google.com/search?hl=en&q=regular+expressions+asp.net
In your case you should get the text and parse it using patterns.
Here's the complete pattern to get the link, name, description and
price:
(?<=\<h2\>\<a\shref=\")
(?<url>(.|\n)*?)(\"\>)(?<name>(.|\n)*?)(\<\/a\></h2\>\n\<br\/\>)
(?<description>(.|\n)*?)(\n)
(.|\n)*?
(\<span\sclass\=\"price\"\>)(?<price>.*?)(\<\/span\>)
Note, in the code it has to be in one line.
Here's an example of the code:
string t = "html_from_yahoo";
string e = "(?<=\<h2\>............(\<\/span\>)";
Regex r = new Regex(e, RegexOptions.Compiled);
MatchCollection matches = r.Matches(t);
foreach (Match m in matches)
{
Response.Write("name="+match.Groups["name"]);
Response.Write("description="+match.Groups["name"]);
Response.Write("url="+match.Groups["url"]);
Response.Write("price="+match.Groups["price"]);
}
Hope it helps
|
|
|
|
 |
discountonall

|
Posted: Sun May 13 15:08:08 CDT 2007 |
Top |
ASP.Net >> Spider and get tag information of one web page
>
> > I know that there is a way of searching in the html code and return
> > values (but don't know how)
>
> Use Regular Expressions.
> More info:http://www.google.com/search?hl=en&q=regular+expressions+asp.net
>
> In your case you should get the text and parse it using patterns.
>
> Here's the complete pattern to get the link, name, description and
> price:
>
> (?<=\<h2\>\<a\shref=\")
> (?<url>(.|\n)*?)(\"\>)(?<name>(.|\n)*?)(\<\/a\></h2\>\n\<br\/\>)
> (?<description>(.|\n)*?)(\n)
> (.|\n)*?
> (\<span\sclass\=\"price\"\>)(?<price>.*?)(\<\/span\>)
>
> Note, in the code it has to be in one line.
>
> Here's an example of the code:
>
> string t = "html_from_yahoo";
> string e = "(?<=\<h2\>............(\<\/span\>)";
>
> Regex r = new Regex(e, RegexOptions.Compiled);
> MatchCollection matches = r.Matches(t);
>
> foreach (Match m in matches)
> {
> Response.Write("name="+match.Groups["name"]);
> Response.Write("description="+match.Groups["name"]);
> Response.Write("url="+match.Groups["url"]);
> Response.Write("price="+match.Groups["price"]);
>
> }
>
> Hope it helps
I have the full string of the page.
I would like to know what the syntext for example is to find all the
full string from <table class="item_table"
Until the next one and return it as a string
|
|
|
|
 |
Alexey

|
Posted: Sun May 13 15:36:48 CDT 2007 |
Top |
ASP.Net >> Spider and get tag information of one web page
>
>
>
>
>
>
> > > I know that there is a way of searching in the html code and return
> > > values (but don't know how)
>
> > Use Regular Expressions.
> > More info:http://www.google.com/search?hl=en&q=regular+expressions+asp.net
>
> > In your case you should get the text and parse it using patterns.
>
> > Here's the complete pattern to get the link, name, description and
> > price:
>
> > (?<=\<h2\>\<a\shref=\")
> > (?<url>(.|\n)*?)(\"\>)(?<name>(.|\n)*?)(\<\/a\></h2\>\n\<br\/\>)
> > (?<description>(.|\n)*?)(\n)
> > (.|\n)*?
> > (\<span\sclass\=\"price\"\>)(?<price>.*?)(\<\/span\>)
>
> > Note, in the code it has to be in one line.
>
> > Here's an example of the code:
>
> > string t = "html_from_yahoo";
> > string e = "(?<=\<h2\>............(\<\/span\>)";
>
> > Regex r = new Regex(e, RegexOptions.Compiled);
> > MatchCollection matches = r.Matches(t);
>
> > foreach (Match m in matches)
> > {
> > Response.Write("name="+match.Groups["name"]);
> > Response.Write("description="+match.Groups["name"]);
> > Response.Write("url="+match.Groups["url"]);
> > Response.Write("price="+match.Groups["price"]);
>
> > }
>
> > Hope it helps
>
> I have the full string of the page.
> I would like to know what the syntext for example is to find all the
> full string from <table class="item_table"
> Until the next one and return it as a string- Hide quoted text -
>
> - Show quoted text -
I guess, something similar to the
(\<table\sclass\=\"item_table\")(.|\n)*?(?=\<table\sclass\=\"item_table
\")
|
|
|
|
 |
|
|