29-05-2014 дата публикации
Номер: US20140149382A1
A web site page has a reference for providing an address for a next page. The web site is crawled by a crawler program, which parses the reference from one of the web pages and sends the reference to an applet running in a browser. The address for the next page is determined by the browser responsive to the reference and is sent to the crawler. The crawler selects non-hypertext-link parameters from the web page of the web site server by performing a programmed action sequence, including selecting items from lists of the web page in a particular sequence. The crawler sends the applet running in the browser, for the query to the web server for the next page referenced by the one web page, the selected parameters and a context arising from the particular sequence. 1. A method for crawling a web site , the method comprising:querying a web site server by a crawler program, wherein at least one page of the web site has a reference, wherein the reference is specified by a script to produce an address for a next page;parsing such a reference from one of the web pages by the crawler program and sending the reference to an applet running in a browser; anddetermining the address for the next page by the browser executing the reference and sending the address to the crawler, wherein the crawler automatically selects non-hypertext-link parameters from the one web page of the web site server by performing a programmed action sequence, including selecting items from lists of the one web page in a particular sequence, and the crawler sends the applet running in the browser, for the query to the web server for the next page referenced by the one web page, the selected parameters and a context arising from the particular sequence.2. The method of claim 1 , the browser being configured to use a certain proxy and refer to a resolver file for hostname-to-IP-address-resolution claim 1 , wherein the web site server has an IP address and the proxy for the browser has a certain IP address ...
Подробнее