Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Save Information to DB at Crawling
I am trying to make a crawling program to grab information and store them
into the database.
The web site is structured as following.
REGION
CATEGORY PROPERTY LISTING
In the site there are about 50 regions each region has 20 category or less
and at the maximum one category
can be as many as 2000( can display 20 property for each page). In order to
get all information, my crawler is going to each property page using regular
expression
to extract specific info( Price, BR, Contact Info etc)..
I have problem to decide when and where I can save it to database. Note that this crawler is scheduled to go to the website to get info every day and if the property information is not changed from last modification date the crawler is going to skip the property.
I create the following tables to store those information
Region Table
ID, Region Name
Category Table
ID(1~20), Category Name
Property Table
ID, Category, Name, Address, Price, Contact Info, Bed Rooms, Contact Info,
Location(Lat), Location(Lon)
Received on Wed Aug 16 2006 - 04:33:53 CDT