so I should spend it the way that I want to.

I just found out why no one (to my very limited knowledge in the area) utilizes HTTP 1.1 request pipelining for web crawlers - what you gain in speed you pay for in complication. Pipelining the GET requests saves tons of connection overhead, the trouble is the server spits the pages out back to back, thousands of bytes commingling without chaperone. It is a head scratcher, but a fun one, and a nice break from the "thesis" project.

( Do I speak too soon, I see word on the Net that Google tested a crawler that utilized HTTP 1.1 in April, uncertain if they are using it in production now or not )

While I was browsing what source forge had to offer in the way of spiders, I ran across this, which qualifies as interesting.

 

Add to My Yahoo!

Add to Google

Subscribe with Bloglines

Austin Gilbert/Male/26-30. Lives in United States/Oklahoma/Tulsa/Midtown, speaks English. Spends 40% of daytime online. Uses a Fast (128k-512k) connection. And likes computer science/photography.
This is my blogchalk: United States, Oklahoma, Tulsa, Midtown, English, Austin Gilbert, Male, 26-30, computer science, photography.

Hmmm... it is my summer after all...
2004/08/07