As I mentioned in my earlier
post we will crawl reliable & original content generator websites like
Wikipedia,
BBC,
Gizmag,
Mashable,
Alibaba,
Dmoz,
Slideshare. Now we have original content on the web which people need for the reasons I have mentioned in
this post.
We called these crawled URLs "
Eggs". Eggs are nothing but crude information on any page let's say each page covers one topic/trade lead/article/News. Users on our site can create butterflies using these eggs. In this case the butterflies could reveal competitors, similar news, review page, etc.
Public & Private content:Public content will be readily available for search for all our users. Additionally these URLs will be crawled by butterfly crawler.
Private content will not be available for public search (Butterfly search engine). Though URLs listed on private pages will be crawled.
In this way we make sure that people visit to these urls & worth adding to our results page on Butterfly search engine.
Caterpillar to Butterfly:This feature is just like forum. Once caterpillar is converted to butterfly (by originator) we will create our own simple HTML page covering that topic. This page will be available for public search.
Butterfly Algorithm:We have information (URLs), lets apply simplified algorithm to rate the urls (eggs + butterflies).
We will have to track popularity (more importantly) & the URL content.
Butterfly rating will be defined based on Likes, Dislikes, Page Title, No. of incubators contributed, Hits on URL through Tree. Like Google, butterfly search engine will track all other user specific information like
Google Analytics.
I have a lot of other complex parameters to apply to decide URL rank. Let's not discuss those :)
That's all for now.