GnuMap Project (Last updated in 2002)
Find an efficient way to create network maps and gather statistics without changing the gnutella protocol.
After testing Gnucleusís network browse feature (Example) I thought of a new way to surf the network. Since gnucleus sends an information page to any browser that tries to connect to it, why not use a web spider as a means of collecting information from each node. After the web spider downloads all the information it can find, a simple parser could create statistics and output a data file to a graphing program.
As soon as I found a decent (and free) web spider, I tested my theory, and it worked excellent. It took a few tries to get through to the first hop of nodes (firewalled nodes canít send info pages) but once it got past the 2nd hop, it kept going. I finally stopped it after collecting data for a few hundred nodes. In theory you could probably run it as long as your want, and I will be trying that shortly to see if it ever stops. So far the largest data set I have collected was about 80,000 nodes in half an hr (Thatís way more than the 10 or 20 thousand that the common user can see) See the stat file here:
I have a ďworkingĒ system now, but there were a few problems (and still are).First I use winHttrack (open source web spider) to save the data to html files. Then I have been working on creating a html parser (in visual basic because Iím not a programmer!) that takes the html files and creates GDL map files out of them (as well as some statistics).Then I use graphing software to create the actual node maps. I started out using Neato the same software that gnucleus uses, but it is very slow at creating large maps. I now use Aisee to create the node maps, and even though it does not create as clean of maps, it is much faster and can handle much larger maps (a full 7 hop map in less then 20 mins)
Here are some samples of what I have been able to create so far.
Simple small graph:
Same Graph, different colors, color by hop, label hops, and label node by node type:
Same graph, Tree layout algorithm, Color by hop, label by filename (host name)
Supernode Network: Statistic file superstats.html
Ultrapeer Network: Statistic file ultrastats.html
There still is a lot that can be done. The most interesting, which I would like to work on next, would be an interactive node map. Neato has the option to create index images so that when you click on a node it will open a web page. I would like to be able to create web maps that would pull up the actual index file for the node you click on. Also are a lot of other little features that I keep thinking of.
So, how can I make maps like this?:I just finished updating the parser. I fixed some bugs and added a couple of features. It can be downloaded here:
I also zipped up the source files, so if you have Visual Basic 6 you can add features or fix bugs your self. You can download it here:
I also have zipped up some data that I have already collected, so if you would like to use that (for if you dont want to or can't collect your own data) you can download it here:
dataset.exe (5.7 megs, 35meg uncompressed)
*Note: It also has data of the past gnutella network (supernode era) so you can compare the current network to the old network.
*Side note.... Here is the stat file for 80,000 node map I'm working on (still trying to make the map): UltrapeerStats.html I also just collected data on the G2 network.... more to come soon......
I just found out that in order to run the parser you need the VB 6 runtimes. You also need the active X controls. You can get the VB 6 runtimes from:
and if you are getting a "component comdlg32.ocx..." error, you need to install the active X controls from:
Since I had a lot of people ask questions in forums, I thought I would list some of them here. Here goes...
- Will this hurt the network?
- In defense of myself I mean no harm to the network. Also, I understand how the network works, and have been following its developement for quite some time now. I believe this will create no noticable effect on it, even if a few thousand people used it daily.
You have to think about it. The web spider sends a single request (I set retries to zero) to each ip that it finds, so even if it finds every node on the network it is only 100,000 or 200,000 requests, which would take over an hour. A single search creates 10 times that amount of traffic in the same amount of time. For each node it creates well under 150Kb of traffic (the average html file is 30 - 50 Kb) and that is if it can even reach the node.
Also the ratio of known nodes to unknown nodes is well over 1 to 1000, because firewalled nodes never even recieve the http request. It is much more efficient (bandwith wise) than the previous method of collecting network information, and is invisible to the user. Also it has the ability to create any size map you want, be it all 7 hops that you can actually see, or all of the gnutella network (I have yet to test the full size of the network)
I agree that most people would rather just look at the maps then actually create their own, so i dont think there are going to be a lot of people that will actually want to use this. Also it is not a very "friendly" application, and you need to know what you are doing before you actually will get anywhere. Also it is a very CPU and bandwith intensive process, since collecting the large amounts of data requires atleast a cable modem, and parsing the files takes a lot of time on large data sets, and creating the maps is VERY cpu intensive (creating a 7 hop map can take atleast 30 mins to an hour). Most people will probably just create 7 hop maps, because that is all they really can see anyways.
- I'm lost and confused, what do I do?
- Make sure you read the Readme file, the Aisee documentation and also the web spider documentation first, so you understand everything. Then if you still dont know what to do, post in one of the forums, or send me an email and I'll try to help you out.
- I tried collecting data but it timed out after a few seconds?
- All I get is Error: "Unknown response structure, no HTTP/ response
- I can't say this enough...Make sure you set it up EXACTLY how the Readme said, and test it first with a web browser (go to http://127.0.0.1:port/)
If the web browser doesn't work then it is a problem with gnucleus. Either your client is set to be behind a firewall, or you are trying to connect on the wrong port, or you aren't connected to enough hosts on the gnutella network
If the web browser does work, then it is a problem with the spider not being setup right, or all the connections you have are behind firewalls so you cant reach them. When you are in the web browser, click on the links, and atleast one of them should to send you to another info page. If they all come up blank, connect to more hosts or disconnect from those and connect to new ones.
- I made a cool map, where can I post it?
- If you made a map that you want to share, then just go browse one one of the gnutella forums and post it in there. If you cant post it, email it to me and I'll try to get it posted. Make sure it is saved in JPG format (you'll need an encoder) because the BMP format is too large.
- I have a modem, can I still use GnuMap?
- Yes you can, because the only part that uses bandwidht is the data collection (web spider) part. If you want to you can just download the data I have already collected. The link is at the bottom of the page
- Dumb and annoying questions
- Isn't this exacly what RIAA needs to shutdown gnutella?
- Can't RIAA use this to sue gnutella users?
- Do you work for RIAA?
- When will RIAA take over the Internet and outlaw all us gnutella users?
- No, I do not work for RIAA, No this will not be used to sue gnutella users, No I can't believe you just asked me that question.
Everyone is so paranoid about RIAA or the movie industry that they dont event think before accusing someone. Nobody can use my app or the information it gathers to prosecute Gnutella users, so you can come out of the bomb shelter now. They canít sue you unless you are doing something illegal, and using gnutella is not illegal. Even if you are sharing music files, you cannot use this method to collect file information because it only collects network information. Well, thatís not entirely true, Bearshare nodes display file information instead of network data, but they deserve to be shut down (kidding).
They cant sue you just for using gnutella, and they cant see what files you are sharing unless you are running bearshare (in which case they can't spider the network). So I'll say it one last time...STOP BEING SO PARANOID!! Also those guys are a lot smarter then I am, and I'm sure they thought of this before I did.
Anyway its not my app that they would use. All my app does is parse the collected data. All they would need would be a web spider.
- This is just a big waste of time, why did you do this?
- It is nice to know that all anyone can think about is Mp3's, Warez and Porn.
Instead of only using gnutella to get the latest album or movie that you were to cheap to buy, why don't you try to learn from it. Gnutella is an awesome p2p technology that has great potential. Those who have been with it from the start remember how difficult it was to use, but now it is just as efficient as any other p2p application, and you can use it to find just about anything you want. Also since it is open source and decentralized, it cannot be taken down.
Yes I probably did have a little too much extra time, but I've learned a lot from it, and I'm not sorry for doing it. If you dont like it, don't use it, its that simple.
- What else?
- Are there any more updates to do?
- Yes, but I have other things to do now. I'll release the source to the parser soon, so if anyone wants to add more features they can. I'm still finding bugs too, but I never intended to make a perfect app (its made in VB!). If you really care about this, email me and I'll keep you posted. I'll try to keep the forums up to date as well. I'm still working on a full network map too, but it might take a while to make it.
NOTE!!!!!!I have been having problems with the new releases of winhttrack, so if you cannot get it to work I have a copy of an older version that does work (atleast for me). Send me an email if you would like a copy of it.
Questions or comments:
Greg (dot) Bray at Gmail (dot) com