---------------------------------------------------------------- Link prediction tools, Version 1.0, 2011, Oussama Allali. Relies on: Bipartite graph tools, Version 1.0, March 2006, Matthieu Latapy. http://www-rp.lip6.fr/~latapy/Bip/ ---------------------------------------------------------------- WARNING, these programs were developed for research purposes, and thus are *far* from perfect, at least concerning the source code (efficiency and readability) and the user interface. They were designed to do some experiments on bipartite complex networks. Some choices were made to avoid bugs and intricate code, though efficiency may suffer from this. The programs should be able to manage graphs of hundreds of tousands nodes quite easily (in minutes), up to millions of nodes. The available central memory must be sufficient to store the graph (approx. 8 bytes per link), but not much more is needed, except for the projection computations (again, the available memory must be sufficient to store the projection). One may avoid the projection computations (mainly to save memory) with the -p option. The programs suppose that there are no nodes of degree 0 and no multiple links (several links between two nodes). The programs may contain some bugs, and are provided with absolutely no warranty. If you encounter any problem, please download the latest version at: http://www-rp.lip6.fr/~latapy/Bip/ and e-mail me if the problem persists. This software is free; I encourage yo to use, modify and distribute it freely, provided you cite the original source and the reference below. ---------------------------------------------------------------- REFERENCE: Basic Notions for the Analysis of Large Two-Mode Networks. Matthieu Latapy, Clémence Magnien and Nathalie Del Vecchio. Social Networks. 2007. Please e-mail me if you find this software useful and in particular if you use it for your publications. All your comments, including bug reports and improvements, are welcome. The statistics computed by the programs are described in the paper above, available from: http://www-rp.lip6.fr/~latapy/Bip/ Any use of these programs and/or of the results described in this paper should cite this reference. ---------------------------------------------------------------- INPUT DATA: The main program basically needs a text file in which each line corresponds to a top node and lists its bottom neighbors. The nodes MUST be numbered from 0 to n-1 where n is the number of nodes. For instance : 0 1 1 2 3 0 1 denotes the bipartite graph G = (T,B,E) with T = {0, 1, 2} the set of top nodes (because there are 3 lines), B = {0, 1, 2, 3} the set of bottom nodes (because the largest number on the 3 lines is 3), and E = {(0,0),(0,1),(1,1),(1,2),(1,3),(2,0),(2,1)} the set of links. Moreover, the program needs the number of (top and bottom) nodes and the (top and bottom) degree sequences. Tools are provided to produce them, see below. To make it easier to use compressed data, the programs read the data on their standard input. One first has to compile the programs, then to prepare the data, and finally to compute the statistics. This produces a set of (plot) files in the current directory and prints out some statistics. It also prints on standard error some information on the processing. ---------------------------------------------------------------- IMPORTANT NOTE: the programs compute statistics from the *top* point of view ; if you want the bottom ones, use the -r option: cat data.n data.deg data | ./bip -r > output (-r for reverse, will simply reverse the bipartite graph before starting the computations). Be careful, you should replace 'top' by 'bottom' and conversely in the output. ---------------------------------------------------------------- FINALLY, I recommend the following usage: - put your data and the programs in a directory and go to this directory. - compile: gcc -O3 -o get.n get.n.c gcc -O3 -o get.deg get.deg.c gcc -O3 -o bip bip.c gcc -O3 -o prediction prediction.c - prepare the data: cat data | ./get.n > data.n cat data.n data | ./get.deg > data.deg - then create a directory for the top statistics, go to it, and launch the computation: mkdir top cd top cat ../data.n ../data.deg ../data | ../bip > projection cat projection | ../get.n > projection.n cat projection.n projection | ../get.deg | head -n `head -n 1 projection.n` > projection.deg cat ../data.n ../data.deg ../data projection.n projection.deg projection weight_delta | ../prediction > output cd .. - and do similarily for bottom statistics: mkdir bot cd bot cat ../data.n ../data.deg ../data | ../bip -r > projection cat projection | ../get.n > projection.n cat projection.n projection | ../get.deg | head -n `head -n 1 projection.n` > projection.deg cat ../data.n ../data.deg ../data projection.n projection.deg projection weight_delta | ../prediction > output cd .. ---------------------------------------------------------------- You may need to produce RANDOM bipartite graphs to compare to the ones under concern for you. See: http://jlguillaume.free.fr/www/programs.php?lang=eng ---------------------------------------------------------------- Some examples of bipartite data are provided at: http://www-rp.lip6.fr/~latapy/Bip/ ----------------------------------------------------------------