Specifications and Requirements
1.1: About IBNAL
Network Alignment over graph-structured data has received considerable attention in many recent applications. Global network alignment tries to uniquely find the best mapping for a node in one network to only one node in another network. The mapping is performed according to some matching criteria that depend on the nature of data. In molecular biology, functional orthologs, protein complexes and evolutionary conserved pathways are some examples of information uncovered by global network alignment. Current techniques for global network alignment suffer from several drawbacks, e.g., poor performance and high memory requirements.We address these problems by proposing IBNAL, Indexes-Based Network ALigner, for better alignment quality and faster results. To accelerate the alignment step, IBNAL makes use of a novel clique-based index and is able to align large networks in seconds. IBNAL produces a higher topological quality alignment and comparable biological match in alignment relative to other state-of-the-art aligners even though topological fit is mainly used to match nodes. IBNALs results confirm and give another demonstration that homology information is more likely to be encoded in network topology more than sequence information.
1.2: Implementation Requirements and Specifications
IBNAL implemented in java. This version of IBNAL runs on Linux, Mac OS X, and Windows, and is available for download as a compressed file here. Before IBNAL runs, java has to be installed in your system. Java is provided, and can be downloaded, by ORACLE for most operating systems, so the aligner is ready to run immediately after downloading java, though instructions are also provided to run both stages of IBNAL here and in the README.TXT file provided.
1.3: Package Contents
The contents of IBNAL package downloaded contains only one .zip file. After extracting the .zip file, files listed below are shown:
1- IBNALbuildIndex.jar
2- IBNALextractAlignment.jar
3- 'lib' , library folder contains jgrapht required to get IBNAL runs.
4- README.TXT , readme file that contains instructions on how IBNAL runs.
5- PPI1.net , the first sample network file.
6- PPI2.net , the second sample network file.
7- PPI1.annos , the first sample annotations file.
8- PPI2.annos , the second sample annotations file.
Please note that:
Our program accepts the input formats that most existing aligners use, as well. There are two forms of input: the networks and GO annotations.
Network data format
Networks are input as plain text files as a list of edges. Each line of the text file contains one edge, which is denoted by two node names separated by whitespace. Node names can be any string. For example:
dm1275 dm2243
dm12045 dm4232
dm1951 dm9060
dm11539 dm157
dm12381 dm2135
dm3616 dm5529
To make it easier for the users, two small example networks are included with our implementation as PPI1.net and PPI2.net. The file extension ‘.net’ is not optional, as it is commonly used in network alignment programs.
GO annotations
One plain text GO annotation file is needed per network. Each line of such a file starts with the name of a node, followed by the GO terms associated with that node. Each of these elements is separated by whitespace. For example:
dm1 7424 51539 40005 4519 7475 8362 46331 5509 4867 5578
dm10 7422 7379 5634 16055 8586 166 6355 35321 7173 48106 3676 7411 7403 8347 7400
dm100 51925 8332 16021 5262
dm10001 46331 5635
dm10002 7052 5875
And also to make it more easier to the users, two examples, corresponding to our two example networks, are included in our implementation package as PPI1.annos and PPI2.annos.
The IBNAL package takes up 3 to 4MB of disk space and about 5MB of disk space after extraction.
Of course more space will be needed to store index files and the resulting
alignments.
Using IBNAL
2.1: Building Idecies
First, the user needs to build the indexes files using the first jar file before he is able to extract the alignment.
To run the jar file from the command line, go to the IBNAL folder and type
java -jar “IBNALindexBuild.jar” networkfilename
java -jar “IBNALindexBuild.jar” dmela
2.2: Extracting the alignment
A user needs to build indices for at least two networks before going to the next step which is extracting the alignment.
To extract the alignment, a user should run the second .jar file with the following parameters on the command line:
java -jar "IBNALextractAlignment.jar" networkfilename1 networkfilename2 annotationfile1 annotationfile2 -type
where “networkfilename1” and networkfilename2 are two networks to be aligned. The format of both files are stated above.
annotationfile1 and annotationfile2 are the GO annotation files for networks to be aligned. Also the format of annotation file is stated above.
The purpose of the parameter "-type" is to state whether networks that are being aligned a real or synthetic data. type has to be -r for real data or -s for synthetic data NAPAbench.
Examples:
To align the real networks provided in this package, type on the command line:
java -jar "IBNALextractAlignment.jar" PPI1 PPI2 PPI1.annos PPI2.annos -r
java -jar "IBNALextractAlignment.jar" A B A.fo B.fo -s
Output Format
Here are samples of files created using IBNALindexBuild.jar:
dmela.clq
1 4 dm2614 dm2243 dm5221 dm3616
2 4 dm7161 dm6886 dm2243 dm3616
3 4 dm2868 dm2243 dm5221 dm3616
4 3 dm6355 dm2243 dm3616
5 3 dm2614 dm10727 dm2243
where the first column is a clique number, second is the size of clique extracted, then a sequence of clique nodes its length based on the size of a clique extracted.
dmela.idx
6238
dm1324 0 143 25 17 0 0 0 0 0 0
dm7953 0 62 2 0 0 0 0 0 0 0
dm7955 0 43 0 0 0 0 0 0 0 0
dm10120 0 19 18 0 0 0 0 0 0 0
dm5551 0 91 2 0 0 0 0 0 0 0
where the total number of subordinate nodes is stored at the very beginning of the index file.
The rest of the file is structured from 11 columns. The first column is the key of the subordinate node and the following columns are the clique-degree signature of that subordinate.
SOdmela.idx
dm5571 1177 299
dm396 1763 971 1762 970 969 968 1764 972 1754 1773
dm395 550 551 1064 549 212 1062 1063 210 211
dm392 593 2197
dm391
dm390 751 2125 503 1062 2123
where the first column is the key of subordinate node and followed by clique numbers touches.
Finally, the alignment output file. The alignment filename will have a compund name of both networks being aligned and have the extension '.aln'. If the two network files being aligned are PPI1 and PPI2, then the alignment file appears as "PPI1-PPI2.aln" and its format as the following:
PPI1-PPI2.aln
ce931 dm13914
ce939 dm6744
ce982 dm1229
ce5412 dm5042
ce2610 dm7980
ce2077 dm11210
ce1877 dm6625
ce1641 dm12217
where the first column is a protein that being aligned to another protein in the second column.
Authors
IBNAL is designed and implemented by A. Elmsallati, A. Msalati, and J. Kalita at University of Colorado at Colorado Springs.
Please don't hesitate to contact the authors regarding any bugs, help, comments, feedback, and/or any suggestions.
Licensing
IBNAL is available as a free software for academic purposes only.
Disclaimer: THIS SOFTWARE IS AVAILABLE "AS IS", WITHOUT GUARANTEES OR WARRANTY OF ANY KIND, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. ALTHOUGH THE AUTHORS HAVE ATTEMPTED TO FIND AND CORRECT ANY BUGS IN THE FREE SOFTWARE PROGRAM, THE AUTHORS ARE NOT RESPONSIBLE FOR ANY DAMAGE OR LOSSES OF ANY KIND CAUSED BY THE USE OR MISUSE OF THIS IMPLEMENTATION. THE AUTHORS ARE UNDER NO OBLIGATION TO PROVIDE SUPPORT, SERVICE, UPGRADES, OR CORRECTIONS TO THE FREE SOFTWARE PROGRAM.