Global Site Seer

ver 1.0

 

About Global Site Seer

Installation

Solve a CSI problem step-by-step

The test data files

Software Contact

Download Global Site Seer

 

About Global Site Seer

The Global Site Seer is a small package developed by Institute of Information Management in National Chiao Tung University, Taiwan, ROC. It was designed for solving common site identification (CSI) problem with methodology proposed in the paper ¡§A linear programming approach for identifying a common site on DNA sequences¡¨. As promised in the paper, it can identify the most common site, which is proved to be the global optimum, in multiple unaligned DNA sequences. This software uses LINGO, a very widely used optimization tool, as its solving engine and provides a friendly interface for user to edit DNA sequences and to specify the format of the common site.

System requirement:

u       IBM PC with Pentium III CPU or higher

u       256MB RAM or higher (with respect to larger sequence data)

u       Microsoft Windows 98/2000/XP OS  

This package contains

ü         An execution file (GlobalSiteSeet.exe),

ü         An instruction file (User's Guide.htm)

ü         8 dynamic-link libraries (*.dll) from LINGO 8.0,

ü         Several test data files (*.gss).

Installation

Step 1: Decompressed the package to a directory,

Step 2: Run "GlobalSiteSeer.exe".

Solve a CSI problem step-by-step

Step1: Edit DNA sequences

The main interface, as depicted in Fig. 1, is a simple text editor window for editing DNA sequences. It has the basic functions like create, open, save, print and edit files. Every line represents a single DNA sequence and no line feed is allowed in a single sequence. Sequences of various lengths are allowed. The sequence data should contain only the characters A, T, C and G.

Fig. 1. Main interface of Global Site Seer

 

Step 2: Specify site format and constraints

To specify the format of the common site and the logical constraints, use the ¡§Model/Build Model¡¨ instruction on the menu bar or the target-shape button on the toolbar. Figure 2 illustrates the steps of modeling the LP program and solving.

The dialog (Fig. 2a) that appears first is to specify the format of common site and the logical constraints. Two characters, ¡¥N¡¦ and ¡¥X¡¦, are used to describe the site format. Each ¡¥N¡¦ represents a single letter and each ¡¥X¡¦ represents an ignored letter. In the example in the paper (the CRP binding site of the lac operon in E. coli genome), the site format is ¡§NNNNNXXXXXXNNNNN¡¨. The complementary relationships between letters inside the site are specified in the form, ¡§m:n¡¨ (that is, the mth and nth letters are complementary). When more than one constraint imposed, they should be separated by commas.

 

Fig. 2a. Interface of specifying format and constraints.

 

Step 3: Solve the formulated problem

After the site format and constraints are assigned, push the ¡§Translate Model¡¨ button to translate the assigned problem into a linear optimization model like Model 2 in the paper. Figure 2b illustrates the generated LP model, which can be solved by pushing the ¡§Solve!¡¨ button. A dialog box like that in Fig. 2c appears and presents information such as elapsed time as the solving proceeds.

 

Fig. 2b. Generated LP model

 

Fig. 2c. Solving status

 

Step 4: Solution report

When the optimal solution has been found, a solution report is generated in a dialog box, as shown in Fig. 2d.

 

Fig. 2d. Solution report

The test data files

Global Site Seer provides several test data files (*.gss) for examination. All the test data files is extended from E. Coli genome sequences, which is taken directly from Stormo and Hartzell (1989).

The naming of the test data files is as below:

 

Ex. DNA18-105.gss

      ¡§DNA¡¨  : This is a DNA data file

      ¡§18¡¨      : Amount of sequences

      ¡§-105¡¨   : The length of each sequence

 

According to this naming rule you can find the test data file you want.

Software Contact

If any question, please contact E-mail: cjfu@iim.nctu.edu.tw.