Cytoscape stringApp diseases exercises

Learning objectives

In these exercises, we will use the stringApp for Cytoscape to retrieve molecular networks from the STRING database for genes associated with diseases according to the DISEASES database. The exercises will teach you how to:

retrieve networks for a disease
merge and compare networks
select proteins by attributes
layout and visually style the resulting networks
perform enrichment analyses and visualize the results
identify functional modules through network clustering

Prerequisites

To follow the exercises, please make sure that you have the latest version of Cytoscape installed. Then start Cytoscape and update the current apps if necessary by checking the App Updates icon in the right-most corner of the menu bar.

The exercises require you to have certain Cytoscape apps installed. Go to the Cytoscape App Store in your web browser and search for stringApp, select the app and press the Install button to install it. Similarly, make sure you have the Omics Visualizer, yFiles Layout Algorithms and clusterMaker2 apps installed before switching back to Cytoscape.

If you are not already familiar with the STRING database or stringApp, we highly recommend that you go through the STRING exercises to learn about the underlying data and the stringApp exercises to get familiarized with Cytoscape and stringApp.

Exercise 1

In this exercise, we will retrieve several different disease networks and compare them by creating the union of their nodes and edges as well as by visualizing which nodes belong to which diseases.

1.1 Disease queries

Go to the menu File → Import → Network from Public Databases. In the import dialog, choose STRING: disease query as Data Source, type Pancreatic cancer into the Enter disease term field and set the Confidence (score) cutoff to 0.7. When you press Import, stringApp will retrieve a STRING network for the top-100 proteins associated with the chosen disease. Repeat this for two of the following diseases: Acute pancreatitis, Anxiety disorder, Sleep disorder, Intestinal disease, or Diabetes mellitus.

Which diseases did you choose and what is the lowest and highest stringdb::disease score for each of them? Are the scores in a similar range or not?

Now, go to the stringdb::disease score column, click on the column name and choose Rename column. For each network, rename the column to reflect the name of the disease, e.g. Pancreatic cancer or disease PC. Note that you can remove stringdb:: from the name.

1.2 Integrate disease networks

Cytoscape provides functionality to merge two or more networks, building either their union, intersection or difference. We will now merge the disease networks so that we can identify the overlap and differences between them. Use the Merge tool (Tools → Merge → Networks…) and make sure the Union tab is chosen. Then, select the disease networks from Available Networks list (for example ‘String Network - Pancreatic cancer’, ‘String Network - Acute pancreatitis’, and ‘String Network - Anxiety disorder’). Click on > to add them to the list of Networks to Merge and click Merge.

We can change the visualization of the merged network to look like a STRING network by changing the style. Go to Style in the Control Panel (beneath Network) and click on the drop-down menu to change the style from default to STRING - Pancreatic cancer. Disable the STRING style colors and STRING style labels from the STRING Results panel (right side) to remove the colors of the proteins associated with Pancreatic cancer, make all nodes grey and center the node labels.

How many nodes and edges are in the merged network? If you pick one protein from each disease and check if they are connected in the merged network, what do you observe?

In the next step, we need to retrieve all the interactions between the nodes that were not in the same disease network since those are not yet included in the network. To do so, we first remove all edges by choosing Apps → STRING → Change confidence or type from the Cytoscape menu. In the dialog, we set the Confidence cutoff to 1.0 and press OK. Then, we open the same dialog again, change the Confidence cutoff back to 0.7 and press OK. In this way, we make sure that all interactions above the confidence cutoff between all proteins in the current network are retrieved. Note that if you get an error while setting the confidence to 0.7, you can avoid it by switching to another network in the Network panel, and then back to the Merged Network before changing the confidence.

How many edges do we have now in the merged network? Can you think of a reason for why the number of edges increased? Are the proteins you selected from different disease now connected?

To better see the nodes and their names, make sure the graphics details are enabled (View → Always Show Graphics Details) and to improve the layout of the merged network, go to Layout → Apply Preferred Layout and then to Layout → yFiles Remove Overlaps (skip the last step if the menu item is not visible on your screen due to screen size or resolution).

1.3 Use selection filters

Now, we can explore the disease scores and check how many proteins are associated with more than one disease by using Cytoscape's built-in selection filters (Filter tab located underneath the Style tab). Click the ᐩ button, choose Column filter from the drop-down menu, and select one of the disease score columns you renamed in Exercise 1.1. The filtering criteria will automatically bet set to is and then a range for the score. Add a filter for the other two diseases in the network by clicking on the ᐩ button and selecting the respective disease score column. All three filters are connected with an AND logic, which means that a node is selected only if it fulfills all three conditions.

How many nodes (proteins) are common to all three diseases? And how many are common to some of the pairs of diseases? Note that you can see the nodes common to a pair by either deleting one of the three filters or by setting the third filter to is not.

Depending on which option you choose, you will get slightly different numbers because in the first case (having only two filters) the set of proteins associated with the two disease might contain proteins associated with the third disease, while in the second case, you specifically set the third filter to choose proteins that are associated with the first two diseases but not with the third one.

1.4 Visualize disease associations

In the next step, we will import the disease scores into a different table using the Omics Visualizer app. Go to Apps → Omics Visualizer → Import form node table. In the resulting dialog, we will see all node attribute columns, including the ones created in Exercise 1.1. Note that if you kept stringdb:: in the column names, you will find the columns under the strigndb namespace. Move the three columns containing the disease scores from Available columns to Selected columns using the > button and then click Next and Import.

A new table should appear in the Cytoscape Node Panel in the Omics Visualizer Tables tab. This table contains three columns (shared name, value, and source or stringdb) and for each node, one row for each column we selected in the previous step, in this case three. Since not all nodes are associated with all three diseases, in some cases the value column is empty. We can filter the table to show only the rows that contain any disease score, since this would be useful for the visualization we want to make. Press the filter icon (second icon just above the table), choose the value column and the is not null criteria. Now you can press Apply and then the Close button.

How many rows remain after filtering? Out of how many? Do you have an idea why?

To visualize which nodes are associated with which disease, you can use the donut chart icon (6th icon in the row above the table). In the resulting dialog, choose source (or stringdb) in the Values column, keep the Mapping to Discrete and Labels to NONE. Pressing the Next button will show the next page of settings. We can pick other colors or keep the defaults and press Draw. As a result, the nodes are colored based on their association with one, two or all three diseases we combined in this network. Press the Legend icon (last icon) and confirm with the Create button to let Omics Visualizer create a legend of the visualization.

Do you observe an overlap between the three diseases? If yes, list a few of the overlapping proteins and briefly explain their biological role or function (see their description in the STRING Results Panel on the right). Would you have expected these proteins to overlap between the diseases you chose?

Exercise 2

In this exercise, we will analyze the integrated disease network by performing network clustering and functional enrichment.

2.1 Network clustering

Starting from the merged network, we will use the MCL algorithm to identify clusters of tightly connected proteins within the network. To do that, press the Cluster network (MCL) button in the STRING Results panel on the right side of the network view. Set the granularity parameter (inflation value) to 5 and click OK to start the clustering. The clusterMaker app will now run the algorithm and automatically create a network showing the clusters. To remove the node overlaps, go to Layout → yFiles Remove Overlaps.

How many big clusters are there (with more than 10 nodes)? Are any of the clusters related mostly to one or two diseases or do they contain proteins associated with all three diseases?

Alternative instructions for clustering

Go to the menu Apps → clusterMaker → ClusterMaker Cluster Network → MCL Cluster>. Set the Granularity parameter (inflation value) to 5 and choose the stringdb::score attribute (i.e. the overall STRING confidence score) as Array Sources, select the option Create new clustered network, and click OK to start the clustering. The app will now run the algorithm and automatically create a network showing the clusters.

2.2 Group-wise functional enrichment

Now we will perform functional enrichment analysis on each of the bigger clusters separately. Sometimes stringApp does not detect the clustered network as STRING network, so switch to the unclustered network and back to the clustered one. Now, select the menu Apps → STRING Enrichment → Retrieve group-wise functional enrichment. In the resulting dialog, keep the Column for groups set to __mclCluster, which is the column with the cluster numbers. Then, press Advanced to show the advanced options and set the minimum group size to 10, in order to retrieve enrichment results only for clusters with at least 10 nodes. Press OK and the STRING enrichment table will be populated with several tables, one for each cluster (look at the bottom right, next to the Node and Edge Table). You can explore the results of each of them separately. Note that if you only see enriched DISEASES, you need to reset the filter by pressing the filter icon, choosing Select all for including all categories, and pressing OK.

Can you briefly characterize the three largest clusters in terms of their functionality? What distinguishes them?

2.3 Group-wise annotations

Finally, you can automatically annotate all clusters by going to Apps → STRING Enrichment → Retrieve group-wise annotations. In the resulting dialog, you can again set the minimum group size to 10 under Advanced setting if you want to only get annotations for the bigger clusters. Press OK and you will see several new annotations being places close to the clusters.

Do the automatic annotations agree with what you wrote up for the three biggest clusters in the previous question?

Supporting lectures

The theoretical background for these exercises is covered in these short online lectures:

Supporting literature

Doncheva NT, Morris JH, Gorodkin J and Jensen LJ (2019). Cytoscape stringApp: Network analysis and visualization of proteomics data. Journal of Proteome Research, 18:623-632.
Abstract Full text Preprint