BREEZE-Drug sensitivity and resistance testing analysis pipeline: technical documentation

Getting started

What is BREEZE?

BREEZE is a web based interactive application for analyzing drug screening data. BREEZE application combines the comprehensive quality control, dose response fitting and calculates the quantitative scoring parameters such as EC50, IC50, drug sensitivity score (DSS) and area under the curve (AUC). The interactive visualizations generated at each step helps the understanding of data and exploratory analysis.

Getting started with BREEZE:

Input data

  1. Upload an input data file by clicking "Browse". The selected input data should be in the tabular format of Excel/ CSV/ TXT.
  2. Select the readout in inhibition/viability and clustering method.
  3. Upload the controls file (if needed to calculate the selective drug sensitivity scores (sDSS))
  4. Go to steps 4-5

Annotation data file

Two input file formats are possible in the BREEZE application:

1. Raw data arranged in tabular format which can be run through the pipeline of QC, dose response curve fitting, IC50, EC50, DSS etc. Here we need to calculate the percent inhibition for each well and hence we need to know where the controls are. The DRUG_NAME of positive controls is denoted by ‘POS’ or ‘BzCl’ and negative control by ‘NEG’ or ‘DMSO’. Percent inhibition is calculated per plate.

2. Pre-calculated percent inhibition/ percent activation values which can be run in the same way without quality control step. This is useful if we have only inhibition/activation data and no raw data.

The controls

If you have drug sensitivity scores (DSS) from a control screen, you can calculate a selective drug response based on those values. You can simply upload the excel-file with the DSS from the control. This is useful especially when you want to calculate response with respect to healthy control vs patient or difference between two cell lines. The heatmaps and tables are generated based on the selective drug sensitivity scores (sDSS). E.g. Paclitaxel has DSS of 12 in the control screen and your current screen paclitaxel has DSS of 19, so the selective response is 19-12=7.

The demo data is provided in both formats.

Data input terms

BREEZE accepts data in text, CSV, excel-files in tabular format. Here is the short glossary of the terms used in the data input:

  • WELL - The well identifier. A10, B11, C12, etc. “A10” specifies row ‘A’ and column 10 of a 384-well Plate.
  • PLATE - Plate number of 384-well plate. If you have multiple plates, the plate are numbered: 1,2,3, etc. QC is calculated per plate.
  • DRUG_NAME – Name of the drug.
  • POSITIVE_CONTROL – Death control (toxic compound like benzethonium chloride that kills most of the cells). In the raw data, positive control can be referred as POS or BzCl.
  • NEGATIVE_CONTROL - DMSO or similar which should not have any effect on the cells. In the raw data, negative control can be referred as NEG or DMSO.
  • CONCENTRATION - Concentration of compound in nano molars (nM). (The users must provide at least 4 doses for each drug)
  • SCREEN_NAME - Unique name used to identify screen. E.g. if you have cell line MCF7 which has been screened twice for 50 drugs. You can name the first screen as MCF7_1 and second as MCF7_2.
  • WELL_SIGNAL - The raw value of each well.
  • PERCENT_INHIBITION - The percent inhibition or activation value is calculated by comparing the raw values of a particular drug at a particular concentration to the positive and negative control values in the same plate. The formula to calculate percent inhibition is as follows

Output data

    Back to steps 1-3
  1. After uploading input data files, click Start button,this will bring the progress bar.
  2. Once the report is completed the Check Results button will be active to see the results.
  3. The link of the "Final report" can be sent to user's email.

Report page

On the report page, you can click on "Download full report" to download the entire results folder that includes the QC results, dose response curve fits in excel-file, DSS and sDSS tables and interactive heatmaps. The web link of this report is unique and you can come back to the same page if you save this link.

There are three main aspects of the report which are Quality control, Curve fitting, and Visualization of results.

Quality control

Quality control statistics give valuable information about the plate quality, technical issues if any and overall reliability of the data. Comprehensive table gives various numeric cutoffs, like Z’ and SSMD, are reported along with several parameters calculated for each of the control. For a good quality plate, Z’ should be above 0.5 and SSMD above 7. Out_To_In_Controls should be between 0.9 and 1.1, while CV_Neg and CV_Pos below 10.

The Z' factor (called as Z Prime factor) describes the available signal window for an assay in terms of the total separation between negative and positive controls minus the error associated with each type of control. For a good assay, Z' value should lie between 0.5 and 1. Plates with the Z' value below 0.5 should be treated with caution. SSMD strictly standardized mean difference and is used as a QC parameter measuring effect sizes for the comparison of any two groups with random values.

    Labeled z-score figure is useful for those who would prefer not to look at the full QC report and just want a minimum information to ensure good quality of the plate.

    In this example, 4 plates have Z' above 0.5 (which we can see from the barplot above).

The application generates plate level scatterplots of the raw data indicating different controls and samples. Users can look for trends in the distribution of the raw data. Positive controls are colored blue, negative controls in red, cells in green and drugs or compounds in gray.

Plate heatmaps are false color images of the assay plate based on the raw data. They help in spotting edge effects, dispensing problems and other technical issues.

Plot A and B represent data from the same drug set and same cell line screened at two different time points. In plot B, unusually low numbers are present throughout row E, possibly resulting from dispensing issues and checkerboard pattern is evident in rows H to M.

If percent inhibition provided as input, no QC results will be generated.

Dose response curve fitting

Dose response curve fitting starts with arranging the percent inhibition values at each of the concentration of a particular drug of a particular screen. The dose response is fitted using four parameter logistic curve fitting. The quantification metrics such as IC50. EC50, AUC, DSS are calculated and exported in the form of excel files. Different types of reports (%Inhibition/ %Viability) can be displayed by switching between worksheets of produced excel reports.

    The example of bad and good curve fits is presented bellow.

              Good curve.                    Bad curve                            Bad curve

We utilized machine learning based approach for marking bad quality dose-response curves (in red color). This is how it is displayed in the report excel files:

Data output terms

  • DRUG_NAME - Analyzed drug name.
  • ANALYSIS_NAME - IC50 / EC50 (% Inhibition or % Viability data).
  • IC50 – Relative IC50 value, the concentration at which the maximum response is reduced by half.
  • EC50 – Relative EC50 value, the concentration at which the maximum response is reduced by half.
  • SLOPE - Slope of the fitted dose-response curve.
  • MAX - The highest asymptote of the fitted the dose-response curve.
  • MIN - The lower asymptote of the fitted the dose-response curve.
  • Min.Conc.tested - The smallest tested drug concentration.
  • Max.Conc.tested - The highest tested drug concentration.
  • EC50_std_error - Standard deviation of IC50/EC50 value.
  • DN - Percent inhibition value at N-th concentration.
  • AUC - Parametric area under the dose-response curve .
  • GRAPH - The graphical illustration of dose-responses curve.
  • DSS - Drug sensitivity score.
  • sDSS - Selective drug sensitivity score (deviations between sample and control DSS scores; used only when control is provided).
  • SE_of_estimate - Standard error of estimate for curve fit.

By default we utilize a highly popular four-parameter logistic (4PL) nonlinear regression model for dose-response curve fitting (see fig. below, left panel). We do recommend this option for its great robustness to outliers. However, in certain scenario, the dose-response curve may not follow sigmoidal pattern (e.g. U-shaped dose-response curve). For this, we implemented a LOESS fit as an alternative option.(see fig. below, right panel).


Visualization of results

AUC and DSS scores are reported in the form of the excel-tables of all the screens as well as interactive graphics such as heatmaps. The interactive heatmap shows the curve fit of a particular drug by hovering over the value on the heatmap. By clicking on the drug name, you can view and compare dose-response curves across all the samples. Please note, that we implemented different distance metrics for heatmap clustering. Euclidean distance is used as default option, but other options, such as Pearson correlation, were shown to be powerful to detect similarities between drug response profiles.

Waterfall plot shows top active drugs only for a particular screen.

If the controls are provided then the heatmaps of selective DSS along with excel-table is also produced.

Clustering tree is a radial dendrogram for each drug with corresponding DSS values.It shows the mean DSS value across all the samples by hovering over the blue dots on the tree.

Reference: to be filled later.