Review waiting, please be patient.

This may take 8 weeks or more, since drafts are reviewed in no specific order. There are 1,826 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · Clustergrammer (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Wikipedia) · Submitted 11 days ago by Saketrohit24 (talk: D · +) · Last edited 11 days ago by Saketrohit24

Submission declined on 6 December 2024 by Passengerpigeon (talk).

This draft's references do not show that the subject qualifies for a Wikipedia article. In summary, the draft needs multiple published sources that are:

in-depth (not just passing mentions about the subject)
reliable
secondary
independent of the subject

Make sure you add references that meet these criteria before resubmitting. Learn about mistakes to avoid when addressing this issue. If no additional references exist, the subject is not suitable for Wikipedia.

If you would like to continue working on the submission, click on the "Edit" tab at the top of the window.
If you have not resolved the issues listed above, your draft will be declined again and potentially deleted.
If you need extra help, please ask us a question at the AfC Help Desk or get live help from experienced editors.
Please do not remove reviewer comments or this notice until the submission is accepted.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Declined by Passengerpigeon 11 days ago. Last edited by Saketrohit24 11 days ago. Reviewer: Inform author.

This draft has been resubmitted and is currently awaiting re-review.

Comment: Wikipedia is not a repository for software user manuals. Passenger pigeon (talk) 03:09, 6 December 2024 (UTC)

Clustergrammer

Clustergrammer is a web-based interactive tool designed for visualizing and analyzing high-dimensional data through heatmaps. Developed by the Ma'ayan Laboratory at the Icahn School of Medicine at Mount Sinai. The tool addresses the limitations of static heatmaps by integrating interactive features, facilitating the analysis of complex biological datasets, including genomics and proteomics.

Introduction

Clustergrammer is a visualization tool specifically designed for high-dimensional data commonly encountered in computational biology and data science.^[1]. Unlike traditional static heatmaps, it enables users to explore data interactively by zooming, panning, clustering, and reordering of rows and columns. The tool is widely applicable to various domains, including gene expression analysis, protein interaction networks, and single-cell data visualization. By leveraging web-based technologies, it facilitates the creation of accessible and shareable visualizations that simplify the interpretation of complex datasets ^[2]

Features

Interactive Heatmaps

Clustergrammer enables users to create interactive heatmaps that allow for dynamic exploration of data. Features ^[3]include zooming, panning, filtering, reording, search and highlighting.

Zooming and Panning: Users can navigate large datasets efficiently, zooming in on specific regions of the heatmap to analyze fine-grained details or zooming out to observe broader patterns. Panning allows users to move across the dataset seamlessly, making it easier to explore different areas of interest.

Filtering and Reordering: Rows and columns can be reorganized using a variety of methods, such as hierarchical clustering, summation, variance, or alphabetical labels. This flexibility enables users to uncover patterns, relationships, or outliers in the data that might otherwise be overlooked in static representations.
Search and Highlighting: The tool includes robust search functions that allow users to locate specific rows, columns, or subsets of data quickly. Highlighting options enable users to emphasize particular features, facilitating comparative analysis and focused exploration.

Interactive Dimensionality Reduction

Dimensionality reduction techniques like Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) simplify high-dimensional data for visualization. Clustergrammer enhances this process by allowing users to filter rows based on sum or variance, focusing on the most informative data points. This interactive filtering helps identify how specific dimensions affect clustering patterns. For smaller datasets, it uses animations to show the impact of these changes, aiding in data interpretation.

Clustering Algorithms

The interactive heatmap using Clustergrammer when clustering applied to the CCLE. In this image, range can selected to increase or decrease the number of clusters formed using the hierarchial clustering and that is reflected in the dendograms interactive visualization.

Clustergrammer employs hierarchical clustering algorithms, with support for additional methods such as K-means clustering. Users can visualize dendrograms, toggle between clustering levels, and extract enriched clusters.

Interactive Dendrograms: Clustergrammer employs interactive dendrograms to represent hierarchical clustering of data rows and columns. Instead of displaying the entire tree, it shows one slice at a time using gray trapezoids. Users can adjust the dendrogram slider to explore different clustering levels, revealing larger or smaller clusters. Interacting with these trapezoids highlights specific clusters, provides detailed information, and allows exporting of row or column names. For gene-level data, users can send clustered genes to Enrichr for enrichment analysis, facilitating deeper biological insights.

Customization Options

The tool provides various customization features:

Users can adjust the opacity, highlight categories, and crop data subsets for detailed exploration.
Integrations with external APIs, such as Enrichr, allow for enrichment analysis directly within the visualization.

Applications

1. High-Dimensional Data Visualization

Clustergrammer is a powerful tool for analyzing large and complex datasets by creating interactive heatmaps. These visualizations enable researchers to examine high-dimensional data intuitively, even when datasets contain thousands of rows and columns. This makes it particularly useful for summarizing, filtering, and interpreting large-scale experiments or studies.

2. Gene Expression Analysis

Widely used in genomics, Clustergrammer aids in analyzing gene expression data, including single-cell RNA sequencing (scRNA-seq) ^[4]>. By visualizing relationships among genes or samples, the tool helps researchers identify meaningful patterns, clusters, and correlations, offering insights into underlying biological processes or gene functions.

3. Biological Network Visualization

The tool is applied to represent biological networks such as protein-protein interactions, metabolic pathways, or gene regulatory networks. Clustergrammer’s clustering capabilities help pinpoint highly interconnected nodes or significant components, which are often critical in understanding the system's overall function or discovering key biomarkers.

4. Hierarchical Clustering

Clustergrammer supports hierarchical clustering, a method for organizing data into groups based on similarity. This is essential for categorizing features like genes, conditions, or samples into clusters, revealing relationships and structures within the data. Such clustering is especially valuable in understanding biological datasets, where interconnectedness is common.

5. Single-Cell Data Analysis

In single-cell studies, Clustergrammer is instrumental in exploring datasets derived from technologies like 10X Genomics. It allows researchers to classify cells based on gene expression signatures, visualize population structures, and assess how cells relate to one another, helping to uncover novel cell types or states.

6. Comparative Data Analysis

Clustergrammer facilitates the comparison of multiple datasets or experimental conditions. By visualizing and contrasting data in heatmaps, researchers can quickly identify similarities or differences between groups, aiding in hypothesis generation or validation.

Technical details

Architecture

Clustergrammer operates on a modular architecture comprising:

Backend: Built using Python, with key libraries such as NumPy and SciPy for data processing.
Frontend: Employs JavaScript and D3.js for rendering interactive visualizations.
Integration: The tool supports integration with Jupyter Notebooks and REST APIs, enabling seamless workflow incorporation.

Core Libraries are Clustergrammer-PY and Clustergrammer-JS.

Core Components

Clustergrammer consists of two primary components: Clustergrammer-JS, and Clustergrammer-PY.

Clustergrammer-JS

It is a frontend and JavaScript visualization library that generates interactive heatmaps in web browsers. Built on D3.js and SVG technology, it renders complex data in an explorable format with features like:

Data filtering options (Data filtering capabilities encompass three main categories: value-based, categorical, and interactive filtering. Value filters allow threshold-based row or column manipulation, handling of numerical criteria, and removal of sparse data points. Category-based filtering enables grouping by metadata, visibility toggling of specific groups, and filtering based on clustering outcomes. Interactive selections provide manual row/column control, subset data visualization, and dynamic content reordering, allowing users to explore and analyze complex datasets efficiently through both preprocessing and real-time filtering options.)
Customizable information displays on hover
Seamless web application integration

The library works with JSON data produced by Clustergrammer-PY and provides developers the tools to embed dynamic visualizations in their web projects. Its source code and installation details are available on.^[5].

Clustergrammer-PY

This is a backend Python package that enables users to create dynamic heatmap visualizations through automated data analysis. The tool processes input data to generate JSON files that power interactive web-based displays via Clustergrammer-JS.

Key features include:

Data preprocessing capabilities like hierarchical clustering and multiple normalization options
Support for both file-based and DataFrame inputs
Integration with major scientific Python libraries (The library demonstrates broad compatibility through integration with essential scientific Python packages, including NumPy for matrix operations, Pandas for DataFrame processing, SciPy for statistical analysis, and scikit-learn for machine learning capabilities.)
Cross-version compatibility (Its cross-version support ensures functionality across both Python 2.7 and Python 3.x versions, maintaining backward compatibility through consistent function implementations and careful management of package dependencies.)

The package handles data transformation and prepares structured JSON output suitable for visualization. Users can access it through the source code repository ^[6]

Clustergrammer2

Clustergrammer2 is a specialized Jupyter widget that enables interactive visualization of high-dimensional datasets. Developed using widget-ts-cookiecutter^[7]> and regl WebGL library ^[8]>, it focuses on analyzing single-cell datasets, particularly RNA sequencing data. The tool also supports the exploration of large-scale data, like the analysis of gene expression patterns across thousands of cells ^[9].

Implementation Guide

Clustergrammer is accessible through multiple platforms, including its web-based interface, Python API, and Jupyter Notebook integration. Below is a step-by-step guide to implementing Clustergrammer in various scenarios:

1. Using the Web Interface

The easiest way to use Clustergrammer is through its web interface:

Visit the Clustergrammer Web Tool.^[10]
Upload a CSV or TSV file containing your high-dimensional data.
Use the interactive heatmap to explore, filter, and cluster your data dynamically.

2. Python API: Clustergrammer-PY

The Python API provides advanced users with full control over preprocessing and visualization. Follow these steps to use the API:

Step 1: Installation

Install the Clustergrammer-PY library using pip:

pip install clustergrammer-py

Step 2: Import the Library

Start by importing the Clustergrammer-PY module:

from clustergrammer import Network

Step 3: Load and Preprocess Data

Initialize the Network object and load the data:

net = Network()
net.load_df(data)

Step 4: Apply Clustering

Use the built-in clustering algorithms:

net.cluster()

Step 5: Save and Visualize Results

Save the clustered data as a JSON file for visualization:

net.write_json_to_file('viz', 'clustergrammer_output.json')

3. Jupyter Notebook Integration

To Visualize Clustergrammer heatmaps directly within Jupyter Notebooks, use the Clustergrammer2 widget

1.Install the clustergrammer2 package

pip install clustergrammer2

2.Import and use the widget in a Jupyter Notebook:

import clustergrammer2
from clustergrammer2 import CGM

# Initialize the Clustergrammer2 object
cgm = CGM()

# Load data into the widget
cgm.load_data(data)

# Display the interactive heatmap
cgm.widget()

This integration allows for seamless interaction with heatmaps during data exploration.

4. Integration with REST APIs

Clustergrammer supports REST API endpoints for automation:

Prepare a JSON-formatted data file as described in the Clustergrammer documentation.
Use tools like curl or Python’s requests library to send POST requests to the API:

import requests

# Define API endpoint and data payload
url = "https://clustergrammer_api_url"
payload = {"data": data.to_json()}

# Send POST request
response = requests.post(url, json=payload)

# Retrieve clustered data
clustered_data = response.json()

Case studies

1) Analyzing MNIST Dataset Using Cluster Grammar

The image displays the analysis results of three clusters (Cluster 5, Cluster 9, and Cluster 12) from a MNIST dataset. Each cluster is represented with the majority-digit category and corresponding counts of occurrences.These counts suggest that the clusters capture groupings of digits based on shared features, but overlapping counts, such as "Four" appearing in both Cluster 9 and Cluster 12, indicate potential feature similarity . The color-coded bars further highlight the distribution of majority digits within each cluster.

This case study demonstrates how Clustergrammer can enhance data analysis and visualization, focusing on the MNIST dataset, a widely used benchmark for handwritten digit classification. The objective was to explore clustering patterns and feature relationships within the dataset, leveraging Clustergrammer’s interactive heatmaps to uncover insights into the dataset’s structure, identify feature significance, and detect anomalies. By analyzing similarity matrices and dynamically clustering data, Clustergrammer enabled a deeper understanding of how pixel intensities and digit structures contribute to classification, providing a valuable tool for data exploration and machine learning workflows.

Data: The MNIST dataset, a benchmark for handwritten digit recognition, was analyzed using Cluster Grammar to explore clustering patterns, feature relationships, and dataset quality. The dataset consists of 70,000 grayscale images of handwritten digits (0–9), with 60,000 used for training and 10,000 for testing. Each image, originally 28×28 pixels, was flattened into a 784-dimensional vector. Preprocessing steps included normalization of pixel intensity values to the range [0, 1].

Visualization Features: The heatmap allowed users to zoom and pan which focuses on specific clusters for detailed analysis. Reorder Rows and Columns that dynamically reorganize data to highlight patterns and annotate data that adds metadata, such as digit labels, for better interpretability. Bright regions in the heatmap corresponded to high-intensity pixels critical for classification, providing insights into feature importance. Additionally, isolated rows and columns highlighted outliers, such as mislabelled or poorly written digits.

Analysis Using Clustergrammer: This analysis demonstrated the effectiveness of Cluster Grammar in visualizing and understanding high-dimensional data. It revealed valuable insights into clustering patterns, feature significance, and dataset anomalies. The interactive visualization facilitated feature selection, anomaly detection, and dataset quality assessment, offering a powerful approach for analyzing complex datasets like MNIST. Cluster Grammar’s versatility makes it suitable for broader applications in computational biology, machine learning, and data science.

2) Lung Cancer Data Analysis

(a) Lung cancer cell lines (columns) were clustered based on a combination of PTMs and mRNA expression data (rows). (b) Zooming into a cluster containing Keratins with commonly up-regulated expression and post-translational modification in the NSCLC cluster. (c) Zooming into a cluster containing expression and methylation data for the lung associated transcription factor, NKX2-1.

This case study is taken from ^[11]. This demonstrates how Clustergrammer can enhance data analysis and visualization, focusing on a dataset collected from lung cancer cell lines. The goal was to analyze post-translational modifications (PTMs) and gene expression patterns across different types of lung cancer to identify relationships and biological mechanisms.

Data: Post-Translational Modifications (PTMs) are changes made to proteins after they are synthesized, including processes like phosphorylation (adding a phosphate group), acetylation (adding an acetyl group), and methylation (adding a methyl group). These modifications can alter protein function and play a critical role in cancer development. In this study, PTMs were measured in 42 lung cancer cell lines using Tandem Mass Tag (TMT) mass spectrometry, a technique for detecting protein changes. Additionally, gene expression data, which reflects the activity levels of genes in producing their products like mRNA or protein, was collected for 37 of these cell lines from the Cancer Cell Line Encyclopedia (CCLE). The analysis focused on two major types of lung cancer: Non-Small Cell Lung Cancer (NSCLC), which includes the majority of cases such as adenocarcinoma and squamous cell carcinoma, and Small Cell Lung Cancer (SCLC), a more aggressive but less common type.

Analysis Using Clustergrammer: Using Clustergrammer, patterns in the data were identified by clustering cell lines based on their post-translational modifications (PTMs) and gene expression levels. The analysis revealed that cell lines grouped into two major types: Non-Small Cell Lung Cancer (NSCLC) and Small Cell Lung Cancer (SCLC). Within these primary groups, further clustering was observed based on specific subtypes, such as adenocarcinoma and squamous cell carcinoma, as well as genetic mutations. Key observations included strong correlations between modifications (phosphorylation, acetylation, methylation) in keratin family proteins and their corresponding mRNA levels, suggesting a close link between protein modifications and gene activity. Additionally, the NKX2-1 transcription factor, a critical regulator in lung cancer, showed strong correlations between its methylation patterns, mRNA expression, and other lung-related genes such as SFTA3 and SOX2.

This study demonstrated how Clustergrammer can quickly identify important patterns in complex biological data, helping researchers understand differences between cancer types and potentially leading to better treatment strategies.

References

^ Clustergrammer documentation: https://clustergrammer.readthedocs.io/
^ Fernandez, Nicolas F.; Gundersen, Gregory W.; Rahman, Adeeb; Grimes, Mark L.; Rikova, Klarisa; Hornbeck, Peter; Ma’ayan, Avi (2017). "Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data". Scientific Reports. 7. doi:10.1038/s41598-017-01819-3 (inactive 2024-11-20).{{cite journal}}: CS1 maint: DOI inactive as of November 2024 (link)
^ "Clustergrammer Documentation". Read the Docs. Retrieved 2024-11-19.
^ Jovic, D.; Liang, X.; Zeng, H.; Lin, L.; Xu, F.; Luo, Y. (2022). "single cell RNA". Clinical and Translational Medicine. 12 (3): e694. doi:10.1002/ctm2.694. PMC 8964935. PMID 35352511.
^ "Clustergrammer-JS GitHub Repository". GitHub. Retrieved 2024-11-19.
^ "Clustergrammer-PY GitHub Repository". GitHub. MaayanLab. Retrieved 2024-11-19.
^ "widget-ts-cookiecutter". GitHub.
^ "regl". GitHub.
^ "Clustergrammer2 GitHub Repository". GitHub. Icahn School of Medicine at Mount Sinai. Retrieved 2024-11-19.
^ "ClusterGrammer Webtool".
^ "Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data".

[1] Clustergrammer documentation: https://clustergrammer.readthedocs.io/

[2] Fernandez, Nicolas F.; Gundersen, Gregory W.; Rahman, Adeeb; Grimes, Mark L.; Rikova, Klarisa; Hornbeck, Peter; Ma’ayan, Avi (2017). "Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data". Scientific Reports. 7. doi:10.1038/s41598-017-01819-3 (inactive 2024-11-20).{{cite journal}}: CS1 maint: DOI inactive as of November 2024 (link)

[3] "Clustergrammer Documentation". Read the Docs. Retrieved 2024-11-19.

[4] Jovic, D.; Liang, X.; Zeng, H.; Lin, L.; Xu, F.; Luo, Y. (2022). "single cell RNA". Clinical and Translational Medicine. 12 (3): e694. doi:10.1002/ctm2.694. PMC 8964935. PMID 35352511.

[5] "Clustergrammer-JS GitHub Repository". GitHub. Retrieved 2024-11-19.

[6] "Clustergrammer-PY GitHub Repository". GitHub. MaayanLab. Retrieved 2024-11-19.

[7] "widget-ts-cookiecutter". GitHub.

[8] "regl". GitHub.

[9] "Clustergrammer2 GitHub Repository". GitHub. Icahn School of Medicine at Mount Sinai. Retrieved 2024-11-19.

[10] "ClusterGrammer Webtool".

[11] "Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data".

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]