Springer Publishing

Thursday 2 June 2016

Getting creative with PubChem molecular similarity graphs

PubChem has the ability to graph the molecules in a search result by Tanimoto similarity. The result is displayed as a hierarchical graph. Below is a snippet of a PubChem substructure search I made using a generalized steroid scaffold (take cholic acid, remove stereochemistry symbols and peripheral oxygens): SMILES: [CH]1CC[CH]2[C]1([CH](C[CH]3[CH]2[CH](C[CH]4[C]3(CC[CH](C4))C)))C

~51,000 results


You can save this data as a GML file. GML is a lightweight graph format that saves connectivity information and usually seems pretty easy to load into graph readers.

I loaded this hierarchical data into yEd, a free graph editor. I applied a circular layout regime, BCC compact, with no special requirements and voila: The data concerning steroid-like molecules in PubChem comes out looking like this!



Need I say that this format looks very appealing?! Next I want to colour-code based on molecular weight and use this data for some mass spectrometry-based problems!