Springer Publishing

Wednesday 2 September 2015

Using PubChem to match CAS numbers to identifiers

Using the PubChem REST API is the most straightforward for new users because it utilizes the URL.
CAS registry numbers are ubiquitous chemical identifiers that have use in many areas of industry. It is important, therefore, to be able to connect other chemical identifiers to CAS RN, improving the visibility of chemicals on the internet.

1. Download data (containing CAS RN)

Domestic Substances List (Canada)
Non-Confidential TSCA Inventory (United States)

2. Search individually through PubChem REST API (leveraging R) and returning as text.

"naphthenic acids"
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/1338-24-5/synonyms/txt

R code:

library(XML)
library(RCurl)
LIST<-{your vector of CAS RN}
getURL(paste0("http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/",LIST,"/synonyms/txt"))

use xmlTreeParse() for each entry in the vector to transform it into xml for slightly easier handling.

3. Create list object in R of synonyms by dumping synonyms into list objects.

For each xml, get the value of each synonym node and save it as the i-th list entry.

OR

3b. Use REST API to make a call for identifiers

"naphthenic acids"
http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/1338-24-5/property/InChI/TXT

3c. Append another column to the matrix which contains the identifier (InChI, SMILES, etc...)



USEFUL POST:
http://depth-first.com/articles/2007/05/21/simple-cas-number-lookup-with-pubchem/


No comments:

Post a Comment