Physically Chemist

Tuesday, 27 October 2015

Open Science Working List

academia.edu	www.academia.edu/	social network
altmetrics	http://www.altmetric.com	measuring scholarly impact
authorclaim	http://authorclaim.org/	measuring researcher impact
citeulike	http://www.citeulike.org	citation bookmarking
crossref	http://www.crossref.org	article metadata search doi resolver
crowdometer	http://crowdometer.org/
datacite	https://www.datacite.org/	doi provider
depsy	http://depsy.org	measuring scholarly impact
faculty of 1000	http://f1000.com/
Fast Track Impact	http://fasttrackimpact.com	research impact
figshare	http://figshare.com/	data repository
github	https://github.com/	computer programming
Global Research Identifier Database (GRID)	www.grid.ac	database
google scholar	scholar.google.com/	measuring researcher impact
hypothesis	https://hypothes.is	annotation organize collaborate
impactstory	https://impactstory.org/	measuring scholarly impact
journal of brief ideas	beta.briefideas.org/	journal
journalreview.org	https://www.journalreview.org/
kudos	www.growkudos.com	research impact
mendeley	https://www.mendeley.com/	citation bookmarking
microsoft academic search	http://academic.research.microsoft.com/	measuring researcher impact
mozilla science lab	https://www.mozillascience.org/
open access infrastructure for research in europe	https://www.openaire.eu/
open knowledge	https://okfn.org/	data repository
open researcher and contributor id	http://orcid.org/	researcher identification
open science framework (OSF)	http://osf.io	research publishing framework
papercritic	http://www.papercritic.com/	monitoring feedback and conversation
peerj	https://peerj.com/
plos impact explorer	http://altmetric.com/interface/plos.html	scientific conversation impact
plum analytics	http://plumanalytics.com/	measuring scholarly impact
public library of science	https://www.plos.org/	library journal
publons	https://publons.com/	scientific review
pubpeer	https://pubpeer.com/
readermeter	http://readermeter.org/	impact
researcherid	www.researcherid.com/	researcher identification
researchgate	http://www.researchgate.net/	social network
rio journal	http://riojournal.com/	journal
Securing a Hybrid Environment for Research Preservation and Access, SHERPA	http://www.sherpa.ac.uk	repository development
sciencecard	http://50.17.213.175/	measuring researcher impact
scienceopen	https://www.scienceopen.com/	publishing network
scinote	scinote.net	electronic lab notebook
slideshare	www.slideshare.net/
sparrho	https://www.sparrho.com/	recommender search engine
the new reddit journal of science	https://www.reddit.com/r/science/	journal
the winnower	https://thewinnower.com/	journal
wikipathways	www.wikipathways.org/
wikipedia	https://www.wikipedia.org/
wiktionary	https://en.wiktionary.org/
zenodo	http://zenodo.org/	data repository

the content mine,contentmine.org,
journal of open humanities data,
science.ai

Scihub, http://sci-hub.io, http://sci-hub.cc

The open journal, http://theoj.org,

Protocols.io, https://www.protocols.io

Pubchase, www.pubchase.com,

Monday, 19 October 2015

PubPeer - Scientific Conversation

PubPeer, The Online Journal Club, is a program that is involved in carrying on the conversation of science, mostly after work has been published.

Nuts and Bolts

Essentially, it appears to work by searching articles based on DOI or other unique identifier (e.g. PubMed ID) through the PubPeer interface. Once the article is found, you can provide comments on it.

Getting started

You become a member by inputting the DOI of a paper you published, selecting which author you are, then providing your institutional email address. ResearchGate is another service that requires an institutional email address to get started.

Providing Commentary

Of course there are guidelines on how to provide appropriate commentary through PubPeer.

The Browser Extension

I, as a good scientist, installed the browser extension. I tried it out searching the keyword "naphthenic acids". No PubPeer results on the first page. I also searched "cancer", "pubpeer" and "metabolomics" and there was no PubPeer commentary on any article on the first page.

Finally, I went to the PubMed featured comment for the day (Oct. 2, 2015) and saw the following page. The yellow bar above the article title shows how many comments are on PubPeer.

To access the PubPeer comment, you click on the white words "1 comment on PubPeer". You are then taken to PubPeer's webspace to explore the comment. The comment at PubPeer is pretty much the same comment below the article in PubMed Commons.

Commenting

I posted my first comment on PubPeer concerning an article about the synthesis of yaku'amide. This is how it looks!

Future

Will they permanently archive commentaries and/or commentary chains with DOIs? What is the difference between PubPeer and PubCommons in PubMed? How is PubPeer different than Disqus?

I know there are subtle differences, but I am still waiting to hear back from those organizations. Until then, PubPeer remains another excellent tool for scientific commentary just waiting to explode!

Monday, 5 October 2015

Sparrho - Scientific Recommendation

Sparrho ("sparrow") is a scientific recommendation service.

When I began playing with Sparrho, I got the feeling that it was similar to Google Scholar, but I knew it was different. I just couldn't tell how.

So I asked Sparrho myself!

@matthwmaclennan we do personal rec of other scientific content types (+ articles/patents)...
— Sparrho (@sparrho) September 28, 2015

@matthwmaclennan ...and these rec are not only based on articles u've published, so we can rec even if you haven't published anything before
— Sparrho (@sparrho) September 28, 2015

@matthwmaclennan that's great, thanks for the shoutout! Let us know if u need anything (pics, slides etc) to make ur life easier :)
— Sparrho (@sparrho) September 28, 2015

Thanks so much! I also got an invitation to receive some "sparrhoswag". I'm not sure what it is, but it sounds good. Now I am trying to navigate sparrho.

--

Be it known that I am obsessed with naphthenic acids in oil sands process waters!

How can sparrho help me?

--

Well, the interface is sleek-looking, purple and starry! I am looking into a fascinating world. As I type in and save keywords, I am building a repertoire of articles of which I can mark as relevant (checkmark) or irrelevant (X) for my purposes. It's like I am building a research topic channel. I can immediately share articles over a variety of networks and link to the location of the article online.

The only thing is the 1D, 2D, 3D network graph logos that confuse me a bit.

THIS is how Sparrho is different from Google Scholar.

The 1-D graph image is a search which contains only the exact keywords you've entered for your channel.

The 2-D graph image search includes keywords defined as a more general concept. Here is where you want more and more keywords to give a better overall context for the research.

How could it get better than that? Well, the 3-D graph image represents Sparrho recommending new articles you didn't think you would need!

Okay, let me go back to Sparrho then and play the game!

The Game

A 1-D search for the keywords "naphthenic acids", "OSPW", and "oil sands" gives 75 hits. By the way, these 75 hits are mostly research that has been published in 2015. The oldest articles in this channel are 2012.

A 2-D search returns over 400 hits! Excellent. I tend to get hits concerning various aspects of naphthenic acids chemistry: biodegradation, toxicity, structure determination, etc. Extremely useful. If an article is currently considered "noise" to a channel, I can mark it as irrelevant. I can also dig up the articles I have marked by clicking HISTORY button on the top right. I can change the status of an article to which I have already applied relevance status (for that particular channel).

A 3-D search produces 261 hits, which is less than 403, but doesn't mean the search has somehow failed. On the contrary, it succeeded by returning exactly the number of results it is supposed to return for the keywords and relevance scores supplied! Perhaps it suggests that my channel's keywords are quite 'directed' and do not have a diffuse set of connotations or definitions. I saw here some articles related to climate change and the Athabasca oil sands, as well as articles concerning oil sands soil nitrogen availability, and honouring indigenous treaty rights.

When you click on the information about an article, a pleasant green window pops down underneath called "People who read this also read", showing articles that can be called as such.

Summary

I am very excited to learn how to use Sparrho more effectively. I can envisage the 3-D search being extremely useful for academics who are charged with research that is very nebulous, publicly involved and has a lot of angles by which to approach: In fact this is all PhD projects, no matter how pessimistic you may feel! Using Sparrho can open you up to new research that is still directed toward your primary research interests and goals! Although I am developing analytical methods to characterize naphthenic acids, my efforts are directly related to policy and the wider industry. I believe it is my job to understand and be able to effectively manage the milieu in which my research is situated so I can have an impact there. Sparrho is helping.

P.S.: sparrhoswag?

Sunday, 13 September 2015

Returning Google Search results in R - "mirex"

Introduction

When searching for specific information on the internet, the keywords we use often have multiple meanings. It is problematic when using statistical measures to gather information quickly: If you get 5 million results, how many of the results are directly related to the exact meaning you intend to search for? How many different meanings are there for a single word? Statistical metrics will easily lose sight of the range of meanings unless they are managed appropriately.

Think of the English word "love" (About 5.7 billion results on Google) and how many websites are dedicated to it. You may be looking for a detailed explication of the Greek notions of love as eros and agape, but end up on someone's careless Facebook post where 'love' is being used sarcastically. You may be directed to companies or people whose names include the word 'Love'.

Chemical nomenclature searching

In the world of chemistry, language is also extremely important and very complicated. IUPAC chemical nomenclature is a kind of agglutinative language, but additionally, many chemicals have their own trade names and traditional names. Some of these names are so old and common that they have acquired many different meanings and contexts over the years.

When searching for information about a chemical called "mirex", a prohibited pesticide, it is important to know that PubChem alone has amassed 121 "synonyms" and alternate names for this molecule. Using R, we can record the estimated number of hits returned in Google Search for each synonym of 'mirex'. The number of hits tells us something of the popularity of the word, but we cannot tell if there are other non-chemical meanings to the word that artificially inflate the results numbers.

R code and example

The following R code returns the approximate Google Search number of results for each entry in a vector.

library(XML)
library(RCurl)
LIST<-{a vector or matrix column of identifiers}
vec<-c()
for(i in 1:length(LIST)){
results<-unlist(xpathApply(htmlTreeParse(getURL(paste0("https://www.google.ca/search?q=",LIST[i]),
ssl.verifyhost=F,ssl.verifypeer=F,
followlocation=T),useInternalNode=T),"//div[@id='resultStats']",xmlValue))
vec[i]<-as.numeric(paste0(unlist(strsplit(results,"[A-Za-z, ]+")),collapse=""))
}

For the 121 synonyms listed in PubChem for the pesticide "mirex", the results can be displayed in a bar plot

barplot(vec,ylim=c(0,6e5))

The two tallest bars extend much further vertically past the boundary of the plot window into the millions. The synonym for mirex which returned the most hits (about 16,400,000) was "HRS 1276" (without double quotation marks). A few reasons are that when not enclosed in double quotation marks, HRS can refer to "hotel reservation service" or "hrs" as an abbreviation for 'hours', searching '1276 hours'. When enclosed in double quotation marks, "HRS 1276" returns 973 results. It can be said that this terms can have high "keyword search entropy"--a concept I will explore at a later time.

Summary

For a chemical with high legal profile, such as mirex, it is important to provide appropriate search terms to find the information needed. Perhaps those search terms with the smallest number of hits are the most relevant terms. Perhaps "mirex" is the most popular synonym for the chemical, but what percentage of hits returned by Google Search of "mirex" relate to the the pesticide and what portion relate to something else? More hits does not necessarily mean more popular.

Monday, 7 September 2015

Cucumber + cherry

When eaten together, cherry and cucumber compliment each other. I wouldn't say that they directly enhance each other's flavours, but they seem to produce a slightly unique and positive flavour. That unique flavour, however, is not as strong as the natural cherry flavour still present.

The cherry flavour appears to dominate the combination just slightly and the cucumber flavour is almost overpowered.

Cherry and cucumber are somewhat close in texture because they are both crunchy so this combination is approximately equal to cucumber in texture, but contains the full texture of both cucumber and cherry.

Thursday, 3 September 2015

Using Google Books API and R to illustrate the general impact of a scientific work over time

In order to get a quick idea of how a book has affected scientific research over time, Google Books API provides that data and R provides the visual!

The Book "The Carbohydrates", edited by Ward Pigman, is an example of a book that you might think has had a significant impact on the landscape of chemical science over the years. If another book cites this one, chances are Google Books will have a record. We can use the Google Books API to have a look.

R code:

library(XML)
library(RCurl)
library(RJSONIO)
result<-getURL("https://www.googleapis.com/books/v1/volumes?q=%22the%20carbohydrates%22%20pigman&startIndex=0",ssl.verifyhost=F,ssl.verifypeer=F,followlocation=T)

#This returns a text object in R which consists of 10 results in JSON format.

list<-fromJSON(result)

totalcount<-fromJSON(result)[[2]] ##returns the total results number
fromJSON(result)[[3]] ##returns all the listings for the 10 results
fromJSON(result)[[3]][[1]]$volumeInfo$publishedDate ##returns the date the book was published for result number 1.

lapply(fromJSON(result)[[3]],function(x) x$volumeInfo$publishedDate) ##returns the publishing date for all 10 books in the list.

##Again you will need to loop this with a new startIndex value each time until 440 is reached.
#Finally, categorize the book;s impact over time by grouping the dates according to year (because
#this is most likely the only datum consistently available.
#The following loop will amass all the JSON returned.

totalcount<-fromJSON(result)[[2]] ##returns the total results number
list1<-list()
#Begin for loop
for(i in 0:floor(totalcount/10)){

list1[[i]]<-getURL(paste0("https://www.googleapis.com/books/v1/volumes?q=%22the%20carbohydrates%22%20pigman&startIndex=",(i*10)),ssl.verifyhost=F,ssl.verifypeer=F,followlocation=T)

}

#The following loop will amass only the published date of results. Less data to save and more time between calls (which is a good thing for the servers).

totalcount<-fromJSON(getURL("https://www.googleapis.com/books/v1/volumes?q=%22the%20carbohydrates%22%20pigman&startIndex=0",ssl.verifyhost=F,ssl.verifypeer=F,followlocation=T))[[2]] ##returns the total results number
vec<-c()
#Begin for loop
for(i in 0:floor(totalcount/10)){

vec<-c(vec,unlist(lapply(fromJSON(getURL(paste0("https://www.googleapis.com/books/v1/volumes?q=%22the%20carbohydrates%22%20pigman&startIndex=",(i*10)),ssl.verifyhost=F,ssl.verifypeer=F,followlocation=T))[[3]],function(x) x$volumeInfo$publishedDate)))

}

vec

#If you want to call quicker, use the URL to extract only the totalItems and publishedDate information by appending the following to the URL

#&fields=totalItems,items/volumeInfo/publishedDate

#This will return only the dates.

#Display vec in R as a kind of timeline graph using package igraph

#As a saveable function. Input your API key in double quotations and your query in double
#quotations (URL-encoded).

GBapi<-function(query,key){
totalcount<-fromJSON(getURL(paste0("https://www.googleapis.com/books/v1/volumes?q=",query,"&startIndex=0&key=",key),ssl.verifyhost=F,ssl.verifypeer=F,followlocation=T))[[2]] ##returns the total results number

list1<-list()
#Begin for loop
for(i in 0:floor(totalcount/10)){

list1[[i+1]]<-fromJSON(getURL(paste0("https://www.googleapis.com/books/v1/volumes?q=",query,"&startIndex=",(i*10),"&key=",key),ssl.verifyhost=F,ssl.verifypeer=F,followlocation=T))

}

list1

}

And lapply() on the resulting list for the data.

And for comparison

#partition the plot space
par(mfrow=c(2,1))
#Plot one book first. xlim parameter makes sure the windows are the same size.

plot(table(unlist(regmatches(unlist(lapply(gbapi,
function(x) lapply(x$items,function(y) y$volumeInfo$publishedDate))),
gregexpr("[0-9]{4}",
unlist(lapply(gbapi,
function(x) lapply(x$items,function(y) y$volumeInfo$publishedDate))))))),
ylab="Number of Books on Google Books",xlim=c(1800,2015))

title(main="Some books published per year
relating to
'Computational Chemical Graph Theory' by Trinajstic")
#plot the other book below
plot(table(unlist(regmatches(vec,gregexpr("[0-9]{4}",vec)))),xlim=c(1800,2015),
ylab="Number of Books on Google Books")
title(main="Some books published per year
relating to

'The Carbohydrates' by Pigman")

Hopefully this kind of metric provides a useful way to approximate the scholarly impact a book has had on other books. In the sciences, textbooks and other books have always had an authoritative quality to them, so this metric may indicate a certain kind of scientific influence which may include teaching, information gathering and reputation all in one.

Current difficulties are mostly related to the limits imposed on the user by the Google Books API. At a certain point, the number of books returned on a result page diminishes. A workaround for this is in the works.

Wednesday, 2 September 2015

Using PubChem to match CAS numbers to identifiers

Using the PubChem REST API is the most straightforward for new users because it utilizes the URL.
CAS registry numbers are ubiquitous chemical identifiers that have use in many areas of industry. It is important, therefore, to be able to connect other chemical identifiers to CAS RN, improving the visibility of chemicals on the internet.

1. Download data (containing CAS RN)

Domestic Substances List (Canada)
Non-Confidential TSCA Inventory (United States)

2. Search individually through PubChem REST API (leveraging R) and returning as text.

"naphthenic acids"
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/1338-24-5/synonyms/txt

R code:

library(XML)
library(RCurl)
LIST<-{your vector of CAS RN}
getURL(paste0("http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/",LIST,"/synonyms/txt"))

use xmlTreeParse() for each entry in the vector to transform it into xml for slightly easier handling.

3. Create list object in R of synonyms by dumping synonyms into list objects.

For each xml, get the value of each synonym node and save it as the i-th list entry.

OR

3b. Use REST API to make a call for identifiers

"naphthenic acids"
http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/1338-24-5/property/InChI/TXT

3c. Append another column to the matrix which contains the identifier (InChI, SMILES, etc...)

USEFUL POST:
http://depth-first.com/articles/2007/05/21/simple-cas-number-lookup-with-pubchem/