Google Refine

ITB CNR, Software

WOW! This project by Google is extremely interesting and versatile. A colleague of mine has discovered it while studying some solution for a bioinformatics problem he’s facing. This kind of tool will be extremely important for data normalization in biological datasets…

Discover more on the project home page: Google Refine, a power tool for working with messy data (formerly Freebase Gridworks) →

scientific publications have to make transition to open science

FLOSS, ITB CNR, on the Web

“An inherent principle of publication is that others should be able to replicate and build upon the authors’ published claims. Therefore, a ondition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available to readers promptly on request.”

Nature, Availability of data and materials

OpenSource.com Magazine last June 12nd published an interesting (and promising) article on Nature Methods, one of the most respected scientific publications in the world, shifting with decision to an ‘open science’ model for its articles approval process…

Researcher Natalia Ivanova was parsing this data when she noticed something strange: several bacteria had really short genes, around 200 nucleotides long, a far cry from the more typical 800-900 nucleotide length she was expecting. Short genes mean short proteins, and in this case, seemingly nonfunctional ones. The only way to make it coherent was if “stop” codons didn’t actually mean “stop”.

Ivanova experimented computationally with various codon reassignments, and ultimately found that things looked a lot more normal if “opal” was translated as a glycine amino acid. In other words, “the same word means different things in different organisms,” says Eddy Rubin, JGI’s Director. The microbial world is multilingual.

Wired, → is DNA multilingual?

ITB CNR, Life, on the Web

introducing GitHub to scientist

ITB CNR, on the Web, Software

→ Making science more open at GitHub

OpenSource magazine (formerly an only RedHat-news driven mangazine) interviews Arfon Smith, taking the occasion to introduce to scientists of all aver the world, and working in a non strictly computer science related fields, to the popular, powerful and awesomeness-engine provided by GitHub, it’s community and philosophy…

ORCID – connecting Research and Researchers

ITB CNR, on the Web

ORCID

ORCID provides a persistent digital identifier that distinguishes you from every other researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between you and your professional activities ensuring that your work is recognized.

I’ve discovered ORCID some days ago, reading the always interesting Better Posters weblog with the post Identifying poster authors: conference organizers, ask for ORCIDs!.

In the article an interesting example of scientific authorship homonymy was put in evidence, making me think about how in scientific publications there’s a general and accepted publication identification (via PubMed or similar codes) and the lack of Author’s secure, unique, individual identification.

So, while I’m still a technical person who doesn’t get involved too often in actual publication – but works behind the set – I’ve made an Orcid for me, and I’m spreading the word about it.

Please do the same.

And now a last word on Orcid. Probably the system would be better accepted if each of us can upload an actual picture of ourselves, and be facilitated in publication’s ownership via a tools which queries externals publications databases. Also giving the possibility to add two – or more – websites and ‘social’ profiles (ie. Twitter, LinkedIn, Xing ones) would be a nice plus IMHO…

 

Speaking only for myself, I’ve now arrived at the point where around 90 – 95% of what I do can be done comfortably in Python. So the major consideration for me, when determining what language to use for a new project, has shifted from what’s the best tool for the job that I’m willing to learn and/or tolerate using? to is there really no way to do this in Python? By and large, this mentality is a good thing, though I won’t deny that it occasionally has its downsides. For example, back when I did most of my data analysis in R, I would frequently play around with random statistics packages just to see what they did. I don’t do that much any more, because the pain of having to refresh my R knowledge and deal with that thing again usually outweighs the perceived benefits of aimless statistical exploration.

Conversely, sometimes I end up using Python packages that I don’t like quite as much as comparable packages in other languages, simply for the sake of preserving language purity. For example, I prefer Rails’ ActiveRecord ORM to the much more explicit SQLAlchemy ORM for Python–but I don’t prefer to it enough to justify mixing Ruby and Python objects in the same application. So, clearly, there are costs. But they’re pretty small costs, and for me personally, the scales have now clearly tipped in favor of using Python for almost everything. I know many other researchers who’ve had the same experience, and I don’t think it’s entirely unfair to suggest that, at this point, Python has become the de facto language of scientific computing in many domains. If you’re reading this and haven’t had much prior exposure to Python, now’s a great time to come on board!

Tal Yarkoni ☞ [citation needed]

The homogenization of scientific computing, or why Python is steadily eating other languages’ lunch

ITB CNR