Wednesday, August 2, 2017

Figure scripting and how we organize computational work in the lab

Saw a recent Twitter poll from Casey Brown on the topic of figure scripting vs. "Illustrator magic", the former of which is the practice of writing a program to completely generate the figure vs. putting figures into Illustrator to make things look the way you like. Some folks really like programming it all, while I've argued that I don't think this is very efficient, and so arguments go back on forth on Twitter about it. Thing is, I think ALL of us having this discussion here are already way in the right hand tail in terms of trying to be tidy about our computational work, while many (most?) folks out there haven't ever really thought about this at all and could potentially benefit from a discussion of what an organized computational analysis would look like in practice. So anyway, here's what we do, along with some discussion of why and what the tradeoffs are (including talking about figure scripting.

First off, what is the goal? Here, I'm talking about how one might organize a computational analysis in finalized form for a paper (will touch on exploratory analysis later). In my mind, the goal is to have a well-organized, well-documented, readable and, most importantly, complete and consistent record of the computational analysis, from raw data to plots. This has a number of benefits: 1. it is more likely to be free of mistakes; 2. it is easier for others (including within the lab) to understand and reproduce the details of your analysis; 3. it is more likely to be free of mistakes. Did I mention more likely to be free of mistakes? Will talk about that more in a coming post, but that's been the driving force for me as the analyses that we do in the lab become more and more complex.

[If you want to skip the details and get more to the principles behind them, please skip down a bit.]

Okay, so what we've settled on in lab is to have a folder structured like this (version controlled or Dropboxed, whatever):



I'll focus on the "paper" folder, which is ultimately what most people care about. The first thing is "extractionScripts". This contains scripts that pull out numbers from data and store them for further plot-making. Let me take this through the example of image data in the lab. We have a large software toolset called rajlabimagetools that we use for analyzing raw data (and that has it's own whole set of design choices for reproducibility, but that's a story for another day). That stores, alongside the raw data, analysis files that contain things like spot counts and cell outlines and thresholds and so forth. The extraction scripts pull data from those analysis files and puts it into .csv files, which are stored in extractedData. For an analogy with sequencing, this is like maybe taking some form of RNA-seq data and setting up a table of TPM values in a .csv file. Or whatever, you get the point. plotScripts then contains all the actual plotting scripts. These load the .csv files and run whatever to make graphical elements (like a series of histograms or whatever) and stores them in the graphs folder. finalFigures then contains the Illustrator files in which we compile the individual graphs into figures. Along with each figure (like Fig1.ai), we have a Fig1readme.txt that describes exactly what .eps or .pdf files from the graphs folders ended up in, say, Figure 1f (and, ideally, what script). Thus, everything is traceable back from the figure all the way to raw data. Note: within the extractionScripts is a file called "extractAll.m" and in plotScripts "plotAll.R" or something like that. These master scripts basically pull all the data and make all the graphs, and we rerun these completely from scratch right before submission to make sure nothing changed. Incidentally, of course, each of the folders often has a massive number of subfolders and so forth, but you get the idea.

What are the tradeoffs that led us to this workflow? First off, why did we separate things out this way? Back when I was a postdoc (yes, I've been doing various forms of this since 2007 or so), I tried to just arrange things by having a folder per figure. This seemed logical at the time, and has the benefit that the output of the scripts are in close proximity to the script itself (and the figure), but the problem was that figures kept getting endlessly rearranged and remixed, leading to endless tedious (and error-prone) rescripting to regain consistency. So now we just pull in graphical elements as needed. This makes things a bit tricky, since for any particular graph it's not immediately obvious what made that graph, but it's usually not too hard to figure out with some simple searching for filenames (and some verbose naming conventions).

The other thing is why have the extraction scripts separated from the plots? Well, in practice, the raw data is just too huge to distribute easily this way, and if it was all mushed together with the code and intermediates, it would be hard to distribute. But, at least in our case, the more important fact is that most people don't really care about the raw data. They trust that we've probably done that part right, and what they're most interested are the tables of extracted data. So this way, in the paper folder, we've documented how we pulled out the data along while keeping the focus on what most people will be most interested in.

[End of nitty gritty here.]

And then, of course, figure scripting, the topic that brought this whole thing up in the first place. A few thoughts. I get that in principle, scripting is great, because it provides complete documentation, and also because it potentially cuts down on errors. In practice, I think it's hard to efficiently make great figures this way, so we've chosen perhaps a slightly more tedious and error prone but flexible way to make our figures. We use scripts to generate PDFs or EPSs of all relevant graphical elements, typically not spending time to optimize even things like font size and so forth (mostly because all of those have to change so many times in the end anyway). Yes, there is a cost here in terms of redoing things if you end up changing the analysis or plot. Claus Wilke argued that this discourages people from redoing plots, which I think has some truth to it. At the same time, I think that the big problem with figure scripting is that it discourages graphical innovation and encourages people to use lazy defaults that usually suffer from bad design principles—indeed, I would argue it's way too much work currently to make truly good graphics programmatically. Take this example:



Or imagine writing a script for this one:


Maybe you like or don't like these type of figures, but either way, not only would it take FOREVER to write up a script for these (at least for me), but by the time you've done it, you would probably never build up the courage to remix these figures the dozen or so times we've reworked this one over the course of publication. It's just faster, easier, and more intuitive to do with a tool for, you know, playing with graphical elements, which I think encourages innovation. Also, many forms of labeling of graphs that reduce cognitive burden (like putting text descriptors directly next to the line or histogram that they label) are much easier in Illustrator and much harder to do programmatically, so again, this works best for us. It does also, however, introduce a human element for error, and that has happened to us, although I should say that programmatic figures are a typo away from errors as well, and that's happened, too. There is also the option to link figures, and we have done that with images in the past, but in the end, relying on Illustrator to find and maintain links as files get copied around just ended up being too much of a headache.

Note that this is how we organize final figures, but what about exploratory data analysis? In our lab, that ends up being a bit more ad-hoc, although some of the same principles apply. Following the full strictures for everything can get tedious and inhibitory, but one of the main things we try and encourage in the lab is keeping a computational lab notebook. This is like an experimental lab notebook, but, uhh, for computation. Like "I did this, hoped to see this, here's the graph, didn't work." This has been, in practice, a huge win for us, because it's a lot easier to understand human descriptions of a workflow than try and read code, especially after a long time and double especially for newcomers to the lab. Note: I do not think version control and commit messages serve this purpose, because version control is trying to solve a fundamentally different problem than exploratory analysis. Anyway, talked about this computational lab notebook thing before, should write something more about it sometime.

One final point: like I said, one of the main benefits to these sorts of workflows is that they help minimize mistakes. That said, mistakes are going to happen. There is no system that is foolproof, and ultimately, the results will only be as trustworthy as the practitioner is careful. More on that in another post as well.

Anyway, very interested in what other people's workflows look like. Almost certainly many ways to skin the cat, and curious what the tradeoffs are.

Sunday, July 30, 2017

Can we measure science?

I was writing a couple grants recently, some with page limits and some with word limits. Which of course got me thinking about the differences in how to game these two constraints. If you have a word limit, you definitely don’t want to use up your limit on a bunch of little words, which might lead to a bit more long-wordiness. With the page limit, though, you spend endless time trying to use shorter words to get that one pesky paragraph one little line shorter (and hope the figures don’t jump around). Each of these constraints has its own little set of games we play trying to obey the letter of the law while seemingly breaking its spirit. But here’s the thing: no amount of "gaming the system" will ever allow me to squeeze a 10 page grant into 5 pages. While there’s always some gamesmanship, in the end, it is hard to break the spirit of the metric, at least in a way that really matters. [Side note, whoever that reviewer was who complained that I left 2-3 inches of white space at the end of my last NIH grant, that was dumb—and yes, turns out the whole method does indeed work.]

I was thinking about this especially in the context of metrics in science, which is predicated on the idea that we can measure science. You know, things like citations and h-index and impact factor and RCR (NIH’s relative citation ratio) and so forth. All of which many (if not most) scientists these days declare as being highly controversial and without any utility or merit—"Just read the damn papers!" is the new (and seemingly only) solution to everything that ails science. Gotta say, this whole thing strikes me as surprisingly unscientific. I mean, we spend our whole lives predicated on the notion that carefully measuring things is the way to understand the world around us, and yet as soon as we turn the lens on ourselves, it’s all “oh, it’s so horribly biased, it’s a popularity contest, all these metrics are gamed, it’s there’s no way to measure someone’s science other than just reading their papers. Oh, and did I mention that time so and so didn’t cite my paper? What a jerk.” Is everyone and every paper a special snowflake? Well, turns out you can measure snowflakes, too (Libbrecht's snowflake work is pretty cool, BTW 1 2).

I mean, seriously, I think most of us wish we had the sort of nice quantitative data in biology that we have with bibliometrics. And I think it’s reasonably predictive as well. Overall, better papers end up with more citations, and I would venture to say that the predictive power is better than most of what we find in biology. Careers have certainly been made on worse correlations. But, unlike the rest of biomedical science, any time someone even insinuates that metrics might be useful, out come the anecdotes:
  • “What about this undercited gem?” [typically one of your own papers]
  • “What about this overhyped paper that ended up being wrong?” [always someone else’s paper]
  • “What about this bubble in this field?” [most certainly not your own field]
Ever see the movie “Minority Report”, where there are these trio of psychics that can predict virtually every murder, leading to a virtually murder-free society? And it’s all brought down because of a single case the system gets wrong about Tom Cruise? Well, sign me up for the murder-free society and send Tom Cruise to jail, please. I think most scientists would agree that self-driving cars will lead to statistically far fewer accidents than human-driven cars, and so even if there’s an accident here and there, it’s the right thing to do. Why doesn’t this rational approach translate to how we think about measuring the scientific enterprise?

Some will say these metrics are all biased. Like, some fields are more hot than others, certain types of papers get more citations, and so forth. Since when does this mean we throw our hands up in the air and just say “Oh well, looks like we can’t do anything with these data!”? What if we said, oh, got more reads with this sequencing library than that sequencing library, so oh well, let’s just drop the whole thing? Nope, we try to correct and de-bias the data. I actually think NIH did a pretty good job of this with their relative citation ratio, which generally seems to identify the most important papers in a given area. Give it a try. (Incidentally, for those who maintained that NIH was simplistic and thoughtless in how it was trying to measure science during the infamous "Rule of 21" debate, I think this paper explaining how RCR works belies that notion. Let's give these folks some credit.)

While I think that citations are generally a pretty good indicator, the obvious problem is that for evaluating younger scientists, we can't wait for citations to accrue, which brings us to the dreaded Impact Factor. The litany of perceived problems with impact factor is too long and frankly too boring to reiterate here, but yes, they are all valid points. Nevertheless, the fact remains that there is a good amount of signal along with the noise. Better journals will typically have better papers. I will spend more time reading papers in better journals. Duh. Look, part of the problem is that we're expecting too much out of all these metrics (restriction of range problem). Here's an illustrative example. Two papers published essentially simultaneously, one in Nature and one in Physics Review Letters, with essentially the same cool result: DNA overwinds when stretched. As of this writing, the Nature paper has 280 citations, and the PRL paper has 122. Bias! The system is rigged! Death to impact factor! Or, more rationally, two nice papers in quality journals, both with a good number of citations. And I'm guessing that virtually any decent review on the topic is going to point me to both papers. Even in our supposedly quantitative branch of biology, aren't we always saying "Eh, factor of two, pretty much the same, it's biology…"? Point is, I view it as a threshold. Sure, if you ONLY read papers in the holy triumvirate of Cell, Science and Nature, then yeah, you're going to miss out on a lot of awesome science—and I don't know a single scientist who does that. (It would also be pretty stupid to not read anything in those journals, can we all agree to that as well?) And there is certainly a visibility boost that comes with those journals that you might not get otherwise. But if you do good work, it will more often than not publish well and be recognized.

Thing is that we keep hearing these "system is broken" anecdotes about hidden gems while ignoring all the times when things actually work out. Here's a counter-anecdote from my own time in graduate school. Towards the end of my PhD, I finally wrapped up my work on stochastic gene expression in mammalian cells, and we sent it to Science, Nature and PNAS (I think), with editorial rejections from all three (yes, this journal shopping is a demoralizing waste of time). Next stop was PLoS Biology, which was a pretty new journal at the time, and I remember liking the whole open access thing. Submitted, accepted, and then there it sat. I worked at a small institute (Public Health Research Institute), and my advisor Sanjay Tyagi, while definitely one of the most brilliant scientists I know, was not at all known in the single cell field (which, for the record, did actually exist before scRNA-seq). So nobody was criss-crossing the globe giving talks at international conferences on this work, and I was just some lowly graduate student. And yet even early on, it started getting citations, and now 10+ years later, it is my most cited primary research paper—and, I would say, probably my most influential work, even compared to other papers in "fancier" journals. And, let me also say that there were several other similar papers that came out around the same time (Golding et al. Cell 2005, Chubb et al. Curr Biol 2006, Zenklusen and Larson et al. Nat Struct Mol Bio 2008), all of which have fared well over time. Cool results (at least within the field), good journals, good recognition, great! By the way, I can't help but wonder if we had published this paper in the hypothetical preprint-only journal-less utopia that seems all the rage these days, would anyone have even noticed, given our low visibility in the field?

So what should we do with metrics? To be clear, I'm not saying that we should only use metrics in evaluation, and I agree that there are some very real problems with them (in particular, trainees' obsession with the fanciest of journals—chill people!). But I think that the judicious use of metrics in scientific evaluation does have merit. One area I've been thinking about is more nefarious forms of bias, like gender and race, which came up in a recent Twitter discussion with Anne Carpenter. Context was whether women face bias in citation counts. And the answer, perhaps unsurprisingly, is yes—check out this careful study in astrophysics (also 1 2 with similar effects). So again, should we just throw our hands up and say "Metrics are biased, let's toss them!"? I would argue no. The paper concludes that the bias in citation count is about 10% (actually 5% raw, then corrected to 10%). Okay, let's play this out in the context of hiring. Let's say you have two men, one with 10% fewer citations than the other. I'm guessing most search committees aren't going to care much whether one has 500 cites on their big paper instead of 550. But now let's keep it equal and put a woman's name on one of the applications. Turns out there are studies on that as well, showing a >20% decrease in hireability, even for a technician position, and my guess is that this would be far worse in the context of faculty hiring. I've know of at least two stories of people combating bias—effectively, I might add—in these higher level academic selection processes by using hard metrics. Even simple stuff like counting the number of women speakers and attendees at a conference can help. Take a look at the Salk gender discrimination lawsuit. Yes, the response from Salk about how the women scientists in question had no recent Cell, Science, or Nature papers or whatever is absurd, but notice that the lawsuits themselves mention various metrics: percentages, salary, space, grants, not to mention "glam" things like being in the National Academies as proxies for reputation. Don't these hard facts make their case far stronger and harder to dismiss? Indeed, isn't the fact that we have metrics to quantify bias critical here? Rather than saying "citations are biased, let's not use them", how about we just boost women's cites by 10% in any comparison involving citations, adjusting as new data comes in?

Another interesting aspect of the metric debate is that people tend to use them when it suits their agenda and dismiss them when they don't. This became particularly apparent in the Rule of 21 debate, which was cast as having two sides: those with lots of grants and seemingly low per dollar productivity per Lauer's graphs, and those with not much money and seemingly high per dollar productivity. At the high end were those complaining that we don't have a good way to measure science, presumably to justify their high grant costs because the metrics fail to recognize just how super-DUPER important their work is. Only to turn around and say that actually, upon reanalysis, their output numbers actually justify their high grant dollars. So which is it? On the other end, we have the "riff-raff" railing against metrics like citation counts for measuring science, only to embrace them wholeheartedly when they show that those with lower grant funding yielded seemingly more bang for the buck. Again, which is it? (The irony is that the (yes, correlative) data seem to argue most for increasing those with 1.5 grants to 2.5 or so, which probably pleases neither side, really.)

Anyway, metrics are flawed, data are flawed, methodologies are flawed, that's all of science. Nevertheless, we keep at it, and try to let the data guide us to the truth. I see no reason that the study of the scientific enterprise itself should be any different. Oh, and in case I still have your attention, you know, there's this one woefully undercited gem from our lab that I'd love to tell you about… :)

Tuesday, July 4, 2017

A system for paid reviews?

Some discussion on the internet about how slow reviews have gotten and how few reviewers respond, etc. The suggestion floated was paid review, something on the order of $100 per review. I have always found this idea weird, but I have to say that I think review times have gotten bad enough that perhaps we have to do something, and some economists have some research showing that paid reviews speed up review.

In practice, lots of hurdles. Perhaps the most obvious way to do this would be to have journals pay for reviews. The problem would be that it would make publishing even more expensive. Let's say a paper gets 6-9 reviews before getting accepted. Then in order for the journal to be made whole, they'd either take a hit on their crazy profits (haha!), or they'd pass that along in publication charges.

How about this instead? When you submit your paper, you (optionally) pay up front for timely reviews. Like, $300 extra for the reviews, on the assumption that you get a decision within 2 weeks (if not, you get a refund). Journal maybe can even keep a small cut of this for payment overhead. Perhaps a smaller fee for re-review. Would I pay $300 for a decision within 2 weeks instead of 2 months? Often times, I think the answer would be yes.

I think this would have the added benefit of people submitting fewer papers. Perhaps people would think a bit harder before submitting their work and try a bit harder to clean things up before submission. Right now, submitting a paper incurs an overhead on the community to read, understand and provide critical feedback for your paper at essentially no cost to the author, which is perhaps at least part of the reason the system is straining so badly.

One could imagine doing this on BioRxiv, even. Have a service where authors pay and someone commissions paid reviews, THEN the paper gets shopped to journals, maybe after revisions. Something was out there like this (Axios Review), but I guess it closed recently, so maybe not such a hot idea after all.

Thoughts?

Friday, June 30, 2017

#overlyhonestauthorcontributions

___ toiled over ridiculous reviewer experiments for over a year for the honor of being 4th author.

___ did all the work but somehow ended up second author because the first author "had no papers".

___ told the first author to drop the project several times before being glad they themselves thought of it.

___ was better to have as an author than as a reviewer.

___ ceased caring about this paper about 2 years ago.

Nobody's quite sure why ___ is an author, but it seems weird to take them off now.

___ made a real fuss about being second vs. third author, so we made them co-second author, which only serves to signal their own utter pettiness to the community.

Friday, May 5, 2017

Just another Now-that-I'm-a-PI-I-get-nothing-done day

Just had another one of those typically I-got-nothing-done days. I’m sure most PIs know the feeling: the day is somehow over, and you’re exhausted, and you feel like you’ve got absolutely nothing to show for it. Like many, I've had more of these days than I'd care to count, but this one was almost poetically unproductive, because here I am at the end of the day, literally staring at the same damn sentence I’ve been trying to write since the morning.

Why the case of writer's block? Because I spent today like most other work days: sitting in the lab, getting interrupted a gazillion times, not being able to focus. I mean, I know what I should do to get that sentence written. I could have worked from home, or locked myself in my office, and I know all the productivity rules I violate on a routine basis. But then I thought back on what really happened today…

Arrived, sat down, opened laptop, started looking at that sentence. Talked with Sydney about strategy for her first grant. Then met with Caroline to go over slides for her committee meeting—we came up with a great scheme for presenting the work, including some nice schematics illustrating the main points. Went over some final figure versions from Eduardo, which were greatly improved from the previous version, and also talked about the screens he’s running (some technical problems, but overall promising). And also, Eduardo and I figured out the logic needed for writing that cursed sentence. Somewhere in there, watched Sara hit submit on the final revisions for her first corresponding author paper! Meanwhile, Ian’s RNATag-seq data is looking great, and the first few principal components are showing exactly what we want. Joked around with Lauren about some mistake in the analysis code for her images, and talked about her latest (excellent) idea to dramatically improve the results. Went to lunch with good friend and colleague John Murray, talked about kids and also about a cool new idea we have brewing in the lab; John had a great idea for a trick to make the data even cooler. Chris dragged me into the scope room because the CO2 valve on the live imaging setup was getting warm to the touch, probably because CO2 had been leaking out all over the place because a hose came undone. No problem, I said, should be fine—and glad nobody passed out in the room. Uschi showed me a technical point in her SNP FISH analysis that suggests we can dramatically reduce our false-positive rate, which is awesome (and I’m so proud of all the coding she’s learned!). I filled our cell dewar with liquid nitrogen for a while, looks like it’s fully operational, so can throw away the return box. Sydney pulled me into the scope room to look at this amazing new real-time machine learning image segmentation software that Chris had installed. Paul’s back in med school, but dropped by and we chatted about his residency applications for a bit. While we were chatting, Lauren dropped off half a coffee milkshake I won in a bet. Then off to group meeting, which started with a spirited discussion about how to make sure people make more buffers when we run out, after which Ally showed off the latest genes she’s been imaging with expansion microscopy, and Sareh gave her first lab meeting presentation (yay!) on gene induction (Sara brought snacks). Then collaborators Raj and Parisha stayed for a bit after group meeting to chat about that new idea I’d talked about with John—they love the idea, but brought up a major technical hurdle that we spent a while trying to figure out (I think we’ll solve it, either with brains or brute force). And then, sat down, stared at that one half-finished sentence again, only to see that it was time to bike home to deal with the kids.

So yeah, an objective measure of the day would definitely be, hey, I was supposed to write this one sentence, and I couldn’t even get that done. But all in all, now that I think about it, it was a pretty great day! I think PIs often lament their lack of time to think, reminiscing about the Good Old Days when we had time to just focus on our work with no distractions, that we maybe forget about how lucky we are to have such rich lives filled with interesting people doing interesting things.

That said, that sentence isn’t going to write itself. Hmm. Well, maybe if I wait long enough…

Wednesday, May 3, 2017

Quick take on NIH point scale: will this shift budget uncertainty to the NIH?

Just heard about the new NIH point scale, and was puzzling through some of the implications. First, quick summary: NIH, in an effort to split the pie more evenly, is implementing a system in which each grant you have is assigned a point value, and you are capped at 21 points (3 R01 equivalents). Other grants are worth less. The consequences of this are of course vast, and I'm assuming most of this is going to be covered elsewhere. I'll just say that I do think some labs are just plain overfunded, so this will probably help with that. Also, it's clear from the point breakdown that some things are incentivized and disincentivized, which probably has some pluses and minuses.

Anyway, I did start wondering about what life would be like for a big lab working with 3 R01s. One of the realities of running such a lab is budget uncertainty. I remember early on when I started at Penn, a (very successful) senior faculty member took me to lunch and was talking about funding and said, "Jeez, my lab is too big, and I've been thinking about how I got here. Thing is you have a grant expiring and you want to replace it, so you have to submit 3 grants hoping that one will come in, but then maybe you get 2 or even all 3, and now you have to spend the money, and your lab gets too big." Clearly, this is bad, and the new system will really help with that. I guess what will happen is that if you get those 3 grants, then you will only take one of them. And, you may have to give back the rest of the grant you already have so that you don't go over 21. Think about this now from the point of view of the NIH: you're going to have money coming back that you didn't expect, and grants not funded that you thought would be funded. The latter is I suppose easy to deal with (just give it to someone else), but I wouldn't be surprised if the former might cause some budgetary problems. Basically, the fluctuations in funding would shift from the PIs to the NIH. Which I think is on balance a good thing. It makes a lot more sense to have NIH manage a large pool of uncertainty in funding than to have individual scientists try and manage crazy step function changes in funding, which will hopefully allow scientists to have more certainty on how much money to expect moving forward. Nice. But maybe I haven't thought through all the angles here.

Saturday, April 22, 2017

What will happen when we combine replication studies with positive-result bias?

Just read a nice blog post from Stephen Heard about replicability vs. robustness that I really agree with. Basically, the idea under discussion is how much effort we should devote to exactly repeating experiments (narrow robustness) vs. the more standard way of doing science, which is everyone does their own version to see whether the result holds more generally (broad robustness). In my particular niche of molecular biology, I think most (though definitely not all, you know who you are!) errors are those of judgement rather than technical competence/integrity, and so I think most exact replication efforts are a waste of time, an argument which many other have made as well.

In the comments, some people arguing for more narrow replication studies made the point that very little (~0%) of our current research budget is devoted to explicitly to replication. Which got me wondering: what might happen if we suddenly funded a lot of replication studies?

In particular, I worry about positive-result bias. Positive-result bias is basically the natural human desire to find something new: our expectation is X, but instead we found Y. Hooray, look, new science! Press release, please! :)

Now what happens when when we start a bunch of studies with the explicit mandate to replicate a previous study? Here, the expectation is now what was already found and so positive-result bias would bias towards a refutation. I mean, let’s face it, people want to do something interesting and new that other people care about. The cancer reproducibility project in eLife provides an interesting case study: most of the press around the publication was about how the results were “muddy”, and I definitely saw a great deal more interest in what didn’t replicate than what did.

Look, I’m not saying that scientists are so hungry for attention that most, or even more than a few, would consciously try to have a replication fail (although I do wonder about that eLife replication paper that applied what seemed to be overly stringent statistical criteria in order to say something did not replicate). All I’m saying is the same hype incentives that we complain about are clearly aligned with failed replication results, and so we should be just as critical and vigilant about them.

As for apportionment of resources towards replication, I think that setting aside the question as to whether it’s a good use of money from the scientific perspective (I, like others, would argue largely not), there’s also the question of whether it’s a good use of human resources. Having a student or postdoc work on a replication study for years during their training period is not, I think, a good use of their time, and keeps them from the more valuable training experience of actually, you know, doing their own science—let alone robbing them of the thrill of new discovery. Perhaps such studies are best left to industry, which is where I believe they already largely reside.

Saturday, April 8, 2017

The hater’s guide to (experimental) reproducibility

(Thanks to Caroline Bartman and Lauren Beck for discussions.)

Okay, before I start, I just want to emphasize that my lab STRONGLY supports computational reproducibility, and we have released data + code (code all the way from raw data to figures) for all papers primarily from our lab for quite some time now. Just sayin’. We do it because a. we can; b. it enforces a higher standard within the lab; c. on balance, it’s the right thing to do.

All right, that said, I have to say that I find, like many others, the entire conversation about reproducibility right now to be way off the rails, mostly because it’s almost entirely dominated by the statistical point of view. My opinion is that this is totally off base, at least in my particular area of quantitative molecular biology; like I said before, “If you think that github accounts, pre-registered studies and iPython notebooks will magically solve the reproducibility problem, think again.” Yet, it seems that this statistically-dominated perspective is not just a few Twitter people sounding off about Julia and Docker. This "science is falling apart" story has taken hold in the broader media, and the fact that someone like Ioannidis was even being mentioned for director of NIH (!?) shows how deeply and broadly this narrative has taken hold.

Anyway, I won’t rehash all the ways I find this annoying, wrongheaded and in some ways dangerous, I’ll just sum up by saying I’m a hater. But like all haters, deep down, my feelings are fueled by jealousy. :) Jealousy because I actually deeply admire the fact that computational types have spent a lot of time thinking about codifying best practices, and have developed a culture and sense of community standards that embodies those practices. And while I do think that a lot of the moralistic grandstanding from computational folks around these issues is often self-serving, that doesn’t mean that talking about and encouraging computational/statistical reproducibility is a bad thing. Indeed, the fact that statisticians dominate the conversation is not their fault, it’s ours: why is there no experimental equivalent to the (statistical/computational) reproducibility movement?

So first off, the answer is that there is, with lists of validated antibodies and an increased awareness of things like cell line and mycoplasma contamination and so forth. That is all great, but in my experience, these things journals make you check are not typically the reasons for experimental irreproducibility. Fundamentally, these efforts suffer from what I consider a “checklist problem”, which is the idea that reproducibility can be codified into a simple, generic checklist of things. Like, the thought is that if I could just check off all the boxes on mycoplasma and cell identification and animal protocols, then my work would be certified as Reproducible™. This is not to say that we shouldn’t have more checklists (see below), but I just don’t think it’s going to solve the problem.

Okay, so if simplistic checklists aren’t the full solution, then what is? I think the crux of the issue actually comes back to a conversation we had with the venerable Warren Ewens a while back about how to analyze some data we were puzzling over, and he said something to the effect of “There are all these statistical tests we can think about, but it also has to pass the smell test.” This resonated with me, because I realize that that at least some of us experimentalists DO teach reproducibility, but it’s more of an experiential learning to try and impart an intuitive sense of what discrepancies to ignore and which to lose sleep over. In particular in molecular biology, where our tools are imprecise and the systems are (hopelessly?) complex, this intuition is, in my opinion, the single most skill we can teach our trainees.

Thing is, some do a much better job of teaching this intuition than others. I think that where we can learn from the computational/statistical reproducibility movement is to try and at least come up with some general principles and guidelines for enhancing the quality of our science, even if they can’t be easily codified. And within a particular lab, I think there are some general good practices, and maybe it’s time to have a more public discussion about them so that we can all learn from each other. So, with all that in mind, here’s our attempt to start a discussion with some ideas for experimental reproducibility, ranging from day-to-day to big picture:
  1. Keep an online lab notebook that is searchable with links to protocols and is easily shared with other lab members.
  2. Organize protocols in an online doc that allows for easy sharing and commenting. Avoid protocol "fragmentation"; if a variation comes up, spend the time to build that in as a branch point in the protocol. Otherwise, there will be protocol drift, and others may not know about new improvements.
  3. Annotate protocols carefully, explaining, where possible, which elements of the protocol are critical and why (and ideally have some documentation). This helps to avoid protocol cruft, where new steps get introduced and reified without reason. Often, leading a new trainee through a protocol is a good time to annotate, since it exposes all the unwritten parts of the protocol. Note: this is also a good way to explore protocol simplification!
  4. Catalog important lab-generated reagents (probes, plasmids, etc.) with unique identifiers and develop a system for labeling. In the lab, we have a system for labeling and cataloging probes, which helps us figure out post-facto what the difference is between "M20_probe_Cy3" and "M20_probe_Cy3_usethis". What is hard with this is to develop a system for labeling enforcement. Not sure how best to do this. My system is that I won't order any new probes for a person until all their probes are appropriately cataloged.
  5. Carefully track biologic reagents that are known to suffer from lot variability, including dates, lot numbers, etc. Things like matrigel, antibodies, R-spondin.
  6. Set up a system for documenting little experiments that establish a little factoid in the lab. Like "Oh, probe length of 30 works best for expansion microscopy based on XYZ…". These can be invaluable down the line, since they're rarely if ever published—and then turn from lab memory into lab lore.
  7. Journal length limits have led to a culture of very short and non-detailed methods, but there's this thing called the internet that apparently can store and share a lot of information. I think we need to establish a culture of publicly sharing detailed protocols, including annotating all the nuances and so forth. Check out this from Feng Zhang about CRISPR (we also have made an extensive single molecule RNA FISH page here).
  8. (Lauren) Track experiments in a log, along with all relevant (or even seemingly irrelevant) details. This could be, for instance, a big Google Doc with list of all similar types of experiments, pointing to where the data is kept, and critically, all the little details. These tabulated forms of lab notebooks can really help identify patterns in those little details, but also serve to show other members of the lab what details matter and that they should be attentive to.
  9. Along those lines, record all your failures, along with the type of failure. We've definitely had times when we could have saved a lot of time in the lab if we had kept track of that. SHARE FAILURES with others in the lab, especially the PI.
  10. (Caroline) Establish an objective baseline for an experiment working, and stick to it. Sort of like pre-registering your experiment, in a way. If you take data, what will allow you to say that it worked or didn't work. If it didn't work, is there a rationalization? If so, discuss with someone, including the PI, to make sure you aren't deluding yourself and just ignoring data you don't like. There are often good reasons to drop bits of data, and sometimes we make mistakes in our judgement calls, but at least get a second opinion.
  11. Develop lab-specific checklists. Every lab has it's own set of things it cares about and that people should check, like microscope light intensity or probe HPLC trace or whatever. Usually these are taught and learned through experience, but that strikes me as less efficient than it could be.
  12. Replicates: What constitutes a biological replicate? Is it the same batch of cells grown in two wells? Is it two separate passages of the same cell line? If so, separated by how much time? Or do you want to start each one fresh from a frozen vial? Whatever your system, it's important to come up with some ground rules for what replicates means, and then stick to it. I feel like one aspect of replication is that you don't want the conditions to be necessarily exactly the same, so a little variability is good. After all, that's what separates a biological replicate (which is really about capturing systematic but unknown variability) from a technical replicate (which is statistically variability).
  13. Have someone else take a look at your data without leading them too much with your hypothesis. Do they follow the same logic to reach the same conclusion? Many times, people fall so in love with their crazy hypothesis that they fail to see the simpler (and far more plausible) boring explanation instead. (Former postdoc Gautham Nair was so good at finding the simple boring explanation that we called it the "Gautham transform" in the lab!)
  14. Critically examine parts that don't fit in the story. No story is perfect, especially in molecular biology, which has a serious "everything affects everything" problem. Often times there is no explanation, and there's nothing you can really do about it. Okay, but resist the urge to sweep it under the rug. Sometimes there's new science in there!
  15. Finally, there is no substitute for just thinking long and hard about your work with a critical mindset. Everything else is just, like I said, a checklist, nothing more, nothing less.
Anyway, some thoughts, and I'm guessing most people already do a lot of this, implicitly or explicitly. We'd love to hear the probably huge list of other ideas people out there have for improving the quality/reproducibility of their science. Point is, let's have a public discussion so that everyone can participate!

On criticism

-by Caroline Bartman

Viewed in a certain light, grad school- all of scientific training- is a process of becoming a good critic. You need to learn to evaluate papers and grants either to make them better, to score/review them, or to try to expand your understanding of the field. However, there are many nuances to being a good critic that were never spelled out in my grad school classes, and that I still try to improve on all the time.

0. Seeing the bigger picture: What statement is the paper trying to make? How do you feel about THAT STATEMENT after reading it? Every paper has experiments with shortcomings or design flaws. Does the scientific light shine through in spite of that? Or are the authors over-interpreting the data? This is really the key to criticizing scientific work thoughtfully and productively.

1. Compassion: Especially important when evaluating the work of others. One person or group can only do so much, due to time, resources, and experimental considerations. When I was an undergrad never having written a paper, I would go to journal clubs and say things like ‘This was a good paper, but what really would have nailed it would be to use these three additional transgenic mouse strains.’ Not realistic! And devalues the effort that’s already represented in the paper. Before you ask for additional experiments, step back: would those really change the interpretation of the paper? Sometimes yes, often no (goes back to point 0).
Plus, consciously noting the good aspects of a paper or grant, and only pointing out limited, specific criticisms will make the author happier! So they will be more likely to adopt your suggestions, and in a way actually facilitates the science moving forward.

2. Balance: Comes into play when evaluating work that you would be predisposed to like- such as your own work! But also the work of well-known labs (aka fancy science). I often find myself cutting myself slack I wouldn’t give others. (‘That experiment is really just a control, so it’s a waste of time’, etc. ) Reviewers (and also my PIs, thanks Gerd and Arjun) won’t necessarily see your work in such a rosy light!
With fancy science, it’s easy to see that e.g. a statement made in a paper isn’t so well supported by the data, but say ‘They’re experts! They founded this field. They probably know what they’re doing.’ Sometimes true, but sometimes not. Would you feel the same way about the paper if it came from an unknown PI? Plus, a fancy lab actually has the best capacity and manpower to carry out the very best experiments with the newest tech! Maybe they should be subject to even harsher scrutiny in their papers.

3. Ignorance: I don’t really know if there’s a good name for this quality. Maybe comfort with uncertainty? You are often called upon to evaluate papers or grants that aren’t in your sub-sub-sub field, and that can instill doubts. Yes, you have to recognize your possible lack of expertise. But you can still have valuable opinions! Ideally papers would be read by scientists outside the immediate field, and help inform their thinking. Plus, while technologies differ, scientific reasoning is pretty much constant. So if an experiment or a logical progression doesn’t make sense, you can say something. The worst thing that could happen is someone tells you you’re wrong.

Grad school tends to instill the idea that knowledge is the primary quality required to evaluate scientific work. Partially because young trainees do indeed need to amass some body of understanding in order to ‘get’ the field and make comments. But knowledge is really not enough, and sometimes (point 3) not even necessary!

Comment if you have more ideas on requirements for a good scientific critic!

Sunday, April 2, 2017

Nabokov, translated for academia

Nabokov: I write for my pleasure, but publish for money.
Academia: I write for your pleasure, but pay money to publish.

More specifically…

Undergrad: I don’t know how to write, but please let me publish something for med school.
Grad student: I write my first paper draft for pleasure, but my thesis for some antiquated notion of scholarship.
Postdoc: I write "in press" with pleasure, but "in prep" for faculty applications.
Editor: You write for my pleasure, but these proofs gonna cost you.
SciTwitter: I write preprints for retweets, but tweet cats/Trump for followers.
Junior PI: I write mostly out of a self-imposed sense of obligation, but publish to try and get over my imposter syndrome.
Mid-career PI: I say no to book chapters (finally (mostly)), but publish to see if anyone is still interested.
Senior PI: I write to explain why my life’s work is under-appreciated, but give dinner talks for money.

Sunday, March 12, 2017

I love Apple, but here are a few problems

First off, I love Apple products. I’ve had only Apple computers for just about 2 decades, and have been really happy to see their products evolve in that time from bold, renegade items to the refined, powerful computers they are today. My lab is filled with Macs, and I view the few PCs that we have to use to run our microscopes with utter disdain. (I’m sort of okay with the Linux workstations we have for power applications, but they honestly don’t get very much use and they’re kind of a pain.)

That said, lately, I’ve noticed a couple problems, and these are not just things like “Apple doesn’t care about Mac software reliability” or “iTunes sucks” or whatever. These are fundamental bets Apple has made, one in hardware and one in software, that I think are showing signs of being misplaced. So I wrote these notes on the off chance that somehow, somewhere, they make their way back to Apple.

One big problem is that Apple’s hardware has lost its innovative edge, mostly because Apple seems disinclined to innovate for various reasons. This has become plainly obvious by watching the undergraduate population at Penn over the last several years. A few years ago, it used to be that a pretty fair chunk of the undergrads I met had MacBook Airs. Like, a huge chunk. It was essentially the standard computer for young people. And rightly so: it was powerful (enough), lightweight, not too expensive, and the OS was clean and let you do all the things you needed to do.

Nowadays, not so much. I’m seeing all these kids with the Surfaces and so forth that are real computers, but with a touch screen/tablet “mode” as well. And here’s the thing: even I’m jealous. Now, I’m not too embarrassed to admit that I have read enough Apple commentary on various blogs to get Apple’s reasons for not making such a computer. First off, Apple believes that most casual users, perhaps including students, should just be using iPads, and that iOS serves their needs while providing the touch/tablet interface. Secondly, they believe that the touch interface has no place, both ergonomically or in principle, on laptop and desktop Macs. And if you’re one of the weird people who somehow needs a touch interface and full laptop capabilities, you should buy both a Mac and an iPad. I’m just realizing now that Apple is just plain wrong on this.

Why don’t I see students with iPads, or an iPad Pro instead of a computer? The reality is that, no matter how much Apple wants to believe it and Apple fans want to rationalize it (typically for “other people”), iOS is just not useful for doing a lot of real work. People want filesystems. People want to easily have multiple windows open, and use programs that just don’t exist on iOS (especially students who may need to install special software for class). The few people I know who have iPad Pros are those who have money to burn on having an iPad Pro as an extra computer, but not as a replacement. The ONLY person I know who would probably be able to work exclusively or even primarily with an iPad is my mom, and even she insists on using what she calls a “real” computer (MacBook Pro).

(Note about filesystems: Apple keeps trying to push this “post-filesystem” world on us, and it just isn’t taking. Philosophical debates aside, here’s a practical example: Apple tried to make people switch away from using “Save As…” to a more versioned system more compatible with the iOS post-filesystem mindset, with commands like “Revert” and “Duplicate”. I tried to buy in, I really did. I memorized all the weird new keyboard shortcuts and kept saying to myself “it’ll become natural any day now”. Never did. Our brains just don’t work that way. And it’s not just me: honestly, I’m the only one in my lab who even understands all this “Duplicate” “Revert” nonsense. The rest of them can’t be bothered—and mostly just use other software without this “functionality” and… Google Drive.)

So you know what would be nice? Having a laptop with a tablet mode/touch screen! Apple’s position is it’s an interface and ergonomic disaster. It’s hard to use interface elements with touch, and it’s hard to use a touch screen on a vertical laptop screen. There are merits to these arguments, but you know what? I see these kids writing notes freehand on their computer, and sketching drawings on their computer, and I really wish I could do that. And no, I don’t want to lug around an iPad to do that and synchronize with my Mac via their stupid janky iCloud. I want it all in one computer. The bottom line is that Surface is cool. Is it as well done as Apple would do it? No. But it does something that I can’t do on an Apple, and I wish I could. Apple is convinced that people don’t want to do those things, and that you shouldn’t be able to do those things. The reality seems to be that people do want to do those things and that it’s actually pretty useful for them. Apple’s mistake is thinking that the reason people bought Apples was for design purity. We bought Apples because they had design functionality. Sometimes these overlap, which has been part of Apple’s genius over the last 15 years, and so you can mistake one for the other. But in the end, a computer is a tool to do things I need.

Speaking of which, the other big problem that Apple has is its approach to cloud computing. I think it’s pretty universally acknowledged that Apple’s cloud computing efforts suck, and I won’t document all that here. Mostly, I’ve been trying to understand exactly why, and I think that the fundamental problem is that Apple is thinking synchronize while everyone else is thinking synchronous. What does that mean? Apple’s is stuck in an “upload/download” (i.e., synchronize) mindset from ten years ago while everyone else has moved on to a far more seamless design in which the distinction between cloud and non-cloud is largely invisible. And whatever attempts Apple has made to move to the latter have been pretty poorly executed (although that at least gives hope that they are thinking about it).

Examples abound, and they largely manifest as irritations in using Apple’s software. Take, for example, something as simple as the Podcast App in the iPhone, which I use every day when I bike to work (using Aftershokz bone conduction headphones, suhweet, try them!). If I didn’t pre-download the next podcast, half the time, it craps out when it gets to the next episode in my playlist, even though I have cell service the whole way. Why? Because when it gets there, it waits to download the next one before playing, and sometimes gets mixed up during the download. So I end up trying to remember to pre-download them. And then I have to watch space with all the downloads, making sure the app removes the downloads. Why am I even thinking about this nowadays? Why can’t it just look at my playlist and make them play seamlessly? Upload/download is an anachronism from a time of synchronize when most things are moving to synchronous.

Same with AppleTV (sucks) compared to Netflix on my computer, or Amazon on my computer, or HBO, or whatever. They just work without me having to thinking about the pre-download of the whatever before the movie can start.

I suppose there was a time when this was important for when you were offline. Whatever, I’m writing this in a Google Doc on an airplane without WiFi. And when I get back online, it will all just merge up seamlessly. With careful thought, it can be done. (And yes, I am one of the 8 people alive who has actually used Pages on the web synchronized with Pages on the Mac—not quite there yet, sorry.)

To its credit, I think Apple does sort of get the problem, belatedly. Problem is that when they have tried synchronous, it’s not well done. Take the example of iCloud Photos or whatever the hell they call it. One critical new feature that I was excited about was that it will sense if you’re running out of space on your device and then delete local copies of old photos, storing just the thumbnails. All your photos accessible, but using up only a bit of space, sounds very synchronous! Problem is that as currently implemented, I have only around 150MB free on my Phone and ~1+ GB of space used by Photos. Same on my wife’s MacBook Pro: not a lot of HD space, but Photos starts doing this cloud sync only when things are already almost completely full. The problem is that Apple views this whole system as a backup measure to kick in only in emergencies, when if they bought into the mentality completely, Photos on my computer would take up only a small fraction of the space it does, freeing up the rest of the computer for everything else I need it to do (you know, with my filesystem). Not to mention that any synchronization and space freeing is completely opaque and happens seemingly at random, so I never trust it. Again, great idea, poor execution.

Anyway, I guess this was marginally more productive than doing the Sudoku in back of United Magazine, but not particularly so, so I’ll stop there. Apple, please get with it, we love you!

Sunday, February 19, 2017

Results from the Guess the Impact Factor Challenge

Results from the Guess the Impact Factor Challenge

By Uschi Symmons and Arjun Raj

tl;dr: We wondered if people could guess the impact factor of the journal a paper was published in by its title. The short answer is not really. The longer answer is sometimes yes. The results suggest that talking about any sort of weird organism makes people think your work is boring, unless you’re talking about CRISPR. This begs the question of whether the people who took this quiz are cynical or just shallow. Much future research will be needed to make this determination.


Introduction:
[Arjun] This whole thing came out of a Tweet I saw:


It showed the title: “Superresolution imaging of nanoscale chromosome contacts”, and the beginning of the link: nature.com. Looking at the title, I thought, well, this sounds like it could plausibly be a paper in Nature, that most impacty of high impact journals (the article is actually in Scientific Reports, which is part of the Nature Publishing Group, which is generally considered to be low impact). This got Uschi and I thinking: could you tell what journal a paper went into by its title alone? Would you be fooled?

[Switching to Uschi and Arjun] By the way, although this whole thing is sort of a joke, we think it does hold some lessons for our glorious preprint based future, in which the main thing you have to go on is the title and the authors. Without the filter/recommendation role that current journals provide, will visibility in such a world be dominated by who the authors are and increasingly bombastic and hype-filled titles? (Not that that’s not the case already, but…)

To see if people could guess the impact factor of the journal a paper was published in solely based on the title we made up a little online questionnaire. More than 300 people filled out the questionnaire—and here are the results.

Methodology:
Our methodology was cooked up in an hour or two discussing by Slack, and has so many flaws it’s hard to enumerate them all. But we’ll try and hit the highlights in the discussion. Anyway, here’s what we did: we chose journals with a range of impact factors, three each in the high, medium, and low categories (>20, 8-20, <8, respectively). We tried to pick journals that would have papers with a flavor that most of our online audience would find familiar. We then chose two papers from each journal, picked from a random issue around December 2014/January 2015. The idea was to pick papers that have maybe receded from memory (and also have accumulated some citation statistics, reported as of Feb. 13, 2017), but not so long ago that the titles would be misleading or seem anachronistic. We picked the paper titles pretty much at random: picked an issue/did a search by date and basically just picked the first paper from the list that was in this area of biomedical science. The idea here was to avoid bias, so there was no attempt to pick “tricky” titles. There was one situation where we looked at an issue of Molecular Systems Biology and the first couple titles had colons in them, which we felt were perhaps a giveaway that it was not high profile, so we picked another issue. Papers and journals given in the results below.

The questionnaire itself presented the titles in random order and asked for each whether it was high, medium, or low impact, based on the cutoffs of 0-8, 8-20, 20+. Answering each question was optional, and we asked people to not answer for any papers that they already knew. At least a few people followed that instruction. We posted the questionnaire on Twitter (Twitter Inc.) and let Google (Alphabet) do its collection magic.

Google response analysis here, code and data here.

Results:
In total, we got 338 responses, mostly within the first day or two of posting. First question: how good were people at guessing the impact factor of the journal? Take a look:



The main conclusion is that people are pretty bad at this game. The average score was around 42%, which was not much above random chance (33%). Also, the best anyone got was 78%. Despite this, it looks like the answers were spread pretty evenly between the three categories, which matches the actual distribution, so there wasn’t a bias towards a particular answer.

Now the question you’ve probably been itching for: how well were people able to guess the journal specific titles? The answer is that they were good for some and not so good for others. To quantify how well people did, we calculated a “Perception score”, which is the average score given to a particular title, with low = 1, medium = 2, high = 3. Here is a table with the results:


TitleJournalImpact factorPerception score
Single-base resolution analysis of active DNA demethylation using methylase-assisted bisulfite sequencingNature Biotechnology43.1132.34
The draft genome sequence of the ferret (Mustela putorius furo) facilitates study of human respiratory diseaseNature Biotechnology43.1131.88
Dietary modulation of the microbiome affects autoinflammatory diseaseNature38.1382.37
Cell differentiation and germ–soma separation in Ediacaran animal embryo-like fossilsNature38.1381.77
The human splicing code reveals new insights into the genetic determinants of diseaseScience34.6612.55
Opposite effects of anthelmintic treatment on microbial infection at individual versus population scalesScience34.6611.44
Dynamic shifts in occupancy by TAL1 are guided by GATA factors and drive large-scale reprogramming of gene expression during hematopoiesisGenome Research11.3512.11
Population and single-cell genomics reveal the Aire dependency, relief from Polycomb silencing, and distribution of self-antigen expression in thymic epitheliaGenome Research11.3511.81
A high‐throughput ChIP‐Seq for large‐scale chromatin studiesMolecular Systems Biology10.8722.22
Genome‐wide study of mRNA degradation and transcript elongation in Escherichia coliMolecular Systems Biology10.8722.02
Browning of human adipocytes requires KLF11 and reprogramming of PPARĪ³ superenhancersGenes and Development10.0422.15
Initiation and maintenance of pluripotency gene expression in the absence of cohesinGenes and Development10.0422.09
Non-targeted metabolomics and lipidomics LC–MS data from maternal plasma of 180 healthy pregnant womenGigaScience7.4631.55
Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitaeGigaScience7.4631.25
Asymmetric parental genome engineering by Cas9 during mouse meiotic exitScientific Reports5.2282.43
Dual sgRNA-directed gene knockout using CRISPR/Cas9 technology in Caenorhabditis elegansScientific Reports5.2282.25
A hyper-dynamic nature of bivalent promoter states underlies coordinated developmental gene expression modulesBMC Genomics3.8672.16
Transcriptomic and proteomic dynamics in the metabolism of a diazotrophic cyanobacterium, Cyanothece sp. PCC 7822 during a diurnal light–dark cycleBMC Genomics3.8671.25


In graphical form:

One thing really leaps out, which is the “bowtie” shape of this plot: while people, averaged together, tend to get medium-impact papers right, there is high variability in aggregate perception for the low and high impact papers. For the middle-tier, one possibility is that there is a bias towards the middle in general (like an “uh, dunno, I guess I’ll just put it in the middle” effect), but we didn’t see much evidence for an excess of “middle” ratings, so maybe people are just better at guessing these ones. Definitely not the case for the high and low end, though. The two titles apiece from Nature and Science had both high and low perceived impact. Also, the two Scientific Reports papers had very high perceived impact, presumably due to the fact that they have CRISPR in the title.

So what, if anything, makes a paper seem high or low impact? Here’s a table stratified by perceived impact factor, notice what all the low ones have in common?


TitleJournalImpact factorPerception score
The human splicing code reveals new insights into the genetic determinants of diseaseScience34.6612.55
Asymmetric parental genome engineering by Cas9 during mouse meiotic exitScientific Reports5.2282.43
Dietary modulation of the microbiome affects autoinflammatory diseaseNature38.1382.37
Single-base resolution analysis of active DNA demethylation using methylase-assisted bisulfite sequencingNature Biotechnology43.1132.34
Dual sgRNA-directed gene knockout using CRISPR/Cas9 technology in Caenorhabditis elegansScientific Reports5.2282.25
A high‐throughput ChIP‐Seq for large‐scale chromatin studiesMolecular Systems Biology10.8722.22
A hyper-dynamic nature of bivalent promoter states underlies coordinated developmental gene expression modulesBMC Genomics3.8672.16
Browning of human adipocytes requires KLF11 and reprogramming of PPARĪ³ superenhancersGenes and Development10.0422.15
Dynamic shifts in occupancy by TAL1 are guided by GATA factors and drive large-scale reprogramming of gene expression during hematopoiesisGenome Research11.3512.11
Initiation and maintenance of pluripotency gene expression in the absence of cohesinGenes and Development10.0422.09
Genome‐wide study of mRNA degradation and transcript elongation in Escherichia coliMolecular Systems Biology10.8722.02
The draft genome sequence of the ferret (Mustela putorius furo) facilitates study of human respiratory diseaseNature Biotechnology43.1131.88
Population and single-cell genomics reveal the Aire dependency, relief from Polycomb silencing, and distribution of self-antigen expression in thymic epitheliaGenome Research11.3511.81
Cell differentiation and germ–soma separation in Ediacaran animal embryo-like fossilsNature38.1381.77
Non-targeted metabolomics and lipidomics LC–MS data from maternal plasma of 180 healthy pregnant womenGigaScience7.4631.55
Opposite effects of anthelmintic treatment on microbial infection at individual versus population scalesScience34.6611.44
Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitaeGigaScience7.4631.25
Transcriptomic and proteomic dynamics in the metabolism of a diazotrophic cyanobacterium, Cyanothece sp. PCC 7822 during a diurnal light–dark cycleBMC Genomics3.8671.25

One thing is that the titles at the bottom seem to be longer, and that is born out quantitatively, although the correlation is perhaps not spectacular:




Any other features of the title? We looked at specificity (which was the sum of the times a species, gene name or tissue was mentioned), declarativeness (“RNA transcription requires RNA polymerase” vs. “On the nature of transcription”), and mention of a “weird organism”, which we basically defined as anything not human or mouse. Check it out:



Hard to say much about declarativeness (declariciousness?), not much data there. Specificity is similarly undersampled, but perhaps there is some tendency for medium impact titles to have more specific information than others? Weird organism, however, really showed an effect. Basically, if you want people to think you wrote a low impact paper, put axolotl or something in the title. Notably, for each of the high impact journals, we had 1 each perceived as high and low impact, and this “weird organism” metric explained that difference completely. The exception to this is, of course, CRISPR: indeed, the highest perceived low impact paper was CRISPR in C. elegans. Note that we also included E. coli as “weird”, although probably should not have.

We then wondered: does this perception even matter? Does it have any bearing on citations? So many confounders here, but take a look:


First off, where you publish clearly is clearly strongly associated with citations, regardless of how your title is perceived. Beyond that, it was murky. Of the high impact titles, the ones with high perception index definitely were cited more, but the n is small there, and the effect is not there for medium and low impact titles. So who knows.

Discussion:
Our conclusion seems to be that mid-tier journals publish things that sound like they should be in mid-tier journals, perhaps with titles with more specificity. Flashy and non-flashy papers (as judged by actual impact factor) both seem to be playing the same hype game, and some of them screw up by talking about a weird organism.

Anyway, before reading too much in into any of this, like we said in the methods section, there are lots of problems with this whole thing. First off, we are vastly underpowered: the total of 18 titles is nowhere near enough to get any real picture of anything but the grossest of trends. It would have been better to have a large number of titles and have the questionnaire randomly select 18 of them, but if we didn’t get enough responses, then we would not have had very good sampling for any particular title. Also, it would have been interesting to have more titles per journal, but we instead opted for more journals just to give a bit more breadth in that respect. Oh well. Some folks also mentioned that 8 is a pretty aggressive cutoff for “low impact”, and that’s probably true. Perception of a journal’s importance and quality is not completely tied to its numerical impact factor, but we think the particular journals we chose would be pretty commonly associated with the tiers of high, medium and low. With all these caveats, should we have given our blog post the more accurate and specific title “Results from the Guess the Impact Factor Challenge in the genomicsy/methodsy subcategory of molecular biology from late 2014/early 2015”? Nah, too boring, who would read that? ;)

We think one very important thing to keep in mind is that what we measured is perceived impact factor. This is most certainly not the same thing as perceived importance. Indeed, we’re guessing that many of you played this game with your cynic hat on, rolling your eyes at obviously “high impact” papers that are probably overhyped, while in the back of your mind remembering key papers in low impact journals. That said, we think there’s probably at least some correspondence between a seemingly high profile title and whether people will click on it—let’s face it, we’re all a bit shallow sometimes. Both of these factors are probably at play in most of us, making it hard to decipher exactly how people made the judgements they did.

Question is what, if anything, should we do in light of this? A desire to “do” something implies that there is some form of systematic injustice that we could either try to fix or, conversely, try to profit from. To the former, one could argue that the current journal system (which we are most definitely not a fan of, to be clear), may provide some role here in “mixing things up”. Since papers in medium and high impact journals get more visibility than those in low impact journals, our results show that high impact journals can give exposure to poorly (or should we say specific or informatively?) titled papers, potentially giving them a citation boost and providing some opportunity for exposure that may not otherwise exist, however flawed the system may be. We think it’s possible that the move to preprints may eliminate that “mixing-things-up” factor and thus increase the incentive to pick the flashiest (and potentially least informative) title possible. After all, let’s say we lived in a fully preprint-based publishing world. Then how would you know what to look at? One obviously dominant factor is who the authors are, but let’s set that aside for now. Beyond that, one other possibility is to try and increase whatever we are measuring with perception score. So perhaps everyone will be writing like that one guy in our field with the crazy bombastic titles (you know who I mean) and nobody will be writing about how “Cas9–crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria” any more. Hmm. Perhaps science Twitter will basically accomplish the same thing once it recovers from this whole Trump thing, who knows.

Perhaps one other lesson from all of this is that science is full of bright and talented people doing pretty amazing work, and not everybody will get the recognition they feel they deserve, though our results suggest that it is possible to manipulate at least the initial perception of our work somewhat. A different question is whether we should care about such manipulations. It is simplistic to say that we should all just do the work we love and not worry about getting recognition and other outward trappings of success. At the same time, it is overly cynical to say that it’s all just a rat race and that nobody cares about the joy of scientific discovery anymore. Maybe happiness is realizing that we are most accurately characterized by living somewhere in the middle… :)