Tuesday, July 4, 2017

A system for paid reviews?

Some discussion on the internet about how slow reviews have gotten and how few reviewers respond, etc. The suggestion floated was paid review, something on the order of $100 per review. I have always found this idea weird, but I have to say that I think review times have gotten bad enough that perhaps we have to do something, and some economists have some research showing that paid reviews speed up review.

In practice, lots of hurdles. Perhaps the most obvious way to do this would be to have journals pay for reviews. The problem would be that it would make publishing even more expensive. Let's say a paper gets 6-9 reviews before getting accepted. Then in order for the journal to be made whole, they'd either take a hit on their crazy profits (haha!), or they'd pass that along in publication charges.

How about this instead? When you submit your paper, you (optionally) pay up front for timely reviews. Like, $300 extra for the reviews, on the assumption that you get a decision within 2 weeks (if not, you get a refund). Journal maybe can even keep a small cut of this for payment overhead. Perhaps a smaller fee for re-review. Would I pay $300 for a decision within 2 weeks instead of 2 months? Often times, I think the answer would be yes.

I think this would have the added benefit of people submitting fewer papers. Perhaps people would think a bit harder before submitting their work and try a bit harder to clean things up before submission. Right now, submitting a paper incurs an overhead on the community to read, understand and provide critical feedback for your paper at essentially no cost to the author, which is perhaps at least part of the reason the system is straining so badly.

One could imagine doing this on BioRxiv, even. Have a service where authors pay and someone commissions paid reviews, THEN the paper gets shopped to journals, maybe after revisions. Something was out there like this (Axios Review), but I guess it closed recently, so maybe not such a hot idea after all.

Thoughts?

Friday, June 30, 2017

#overlyhonestauthorcontributions

___ toiled over ridiculous reviewer experiments for over a year for the honor of being 4th author.

___ did all the work but somehow ended up second author because the first author "had no papers".

___ told the first author to drop the project several times before being glad they themselves thought of it.

___ was better to have as an author than as a reviewer.

___ ceased caring about this paper about 2 years ago.

Nobody's quite sure why ___ is an author, but it seems weird to take them off now.

___ made a real fuss about being second vs. third author, so we made them co-second author, which only serves to signal their own utter pettiness to the community.

Friday, May 5, 2017

Just another Now-that-I'm-a-PI-I-get-nothing-done day

Just had another one of those typically I-got-nothing-done days. I’m sure most PIs know the feeling: the day is somehow over, and you’re exhausted, and you feel like you’ve got absolutely nothing to show for it. Like many, I've had more of these days than I'd care to count, but this one was almost poetically unproductive, because here I am at the end of the day, literally staring at the same damn sentence I’ve been trying to write since the morning.

Why the case of writer's block? Because I spent today like most other work days: sitting in the lab, getting interrupted a gazillion times, not being able to focus. I mean, I know what I should do to get that sentence written. I could have worked from home, or locked myself in my office, and I know all the productivity rules I violate on a routine basis. But then I thought back on what really happened today…

Arrived, sat down, opened laptop, started looking at that sentence. Talked with Sydney about strategy for her first grant. Then met with Caroline to go over slides for her committee meeting—we came up with a great scheme for presenting the work, including some nice schematics illustrating the main points. Went over some final figure versions from Eduardo, which were greatly improved from the previous version, and also talked about the screens he’s running (some technical problems, but overall promising). And also, Eduardo and I figured out the logic needed for writing that cursed sentence. Somewhere in there, watched Sara hit submit on the final revisions for her first corresponding author paper! Meanwhile, Ian’s RNATag-seq data is looking great, and the first few principal components are showing exactly what we want. Joked around with Lauren about some mistake in the analysis code for her images, and talked about her latest (excellent) idea to dramatically improve the results. Went to lunch with good friend and colleague John Murray, talked about kids and also about a cool new idea we have brewing in the lab; John had a great idea for a trick to make the data even cooler. Chris dragged me into the scope room because the CO2 valve on the live imaging setup was getting warm to the touch, probably because CO2 had been leaking out all over the place because a hose came undone. No problem, I said, should be fine—and glad nobody passed out in the room. Uschi showed me a technical point in her SNP FISH analysis that suggests we can dramatically reduce our false-positive rate, which is awesome (and I’m so proud of all the coding she’s learned!). I filled our cell dewar with liquid nitrogen for a while, looks like it’s fully operational, so can throw away the return box. Sydney pulled me into the scope room to look at this amazing new real-time machine learning image segmentation software that Chris had installed. Paul’s back in med school, but dropped by and we chatted about his residency applications for a bit. While we were chatting, Lauren dropped off half a coffee milkshake I won in a bet. Then off to group meeting, which started with a spirited discussion about how to make sure people make more buffers when we run out, after which Ally showed off the latest genes she’s been imaging with expansion microscopy, and Sareh gave her first lab meeting presentation (yay!) on gene induction (Sara brought snacks). Then collaborators Raj and Parisha stayed for a bit after group meeting to chat about that new idea I’d talked about with John—they love the idea, but brought up a major technical hurdle that we spent a while trying to figure out (I think we’ll solve it, either with brains or brute force). And then, sat down, stared at that one half-finished sentence again, only to see that it was time to bike home to deal with the kids.

So yeah, an objective measure of the day would definitely be, hey, I was supposed to write this one sentence, and I couldn’t even get that done. But all in all, now that I think about it, it was a pretty great day! I think PIs often lament their lack of time to think, reminiscing about the Good Old Days when we had time to just focus on our work with no distractions, that we maybe forget about how lucky we are to have such rich lives filled with interesting people doing interesting things.

That said, that sentence isn’t going to write itself. Hmm. Well, maybe if I wait long enough…

Wednesday, May 3, 2017

Quick take on NIH point scale: will this shift budget uncertainty to the NIH?

Just heard about the new NIH point scale, and was puzzling through some of the implications. First, quick summary: NIH, in an effort to split the pie more evenly, is implementing a system in which each grant you have is assigned a point value, and you are capped at 21 points (3 R01 equivalents). Other grants are worth less. The consequences of this are of course vast, and I'm assuming most of this is going to be covered elsewhere. I'll just say that I do think some labs are just plain overfunded, so this will probably help with that. Also, it's clear from the point breakdown that some things are incentivized and disincentivized, which probably has some pluses and minuses.

Anyway, I did start wondering about what life would be like for a big lab working with 3 R01s. One of the realities of running such a lab is budget uncertainty. I remember early on when I started at Penn, a (very successful) senior faculty member took me to lunch and was talking about funding and said, "Jeez, my lab is too big, and I've been thinking about how I got here. Thing is you have a grant expiring and you want to replace it, so you have to submit 3 grants hoping that one will come in, but then maybe you get 2 or even all 3, and now you have to spend the money, and your lab gets too big." Clearly, this is bad, and the new system will really help with that. I guess what will happen is that if you get those 3 grants, then you will only take one of them. And, you may have to give back the rest of the grant you already have so that you don't go over 21. Think about this now from the point of view of the NIH: you're going to have money coming back that you didn't expect, and grants not funded that you thought would be funded. The latter is I suppose easy to deal with (just give it to someone else), but I wouldn't be surprised if the former might cause some budgetary problems. Basically, the fluctuations in funding would shift from the PIs to the NIH. Which I think is on balance a good thing. It makes a lot more sense to have NIH manage a large pool of uncertainty in funding than to have individual scientists try and manage crazy step function changes in funding, which will hopefully allow scientists to have more certainty on how much money to expect moving forward. Nice. But maybe I haven't thought through all the angles here.

Saturday, April 22, 2017

What will happen when we combine replication studies with positive-result bias?

Just read a nice blog post from Stephen Heard about replicability vs. robustness that I really agree with. Basically, the idea under discussion is how much effort we should devote to exactly repeating experiments (narrow robustness) vs. the more standard way of doing science, which is everyone does their own version to see whether the result holds more generally (broad robustness). In my particular niche of molecular biology, I think most (though definitely not all, you know who you are!) errors are those of judgement rather than technical competence/integrity, and so I think most exact replication efforts are a waste of time, an argument which many other have made as well.

In the comments, some people arguing for more narrow replication studies made the point that very little (~0%) of our current research budget is devoted to explicitly to replication. Which got me wondering: what might happen if we suddenly funded a lot of replication studies?

In particular, I worry about positive-result bias. Positive-result bias is basically the natural human desire to find something new: our expectation is X, but instead we found Y. Hooray, look, new science! Press release, please! :)

Now what happens when when we start a bunch of studies with the explicit mandate to replicate a previous study? Here, the expectation is now what was already found and so positive-result bias would bias towards a refutation. I mean, let’s face it, people want to do something interesting and new that other people care about. The cancer reproducibility project in eLife provides an interesting case study: most of the press around the publication was about how the results were “muddy”, and I definitely saw a great deal more interest in what didn’t replicate than what did.

Look, I’m not saying that scientists are so hungry for attention that most, or even more than a few, would consciously try to have a replication fail (although I do wonder about that eLife replication paper that applied what seemed to be overly stringent statistical criteria in order to say something did not replicate). All I’m saying is the same hype incentives that we complain about are clearly aligned with failed replication results, and so we should be just as critical and vigilant about them.

As for apportionment of resources towards replication, I think that setting aside the question as to whether it’s a good use of money from the scientific perspective (I, like others, would argue largely not), there’s also the question of whether it’s a good use of human resources. Having a student or postdoc work on a replication study for years during their training period is not, I think, a good use of their time, and keeps them from the more valuable training experience of actually, you know, doing their own science—let alone robbing them of the thrill of new discovery. Perhaps such studies are best left to industry, which is where I believe they already largely reside.

Saturday, April 8, 2017

The hater’s guide to (experimental) reproducibility

(Thanks to Caroline Bartman and Lauren Beck for discussions.)

Okay, before I start, I just want to emphasize that my lab STRONGLY supports computational reproducibility, and we have released data + code (code all the way from raw data to figures) for all papers primarily from our lab for quite some time now. Just sayin’. We do it because a. we can; b. it enforces a higher standard within the lab; c. on balance, it’s the right thing to do.

All right, that said, I have to say that I find, like many others, the entire conversation about reproducibility right now to be way off the rails, mostly because it’s almost entirely dominated by the statistical point of view. My opinion is that this is totally off base, at least in my particular area of quantitative molecular biology; like I said before, “If you think that github accounts, pre-registered studies and iPython notebooks will magically solve the reproducibility problem, think again.” Yet, it seems that this statistically-dominated perspective is not just a few Twitter people sounding off about Julia and Docker. This "science is falling apart" story has taken hold in the broader media, and the fact that someone like Ioannidis was even being mentioned for director of NIH (!?) shows how deeply and broadly this narrative has taken hold.

Anyway, I won’t rehash all the ways I find this annoying, wrongheaded and in some ways dangerous, I’ll just sum up by saying I’m a hater. But like all haters, deep down, my feelings are fueled by jealousy. :) Jealousy because I actually deeply admire the fact that computational types have spent a lot of time thinking about codifying best practices, and have developed a culture and sense of community standards that embodies those practices. And while I do think that a lot of the moralistic grandstanding from computational folks around these issues is often self-serving, that doesn’t mean that talking about and encouraging computational/statistical reproducibility is a bad thing. Indeed, the fact that statisticians dominate the conversation is not their fault, it’s ours: why is there no experimental equivalent to the (statistical/computational) reproducibility movement?

So first off, the answer is that there is, with lists of validated antibodies and an increased awareness of things like cell line and mycoplasma contamination and so forth. That is all great, but in my experience, these things journals make you check are not typically the reasons for experimental irreproducibility. Fundamentally, these efforts suffer from what I consider a “checklist problem”, which is the idea that reproducibility can be codified into a simple, generic checklist of things. Like, the thought is that if I could just check off all the boxes on mycoplasma and cell identification and animal protocols, then my work would be certified as Reproducible™. This is not to say that we shouldn’t have more checklists (see below), but I just don’t think it’s going to solve the problem.

Okay, so if simplistic checklists aren’t the full solution, then what is? I think the crux of the issue actually comes back to a conversation we had with the venerable Warren Ewens a while back about how to analyze some data we were puzzling over, and he said something to the effect of “There are all these statistical tests we can think about, but it also has to pass the smell test.” This resonated with me, because I realize that that at least some of us experimentalists DO teach reproducibility, but it’s more of an experiential learning to try and impart an intuitive sense of what discrepancies to ignore and which to lose sleep over. In particular in molecular biology, where our tools are imprecise and the systems are (hopelessly?) complex, this intuition is, in my opinion, the single most skill we can teach our trainees.

Thing is, some do a much better job of teaching this intuition than others. I think that where we can learn from the computational/statistical reproducibility movement is to try and at least come up with some general principles and guidelines for enhancing the quality of our science, even if they can’t be easily codified. And within a particular lab, I think there are some general good practices, and maybe it’s time to have a more public discussion about them so that we can all learn from each other. So, with all that in mind, here’s our attempt to start a discussion with some ideas for experimental reproducibility, ranging from day-to-day to big picture:
  1. Keep an online lab notebook that is searchable with links to protocols and is easily shared with other lab members.
  2. Organize protocols in an online doc that allows for easy sharing and commenting. Avoid protocol "fragmentation"; if a variation comes up, spend the time to build that in as a branch point in the protocol. Otherwise, there will be protocol drift, and others may not know about new improvements.
  3. Annotate protocols carefully, explaining, where possible, which elements of the protocol are critical and why (and ideally have some documentation). This helps to avoid protocol cruft, where new steps get introduced and reified without reason. Often, leading a new trainee through a protocol is a good time to annotate, since it exposes all the unwritten parts of the protocol. Note: this is also a good way to explore protocol simplification!
  4. Catalog important lab-generated reagents (probes, plasmids, etc.) with unique identifiers and develop a system for labeling. In the lab, we have a system for labeling and cataloging probes, which helps us figure out post-facto what the difference is between "M20_probe_Cy3" and "M20_probe_Cy3_usethis". What is hard with this is to develop a system for labeling enforcement. Not sure how best to do this. My system is that I won't order any new probes for a person until all their probes are appropriately cataloged.
  5. Carefully track biologic reagents that are known to suffer from lot variability, including dates, lot numbers, etc. Things like matrigel, antibodies, R-spondin.
  6. Set up a system for documenting little experiments that establish a little factoid in the lab. Like "Oh, probe length of 30 works best for expansion microscopy based on XYZ…". These can be invaluable down the line, since they're rarely if ever published—and then turn from lab memory into lab lore.
  7. Journal length limits have led to a culture of very short and non-detailed methods, but there's this thing called the internet that apparently can store and share a lot of information. I think we need to establish a culture of publicly sharing detailed protocols, including annotating all the nuances and so forth. Check out this from Feng Zhang about CRISPR (we also have made an extensive single molecule RNA FISH page here).
  8. (Lauren) Track experiments in a log, along with all relevant (or even seemingly irrelevant) details. This could be, for instance, a big Google Doc with list of all similar types of experiments, pointing to where the data is kept, and critically, all the little details. These tabulated forms of lab notebooks can really help identify patterns in those little details, but also serve to show other members of the lab what details matter and that they should be attentive to.
  9. Along those lines, record all your failures, along with the type of failure. We've definitely had times when we could have saved a lot of time in the lab if we had kept track of that. SHARE FAILURES with others in the lab, especially the PI.
  10. (Caroline) Establish an objective baseline for an experiment working, and stick to it. Sort of like pre-registering your experiment, in a way. If you take data, what will allow you to say that it worked or didn't work. If it didn't work, is there a rationalization? If so, discuss with someone, including the PI, to make sure you aren't deluding yourself and just ignoring data you don't like. There are often good reasons to drop bits of data, and sometimes we make mistakes in our judgement calls, but at least get a second opinion.
  11. Develop lab-specific checklists. Every lab has it's own set of things it cares about and that people should check, like microscope light intensity or probe HPLC trace or whatever. Usually these are taught and learned through experience, but that strikes me as less efficient than it could be.
  12. Replicates: What constitutes a biological replicate? Is it the same batch of cells grown in two wells? Is it two separate passages of the same cell line? If so, separated by how much time? Or do you want to start each one fresh from a frozen vial? Whatever your system, it's important to come up with some ground rules for what replicates means, and then stick to it. I feel like one aspect of replication is that you don't want the conditions to be necessarily exactly the same, so a little variability is good. After all, that's what separates a biological replicate (which is really about capturing systematic but unknown variability) from a technical replicate (which is statistically variability).
  13. Have someone else take a look at your data without leading them too much with your hypothesis. Do they follow the same logic to reach the same conclusion? Many times, people fall so in love with their crazy hypothesis that they fail to see the simpler (and far more plausible) boring explanation instead. (Former postdoc Gautham Nair was so good at finding the simple boring explanation that we called it the "Gautham transform" in the lab!)
  14. Critically examine parts that don't fit in the story. No story is perfect, especially in molecular biology, which has a serious "everything affects everything" problem. Often times there is no explanation, and there's nothing you can really do about it. Okay, but resist the urge to sweep it under the rug. Sometimes there's new science in there!
  15. Finally, there is no substitute for just thinking long and hard about your work with a critical mindset. Everything else is just, like I said, a checklist, nothing more, nothing less.
Anyway, some thoughts, and I'm guessing most people already do a lot of this, implicitly or explicitly. We'd love to hear the probably huge list of other ideas people out there have for improving the quality/reproducibility of their science. Point is, let's have a public discussion so that everyone can participate!

On criticism

-by Caroline Bartman

Viewed in a certain light, grad school- all of scientific training- is a process of becoming a good critic. You need to learn to evaluate papers and grants either to make them better, to score/review them, or to try to expand your understanding of the field. However, there are many nuances to being a good critic that were never spelled out in my grad school classes, and that I still try to improve on all the time.

0. Seeing the bigger picture: What statement is the paper trying to make? How do you feel about THAT STATEMENT after reading it? Every paper has experiments with shortcomings or design flaws. Does the scientific light shine through in spite of that? Or are the authors over-interpreting the data? This is really the key to criticizing scientific work thoughtfully and productively.

1. Compassion: Especially important when evaluating the work of others. One person or group can only do so much, due to time, resources, and experimental considerations. When I was an undergrad never having written a paper, I would go to journal clubs and say things like ‘This was a good paper, but what really would have nailed it would be to use these three additional transgenic mouse strains.’ Not realistic! And devalues the effort that’s already represented in the paper. Before you ask for additional experiments, step back: would those really change the interpretation of the paper? Sometimes yes, often no (goes back to point 0).
Plus, consciously noting the good aspects of a paper or grant, and only pointing out limited, specific criticisms will make the author happier! So they will be more likely to adopt your suggestions, and in a way actually facilitates the science moving forward.

2. Balance: Comes into play when evaluating work that you would be predisposed to like- such as your own work! But also the work of well-known labs (aka fancy science). I often find myself cutting myself slack I wouldn’t give others. (‘That experiment is really just a control, so it’s a waste of time’, etc. ) Reviewers (and also my PIs, thanks Gerd and Arjun) won’t necessarily see your work in such a rosy light!
With fancy science, it’s easy to see that e.g. a statement made in a paper isn’t so well supported by the data, but say ‘They’re experts! They founded this field. They probably know what they’re doing.’ Sometimes true, but sometimes not. Would you feel the same way about the paper if it came from an unknown PI? Plus, a fancy lab actually has the best capacity and manpower to carry out the very best experiments with the newest tech! Maybe they should be subject to even harsher scrutiny in their papers.

3. Ignorance: I don’t really know if there’s a good name for this quality. Maybe comfort with uncertainty? You are often called upon to evaluate papers or grants that aren’t in your sub-sub-sub field, and that can instill doubts. Yes, you have to recognize your possible lack of expertise. But you can still have valuable opinions! Ideally papers would be read by scientists outside the immediate field, and help inform their thinking. Plus, while technologies differ, scientific reasoning is pretty much constant. So if an experiment or a logical progression doesn’t make sense, you can say something. The worst thing that could happen is someone tells you you’re wrong.

Grad school tends to instill the idea that knowledge is the primary quality required to evaluate scientific work. Partially because young trainees do indeed need to amass some body of understanding in order to ‘get’ the field and make comments. But knowledge is really not enough, and sometimes (point 3) not even necessary!

Comment if you have more ideas on requirements for a good scientific critic!