Friday, May 1, 2015

Can I just normalize expression levels by GAPDH?

tl;dr: Depends on context. Probably yes in many instances, but there are definitely situations where you can’t. And beware of global changes in transcription–may just be volume effects.

Now that Olivia’s paper is out (slidecast, full text), thought I’d write a bit about the time-honored practice of normalizing gene expression by GAPDH. A bit of context: when people did RT-qPCR (remember that?) on bulk RNA isolated from, say, cells with and without drug, the question would arise as to how to normalize the measurement by number of cells, differences in RNA isolation efficiency, etc. The way people normally do this in a practical sense is by dividing by the expression of housekeeping genes like GAPDH, which we assume is roughly the same per cell in both conditions. This is of course an assumption, and one which is most definitely broken in some situations.

The plot thickened around 10 years ago, when people started making measurements showing that absolute transcript abundances can vary dramatically from cell to cell, even for housekeeping genes like GAPDH. So how should you normalize single cell data?

Olivia’s paper provides some answers, but also opens up more questions. One of the principal findings (also see this paper by Hermannus Kempe in Frank Bruggeman's group) is that transcript abundance roughly scales with volume. What this means is that bigger cells have more transcripts, and that while the number of, say, GAPDH mRNA can vary a lot from cell to cell, the concentration varies far less. This holds fairly globally. So what this means is that if you normalize by GAPDH, you are pretty much normalizing by the total (m)RNA content of the cell. In the case of single cell RNA-seq (will write up a comparison of that later), you are essentially also normalizing by total mRNA content. Thus, if you are interested in the concentration of your particular mRNA, this is a reasonable thing to do.

There are a couple of wrinkles here. First, one observation we made was that most of the mRNA we looked at had a higher concentration in smaller cells than in larger cells. It was not as wide as the volume variation, but it could go as high as 2x. We’re not sure of the origin of the effect, and it is possible that there’s some systematic error in our measurement that leads to this (although we really tried a lot of different things to discount such possibilities). In any case, it’s something to consider, especially if you want to be very quantitative.

Another wrinkle is that there are definitely situations we’ve encountered when GAPDH mRNA concentration itself can change. This can happen both homogeneously across the entire population, or even within single cells–in one project we’ve been working on, we see some cells with very high GAPDH transcript abundance right next to cells with very low GAPDH transcript abundance. What to do? If you’re doing sequencing, I think that adding some spike-in controls to help normalize by the total number of molecules could help. Or just do some RNA FISH to get a baseline… :)

Finally, I think it’s really important to carefully consider the directions of causality when making claims about global changes in transcription. Olivia’s heterokaryon experiments clearly show that increasing cell volume/cellular content can directly lead to increased transcription. What that means is that if you make a perturbation and then see a global change in gene expression, it may be (in fact, very well likely is) that the perturbation is somehow causing a cell volume change, which then can result in a proportional global change in transcription. We have seen this very clearly in a number of cases.

Another point is that it really depends on context.  We have a recent example in which absolute expression of a secreted protein remains constant, but the cell volume (and hence GAPDH) expression increases dramatically. So what matters, concentration? Absolute amount? It is secreted, and these cells are living in a primarily acellular environment, so the total secreted proteins presumably depends on the absolute number of molecules rather than the concentration. I think it's all a question of context. Which is of course a complete cop-out, I know... :)

Coming soon: description of a comparison of single cell RNA seq and RNA FISH.

8 comments:

  1. Of course, its not generally considered good practice these days to just normalize qPCR data to GAPDH, but instead to use the geometric mean of a panel of housekeeping genes. So in the single cell data are those changes in GAPDH concentration correlated to changes in the concentration of other housekeeping genes?

    ReplyDelete
    Replies
    1. Yep, they are all correlated with volume and hence to each other to varying degrees. If using a panel, might as well just normalize to total RNA, I guess.

      Delete
  2. Quick question -- why do you need spike in controls? Why can't you just normalize by library size to get concentration?

    ReplyDelete
    Replies
    1. Maybe I'm wrong about this, but in our experience, library size is not necessarily related (linearly or otherwise) to total input.

      Delete
  3. Arhjun, did you ever try to look at the level of Pol I or Pol III transcripts?
    For instance, in the mRNA decay field, in yeast, people routintely use SCR1 RNA (a pol III transcript) to normalize Northern blot data.

    Also, what about non-coding Pol II transcripts (LincRNA, miRNA)?

    I guess my question is whether this is a very global, all encompassing regulation machinery, or is it specific for Pol II or even just to protein coding mRNAs (and hence, maybe the regulatory factor(s) is/are related to translation)?

    ReplyDelete
    Replies
    1. Hi Gal, great questions! So we looked at rRNA (Pol I), and it also scaled quite nicely with volume. Didn't look at any Pol III transcripts, but I would guess they probably scale similarly. lincRNA: for most of the ones we looked at, they either scaled or were too noisy to tell (although they were all Pol II transcribed and poly A tailed). Overall, my feeling is that this is a global, all encompassing regulation, rather than just specific to Pol II, but we would need to do more to know for sure.

      Delete
    2. Hmmm.. interesting. So this should be a (protein?) factor (or factors) which is common to all three polymerases. Can this be related to nucleotides concentration/availability? [if so, more likely Uracil]. Can this be tested at the single cell level?

      Delete
    3. Well, it may be the same mechanism for all 3 polymerases, namely the polymerases themselves! We think we have decent evidence for RNA Pol II working for mRNA, and the same thing could be happening for RNA Pol I and III. If I had to guess, I'd say it's probably not nucleotide concentration based on our heterokaryon experiment (that would just be a volume sensor and not a volume/DNA sensor). But I could be thinking about that wrong...

      Delete