Project: Neuroscience under a quantitative progress studies perspective

10 minute read

Published:

Update This project is now a collaboration with Kevin Kermani Nejad. Current material:

1. Introduction

Understanding the human brain - to the point of being able to reproduce its abilities, as one could design and build a plane or chair - is hopefully a finite problem for civilization. I think that many aspects of that problem (see section 2 for my attempt at enumeration) can be quantified. Similarly, performance and progress of associated experimental methods can often be quantified and predicted, leading to equivalents of Moore’s law famous from computing.

I’d like to attempt just that systematically - create a spreadsheet and accompanying notes discussing the questions in section 3 for the items in section 2, giving numerical answers (USD, gigabytes, achievable accuracy, person-hours invested in a certain year etc.) whenever possible. I’d like to do so by literature review and interviews with experts. This is different from - and much simpler than - reviewing all the literature in neuroscience and microbiology: Most of the literature focuses on what scientists know or learned recently, I’d like to focus on how much they did, do, and need to know.

Ideally, this would allow predictions of how far we are from “reverse-engineering” the human brain, and understanding how it works and develops from a human genome in its environment - and how we will most likely achieve that goal - however, section 4 discusses some big possible caveats.

2. Breaking down the problem: the brain’s inner workings by scale

“If you wish to make an apple pie from scratch, you must first invent the universe.” ― Carl Sagan, Cosmos

“There is a theory which states that if ever anyone discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable. There is another theory which states that this has already happened.” ― Douglas Adams, The Restaurant at the End of the Universe

As argued, a description of an adult brain sufficient to reproduce human intelligence would probably take less than 300 megabytes of space - if one considers an adequate lossy compression of the genetic and environmental “input data” and discounts randomness, irrelevant data, and intermediate computations performed during development. The grand challenge of neuroscience is to obtain knowledge and understanding corresponding to that data.

A “brute-force” solution would involve a whole-cell model of human brain and body cells, starting with a fertilized egg. This would be a description of all chemical reactions in the cells, complete and accurate enough to simulate their “algorithm”.1 The state of the art in whole-cell modeling is quite some way from that2 - but crucially, I found in several places that obtaining and processing relevant data becomes exponentially cheaper over the years.

To create a whole-cell model, one is interested in the following questions:

  1. How do the cell’s proteins fold?
  2. What chemical reactions are they involved in, and what are the reaction speeds?
  3. Where are proteins and other substances located in a cell?
  4. What’s the “control logic” in each cell, determining when which proteins get expressed and activated?
  5. What chemical and electrical signals do cells exchange for communication?

To understand how human intelligence works in an adult brain, it would suffice to understand the effective behaviour of neurons and other human cells, i.e. answer roughly the following questions:

  1. Cell differentiation and morphogenesis: What types of cells (in particular, neurons) exist? How are they distributed in a brain?
  2. Arborization: How do neurons decide to “branch out” and connect to other neurons?
  3. Activity and plasticity: Given a branched-out neuron, how does it respond and change its properties when exposed to electrical and chemical stimuli?

Zooming out further, the question becomes:

  1. Collective behaviour: Can we describe collections of neurons without describing each neuron individually? How to map specific abilities to brain regions?

The most important - and, likely, most open-ended - question is:

  1. Design principles - what’s the point of all these algorithms? Which parts of them are essential for intelligence, and which are artifacts of biology?

Putting it all together, a whole-brain emulation would be a model reproducing a brain in a computer accurately enough that the simulation reproduces its abilities. This has not been achieved even for the simplest real-life neural networks.3 A related field is neuromorphic engineering, though the emphasis there is on solving concrete engineering problems rather than accurately reproducing an entire brain.

3. What questions to ask experts, and the literature?

As a starting point, the questions I’d like to ask concerning all these levels are similar. This does not mean I expect to find satisfactory, quantitative answers for all of them - but I could only know after talking with experts.

  1. What’s the necessary quality (resolution, error rate etc.) and quantity (gigabytes, number of graph edges…) of data needed to “solve” the problem - i.e. create a quantitative model that, as part of a whole-brain emulation, would be accurate enough to reproduce human intelligence? How sure are we about these?
  2. To what extent could data of lower quality and quantity be helpful? (e.g. using fMRI to determine brain areas involved in certain abilities, even though their resolution is far lower than a neuron-level recording)
  3. What experimental and computational data-gathering methods exist? How fast - and predictably - did and do they improve, in particular in terms of performance per USD? Can we draw a nice Moore’s law-like graph showing exponentially declining costs per unit of information?
  4. What is the reason for that progress, i.e. what changed in the wider world of engineering that allowed the introduction of new methods? What are possibilities of and obstacles to further progress?
  5. Probability estimates for certain kinds of advances as in 4.?
  6. How many people-hours and USD per year are devoted to the field? As in 3., how did that develop historically and how is it projected to develop?
  7. How much compute is expected to be necessary in a simulation as in 1.? How much is expected to be an artefact of the biological substrate, as opposed to being “essential for intelligence”?
  8. Further remarks or complications?

Just like Moore’s law allowed to predict developments in IT decades in advance, similar laws concerning the items of the enumeration above may allow predictions when, and how, humanity meets those subjects’ grand challenges.4 Answers to question 4 could be turned into a “tech tree”/”tech directed acyclic graph” similar to what’s on the last page of this report on longevity research. A guesstimate is a natural way to combine probability estimates.

It is not likely that deriving these kinds of laws works smoothly, however - see the next section.

4. Problems

“All models are wrong, but some are useful” - George Box

  • Progress is often discontinuous, and come in unexpected ways that don’t appear in such extrapolations. Part of the question is therefore to what extent these “laws” could be observed in the past, and how plausible it is that they extend into the future.
  • “Understanding mammalian brains” will probably be achieved by a combination of several advances - e.g. by combining gene manipulation that makes recording neural networks easier and/or more insightful with advances in recording technology. Treating every item and approach independently will make one miss these developments.
  • What’s more, even answers to the concrete questions of section 3 will have plenty of caveats if they exist at all.5 There will also be different opinions in the community regarding whatever points can be made about them.

So the output could never be as clean as the solution of a math exercise. But as in the quote above, a problem having no complete solution doesn’t in principle imply that working on it is useless. I think some reference that summarizes relevant information, and makes its own limitations clear, is feasible.

4. The desired end-state, and how I’d like to get there

Going back to the goal stated in section 1, I’d expect a fact-sheet of about 1.5 A4 page per item in section 2 - excluding item 10, which is too open-ended. As mentioned, making its own limitations clear should be a crucial part of both.

I estimate that one can get there based on interviews with about 2-3 experts per item in section 2, plus preparing and following up by a literature review.

I hope to have a partial report in 2.5 months, based on which I can hold a talk.

6. Acknowledgments

So far, thanks in particular to Milan Cvitkovic, Birses Debir, Kynan Eng, Jonathan Karr and Lucia Purcaru for helpful feedback, discussions and references on the subject and draft.

7. Footnotes

  1. See this website, and the list of models therein, for more information - although it doesn’t seem to have been updated since 2018. 

  2. To my understanding, a “complete” whole-cell model has been obtained for Mycoplasma genitalium, an STD agent that happens to be among the simplest known bacteria. More recent work focused on yeast. See here for an overview of technological advances needed for human whole-cell modeling - though the focus there is on generic human cells rather than neurons. 

  3. The OpenWorm project attempts to emulate the entire neural network of the C. Elegans worm, but hasn’t succeeded yet. A crucial problem seems to be that, although we know the worm’s neural network’s “wiring diagram” - called connectome - we don’t have adequate knowledge of connection strengths, and how these change in response to stimuli. 

  4. Of course, there are various approaches to obtain data for each level. For example, the structure of a folded protein could be determined by cryogenic electron microscopy - which, as I heard, improved greatly since about 2010 - or by a computer program, where a deep learning system called AlphaFold improved the state of the art in 2018-2020. Similarly, an adequate model of neurons and their interactions may be obtained by whole-cell modeling - or by growing neurons in a lab, measuring how they respond to stimuli, and training a machine learning model on the results. 

  5. That is not too different from the situation in computing; there, all plots of structure sizes or costs per bit, transistor, or floating-point operation by year hide complications and caveats beyond the scope of this text.