Computational analyses at large scales revolutionized palaeontology decades ago and continue to advance. Smart sensors allow to collect wider spectrums of ecological data than ever before. While sophisticated computational tools and techniques are becoming commonplace, it is not uncommon to expect that after months, years, or even decades of gathering data will speak by themselves and a computer will provide interpretations. And even if sometimes data stay uncomfortably numb, the prospects are hopeful not only for interpretations. We wish computers to design the analysis, pick the number of clusters for us, set up the “correct” tree or the “right” Bayesian parameters. Most of all we hope them to be objective. Objective, really?
“Making choices tends to make novices nervous” says Richard McElreath1. “There’s an illusion sometimes that default procedures are more objective than procedures that require user choice, such as choosing priors. If that’s true, then all “objective” means is that everybody does the same thing. It carries no guarantees of realism or accuracy.”
Philosophers define objectivity as knowledge that bears no trace of the knower, seeing with blind sight, without inference, interpretation or intelligence2. Is no interpretation or intelligence really how we want science to be conducted? What is left then, routines?
“ ’Let nature speak for itself’ became the watchword of scientific objectivity that emerged in the latter half of the nineteenth century. At issue was not only accuracy but morality as well: the all-too-human scientists must, as a matter of duty, restrain themselves from imposing their hopes, expectations, generalizations, aesthetics, even ordinary language on the image of nature”, Lorraine Daston and Peter Galison describe5. “Where human self-discipline flagged, the machine would take over.”
Over centuries the perception of scientific representation came in three waves6. The first wave allowed to correct impurities. Knowing that objects vary, the goal was to represent the most typical ones. Think of medical atlases, for instance.
The second wave came with advance of imaging and measurement technologies and represented a mechanical view. In medical world the perspective shifted from hand-drawn atlases to photos, scans or encephalograms. Those were not necessarily reproducible (lenses may have had dust after all), but it was supposedly free from human intervention. And yet mechanical images, scans or encephalograms were getting notoriously more and more difficult to interpret. Medical students were expected to learn from convoluted photos where a body was barely recognizable. Finally, professional judgement came into the scene and interpreters were called upon.
Computing, particularly recent advances in machine learning, present great promise for science. Instead of nature speaking for itself now data is expected to speak for itself. But can data speak? Without an interpreter?
Computing is mechanical, algorithms (even randomized to an extent) are perfectly repeatable. They are also consistent. Given the same input they will always (almost always) produce the same output.
Yet it is easy to mistake consistency for objectivity. Even if we leave aside potential biases in data collection, assumptions about how data were generated and how they are structured go into models.
Take clustering as an example. Clustering requires specifying the distance measure, there is no one way, in fact, there are hundreds or even thousands of ways for one dataset. There is no free lunch, they say, for knowledge to come out knowledge must come in:
“I'll need some information first / Just the basic facts / Can you show me where it hurts?" (Pink Floyd)
Analytical models are summaries of data and summaries of the world. From centuries old statistical models via trending AI tools to deep learning for image or language analysis, all of them provide summaries for the analysts. Just like in professional chess, a combination of humans and computers is the key. And while computer programs for chess are ever improving, a human assisted by a computer is unbeatable7.
Computer assisted summaries can be extremely useful in science, but summaries should not be mistaken for interpretations. Indeed, computational methods provide consistent, and at least hypothetically reproducible treatment. But an analyst, the person, selects data, defines modelling assumptions and sets the parameters. Interpretations of computed results are only meaningful in the context of the modelling assumptions. Needless to argue, responsibility for interpretations is always with the analyst, even if the analyst hasn’t made “the software” himself or herself. Yes, data can speak for themselves, but not by themselves. Interpretation will always need humans.
And meanwhile, while science is driven by peer-review, a question arises whether computers soon will do peer-reviews for us. Shouldn’t they? Wouldn’t this be utmost objective?
1. McElreath, R. (2020). Statistical Rethinking 2nd edition. CRC Press.↩
2. Daston, L. and Galison, P. (2007). Objectivity. The MIT Press. (Chapter 1).↩
3. The Robot Scientist Project; AI 'scientist' finds that toothpaste ingredient may help fight drug-resistant malaria ↩
4. This robot scientist has conducted 100,000 experiments in a year ↩
5. Daston, L. and Galison, P. (1992). The Image of Objectivity. Representations 40, Special Issue: Seeing Science, 81-128.↩
6. Daston, L. and Galison, P. (2007). Objectivity. The MIT Press. ↩
7. Centaur Chess Shows Power of Teaming Human and Machine ↩