Partial Automation of Scientific Discovery

I am using the term “discovery” as a verb, not a noun. That is, I am concerned more with the partial automation of the process of doing science, rather than the more strict “automatic generation and verification of new facts about the world”. In fact, the latter is subsumed by the former: whereas the latter is concerned with end-products (statements about the world and perhaps some information about how they were derived), the former is concerned with the instantiation of methods through which those aforementioned end-products are hypothesized and validated, e.g. semi-autonomous hypothesis generation, carrying out of experiments, interpretation of data, and so forth.

To a large extent, machines are already automating large swaths of the scientific landscape, in both the acquisition of data from wet-lab experiments and the analysis of such data. On the acquisition side, machines are automating (a) handling of materials and laboratory instruments, e.g. 10x sequencing platforms; (b) preprocessing of acquired signals, e.g. deconvolution algorithms in microscopes and error correction in spatial transcriptomics; and transformation of raw data into more semantic information to be used in downstream analyses, e.g. motion tracking and pose estimation of animals in behavioral experiments. On the analysis side, machine learning has been particularly instrumental in extracting insights from the acquired data, e.g. automatic classification of cell types in single-cell omics datasets, or in-silico drug screening. (it should be noted that machine learning is also used to optimize data acquisition and preprocessing, e.g. learning quality control parameters that optimize some criteria further downstream.)

Reading List

General

  1. Waltz, D., & Buchanan, B. G. (2009). Automating science. Science, 324(5923), 43-44.

  2. De Bie, T., De Raedt, L., Hernández-Orallo, J., Hoos, H. H., Smyth, P., & Williams, C. K. (2021). Automating data science: Prospects and challenges. arXiv preprint arXiv:2105.05699.

  3. Olson, R. S., Bartley, N., Urbanowicz, R. J., & Moore, J. H. (2016, July). Evaluation of a tree-based pipeline optimization tool for automating data science. In Proceedings of the genetic and evolutionary computation conference 2016 (pp. 485-492).

  4. King, R. D., Costa, V. S., Mellingwood, C., & Soldatova, L. N. (2018). Automating sciences: Philosophical and social dimensions. IEEE Technology and Society Magazine, 37(1), 40-46.

  5. Dangovski, R., Shen, M., Byrd, D., Jing, L., Tsvetkova, D., Nakov, P., & Soljačić, M. (2021, May). We Can Explain Your Research in Layman’s Terms: Towards Automating Science Journalism at Scale. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 14, pp. 12728-12737).

  • I think automatic reporting of scientific results in a journalistic, easily-digestible manner is actually quite an important task for the practice of science as a whole.
  1. Gil, Y., & Garijo, D. (2017, March). Towards automating data narratives. In Proceedings of the 22nd International Conference on Intelligent User interfaces (pp. 565-576).

  2. Savage, N. (2012). Automating scientific discovery. Communications of the ACM, 55(5), 9-11.

  3. Coley, C. W., Eyke, N. S., & Jensen, K. F. (2020). Autonomous discovery in the chemical sciences part I: Progress. Angewandte Chemie International Edition, 59(51), 22858-22893.

  4. Gil, Y., Greaves, M., Hendler, J., & Hirsh, H. (2014). Amplify scientific discovery with artificial intelligence. Science, 346(6206), 171-172.

  5. Kitano, H. (2021). Nobel Turing Challenge: creating the engine for scientific discovery. npj Systems Biology and Applications, 7(1), 1-12.

  6. Kitano, H. (2016). Artificial intelligence to win the nobel prize and beyond: Creating the engine for scientific discovery. AI magazine, 37(1), 39-49.

  7. Dybowski, R. (2020). Interpretable machine learning as a tool for scientific discovery in chemistry. New Journal of Chemistry, 44(48), 20914-20920.

  8. Gil, Y. (2017). Thoughtful artificial intelligence: Forging a new partnership for data science and scientific discovery. Data Science, 1(1-2), 119-129.

  9. Coutant, A., Roper, K., Trejo-Banos, D., Bouthinon, D., Carpenter, M., Grzebyta, J., … & King, R. D. (2019). Closed-loop cycles of experiment design, execution, and learning accelerate systems biology model development in yeast. Proceedings of the National Academy of Sciences, 116(36), 18142-18147.

  10. Gomez-Perez, J. M., Palma, R., & Garcia-Silva, A. (2017, October). Towards a human-machine scientific partnership based on semantically rich research objects. In 2017 IEEE 13th International Conference on e-Science (e-Science) (pp. 266-275). IEEE.

  11. Vasilevich, A., & de Boer, J. (2018). Robot-scientists will lead tomorrow’s biomaterials discovery. Current Opinion in Biomedical Engineering, 6, 74-80.

  12. Grizou, J., Points, L. J., Sharma, A., & Cronin, L. (2020). A curious formulation robot enables the discovery of a novel protocell behavior. Science advances, 6(5), eaay4237.

  13. Ezer, D., & Whitaker, K. (2019). Point of View: Data science for the scientific life cycle. Elife, 8, e43979.

  14. Hakuk, Y., & Reich, Y. (2020). Automated discovery of scientific concepts: Replicating three recent discoveries in mechanics. Advanced Engineering Informatics, 44, 101080.

  15. Y. Reich and E. Subrahmanian, “The PSI Framework and Theory of Design,” in IEEE Transactions on Engineering Management, vol. 69, no. 4, pp. 1037-1049, Aug. 2022, doi: 10.1109/TEM.2020.2973238.

  16. Melnikov, A. A., Poulsen Nautrup, H., Krenn, M., Dunjko, V., Tiersch, M., Zeilinger, A., & Briegel, H. J. (2018). Active learning machine learns to create new quantum experiments. Proceedings of the National Academy of Sciences, 115(6), 1221-1226.

  17. Masubuchi, S., Morimoto, M., Morikawa, S., Onodera, M., Asakawa, Y., Watanabe, K., … & Machida, T. (2018). Autonomous robotic searching and assembly of two-dimensional crystals to build van der Waals superlattices. Nature communications, 9(1), 1-12.

  18. Conrad, C., & Gerlich, D. W. (2010). Automated microscopy for high-content RNAi screening. Journal of Cell Biology, 188(4), 453-461.

  19. Vanschoren, J., Blockeel, H., Pfahringer, B., & Holmes, G. (2012). Experiment databases. Machine Learning, 87(2), 127-158.

  20. Trobe, M., & Burke, M. D. (2018). The molecular industrial revolution: automated synthesis of small molecules. Angewandte Chemie International Edition, 57(16), 4192-4214.

  21. Schneider, P., & Schneider, G. (2016). De novo design at the edge of chaos: Miniperspective. Journal of medicinal chemistry, 59(9), 4077-4086.

  22. Chao, R., Mishra, S., Si, T., & Zhao, H. (2017). Engineering biological systems using automated biofoundries. Metabolic Engineering, 42, 98-108.

  23. Lippi, G., & Da Rin, G. (2019). Advantages and limitations of total laboratory automation: a personal overview. Clinical Chemistry and Laboratory Medicine (CCLM), 57(6), 802-811.

  24. Genzen, J. R., Burnham, C. A. D., Felder, R. A., Hawker, C. D., Lippi, G., & Peck Palmer, O. M. (2018). Challenges and opportunities in implementing total laboratory automation. Clinical chemistry, 64(2), 259-264.

  25. Liu, Y. E. (2015). Building behavioral experimentation engines (Doctoral dissertation, University of Washington).

Somewhat Outdated, yet interesting / important

  1. Russo, M. F., & Echols, M. M. (1999). Automating science and engineering laboratories with visual basic. Wiley.

  2. Dunbar, K., & Fugelsang, J. (2005). Scientific thinking and reasoning. The Cambridge handbook of thinking and reasoning, 705-725.

  • For better understanding the scientific process
  1. Buchanan, B., Sutherland, G., & Feigenbaum, E. A. (1969). Heuristic DENDRAL: A program for generating explanatory hypotheses. Organic Chemistry.

  2. Langley, P., Simon, H. A., Bradshaw, G. L., & Zytkow, J. M. (1987). Scientific discovery: Computational explorations of the creative processes. MIT press.

  3. Lindsay, R. K., Buchanan, B. G., Feigenbaum, E. A., & Lederberg, J. (1993). DENDRAL: a case study of the first expert system for scientific hypothesis formation. Artificial intelligence, 61(2), 209-261.

  4. Markin, R. S., & Whalen, S. A. (2000). Laboratory automation: trajectory, technology, and tactics. Clinical chemistry, 46(5), 764-771.

People

  1. Yolanda Gil

  2. Ross D. King

  3. Hiroaki Kitano

  4. Jose Manuel Gomez-Perez

  5. Raul Palma

  6. Connor Coley

  7. Jonathan Grizou

  8. Abishek Sharma

  9. Nicholas Matiasz

  10. Yoram Reich

  11. Tobias Kuhn

  12. Yun-En Liu

Jan De Boer