Test-tubes with blue liquids over business document

Role of Statistics in Scientific Research

Research in science uses scientific method which is popularly known as the inductive-deductive approach. Scientific method entails formulation of hypotheses from observed facts followed by deductions and verifications repeated in a cyclical process. Facts are observations which are taken to be true. Hypothesis is a tentative conjecture regarding the phenomenon under consideration. Deductions are made out of the hypotheses through logical arguments which in turn are verified through objective methods. The process of verification may lead to further hypotheses, deductions and verification in a long chain in the course of which scientific theories, principles and laws emerge.

As a case of illustration, we may consider Edward Jenner’s trials with smallpox. He observed that people who have had cowpox do not become ill with smallpox from which he hypothesized that cowpox must be giving immunity against smallpox. The deduction he made was that if a person is intentionally infected with cowpox, then that person will be protected from becoming ill after a purposeful exposure to smallpox. Later this was verified by infecting people with cowpox followed by infecting with smallpox. This trial led to the conclusion that infecting a person with cowpox protects from infection with smallpox.

The two main features of scientific method are its repeatability and objectivity. Although this is rigorously achieved in the case of many physical processes, biological phenomena are characterised by variation and uncertainty. Experiments when repeated under similar conditions need not yield identical results, being subjected to fluctuations of random nature. Also, observations on the complete set of individuals in the population are out of question many times and inference may have to be made quite often from a sample set of observations. The science of statistics is helpful in objectively selecting a sample, in making valid generalisations out of the sample set of observations and also in quantifying the degree of uncertainty in the conclusions made.

Two major practical aspects of scientific investigations are collection of data and interpretation of the collected data. The data may be generated through a sample survey on a naturally existing population or a designed experiment on a hypothetical population. The collected data are condensed and useful information extracted through techniques of statistical inference. This apart, a method of considerable importance which has gained wider acceptance in recent times with the advent of computers is simulation. This is particularly useful because simulation techniques can replace large scale field experiments which are extremely costly and time consuming. Mathematical models are developed which capture most of the relevant features of the system under consideration after which experiments are conducted in computer rather than with real life systems.

In a broad sense, all in situ studies involving non-interfering observations on nature can be classed as surveys. These may be undertaken for a variety of reasons like estimation of population parameters, comparison of different populations, study of the distribution pattern of organisms or for finding out the interrelations among several variables. Observed relationships from such studies are not many times causative but will have predictive value. Studies in sciences like economics, ecology and wildlife biology generally belong to this category. Statistical theory of surveys relies on random sampling which assigns known probability of selection for each sampling unit in the population.

Experiments serve to test hypotheses under controlled conditions. Experiments are conducted with pre-identified treatments on well-defined experimental units. The basic principles of experimentation are randomization, replication and local control which are the prerequisites for obtaining a valid estimate of error and for reducing its magnitude. Random allocation of the experimental units to the different treatments ensures objectivity, replication of the observations increases the reliability of the conclusions and the principle of local control reduces the effect of extraneous factors on the treatment comparison.

Experimenting on the state of a system with a model over time is termed simulation. A system can be formally defined as a set of elements also called components. The elements (components) have certain characteristics or attributes and these attributes have numerical or logical values. Among the elements, relationships exist and consequently, the elements are interacting. The state of a system is determined by the numerical or logical values of the attributes of the system elements. The interrelations among the elements of a system are expressible through mathematical equations and thus the state of the system under alternative conditions is predictable through mathematical models. Simulation amounts to tracing the time path of a system under different conditions.

While surveys and experiments and simulations are essential elements of any scientific research programme, they need to be embedded in some larger and more strategic framework if the programme as a whole is to be both efficient and effective. Increasingly, it has come to be recognized that systems analysis provides such a framework, designed to help decision makers to choose a desirable course of action or to predict the outcome of one or more courses of action that seems desirable. A more formal definition of systems analysis is the orderly and logical organisation of data and information into models followed by rigorous testing and exploration of these models necessary for their validation and improvement.

Research related to biology extends from molecular level to the whole of biosphere. The nature of the material dealt with largely determines the methods employed for making investigations. Many levels of organization in the natural hierarchy such as micro-organisms or human beings are amenable to experimentation but only passive observations and modelling are possible at certain other levels. Regardless of the objects dealt with, the logical framework of the scientific approach and the statistical inference can be seen to remain the same.

Modern scientific research has witnessed large paradigm shifts with the availability of huge datasets, increased computational power and discovery of complex algorithms leading to machine learning. This change has happened right from the DNA level to areal mapping of vast landscapes. This has brought the science of statistics to the larger framework of data science, a combination of statistics, programming and domain knowledge. The inferential base of traditional fiducial statistics also got increasingly replaced by Bayesian methods which being computationally intensive were at the backstage so far. In short, we are witnessing revolutionary changes in the way knowledge discovery process happens!