ADVERTISEMENTS:
In this article we will discuss about the general problems of measuring human behaviour.
Problems of Measuring Human Behaviour:
In as much as measurement of things, their properties, overt or covert, and their responses to certain stimuli, etc., are essential engagements of scientific research it hardly needs to be emphasized that the quality of research will depend on the fruitfulness of the measurement procedures employed.
It is understandable that basic to any meaningful measurement is an adequate formulation of the research questions and explicit definition of the concepts utilized in the course of the study. In other words, the researcher must know besides why he wants to measure, what it is that he wants to measure.
ADVERTISEMENTS:
Measurement in the realms of human behaviour or social phenomena is typically hazardous since a substantial part of what is intended to be measured is covert or of an inferential nature.
We have just stressed that the researcher must have a clear notion of what is to be measured. Failing this, he would not be in a position to decide how he will measure it. A measurement procedure consists of the techniques for collecting data and a set of rules for using these data.
The accompanying rules facilitate the use of these data in making specific statement about the characteristics of the phenomena to which the data purport to be relevant.
Data may be collected in many different ways, e.g., by observation, questionnaire or interviews, examination of records or available statistics and/or by projective techniques. The data collection techniques and the rules for utilizing the data must produce if they are to be useful, information that besides being relevant to the research problem is valid, reliable and precise.
ADVERTISEMENTS:
Let us briefly understand what type of a measuring instrument or procedure can produce information that is (a) valid, (b) reliable and (c) precise.:
(a) An instrument is valid to the extent it measures what it purports to measure. If the researcher was interested in measuring a person’s I.Q. (intelligence) and adopted as a measuring instrument a test that measured only general knowledge or memory, then the measuring instrument cannot be said to be valid.
As Selltiz, Jahoda and associates state, “A measuring procedure is valid to the extent to which scores reflect true differences among individuals, groups or situations in the characteristics it seeks to measures.” A thermometer, for instance, cannot be a valid instrument for measuring pressure.
(b) An instrument of measurement is reliable to the extent that the independent and comparable measures of the same object give similar results (provided, of course, that the object being subjected to measurements does not undergo changes between the measurements).
For instance, a measuring tape made of elastic would be an extremely unreliable instrument since the same object may yield different measures (values) depending upon how much the elastic is stretched.
Similarly, an instrument, say an I.Q. test, would be considered unreliable if persons who were classified on the basis of the first measurements as’ geniuses’ are classified as ‘below average’ on a second measurement taken after, say, a month of the first one.
(c) The measuring instruments is precise to the extent that it is capable of making distinctions in certain characteristic of persons or situations fine enough for the purpose it is expected to serve. A precise scale for attitude measurements will be able to distinguish between fine shades, e.g., strongly favourable, neutral, unfavorable and strongly unfavorable, of people’s attitudes and not register just crude differences.
Factors which may contribute to variations among scores registered on a measuring instrument administered to a group of subjects. This consideration is very relevant (especially, in the context of human behaviour) because the result of measurement may reflect not only the characteristics being measured but the process of measurement itself.
Following are the possible sources of variation in scores among a group of subjects subjected to a measuring instrument:
ADVERTISEMENTS:
(a) Of course, the variations in scores in the ideal measuring situation would reflect the true differences in the characteristic which the researcher is attempting to measure. This is as it should be.
(b) Conceivably, true differences in other relatively stable characteristics of the subject may affect his score. Few techniques available to the social scientist can provide ‘pure’ measures of any given characteristic. Hence, the scores of individuals in a group may reflect not only differences in the characteristic being measured but also the differences in other characteristics intertwined with the one being measured.
(c) Variations in the situation in which measurement takes place often play a major part in contributing to the difference in scores among a group of subjects; for example, the interview results of a subordinate may be markedly affected by the presence of his superior officer. Comparatively, the presence of his co-workers would not affect them so substantially.
(d) Various personal factors such as latent motive, state of health, fatigue, mood, etc. may contribute to variations in scores of the subjects in a group.
(e) Lack of uniformity and inadequacies in the method of administering the measuring instrument, e.g., administration of an interview schedule to subjects, may contribute to variations in scores among subjects.
(f) Any measuring instrument, taps only a sample of items relevant to the characteristic being measured. If a measuring instrument comprises mostly items to which the subjects are likely to respond in one way rather than the other, that is, if the items did not represent the entire universe of possible aspects then the scores may vary owing to the sampling of items.
(g) The subjects will quite understandably respond to the items or questions on the basis of how they understand them. Therefore, if the subjects’ understanding of the items in a measuring instrument is in uniform, variations in their responses may reflect the differences in interpretations or understanding rather than the true differences (in the characteristics).
(h) An instrument of measurement may not function effectively in reference to its purport owing to such circumstances as poorly-printed instructions, wrong check marks, lack of space for recording responses fully, etc. It is not improbable that these factors might contribute to variations in the scores of subjects.
(i) The phase of analysis of data involves many sub-processes such as coding, tabulation, statistical computation, etc. Errors entering at this stage may contribute to variations in scores of individuals.
The major factors contributing to variations in results obtained from any measurement procedure. The errors introduced into the data by such factors (from (b) to (i) may be grouped into two broad categories:
(1) Constant or systematic errors which are introduced into the measurement by some factor which systematically affects the characteristic being measured in the course of measurement itself (for example, stable characteristics like intelligence, education, social status etc.).
(2) Random errors are introduced into the measurement by factors likely to vary from one measurement to the next, although the characteristic that the researcher wants to measure has not changed. For example, transient aspects of the person, situation of measurement or of the measurement procedure, etc. are factors that might introduce such errors.
Thus, a random error reveals itself in the lack of consistency in reported or equivalent measurements of the same person, group, object or event.
The Problem Involving Validity of Measurements:
The validity of a measuring instrument as suggested earlier has a reference to the extent to which differences in scores on it reflect the true differences among individuals or groups, etc. in respect of the characteristic it seeks to measure, or the true difference in the same person or group from one occasion to another.
The constant errors and the random errors may get introduced into the measure. Estimates or validity are thus affected by both types of errors.
In essence, validity concerns the extent to which an instrument is measuring what it is intended to measure. The formulation can be interpreted in various direction. ‘Realism’ could be a term to be used as a label for this goal.
Realism is not the same as truth, a realistic description need not be literally true, but it must help us to get an adequate or realistic picture of the world either by telling us straightforward truths or by affording us points of view, concepts and the like, which are faithful in relation to this purpose.
Let us take an example of the thermometer. What does it measure, which variables it is assigning values to? The answer to this is minimal realism a measurement is anchored in reality in the sense that it pertains to significant aspects of the world.
But then we can go on to ask the further question of how it measures temperature, that is, if it measures temperature correctly and thus measures nothing but temperature. This is maximal realism.
We may distinguish among three different interpretations of the requirement that the result of a measurement should be about ‘what it is intended’, that is, three different meanings of ‘realism.’
(1) The realism of a certain set of data consists of its correspondence to some facts, i.e., its truth.
(2) The realism of a certain set of data consists of its connection with some significant problem or with the purpose of the study, i.e., its relevance.
(3) The realism of a certain set of data consists of its correspondence with precisely those facts that are connected with some real problem or the purpose of the study, i.e., truth and relevance. The meaning of the term validity oscillates between (2) and (3). Krippendorf observes, “Generally, validity designates a quality that compels one to accept scientific results as evidence. Its closest relative is objective truth.”
Since we do not know an individual’s true position on the variable or characteristic we seek to measure, there is no direct way of determining its validity. In the absence of such direct knowledge of the individual’s true position in respect of the variable being measured, the validity of an instrument is judged by the extent to which its results are compatible with other relevant evidence.
What constitutes relevant evidence depends, on of course, on the nature and purpose of the measuring instrument. The purpose of certain tests is to provide a basis for specific predictions about individuals, e.g., whether certain individuals will be successful in a particular type of career, profession or in solving a certain type of problem.
Other tests, although they are designed to measure specific characteristics of individuals do not afford predictions about how individuals will respond to function in given situations. In the case of the former type of tests, evidence as to whether the individual actually conforms to the predictions, provides a basis for estimating the validity of the test.
Investigation of validity in these terms may be described as pragmatic, i.e., validity judged in terms of accuracy of predictions made on the basis of the test results.
The latter type of tests which are designed to measure characteristics that do not lead to specific predictions cannot naturally be evaluated so directly. Certain other evidence is necessary and is sought to provide a basis for judging whether the test or instrument measures the concept, it is charged with measuring. The approach has been designated as construct validation.
Pragmatic Validity:
In the pragmatic approach to estimation of validity the interest is in the usefulness of the measuring instrument as an indicator or a predictor of some other behaviour or characteristics of the individual. The investigator is not interested in the performance of the subject on the test per se; rather he is interested in the person’s performance on the test only as an indication of a certain characteristic of the person.
What is essential in this approach is that there be a reasonably valid and reliable criterion with which the scores on the measuring instrument can be compared. In general, the nature of the predictions and the techniques available for checking them will determine what criteria are relevant. For example, if the purpose of a test is to predict success in school, one very relevant criterion could be the school grades.
Ideally, the criterion with which the scores on the measuring instrument are compared should itself be perfectly valid and reliable.
But in practice the investigator rarely, if at all, finds a thoroughly tested criterion and usually selects the one that seems most adequate despite its limitations. The reliability and validity of the available criteria may be improved upon by careful defining of the various dimensions of the criterion and getting information relevant to these.
Construct Validity:
Many of the measures in social sciences deal with or relate to complex constructs, e.g., measures of intelligence, of attitudes, of modernization, of group morale, etc. Cronbach and Meehl point out that the definition of such constructs consists, in part, of sets of propositions about their relationship to other variable constructs or directly observable behaviour.
Thus, in examining construct validity one would ask such questions as, “what predictions would one make on the basis of these sets of propositions about the relationships among variable scores based on a measure of this construct?” or “are the measurements obtained by using this instrument consistent with these predictions?”
It is evident that the predictions in respect of the construct validity are of a different order and have a somewhat different function from those involved in determining the pragmatic validity.
For example, let us consider a prediction relating to how individuals will vote at the national elections. Since this prediction relates to examining the construct validity, such prediction about voting may be made with a view to evaluating the construct validity of a test of say, ‘progressive attitudes.’
The researcher may reason out that since this test measures persons with progressive orientation, people who get rated as less “progressive” on this test will be more likely to vote for a particular party which has a particular ideological bias or slant.
But there may not always be a high degree of correlation between “progressive” orientation and voting behaviour, since many other conceivable influences such as family tradition, socio-economic status, religion etc. may also influence his vote.
In construct validation, all the predictions that would be made on the basis of a set of propositions involving the construct enter into the consideration of validity (ideally).
For example, the researcher in the above illustration may specify voting preferences for candidates within a particular political party. He may also make and test predictions about relations between voting and socio-economic status, religion, education, etc. If any one of these predictions is not borne out, the validity either of the measure or of the underlying hypotheses would become suspect.
It should be remembered that examination of construct validity involves validation not only of the measuring instrument but also of the theory or perspective underlying it. If the researcher’s predictions are not borne out, the researcher may not get a clear clue as to whether the shortcoming is in the measuring instrument or in the theory on which the predictions were grouped.
Consider, for example, a prediction or hypothesis that greater interaction between people leads to greater liking for each other amongst them.
If the findings do not suggest the predicted or hypothesized relationship, it may mean either that measure of ‘liking’ or of ‘interaction’ is not valid or that the hypothesis incorrect.
The researcher may under the circumstances, be led to re-examine the constructs as ‘interaction’ or ‘liking’, and the entire network of propositions that led to this prediction. The outcome or result of this quest may be the refinement of the constructs.
Campbell and Fiske have suggested that the investigation of construct validity can be made more rigorous by increased attention being paid to the adequacy of the measure of the construct in question; before its relationships to other variables are considered.
They propose that two kinds of evidence about a measure are needed before one as really justified in examining relationship with other variables:
(a) Evidence that different measure of the construct yield similar results; and
(b) Proof that the construct thus measured can be differentiated from other constructs.
In order to secure such evidence, the researcher must measure the characteristics from which he wishes to differentiate or segregate his construct, using the same general methods he has applied to his central construct.
In the light of the above discussion, it is clear that construct validity cannot be adequately tested by a single procedure. Evidence derived from a number of sources is relevant, i.e., correlation with other tests, internal consistency of items, stability of pattern over time, etc.
How the evidence from each of these sources bears on the estimation of the validity of the test depends on the relationship predicted in the theoretic system in which the construct has been employed. To the extent more and more different relationships are tested and confirmed, the greater is the support marshaled both for the measuring instrument and for the underlying theory.
We shall do well to bear in mind that these aspects are not mutually exclusive. Estimates of pragmatic validity may enter into the evaluation of construct validity and principally, the construct validity of the measure shown to have pragmatic validity could be investigated.
In sum, construct validity does not lie in the correspondence between reality and results but in the relationship between results of a measurement or investigation and some theory supposed to shed light on the problems or questions behind the investigation.
It has been suggested that the measures shown to have pragmatic validity are usually arrived at by a trial and error method. The ideal requirement of science lays down that the constructs involved in these measures and their relation to the criterion variables be thoroughly considered. Such investigations typically lead to clarification of concepts and eventually to construct validation of these measures.
The scientist should not, advisedly, remain content with a measurement procedure which has been validated only pragmatically. So long as the scientist cannot understand the reason behind its working, i.e., why a particular prediction comes out to be true, he has no assurance that the ‘mysterious’ conditions of its working still hold with reference to a particular application.
It is quite understandable that so long as a particular prediction ignores any concern for an underlying theoretical explanation as to its working, it does not afford any basis for generalization to other problems.
The Problem of Reliability of Measurements:
The formulation of the requirement of reliability is often impaired by a certain vagueness. By the reliability of a measurement with respect to a given variable is meant the constancy of its results as that variable assumes different values.
The variables usually considered are: the measuring event (e.g., the same person using the same ruler in successive measurement “forms” of an intelligence test)1; the person doing the measuring (e.g., different eye witnesses of the same event).
Holsti says:
“If research is to satisfy the requirement of objectivity, measures and procedures must be reliable; i.e., repeated measures with the same instrument on a given sample of data should yield similar results.”
Here we find a connection between the concepts of objectivity and of reliability. This is only as it should be. It is natural to assume that an objective result is independent of the subject who conducted the investigation.
Here, however, we must distinguish between the factual or ontological problem, i.e., what makes the result true and the epistemic or methodological problem — how do we come to know that a result is true?
Ontological independence is a two-place relation. A datum is ontologically independent of its producer (for example, scorer) when it is not about the producer. The question of epistemic independence is, however, more complicated and controversial. Epistemic independence is a three place relation; it is a relation between a producer, P, a result, R, and a set of possible producer, S.
If a certain result R1, produced by P1 is epistemically independent in relation to a population of scores, SI to the extent that it is possible for the members of S1 to know R1without first acquiring mental characteristics that PI already has.
A datum is thus maximally independent, in this sense, in a certain population of scores if every member of this sense, in a certain population of scores if every member of this population would produce it. The epistemic independence is a matter of degree, in so far as more or less of specific capabilities and knowledge of the scorer can be made use of in the process of measuring a phenomenon.
Krippendorf distinguishes between three kinds of reliability:
(1) Stability is the degree to which a process is invariant over time, i.e., yields the same results at different points of time.
(2) Reproducibility is the degree to which a process can be recreated under varying circumstances, different locations involving different materials, forms, i.e., yields the same results despite different implementations.
(3) Accuracy is the degree to which a process conforms in effect to a known standard, i.e., yields the desired results in a variety of circumstances.
Reliability is regarded to be one of the constituent elements of validity. A measurement procedure or an operational definition (of a construct) is reliable to the extent that independent applications of it yield consistent results.
Julian L. Simon remarks, “reliability is roughly the same as consistency and repeatability.” He further elaborates that if one knows that a measuring instrument or an operational definition has satisfactory validity, he need not bother about its reliability.
If an instrument is valid, it means in essence that it is reflecting primarily the characteristic which it is supposed to measure, thus there would be little reason to investigate its reliability, i.e., the extent to which it is influenced by extraneous factors.
Methodologists would not, however, be inclined to subscribe to the view that reliability is presupposed by validity. They would point out instances between these two concepts and that reliability is a necessary but not a sufficient condition for validity or in other words, that validity implies reliability but is not implied by it.
It is only rarely that a researcher, is in a position to say in advance whether his instrument has a satisfactory validity. Hence, it is generally necessary to determine the extent of variable error in the measuring instrument. This applies with particular force to construct validity where simple determination of validity directly is impossible of realization.
Till such time as a satisfactory measure of validity and reliability of a measuring instrument has been demonstrated already, its reliability should be determined before it is used in a study. We now consider some of the methods of determining the reliability of measurements.
Evaluation of reliability of a measuring instrument requires determination of the consistency of independent but comparable measures of the same individual or situation. It would be desirable, therefore, to have many repeated observations or measurements of the same individual or situation as a basis for estimating errors of measurement. But in the area of human behaviour this is often not possible.
Apart from other things, such oft repeated measurements may create annoyance, fatigue and may also affect the characteristics one wishes to study. When this is likely, reliability may be estimated on the basis of as few measures as possible (even two) for each individual in a sample of the ‘universe’ or even on the basis of one measure if it can be subjected to internal analysis.
Enough measurements to provide a basis for evaluation of reliability may also be obtained by increasing the number of individuals measured rather than the number of measurements of the same individual.
Different methods of estimating reliability are focused on different sources of variation in scores:
(a) Some are concerned with ‘stability’ of individual’s position from one administration of the measure to another,
(b) Others are concerned with the ‘equivalence’ of the individual’s position on different instruments expected to measure the same characteristic. They essentially focus on unreliability due to sampling of items or to substantive variations in administration, and
(c) Still other methods are concerned both with ‘stability’ and ‘equivalence.’
Stability:
The stability of results of a measuring instrument is determined on the basis of consistency of measures on repeated applications. Of course, where the characteristic being measured is likely to fluctuate between the subsequent measurements, e.g., attitudes, morale, etc., inconsistency in measurements would be evidenced.
This should not, however, be interpreted as unreliability of the measuring instrument. What should bother us is the effect of extraneous factors.
The appropriate method for determining stability is to compare the result of repeated measurements with a view to identify whether the source of instability is genuine fluctuations in the characteristic being measured or a random error due to inadequacies contained in the measuring instrument.
When the measuring instrument is an interview schedule, questionnaire or a projective test, involving subjects’ cooperation, usually, only two administrations are used. In case the measuring device is observation in a natural setting, the number of observations may be repeated.
A restricted number of administrations (usually two) in case of identical repeat- interview, questionnaire or projective test is advisable because these measuring procedures require a great deal of participation by the individual subject and interest, curiosity, and motivation may get substantially dampened during the second application of the test.
Consequently, the test though objectively identical to the earlier one, may actually represent a substantially different test-situation on second administration. Secondly, the subject may artificially try to maintain consistency in his responses at the second application as he may remember his responses to the first one.
There is, of course, the possibility that the initial measurement has actually changed the characteristics being measured (as in the ‘before-after’ experiment.) Lastly, there is the possibility of a genuine change occurring between two administrations of the test.
When there is the possibility of initial measure having affected the results of the second measurement and also of genuine change having been brought about by extraneous factors, the common practice is to strike a mean between waiting long enough for the effects of the first testing to fade off but not long enough for a significant amount of change to take place.
The method of ‘alternate’ measurement procedures administered at different times is a well-thumbed design to take account of the combined effect of these various sources of unreliability and thus ensure both ‘stability’ and ‘equivalence.’
A group of subjects receive one form of test at one time and after a lapse of time receive a different form of the test. Alternatively, the group of subjects may be rated by one observer in a particular situation and by a different observer after the stipulated lapse of time in another situation. Correlation between scores or rating at these two points of time provides an overall index of the reliability.
As in the repeated-observation method, there is the possibility that genuine changes in the characteristic being measured might have occurred in the interval between two test administrations. But again, provided that the results of two testing are reasonably independent, the effect of this possibility is to make the obtained coefficient an estimate of the minimal reliability of the measuring instrument.
Since the method of alternate measurement procedure administered at different times takes into account more sources of variation than the other methods described earlier, it ordinarily gives a lower but a more accurate estimate of reliability than either a coefficient of stability or a coefficient of equivalence.
Which method of testing reliability a researcher will use in a given research depends not only on the intrinsic worth of different techniques but also on the practical facilities available to him and the resources that can be procured to develop the measuring procedures.
Sometimes, it may not be possible for him to reach the same group of subjects for the second or subsequent measurements or the cost of so doing may be prohibitive.
In such a case, he has no choice but to base his estimate of reliability on the equivalence of scores. Sometimes, the measuring instrument does not lend itself to internal analysis that may be warranted as a test of equivalence.
The reliability of measuring instruments can often be increased by taking appropriate steps with respect to the source of error. For example, the conditions under which the measurement procedure is applied can be standardized to effect a similarity between conditions in which subsequent measurements are taken.
Adequately trained, instructed and motivated personnel greatly helps to minimize the possibility of variations during the administration of procedures.
There are two methods of increasing the reliability of a measurement procedure. These involve the selection and accumulation of measurement operations themselves rather than any change in the conditions of the measurement operations.
The first method of increasing reliability is to add measurement operations of the type with which the researcher started in the first place and to assign to the subject a score based on the sum of the results of all measurement operations.
In the testing situation parlance, this means increasing the length of the test and in the observational situation it means increasing the number of observations or of observers or both. We can make the reliability of a measuring procedure approach the point of perfection (i.e.,1) provided we are able to add measurement operations indefinitely without changing their nature in any major way.
The second method of increasing reliability is to increase the internal consistency of the measurement operations of the test items. This method is mostly used in the field of psychological testing. The researcher begins with a large collection of test items, calculates a score based on each, and another score based on responses to the total set of items.
Then the score for each item is correlated with the total score. Only those items (e.g., statements) are retained in the test which correlate most highly with the total score of the test. The -items which are retained are then divided into two equivalent groups. New scores are calculated on these two groups and these scores are correlated to provide a measure of reliability of the ‘purified’ test.
The internal consistency of the test items may also be increased by dropping from the test items as cannot distinguish sharply between the high and low scorers, i.e., that have a low discriminatory power, those items which yield the largest differences in the direction of response, in other words, those items which have a high ‘discriminatory power’, are identified as the ones most consistent with the total set (test).
It is well-worth remembering that the method of internal consistency can reduce unreliability resulting from lack of equivalence of items but it cannot do much to reduce unreliability resulting from instability of a subject’s responses or variations in the condition of measurement.
The method of increasing the number of measurement operations can be used to reduce these sources of unreliability if it is possible to spread the measurement operations out over a period of time or to distribute them over a number of different conditions of measurement.
Precision:
The problem relating to precision of the measuring instruments. In the interest of both accuracy of judgement as well as the discovery of constant relationship among characteristics that vary in amount as well as in kind, statements that merely affirm or deny differences have to be replaced by more precise statements indicating the degree of difference.
In social sciences, many of the distinctions are qualitative in nature, e.g., we distinguish different races, languages, cultures and communities. But it is often necessary and desirable to make distinctions of degree rather than of mere kind. Suppose, a researcher was interested in studying people’s attitude toward the ceiling on urban property.
Should he want to assert that two persons differ in their attitudes toward ceiling on urban property, he must at least be able to distinguish among different shades of attitude, that is, he must be able to identify certain persons as equivalent, or others as unequal or different.
If the researcher wishes to state that the attitude of one person is more favourable than that of the other, he must be able to identify rank different attitudinal positions as more favourable or less favourable. If he wishes to make a statement that Mr. X, is much more favourable than Mr. Y as compared to Mr. Z, then he must be able to determine whether the difference between the former two attitudinal positions is equal to the difference between the latter two attitudinal positions.
And going a step further, if the researcher wants to make some such assertion as “Mr. X is twice as favourable as Mr. Y,” then he must be able to identify the existence of an absolute zero of favourable-ness for the given attitude as well as equal points above the zero point.
The above four types of statements correspond respectively to four levels of measurement or as these are mediated through four types of scales, i.e., the nominal scale, ordinal scale, interval scale and ratio scale. The least powerful are the nominal scales whereas the ratio scales are the most powerful as devices for comparison.
Nominal Scales:
The basic requirement for a nominal scale is that the researcher be able to distinguish two or more categories relevant to the attribute under study and specify the criteria for placing individuals or groups etc., in one or the other category.
The categories are merely different from each other; they do not represent the more or less degree of the attitude or characteristic being measured. Classification of individuals according to race, for instance, constitutes a nominal scale. The use of nominal scales is characteristic of exploratory research where the emphasis is on uncovering relationship between certain characteristics rather than on specifying the mathematical form of relationship.
Ordinal Scales:
An ordinal scale defines the relative position of objects or persons with respect to a characteristic with no implication of the distances between positions. The basic requirement for an ordinal scale is that one be able to determine the order of positions of objects or persons in terms of the characteristics under study, i.e., whether an individual has more of the given characteristic than another individual or the same amount of it or less.
This presupposes that one must be able to place each individual at a single point with respect to the characteristic in question. Of course, this is a requirement for the more powerful scales as well. In an ordinal scale, the scale-positions are in a clearly defined order but there is no definite indication of the distance between any two points.
That is, distance between 7 and 8 may be equal to, greater than or lesser than the distance between 2 and 3 or 1 and 2. Thus, with ordinal scales, we are constrained to make statements of greater, equal or less, but we cannot articulately specify how much greater or how much less.
Interval Scales:
On an interval scale, not only are the positions arranged in terms of greater, equal or less, but the units or intervals on scale are also equal. In other words, the distance between the scale positions 7 and 8 is equal to the distance between positions 2 and 3, 2 and 1. The thermometer is an example of the interval scale.
The basic requirement for such a scale is a procedure for determining that the intervals are equal. It should be remembered that for many of the attributes that social sciences typically deal with, procedures affording reasonable certainty about equality of intervals are yet to be devised.
On the interval scale, the zero point is arbitrary (as in the nominal and ordinal scale); for example, in the centigrade thermometer the zero point is the mark at which water freezes but in the Fahrenheit thermometer the zero point is much below the freezing point of water.
Thus, in the interval scale one cannot state, for instance, that a person’s attitude is twice as favourable as that of another person just as one cannot say that 20 degrees Fahrenheit is twice as hot as 10 degrees Fahrenheit.
Ratio Scale:
A ratio scale contains in addition to the characteristic of an interval scale, an absolute zero. The operations necessary to establish a ratio scale include methods for determining not only equivalence (as in the nominal scale) or rank order (as in ordinal scale) and the equality of intervals (as in the interval scales) but also the equality of ratios.
Since ratios are meaningless unless there is an absolute zero point, it is only the ratio scale that warrants assertions such as “Mr. X is twice as favourable towards the nationalization of Banks as Mr. Y is.” If one’s data conform to the criteria for a ratio scale, all relations between numbers in the conventional system of mathematics obtain between the correspondingly numbered positions on the scale. With such a scale, all types of statistical procedures become amenable.
It should be conceded that for most of the subject-matter dealt with by the social sciences, the researchers have not been able to develop procedures that satisfy the requirements of a ratio scale. For most part, the scale construction uses the judgements of people (judges) as the basis for the subject’s positions on the scale.
Social sciences can never hope to reach the precision of measurements achieved in the physical sciences because the very nature of the materials with which the social scientists deal do not permit the establishment of an absolute zero point.
No doubt, much of social science measurement will always be indirect and will depend upon the knowledge of the relationships between characteristic. The development of this knowledge, in turn, is partly dependent upon the development of fundamental measurement.
The gloomy prediction that the social sciences may never reach the precision of measurements characteristic of physical may well prove to be correct. But’ such an assertion seems premature at this stage. We can only express the hope that measurement of social and psychological characteristics or properties will, given time, progress from a lower scale to a higher one as it happened in the case of many physical properties.
As Stevens has pointed out, “when people (in the olden times) knew temperature only by sensations as ‘warmer’ or ‘colder’, temperature belonged to the ordinal class of scales. It became an interval scale with the development of thermometry, and after thermodynamics… it became a ratio scale.”
Long ago a foot-racer would race only against another runner. Now he races against a clock and achieves a time record. Of course, the runner usually also races against other runner but his time scores has a meaning all by itself, which was not possible before the chronometer (clock) was invented.