Selecting Subjects for Survey Research…

Sampling (Selecting Subjects)...

The main purpose of survey research is to describe the characteristics of a population. This is usually accomplished by collecting data from a sample. Therefore, the first step in sampling is to define the population.

POPULATION–> The population is the group consisting of all people to whom we (as researchers) wish to apply our findings. lf we were interested in the reading level of 3rd graders in Connecticut, the population would be all third graders in Connecticut. The data (information) we collect from populations are called PARAMETERS and are said to be DESCRIPTIVE. We label the number of subjects (observations) in a population with an upper case N (N=300). The first step in sampling is to define the population (3rd graders in Connecticut). The actual population to whom the researcher wishes to apply his or her findings is called the TARGET population. Often the TARGET population is not available, and the research must use an ACCESSIBLE POPULATIONS. In this case, the researcher can only apply (generalize) his or her findings to that group.

SAMPLE–> Subsets of people are usually used to conduct studies. These subsets are called samples. The samples are used to represent the population from which they were drawn. The data we collect from samples are called STATISTICS and are said to be INFERENTIAL (because we are making inferences about the POPULATION with data collected from the SAMPLE). We label the number of subjects (observations) in a sample with a lower case n (n=25).

Statistics are used to effectively communicate numerical information to other people. In statistics we are…

  • …Looking at RELATIONSHIPS among (between) characteristics (i.e., salary & job satisfaction; food consumption & energy) — Correlation Research (which we study in a different unit) is an example of research involving relationships.
  • …Looking at DIFFERENCES between (among) groups (i.e., males & females; experiment & control) — Experimental Research (which we study in a different unit) is an example of research that looks at differences.
  • …Looking to DESCRIBE the characteristics of the population from data collected from a sample — Survey Research.   The two major types of surveys are cross-sectional survey and longitudinal survey (trend, cohort, and panel studies).

Inferential statistics are used to determine how likely it is that characteristics exhibited by a sample of people are an accurate description of those characteristics exhibited by the population of people from which the sample was drawn.

The term statistically significant (p < .05) is used merely as a way of indicating the chances are at least 95 out of 100 that the findings obtained from the sample of people who participated in the study are similar to what the findings would be if one were actually able to carry out the study with the entire population. In other words, with p<.05 we believe that if we repeated our study 100 times with different samples from a population where there really was no difference (or relationship), that the results we found with our sample would occur just by chance less than 5 in 100 times.

The first step in selecting a sample is to define the population to which one wishes to generalize the results of a study. Unfortunately, one may not be able to collect data from his or her TARGET POPULATION. In this case, an ACCESSIBLE POPULATION is used. If the latter is used, care must be taken not to generalize beyond the ACCESSIBLE POPULATION.

 -The sample is drawn from the population

  • -Data is collected from the sample
  • -Statistics are used to determine how likely the sample results are reflective of the population

A number of different strategies can be used to select a sample. Each of the strategies has strengths and weaknesses. There are times when the research results from the sample cannot be applied to the population because threats to external validity exist with the study. The most important aspect of sampling is that the sample represents the population.


 SIMPLE RANDOM SAMPLING – Each subject in the population has an equal chance of being selected regardless of what other subjects have or will be selected. While this is desirable, it may not be possible.

A random number table or computer program (random generator) is often employed to generate a list of random numbers to use.

A simple procedure is to place the names from the population is a hat and draw out the number of names one wishes to use for a sample.

STRATIFIED RANDOM SAMPLING – A representative number of subjects from various subgroups is randomly selected.

Suppose we wish to study computer use of educators in the Hartford system. Assume we want the teaching level (elementary, middle school, and high school) in our sample to be proportional to what exists in the population of Hartford teachers.

First we must determine what percentage of the teachers in the Hartford system are elementary, middle school, and high school. For this example, we will use 50%, 20% and 30% respectively. Because those percentages exist in our population, we want our sample to have the same percentages.

Let’s also assume that we want to sample 200 teachers. Since 50% of those teachers need to be elementary teachers, we need 100 elementary teachers in our sample (200 X .50). To achieve this, we obtain a list of all of the elementary teachers in the system. From that list we randomly select 100.

Similarly, we use a list of all of the middle school teachers and randomly select 40 (20% of 200). We do the same for the high school teachers and select 60.

The sample we selected is exactly proportional to the population with regards to teaching level. If we had not used STRATIFIED RANDOM SAMPLING we might have reached a similar proportion, or by chance, we might have had over representation of one of the groups.

However, the main reason we do stratified is to better understand each of the subgroups. Therefore, researchers may over sample some of the subgroups and then weight the results so they are still proportional. The reason we oversample is because we need a large enough sample to represent the subgroup.

CLUSTER RANDOM SAMPLING – Samples chosen from pre-existing groups. Groups are selected and then the individuals in those groups are used for the study.

If we wished to know the attitude of fifth graders in Connecticut about reading, it might be difficult and costly to visit each fifth grade in the state to collect our data. We could randomly select 10 schools (our clusters) and survey the students in those schools. Each school in the state would have an equal chance of being selected, but only the students at the selected schools would be surveyed.

An extension of the Cluster Random Sample is the TWO-STAGE CLUSTER RANDOM SAMPLE. ln this situation, the clusters (classes in our example) are randomly selected and then students within those clusters are randomly selected.

SYSTEMATIC SAMPLING -Systematic sampling is an easier procedure than random sampling when you have a large population and the names of the targeted population are available. Systematic sampling involves selection of every nth (e.g., 5th) subject in the population to be in the sample.

Suppose you had a list of 10,000 voters in your school district and you wished to sample 400 voters to see if they supported special funding for a new school program.

We divide the number in the population (10,000) by the size of the sample we wish to use (400) and we get the interval we need to use when selecting subjects (25). In order to select 400 subjects, we need to select every 25 person on the list.

Before we start selecting subjects, we need to select a random starting point on the list. That starting point must be with one of the first 25 names on the list for this example. We would use a random table or generator to determine the starting point. Once we have the starting point, we select that subject and every 25th subject after that on the list.

CONVENIENCE SAMPLING – Subjects are selected because they are easily accessible. This is one of the weakest sampling procedures. An example might be surveying students in one’s class. Generalization to a population can seldom be made with this procedure.

“Researchers often need to select a convenience sample or face the possibility that they will be unable to do the study. Although a sample randomly drawn from a population ls more desirable, it usually is better to do a study with a convenience sample than to do no study at all– assuming, of course, that the sample suits the purpose of the study” {Gall, Borg, & Gall, 1996, p. 228).

Gall, M. D., Borg, W.R., & Gall, J.P. (1996). Educational Research: An Introduction. White Plains, NY: Longman.

PURPOSIVE SAMPLING-Subjects are selected because of some characteristic. Patton (1990) has proposed the following cases of purposive sampling. Purposive sampling is popular in qualitative research. Note: These categories are provided only for additional information for EPSY 5601 students.


  • Extreme or Deviant Case – Learning from highly unusual manifestations of the phenomenon of interest, such as outstanding success/notable failures, top of the class/dropouts, exotic events,
  • Intensity – Information-rich cases that manifest the phenomenon intensely, but not extremely, such as good students/poor students, above average/below
  • Maximum Variation – Purposefully picking a wide range of variation on dimensions of interest…documents unique or diverse variations that have emerged in adapting to different conditions. Identifies important common patterns that cut across
  • Homogeneous – Focuses, reduces variation, simplifies analysis, facilitates group interviewing.
  • Typical Case – Illustrates or highlights what is typical, normal,
  • Stratified Purposeful – Illustrates characteristics of particular subgroups of interest; facilitates
  • Critical Case – Permits logical generalization and maximum application of information to other cases because if it’s true of this once case it’s likely to be true of a!I other
  • Snowball or Chain – Identifies cases of interest from people who know people who know people who know what cases are information-rich, that is, good examples for study, good interview
  • Criterion – Picking all cases that meet some criterion, such as all children abused in a treatment facility. Quality assurance.
  • Theory-Based or Operational Construct – Finding manifestations of a theoretical construct of interest so as to elaborate and examine the
  • Confirming or Disconfirming – Elaborating and deepening initial analysis, seeking exceptions, testing variation.
  • Opportunistic – Following new leads during fieldwork, taking advantage of the unexpected, flexibility.
  • Random Purposeful – (still small sample size) Adds credibility to sample when potential purposeful sample is larger than one can handle. Reduces judgment within a purposeful category. (Not for generalizations or representativeness.)
  • Politically Important Cases -Attracts attention to the study {or avoids attracting undesired attention by purposefully eliminating from the sample politically sensitive cases).
  • Convenience – Saves time, money, and Poorest rational; lowest credibility. Yields information-poor cases.
  • Combination or Mixed Purposeful – Triangulation, flexibility, meets multiple interests and needs. (Patton, 1990)

Patton, M. Q. (1990). Qualitative evaluation and research methods (2nd ed.). Newbury Park, CA: Sage Publications.

Sample Size

How large should my sample be? Large enough to be an accurate representation of the populaton and large enough to achieve statistically significant results

Larger Samples are needed when… 

  • a large number of uncontrolled variables are interacting unpredictably
  • the total sample is to be divided into several subsamples (the researcher is interested in also studying subgroups within the sample)
  • the population is made up of a wide range of variables and characteristics
  • differences in the results (effect size) are expected to be small
  • high attrition of subjects is expected

Sample Sizes for Surveys

The number of subjects you select (use a sample size calculator to determine this) will influence how confident you can be that your results depict the population from which the sample was drawn.

The confidence interval is the plus-or-minus figure usually reported in newspaper or television opinion poll results. For example, if you use a confidence interval of 4 and 47% percent of your sample picks an answer you can be “sure” that if you had asked the question of the entire relevant population between 43% (47-4) and 51% (47+4) would have picked that answer.

The confidence level tells you how sure you can be. It is expressed as a percentage and represents how often the true percentage of the population who would pick an answer lies within the confidence interval. The 95% confidence level means you can be 95% certain of the confidence interval; the 99% confidence level means you can be 99% certain of the confidence interval. Most researchers use the 95% confidence level.

When you put the confidence level and the confidence interval together, you can say that you are 95% sure that the true percentage of the population is between 43% and 51%.

The wider the confidence interval you are willing to accept, the more certain you can be that the whole population answers would be within that range. For example, if  you asked a sample of 1000 people in a city which brand of cola they preferred, and 60% said Brand A, you can be very certain that between 40 and 80% of all the people in the city actually do prefer that brand, but you cannot be so sure that between 59 and 61% of the people in the city prefer the brand.

Factors that Affect Confidence Intervals

There are three factors that determine the size of the confidence interval for a given confidence level. These are: sample size, percentage difference, and population size.

Sample Size

The larger your sample, the more confident you can be that their answers truly reflect the population. This indicates that for a given confidence level, the larger your sample size, the smaller your confidence interval. However, the relationship is not linear (i.e., doubling the sample size does not half the confidence interval).

Percentage Difference

Your accuracy also depends on the percentage of your sample that picks a particular answer. If 99% of your sample said “Yes” and 1% said “No” the chances of error are remote, irrespective of sample size. However, if the percentages are 51% and 49% the chances of error are much greater. It is easier to be sure of extreme answers than of middle-of-the-road ones.

When determining the sample size needed for a given level of accuracy you must use the worst case percentage (50%). You should also use this percentage if you want to determine a general level of accuracy for a sample you already have. To determine the confidence interval for a specific answer your sample has given, you use the percentage of the sample that selected that answer, which if it different than 50%, gives a smaller interval.

Population Size

How many people are there in the group your sample represents? This may be the number of people in a city you are studying, the number of people who buy new cars, etc. Often you may not know the exact population size. This is not a problem. The mathematics of probability proves the size of the population is irrelevant, unless the size of the sample exceeds a few percent of the total population you are examining. This means that a sample of 500 people is equally useful in examining the opinions of a state of 15,000,000 as it would a city of 100,000. For this reason, a sample calculator ignores the population size when it is “large” or unknown. Population size is only likely to be a factor when you work with a relatively small and known group of people.

Note: The confidence interval calculations assume you have a genuine random sample of the relevant population. If your sample is not truly random, you cannot rely on the intervals. Non-random samples usually result from some flaw in the sampling procedure. An example of such a flaw is to only call people during the day, and miss almost everyone who works. For most purposes, the non-working population cannot be assumed to accurately represent the entire (working and non-working) population.Information about confidence intervals was obtained from The Survey System


Del Siegle, Ph.D.
Neag School of Education – University of Connecticut