انت هنا الان : شبكة جامعة بابل > موقع الكلية > نظام التعليم الالكتروني > مشاهدة المحاضرة

Descriptive Biostatistics

Share |
الكلية كلية طب الاسنان     القسم  العلوم الاساسية     المرحلة 1
أستاذ المادة جميلة علي عبد الصاحب الكريمي       15/05/2017 03:36:33
Biostatistics
It is the science which deals with development and application of the most appropriate methods for the:
? Collection of data.
? Presentation of the collected data.
? Analysis and interpretation of the results.
? Making decisions on the basis of such analysis

? Role of statisticians
To guide the design of an experiment or survey prior to data collection,
To analyze data using proper statistical procedures and techniques,
To present and interpret the results to researchers and other decision makers,


? Types of data
• Constant Variables

Quantitative variables Qualitative variables
1. Quantitative continuons 1. Qualitative nominal
2. Quantitative discrète 2. Qualitative ordinal

? Methods of presentation of data
? Numerical presentation Tabular presentation (simple – complex)
? Graphical presentation
? Pie chart ? Statistical maps ? Graphs drawn using Cartesian coordinates
• Line graph
• Histogram
• Bar graph
• Scatter plot
? Mathematical presentation

Descriptive Biostatistics
Is the best way to work with data is to summarize and organize them. Numbers that have not been summarized and organized are called raw data.
? A descriptive measure is a single number that is used to describe a set of data.
? Descriptive measures include measures of central tendency and measures of dispersion.
Central Tendency:
It is a property of the data that they tend to be clustered about a centres point. The measures of central tendency include:
– Mean (generally not part of the data set)
– Median (may be part of the data set)
– Mode (always part of the data set).

Measures of Dispersion
Dispersion is a property of the data that they tend to be spread out, it is included:
o Range
o Variance
o Standard deviation
o Coefficient of variation


Central Tendency


1. The mean or arithmetic mean: is the "average" which is obtained by adding all the values in a sample or population and dividing them by the number of values.


For example, the waiting time (in minutes) of five customers in a bank is: 3, 2, 4, 1, and 2. the mean waiting time is:


On average, a customer waits 2.4 minutes for service at the bank

We may have noticed that the above formula refers to the sample mean. So, why have we called it a sample mean? This is because, in statistics, samples and populations have very different meanings and these differences are very important, even if, in the case of the mean, they are calculated in the same way. To acknowledge that we are calculating the population mean and not the sample mean, we use the Greek lower case letter "mu", denoted as µ:

Characteristics of Mean

1. Uniqueness: For a given set of data there is one and only one mean.
2. Simplicity: The mean is easy to calculate.
3. Affected by extreme values: The mean is influenced by each value. Therefore, extreme values can distort the mean.

2. Median: is the value that divides the set of data into two equal parts. It is the midpoint of the data set. The number of values equal to or greater than the median equals the number of values less than or equal to the median.

Finding the Median
1. Arrange (sort) the data in order of increasing value in a sorted list.
2. Find the median.
a. Odd number of values (n is odd), middle value of sequence
If X = [1,2,4,6,9,10,12,14,17]
Then 9 is the median
b. Even number of values ( n) , average of 2 middle values
if X = [1,2,4,6,9,10,11,12,14,17]
then 9.5 is the median; i.e., (9+10)/2


Characteristics of Median

1. Uniqueness: There is only one median for each set of data.
2. Simplicity: It is easy to calculate.
3. Median is not affected by extreme values

3. Mode : the mode is the most frequently occurring number in a distribution
if X = [1,2,4,7,7,7,8,10,12,14,17]
then 7 is the mode
Characteristics of Mode
1. Easy to see in a simple frequency distribution
2. Possible to have no modes or more than one mode
? bimodal and multimodal
3. Don’t have to be exactly equal frequency
? major mode, minor mode
4. Mode is not affected by extreme values.

The appropriateness of measures of central tendency for different levels of measurement.




When to Use


Example: Consider the aptitude test scores of ten students below:
95, 78, 69, 91, 82, 76, 76, 86, 88, 80
Mean = (95+78+69+91+82+76+76+86+88+80)/10 = 82.1
If the entry 91 is mistakenly recorded as 9, the mean would be 73.9, which is very different from 82.1.
On the other hand, let us see the effect of the mistake on the median value:
The original data set in increasing order are:
69, 76, 76, 78, 80, 82, 86, 88, 91, 95
With n = 10, the median position is found by (10 + 1) / 2 = 5.5. Thus, the median is the average of the fifth (80) and sixth (82) ordered value and the median = 81
The data set (with 91 coded as 9) in increasing order is:
9, 69, 76, 76, 78, 80, 82, 86, 88, 95
where the median = 79
The medians of the two sets are not that different. Therefore the median is not that affected by the extreme value 9.









Practice Questions and Problems
1. With which of the data classes Nominal, Ordinal, Interval/Ratio can the following measure of central tendency be used? (A given measure may be used for more than one data class.)
a. mean
b. mode
c. median
2. Under what conditions might a median be a better measure of the center of your data set than the mean?



3. It should seem clear how the mean and the median are measures of the central tendency of the data since the mean is a familiar average and the median is the middle. However, explain why the mode is also considered a measure of central tendency.
4. The following data represent a sample of the time to complete a certain task in minutes and seconds (mm:ss).
6:30, 11:15, 6:22, 11:32, 8:12, 5:02, 9:17, 6:51, 8:44, 7:45, 9:37, 7:28, 4:29, 7:42
a. Compute the mean.
5. The following sample data of the number of communications are taken from logs of commmunication with Distance Education students: Compute the mean , median and mode.
(5, 9, 5, 23, 27, 55, 34, 7, 30, 15, 22, 60, 14, 52, 297, 8, 51, 15, 51, 35, 15, 39, 137, 43, 38, 14, 93, 7)

6. Consider the following data set:
21, 34, 18, 26, 30, 35, 24, 29, 25
a. If this is a population, compute the mean.
b. If this is a sample, compute the mean.
7. At the beginning of the 2015-16 academic year the number of years the full-time teaching faculty had been at Southwestern were:
(13, 5, 20, 1, 8, 0, 3, 9, 31, 8, 2, 16, 1, 3, 19, 9, 0, 6, 8, 0, 3, 10, 18, 24, 5, 11, 15, 4, 4, 4, 36, 5, 4, 5, 3, 0, 3, 9, 17, 0, 13, 4, 15, 8, 5, 20, 19, 24, 6, 6, 9, 0, 37)
a. What is the mean?
b. What is the median?
c. Which is a better measure of the center of the data set? Why?
d. Compute the five-number summary.
c. median because of the few extreme values
d. (0, 3, 7, 15, 38)



The Variance and Standard Deviation
The mean, mode and median do a nice job in telling where the center of the data set is, but often we are interested in more. The most important measure of variability is (Range, Variance, Standard deviation and Coefficient of variation).
? Range (R):
The range is the difference between the largest (XL) and the smallest (XS) values in a set of observations.
Range=(Highest value) – (Lowest value)
R = XL – XS
Example: Range of {1, 3,6,11, 14}
14-1=13
Note: The range is poor measure of dispersion? Because it only takes into account two of the values.



? Variance:
The variance is the most commonly used to measure of spread in biological statistics. The variance of a set of values is a measure of variation equal to the square of the standard deviation. For a population is defined as the sum of squares of the deviation from the mean (SS), dividing by the total number of the deviations, and by one less than the total number of the deviation (degree of freedom, df) for a sample.

Note: The reason for dividing by (n-1), (df), in sample, because the sum of deviations of the values from their mean is equal zero.

? Standard Deviation (s) or (sd):
The standard deviation of a set of values is a measure of variation of values about the mean
There are two standard deviation; Sample standard deviation and Population standard deviation
It is defined as a positive square root of variance.

The mean squared difference from the sample mean will, on average, underestimate the population variance. In some samples it will overestimate it, but most of the time it will underestimate it, if the formula is modified so that the sum of squared deviations is divided by n-1 rather than N, then the tendency to underestimate the population variance is eliminated.
The Key Points Standard Deviation
? The standard deviation is a measure of variation of all values from the mean.
? The value of the standard deviation s is usually positive and always non-negative.
? The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others)
? The units of the standard deviation s are the same as the units of the original data values


? Coefficient of Variation (C.V):
It is defined as the ratio of the standard deviation to the mean. It is independent of the units employed.


Example: A set of data (4, 6, 3, 4, 5 and 2) compute: The range, the variance, the standard deviation and the coefficient of variation?
Solution:
R = XL - XS
R= 6-2=4

Q1: Correct the following sentences
1. The Mean central tendency measure used with Nominal, Ordinal, Interval/Ratio.

2. Sample Mean: is mean or average used to measure central tendency.

3. Best measure: is considered when the most repeated observations recorded are outliers of data then Mode.

4. The mean is influenced by each value. Therefore, extreme values cannot distort the mean.

5. Service time (in minutes) at airport ticket counter is as 4.5, 5.5, 6, 7, 8, 8.5, 4, 3, 3.5, 2.5, 3.8 then median of data is (5).

6. Roman letter (?) is denoted to the population parameter of central tendency.

7. For any data set, what is ?(X ? X) ? Zero.

8. Measures of positive variation are considered as a method used to compute average or central value of collected data.

9. Arithmetic mean is 12 and number of observations are 20 then sum of all values is 420.

10. The following sample data set: 6, 12, 9, 7, 8, 4, 3, 12, 15 the mean, median and mode are (8.44,8,15)

11. The Median is influenced by each value. Therefore, extreme values can distort the Median.

12. Mode is the value that divides the set of data into two equal parts.

13. Characteristics of Mode simplicity and uniqueness.

14. The Median is the value that occurs most often in the data.

15. Median is always part of the data set


Q2: Explain why the mode is considered a measure of central tendency.
Q3:
1. With which of the data classes Nominal, Ordinal, Interval/Ratio can the following measure of central tendency be used?
a. mean
b. mode
c. median
2. With which of the data classes Ordinal and Interval/Ratio can the following measure of central tendency be used?
a. mean
b. mode
c. median

3. Under what conditions might a median be a better measure of the center of your data set than the mean?
a. When the data is Ordinal—the mean does not apply.
b. For Interval/Ratio data if there are one or more extreme values.
c. Both of them
4. For any data set, what is ?(X ? X)?
a. Zero
b. ? Zero
c. ? zero
5. It should seem clear how the mean and the median are measures of the central tendency of the data since the mean is a familiar average and the median is the middle. However, explain why the mode is also considered a measure of central tendency.
a. Most data sets are peaked in the middle.
b. The mode is the highest frequency.
c. Is typically in the middle of the data somewhere.
d. All above.
6. The following sample data set: 6, 12, 9, 7, 8, 4, 3, 12, 15 the mean, median and mode are
a. (8.444, 8, 12).
b. (8,8, 15)
c. (8.44,8,15)
7. Arithmetic mean is 12 and number of observations are 20 then sum of all values is
a. 8
b. 32
c. 240
8. Method used to compute average or central value of collected data is considered as
a. measures of positive variation
b. measures of central tendency
9. Mean or average used to measure central tendency is called
a. sample mean
b. arithmetic mean
c. population mean
10. In statistics out of 100, marks of 21 students in final exams are as 90, 95, 95, 94, 90, 85, 84, 83, 85, 81, 92, 93, 82, 78, 79, 81, 80, 82, 85, 76, 85 then mode of data is

a. 85
b. 95
c. 90

11. Number of observations are 30 and value of arithmetic mean is 15 then sum of all values is
a. 415
b. 450
c. 200

12. If most repeated observations recorded are outliers of data then mode is considered as
a. percentage measure
b. best measure
c. poor measure
13. In statistics out of 100, marks of 21 students in final exams are as 90, 95, 95, 94, 90, 85, 84, 83, 85, 81, 92, 93, 82, 78, 79, 81, 80, 82, 85, 76, 85 then mode of data is
a. 85
b. 95
c. 90

14. Characteristics of Median
a. Uniqueness and Simplicity
b. Median is not affected by extreme values
c. All above
15. A descriptive measure is a single number that is
a. used to describe a set of data
b. Measures include measures of central tendency and measures of dispersion.
c. All above

16. Role of statisticians
a. To guide the design of an experiment or survey prior to data collection.
b. To analyze data using proper statistical procedures and techniques.
c. To present and interpret the results to researchers and other decision makers.
d. All above
17. Dispersion is a property of the data that they tend to be spread out, it is included:
a. Range and Variance
b. Standard deviation and Coefficient of variation
c. All above

18. In measure of central tendency, population parameter is denoted by
a. Greek letter ?
b. Roman letter ?
c. Roman letter x?

19. Service time (in minutes) at airport ticket counter is as 4.5, 5.5, 6, 7, 8, 8.5, 4, 3, 3.5, 2.5, 3.8 then median of data is
a. 4.5
b. 4
c. 4.75
20. Which of the central tendency that affected by extreme values: The mean is influenced by each value. Therefore, extreme values can distort the mean.
a. Mean
b. Median
Mode



المادة المعروضة اعلاه هي مدخل الى المحاضرة المرفوعة بواسطة استاذ(ة) المادة . وقد تبدو لك غير متكاملة . حيث يضع استاذ المادة في بعض الاحيان فقط الجزء الاول من المحاضرة من اجل الاطلاع على ما ستقوم بتحميله لاحقا . في نظام التعليم الالكتروني نوفر هذه الخدمة لكي نبقيك على اطلاع حول محتوى الملف الذي ستقوم بتحميله .
download lecture file topic