Boadi, Joanne

Biomedical Statistics

8/1/18

Independent Project

Frequency distribution of a variable and bar graph of the same variable

The frequency table created from the Independent Project Data shows the marital status of women in our sample. The nominal variable is the “marital status” and the five categories are: widowed, separated, never married, married, and divorced. Out of the 972 women in this sample, 63% have never been married, 12% are separated, and 15.6% of women are divorced. Furthermore, 2.6% of women are widowed, and 6% of women are married. The highest frequency response/mode is “Never married” which occurs 613 times which accounts for a relative frequency of 63% and is the most dominant category. The least frequent response is “Widowed” which occurred 25 times which accounts for a relative frequency of 2.6% and is the least dominant category. Furthermore,the bar graph shows a visual representation that “Never married” is the most frequent response and “Widowed” is the least frequent response.

left17145000

left106045Histogram Describing Marital Status for a Sample of 972 Women

00Histogram Describing Marital Status for a Sample of 972 Women

423545698500

left294005Frequency

00Frequency

2486025186690Marital Status

00Marital Status

Descriptives of a continuous variable: mean, median, skewness, kurtosis, standard deviation and graph of that variable

The continuous variable that I will be using for this question is age. From our sample size (n) of 972 subjects, the mean age is 36.6 years old. The median is 37 years old, and the mode is 41 years old. The median tells us that when all the values are lined up in order from least to greatest, the middle number is 37.The mode tells us that the highest frequency of subjects are aged 41.The mean tells us that if all the values of sample are added and then divided by 972, our answer is 36.624486. The median (37) is greater than the median 36.6 which means that the graph is slightly negatively skewed. The mode is greater than the mean and median; this also means that the histogram is negatively skewed, which is also supported by the skewness value of -0.36. The distribution is not normal since the mean, median, and mode do not share the same value. A negative skew also means that most of the distribution is skewed to the right since the left tail is longer. The distribution can be described as platykurtic because the kurtosis value is less than zero (-0.395). The lower quartile is 33, meaning that 25% of values are at/below the age of 33. The upper quartile is 41 meaning that 75% of values are at/below the age of 41. The standard deviation of 6.28 means that on average, the age deviates from the mean by a value of 6.28. The distribution is unimodal because there is only one peak.

660401837690Frequency

00Frequency

center6985Histogram Describing Frequency vs Age of Subjects

00Histogram Describing Frequency vs Age of Subjects

center164465

2409825167640Age of Subjects

00Age of Subjects

Cross tabulation of two variables

Contingency table results: Rows: smokerColumns: poverty

Cell format

Count(Row percent)(Column percent)(Percent of total)(Expected count)(Contributions to Chi-Square)

Smoker? Poverty Level Total

Above poverty Below poverty No

(% within Smoking Status)

(% within Poverty Level)

(% of Total)

(Expected Count)

(Contributions to Chi-Square) 12725.87%58.26%13.13%110.692.4 36474.13%48.6%37.64%380.310.7 491100%50.78%50.78%

Yes

(% within Smoking Status)

(% within Poverty Level)

(% of Total)

(Expected Count)

(Contributions to Chi-Square) 9119.12%

41.74%9.41%107.312.48 38580.88%51.4%39.81%368.690.72 476100%49.22%49.22%

Total

(% within Smoking Status)

(% within Poverty Level)

(% of Total)

21822.54%100%22.54% 74977.46%100%77.46% 967100%100%100%

Chi-Square test:

Statistic DF Value P-value

Chi-square 1 6.3025773 0.0121

Null Hypothesis: There is no relationship between Poverty Level and Smoking Status.

Alternative Hypothesis: There is a relationship between Poverty level and Smoking Status.

Explanation: The results are statistically significant because the P-value of 0.0121 is less than 0.05. In addition, the chi-square value of 6.30 is greater than the critical value of 3.84, also making the data statistically significant. Based on these results, we can reject/not keep the null hypothesis and accept the alternative hypothesis that there is a relationship between poverty level and smoking status. In addition, the data shows us that 50.78% of the subjects are non-smokers despite their poverty status. On the contrary, 9.22% of the subjects smokers regardless of their poverty level. The data also shows us that 91 smokers are both above the poverty line and are smokers; this amounts to 9.41% of subjects. In the smokers group, which consists of 476 subjects, only 19.12% of these subjects are above the poverty line. For the smoker population below the poverty line, a little bit more than half, or 51.4%, of subjects are smokers.

Comparison of the effect of three or more groups (single variable) on a single continuous variable

Anova: one-way variance test

Analysis of Variance results:Responses: Mental HealthFactors: Marital StatusResponse statistics by factor

Marital status n Mean Std. Dev. Std. Error

Divorced 151 45.951801 12.04699 0.98037018

Married 59 47.855339 9.9816902 1.2995054

Never married 613 46.756522 10.748486 0.43412727

Separated 123 45.425724 10.577431 0.95373487

Widowed 25 42.38084 12.114696 2.4229393

ANOVA table

Source DF SS MS F-Stat P-value

marital 4 753.52995 188.38249 1.5765918 0.1784

Error 966 115424.6 119.48717 Total 970 116178.13 The independent variable is marital status since it is a nominal variable with five groups (divorced, married, never married, separated, and widowed). The dependent variable is mental health status because it is a continuous, interval and ratio level variable.

Null Hypothesis: There is no difference in mean between mental health statuses for each of the five groups regarding marital status.

Alternative Hypothesis: There is a difference in mean between mental health statuses for each of the five groups regarding marital status.

Explanation: Since the P-value, 0.1784, is less than 0.05, the results are not statistically significant and we must not reject/accept the null hypothesis. In addition, the F-stat of 1.5765918 is below the critical value of 7.71 which also leads us to acknowledge that the results are not statistically significant as well. Therefore, we can conclude that the marital status’ of the subjects is not related to their mental health status.

Scatterplot of two continuous variables

40005070675500

The scatterplot represents two continuous variables which are BMI values (x-axis) and physical health score (y-axis) among subjects. The independent variable is Body Mass Index (BMI) and the dependent variable is Physical Health Score. The graph appears to have a very weak or little relationship between BMI and Physical Health Score. Since there is no relationship between the variables, it seems very unlikely to determine whether there is a positive or negative relationship. However, many of the points on the scatterplot appear to be clustered on the top left (circled in red) area where the Physical Health Score values are between 45 and 60 on the y-axis and the BMI values are between 15 and 35 on the x-axis.

6. Correlation between the two continuous variables from #5 above

Simple linear regression results:Dependent Variable: Physical Health (score)Independent Variable: BMI (body mass index)Physical Health = 50.006983 – 0.18457608 BMISample size: 967R (correlation coefficient) = -0.125091R-sq = 0.015647759Estimate of error standard deviation: 10.750835

Parameter estimates:

Parameter Estimate Std. Err. Alternative DF T-Stat P-value

Intercept 50.006983 1.4186048 ? 0 965 35.250821 ;0.0001

Slope -0.18457608 0.047126042 ? 0 965 -3.9166473 ;0.0001

Analysis of variance table for regression model:

Source DF SS MS F-stat P-value

Model 1 1773.0186 1773.0186 15.340126 ;0.0001

Error 965 111535.13 115.58045 Total 966 113308.15 Null Hypothesis: The correlation coefficient “R” is equal to “0” which means that there is no relationship between the two variables: Physical Health Score and Body Mass Index (BMI).

Alternative Hypothesis: The correlation coefficient “R” is not equal to “0” which means that there is a relationship between the two variables: Physical Health Score and Body Mass Index (BMI).

The R (correlation coefficient) is -0.125 which indicates an inverse relationship between the two variables. The strength of the relationship between the two variables can be described by taking the absolute value of -0.125 which is 0.125. The absolute value of 0.125 indicates that there is not a strong relationship or a very weak relationship between the two variables. Furthermore, the R-squared value is 0.0154 (1.5%) meaning that the independent variable has a very little effect on the dependent variable. Our p-value (;0.0001) can be described as statistically significant because it is below the standard 0.05 probability value. Therefore, we can reject/not keep our null hypothesis and accept the alternative hypothesis.