Next, we fit a
logistic regression model to our data set. Logistic regression is a statistical method for
analyzing a dataset in which there are one or more independent variables that
determine an outcome. The outcome is measured with a dichotomous variable (in
which there are only two possible outcomes). There have been some past studies in the
highway safety area in which logistic regression models had been used to
identify various influential factors. In one such study in Iowa, age and gender
were investigated as predictors of injury severity in head-on highway crashes by
using logistic regression 41. Another study identified the personal and
behavioral predictors of automobile crash and injury severity in Hawaii 42.
Another study, also in Hawaii, used logistic regression modeling to identify
demographic and temporal factors associated with impaired motorcycle crashes
43. It has also been used to examine the predictors of safety belt use among
crash-involved drivers and front seat passengers while allowing for
misclassification errors in the outcome variable 44. Pennsylvania has also
used logistic regression modeling to estimate the influences of driver,
highway, and environmental factors on run-off-road crashes 45.

We split the data to validate the model into train data and test
data. Train data consists of 1500 data and test data consists of 568 data. The
threshold value is 0.5. If probability > 0.5, we consider the accident will
be reported and if the probability < 0.5, the accident will not be reported. Based on that, we estimate the proportion of underreported accident. A logistic regression model was fitted into accident data set consisting of a previously specified set of fifteen dichotomous predictors: age, gender, ethics, religions, residential locations, residential, status, type of road users, vehicle damaged, injury, death, the presence of objects, type of road, hospitalized and location of the accident 46.

