Statistical modeling has been used to predict high risk area of arsenic hazard, but information about its application on endemic arsenism is limited. In this study, we aim to link the prediction model with population census data and endemic arsenicosis in Shanxi Province, Northern China. 23 explanatory variables from different sources were compiled in the format of grid at 1 km resolution in a GIS environment. Logistic regression was applied to describe the relationship between binary-coded arsenic concentrations data and the auxiliary predictors. 61 endemic arsenism villages were geo-located and combined with output maps of the prediction model. Linear regression was used to identify the relationship between arsenicosis occurrence rate and predictive arsenic probability at village level. Our results show that 6 explanatory environmental variables were significantly contributed to the final model. Area of 3000 km2 was found to have high risk of arsenic concentrations above 50 ppb. The linear regression indicates that 13% of the variation in arsenicosis occurrence rate can be predicted using predictive probability of arsenic concentration above 50 ppb in Shanxi Province. These results suggest that arsenic prediction model may be helpful for identifying arsenic-contaminated area and endemic arsenism village.