Natural Hazards and Job Choice: Predicting Job Choice Outcomes

We used a 2-step nested logit model to estimate final job choice based on personal attributes and job attributes.

Equation 1: What Drives Willingness-to-Move?

The outputs of this equation represent a reduced form of an individual’s attitudes and feelings as it relates to their willingness to move. We use the job search radius and current location to determine the starting willingness to move value based on the US Census division of the individual’s current location relative to the divisions included in the job search. Willingness to move is either stay (0), willing to move (1), or move (2). For “stayers”, their job search radius is only the US Census division of the their current location. For those open to moving (1), their job search radius includes the US Census division of their current location and expands beyond it. For “movers”, their job search radius excludes the US Census division of their current location.

Model Variables: Willingness to Move

Variable Description
Dependent Variable
willingness_to_move Ordinal measure of whether individual is open to relocating for a job
Independent Variables
generationGenerational cohort (e.g., Gen Z, Millennial)
genderGender
education_levelHighest level of education completed
geo_degreeWhether the individual holds a geoscience degree
impact_severitySeverity of impacts experienced from hazards (cumulative of all individual hazard types)
current_hazard_concernLevel of concern about hazards at current location (cumulative of all individual hazard types)
current_loc_reasons_climateImportance of climate/weather in choice of current location
current_loc_reasons_crimeImportance of crime risk in choice of current location
current_loc_reasons_hazardImportance of hazard risk in choice of current location
current_loc_reasons_incomeImportance of income in choice of current location
current_loc_reasons_locationGeneral importance of location in choice of current location
current_loc_reasons_resourcesImportance of community resources and amenities in choice of current location
current_loc_reasons_socialImportance of proximity to social networks in choice of current location
current_loc_reasons_otherImportance of other unspecified factors in choice of current location
current_occ_reasons_incomeImportance of income in choice of current job
current_occ_reasons_jobtasksImportance of job characteristics in choice of current job
current_occ_reasons_locationImportance of job location in choice of current job
current_occ_reasons_otherImportance of other factors in choice of current job

Equation 1 Results

The confusion matrix shows the model is most accurate at predicting participants in the middle category (“willing to move”), but tends to overpredict class 1 for individuals who are firmly unwilling or firmly willing to move. This suggests that while the model captures general tendencies, there may be unmeasured heterogeneity in the most decisive groups.

Several variables showed statistically significant relationships with willingness to move:

  • Income-based location selection: Participants who rated income as an important reason for their current location were more likely to express willingness to move (p = 0.015). This may reflect greater economic flexibility or a strategic approach to maximizing opportunity.
  • Unspecified gender: Participants who did not specify their gender were significantly more likely to be willing to move than those identifying as female (p = 0.023), suggesting a distinct pattern of mobility in this group.
  • Favorable weather: Individuals who valued favorable weather in their choice of their current location were less willing to move.
  • Older generations: Older participants were less likely to express willingness to move (p = 0.041), consistent with broader trends in residential stability with age.

Notably, participants who indicated that job location was a key reason for their current job were generally less willing to move (p = 0.058), and while this result was borderline in its significance, it suggests these individuals may have place-based ties to their employment.

Other variables, including education level, hazard concern, and job characteristics, were not individually significant predictors but remain important for interpreting population-level patterns, particularly in light of sample composition. The universe of game participants is skewed toward older, more highly educated individuals, most of whom hold geoscience degrees. This structure affects the relationships among education, generation, and occupational preferences, and influences mobility decisions in ways that are not fully separable.

To illustrate this structure, we plotted the proportion of education levels by generation, separated by geoscience and non-geoscience degree holders. Among geoscience degree holders, nearly all participants hold a bachelor’s or graduate/professional degree, regardless of generation. Among non-geoscience respondents, education levels are more diverse, particularly among Gen Z and Millennials.

This concentration of higher education among geoscientists explains why education level did not emerge as a strong independent predictor in the model (p = 0.82), despite being theoretically important. It overlaps substantially with generation and degree field (geo_degree) (both of which had moderate to high VIFs) and serves more as a structural characteristic of the population than a distinct explanatory factor.

We also looked at the role of education in shaping job mobility. We plotted the importance placed on job characteristics in the choice of one’s current occupations against education level. Respondents with a bachelor’s or graduate degree consistently rated job features (e.g., tasks, responsibilities) as more important in their occupational choices than participants with lower educational attainment levels. This aligns with qualitative insights that individuals with higher education tend to prioritize intellectual engagement and mission alignment in their work.

We plotted a pairwise correlation matrix for all independent variables used in the Equation 1 model predicting willingness to move. The heatmap highlights areas of moderate correlation, particularly among demographic and attitudinal variables. Moderate positive correlations were observed between education_level and both generation and geo_degree, reflecting the demographic structure of the sample, where older respondents and those with geoscience degrees tend to have higher educational attainment. Moderate positive correlation was also observed between current_occ_reasons_jobtasks and education_level, suggesting that more educated respondents are more likely to value job characteristics when choosing an occupation. Several reasons for choosing current location or job also showed modest intercorrelations, consistent with the idea that preferences such as income, location, and resources often co-occur. Weak or no correlations were observed across most other predictor pairs, indicating that multicollinearity is limited to a small subset of structurally related variables. This visualization supports the decision to retain all predictors in the model while interpreting highly correlated variables like education_level, generation, and geo_degree as structural descriptors rather than isolated, independent drivers. Their shared variance reflects real cohort patterns and disciplinary clustering rather than redundant information.

This model reveals that willingness to move is driven by a mix of demographic, location-based, and sociocultural factors. Younger individuals and those prioritizing income show greater mobility, while those who are anchored by local climate are less likely to move. Although education level was not a statistically significant predictor in isolation, it plays a central role in shaping the profile of respondents — especially among those with geoscience degrees — and interacts meaningfully with job and location preferences.

Ordered Logit Regression: Willingness to Move

Dep. Variable:willingness_to_moveLog-Likelihood:-328.76
Model:OrderedModelAIC:699.5
Method:Maximum LikelihoodBIC:781.2
No. Observations:361Df Model:19
Df Residuals:340Date:Wed, 30 Apr 2025
Time:11:31:46
Variable Coef. Std. Err. z P > |z| [0.025 0.975]
generation-0.24370.119-2.0430.041-0.477-0.010
education_level-0.03430.149-0.2310.817-0.3260.257
geo_degree0.32600.2361.3830.167-0.1360.788
impact_severity0.03940.0450.8830.377-0.0480.127
current_hazard_concern-0.04130.029-1.4050.160-0.0990.016
current_loc_reasons_climate-0.21260.093-2.2780.023-0.395-0.030
current_loc_reasons_crime-0.05540.103-0.5390.590-0.2570.146
current_loc_reasons_hazard-0.07080.114-0.6190.536-0.2950.153
current_loc_reasons_income0.26440.1092.4360.0150.0520.477
current_loc_reasons_location-0.03760.085-0.4420.659-0.2040.129
current_loc_reasons_resources0.04070.0880.4630.644-0.1320.213
current_loc_reasons_social0.03520.0780.4520.651-0.1170.188
current_loc_reasons_other0.15910.0881.8010.072-0.0140.332
current_occ_reasons_income-0.07740.103-0.7510.452-0.2790.124
current_occ_reasons_jobtasks0.12580.1001.2540.210-0.0710.322
current_occ_reasons_location-0.18920.100-1.8930.058-0.3850.007
current_occ_reasons_other-0.09650.090-1.0680.285-0.2740.081
gender_1-0.04130.229-0.1810.857-0.4900.407
gender_20.80990.3562.2730.0230.1111.508
0/1-1.74990.608-2.8760.004-2.942-0.557
1/21.07370.06117.4660.0000.9531.194

Variance Inflation Factors

Variable VIF
generation3.124
education_level15.283
geo_degree2.861
impact_severity3.480
current_hazard_concern4.119
current_loc_reasons_climate3.525
current_loc_reasons_crime3.254
current_loc_reasons_hazard2.970
current_loc_reasons_income5.266
current_loc_reasons_location4.507
current_loc_reasons_resources4.027
current_loc_reasons_social3.017
current_loc_reasons_other2.114
current_occ_reasons_income5.495
current_occ_reasons_jobtasks9.622
current_occ_reasons_location6.198
current_occ_reasons_other2.116
gender_11.782
gender_21.284
Prediction accuracy: 57.89%
Mean Absolute Error (MAE): 0.43
Confusion Matrix: Willingness to Move Prediction
Equation 1 Predictor Correlation Matrix
Proportion of Education Levels by Generation and Degree Field
Importance of Job Characteristics by Education Level

Equation 2: What Drives Final Job Choice?

We modeled participants’ final job selection using a logistic regression framework, where the dependent variable was whether a given job was ultimately chosen. Predictors included job-level characteristics and a mobility preference score derived from Equation 1. The model included both main effects and key interaction terms reflecting theoretical tradeoffs between job appeal, risk, and individual mobility. Note that we centered continuous variables (salary, cost of living, and cumulative hazard risk) for use as interaction terms. This had no effect on the regression with just the primary independent variables (no difference between using centered and non-centered values) but reduced the VIF values for the regression including the interaction terms.

Model Variables: Predicting Final Job Choice

Variable Description
Dependent Variable
finaljob_chosen Binary variable indicating whether the job was the final job choice
Independent Variables
consideration_timeNumber of job pairs considered before final decision was made
division_changeNumber of U.S. Census divisions moved from current location
cost_of_living_changeDifference in cost of living between job location and current location
hazard_risk_changeDifference in cumulative hazard risk between job location and current location
job_salaryAnnual salary for the job
job_cost_of_livingCost of living in the job location
job_crime_riskCrime risk index at the job location
job_hazard_riskCumulative hazard risk at the job location
predicted_wtmPredicted score representing individual's openness to relocation

Equation 2 Results

The reduced model, which retained all main effects and only statistically meaningful interaction terms, demonstrated strong fit. The model explains approximately 10% of the variance in final job choice, which is acceptable for behavioral models involving complex personal tradeoffs.

Higher salary significantly increases the likelihood of a job being selected, while higher crime risk substantially reduces the probability of a job being chosen. Cost of living is positively associated with job selection, likely reflecting the desirability of high-opportunity urban areas despite higher expenses. Job deliberation time—measured by the number of job pairs considered before making a final selection—is negatively associated with job choice likelihood. This suggests that jobs selected after more extensive comparison were less compelling overall, or that extended decision-making reflects greater uncertainty or difficulty identifying a desirable job.

A notable interaction between salary and hazard risk reveals that while salary is a strong attractor, its positive effect diminishes significantly when hazard risk is high, highlighting a clear risk-salary tradeoff. Although willingness to move is not statistically significant on its own, but its inclusion improves model performance by capturing individual variation in sensitivity to contextual tradeoffs like cost and risk.

Final job choice is influenced by a combination of economic factors (salary, cost of living), personal safety (crime), and risk tolerance. The strongest deterrent is high crime, while the most compelling attractor is salary—unless offset by high hazard risk. These findings support the theory that individuals weigh tangible job benefits against both environmental risk and personal mobility preferences when making career decisions.

Model output with primary effects

The overall model was statistically significant and explained approximately 9% of the variation in final job choice. Variance Inflation Factors (VIFs) for all predictors were below 2.5, indicating no concerning multicollinearity among model variables. Model accuracy was 94.56% overall, but the model’s ability to correctly predict final job choice among available options (Top-1 accuracy) was 24.10%, reflecting the difficulty of the task given the imbalanced outcome. The ROC curve with an AUC of 0.733 indicates that the model had a reasonably good ability to distinguish between chosen and non-chosen jobs, although the very high overall accuracy reflects the dataset’s heavy imbalance (many more non-chosen jobs than chosen ones).

Significant effects included:

  • Job salary: Higher salaries strongly increased the likelihood of final job choice. Salary was one of the most robust predictors, confirming its central role in shaping participants’ employment decisions.
  • Consideration Time: Longer decision-making times were significantly associated with a lower probability of selecting a given job. This suggests that quicker decisions were more likely to result in final job selection, possibly indicating a more confident or intuitive choice process.
  • Job Crime Risk: Greater perceived crime risk at the job location significantly reduced the probability of selecting that job. This indicates that safety concerns were an important deterrent.

Distance required to relocate, cost of living change, and cumulative hazard risk change were not statistically significant predictors, suggesting that relocation distance and changes in living expenses or hazard exposure did not independently affect final job choice. Also not statistically significant were job cost of living, and hazard risk at the location of the final job choice. Predicted willingness to move, included as a control variable, also did not have a statistically significant independent effect on job selection in this model.

Logistic Regression Summary: Final Job Choice

Dep. Variable:finaljob_chosenLog-Likelihood:-1284.8
Model:LogitLL-Null:-1409.2
Method:MLEPseudo R-squared:0.08826
No. Observations:6766Df Residuals:6756
Df Model:9Covariance Type:nonrobust
Date:Wed, 30 Apr 2025Time:13:21:15
Converged:TrueLLR p-value:1.825e-48
Variable Coef. Std. Err. z P > |z| [0.025 0.975]
const-2.56000.536-4.7780.000-3.610-1.510
consideration_time-0.04070.009-4.3840.000-0.059-0.022
division_change0.03620.0380.9520.341-0.0380.111
cost_of_living_change2.872e-066.28e-060.4570.648-9.45e-061.52e-05
hazard_risk_change-0.00210.018-0.1210.903-0.0370.032
job_salary1.39e-051.21e-0611.4600.0001.15e-051.63e-05
job_cost_of_living8.946e-069.09e-060.9850.325-8.86e-062.68e-05
job_crime_risk-0.32020.069-4.6600.000-0.455-0.186
job_hazard_risk-0.04010.025-1.5910.112-0.0890.009
predicted_wtm-0.10860.203-0.5350.592-0.5060.289

Variance Inflation Factors (VIF)

Variable VIF
const87.850
consideration_time1.037
division_change1.059
cost_of_living_change1.859
hazard_risk_change2.011
job_salary1.368
job_cost_of_living2.323
job_crime_risk1.095
job_hazard_risk2.176
predicted_wtm1.018

Confusion Matrix: Final Job Choice Prediction

  Predicted 0 Predicted 1
Actual 0 6392 13
Actual 1 355 6
Top-1 Accuracy: 24.10%
AUC: 0.733
Accuracy: 94.56%
ROC Curve: Predicting Final Job Choice

We then removed the reflective change variables (hazard_risk_change and cost_of_living_change) and only used job-specific factors to assess model performance. This change resulted in a stronger and more interpretable explanation of final job choice behavior than the model that included the reflective change variables. Both models yielded nearly identical overall performance (i.e., similar log-likelihood values, pseudo R-squared (~0.088), ROC AUC (0.733), and Top-1 prediction accuracy (24.1%)).

By removing the two change variables, the model’s structure was simplified without sacrificing predictive power. The variables dropped from the model assumed that participants knew the hazard risk and current cost of living at their current location, information that was not provided by the game. The prior model showed the insignificance of these variables in job choice, suggesting that participants may not have consciously weighed this information, or compared these factors when choosing a job. In contrast, the remaining job-level features, such as salary, crime risk, and hazard risk, were directly presented to participants as part of each job offer information.

Furthermore, the second model demonstrates reduced multicollinearity, with the Variance Inflation Factors (VIFs) for several predictors dropping after removing the change variables, most notably for job_hazard_risk and job_cost_of_living. In addition, job_hazard_risk was not statistically significant in the first model but became significant in the second, suggesting the presence of a suppression effect in the more complex model that included the change variables. This further suggests that participants approached the game in a forward-looking perspective, meaning that they considered the options without necessarily comparing factors such as cost of living and hazard risk to their current location. The second model, focused solely on the attributes of the job options themselves, better reflects the actual structure of the decision participants were asked to make.

Significant effects in the second model were similar to the first model, and included:

  • Job salary: Salary was the most robust predictor in the model (p < 0.001), with higher salaries significantly increasing the likelihood that a job was selected. Though the coefficient is numerically small due to the unit scale of salary (likely measured in dollars), the effect size becomes meaningful over larger salary differences. This confirms that financial compensation plays a central role in employment decisions, reinforcing salary as a key motivator in job selection.
  • Consideration Time: The coefficient for consideration time was negative and highly significant (p < 0.001), suggesting that the longer a participant took to consider a job, the less likely that job was ultimately chosen. This may indicate that quicker decisions—perhaps reflecting a stronger intuitive match or lower internal conflict—are more likely to result in selection. Conversely, options that prompted hesitation or extended deliberation were less likely to be chosen.
  • Job Crime Risk: Perceived crime risk at the job location also had a significant negative effect on selection likelihood (p < 0.001). Jobs associated with higher crime risk were less likely to be chosen, emphasizing the importance of safety in employment decisions. This aligns with broader evidence that individuals factor environmental security into decisions about where to live and work.
  • Job Hazard Risk: Unlike in the initial model, job hazard risk reached statistical significance in this specification (p = 0.037). The negative coefficient indicates that jobs located in areas with higher hazard risk were less likely to be selected. The emergence of this effect after removing hazard change variables suggests that it had been suppressed by multicollinearity in the previous model. This finding supports the idea that individuals are responsive to environmental hazard exposure when evaluating job opportunities, though the effect is more modest than salary or crime risk.

Logistic Regression Summary: Final Job Choice (job factors only)

Dep. Variable:finaljob_chosenLog-Likelihood:-1284.9
Model:LogitLL-Null:-1409.2
Method:MLEPseudo R-squared:0.08819
No. Observations:6766Df Residuals:6758
Df Model:7Covariance Type:nonrobust
Date:Fri, 02 May 2025Time:17:40:47
Converged:TrueLLR p-value:5.652e-50
Variable Coef. Std. Err. z P > |z| [0.025 0.975]
const-2.66960.442-6.0370.000-3.536-1.803
consideration_time-0.04060.009-4.3840.000-0.059-0.022
division_change0.03650.0380.9640.335-0.0380.111
job_salary1.391e-051.21e-0611.4820.0001.15e-051.63e-05
job_cost_of_living1.155e-057.1e-061.6280.104-2.36e-062.55e-05
job_crime_risk-0.32040.069-4.6630.000-0.455-0.186
job_hazard_risk-0.04230.020-2.0840.037-0.082-0.003
predicted_wtm-0.10090.201-0.5010.616-0.4960.294

Variance Inflation Factors (VIF)

VIF values for model variables
Variable VIF
const61.043
consideration_time1.020
division_change1.059
job_salary1.367
job_cost_of_living1.594
job_crime_risk1.094
job_hazard_risk1.284
predicted_wtm1.014

Confusion Matrix: Final Job Choice Prediction

  Predicted 0 Predicted 1
Actual 0 6391 14
Actual 1 355 6
Top-1 Accuracy: 24.10%
AUC: 0.733
Accuracy: 94.55%
ROC Curve: Predicting Final Job Choice

The following correlation matrix shows the primary variables included in the final job choice model. Most predictors show low to moderate correlations (|r| < 0.7), indicating minimal risk of multicollinearity and supporting their inclusion in the multivariate analysis. Among the strongest positive relationships observed were those between job salary, job cost of living, and job hazard risk. Job salary and job cost of living show a moderate positive correlation, which was expected as jobs offering higher salaries are often located in regions where living costs are higher, reflecting common economic compensation adjustments. Similarly, job hazard risk was moderately correlated with job cost of living, suggesting that higher-cost areas may also be associated with locations where both economic opportunities and exposure to natural hazards are elevated. Job crime risk had the strongest negative relationship of all variables, and was correlated with job salary, suggesting that participants chose jobs with high salaries in locations with relatively lower crime risk.

Other variables were weakly correlated with other variables in the model. Their relative independence from the economic and environmental job attributes suggests they represent distinct dimensions of the decision-making process, such as behavioral tendencies, geographic flexibility, or personal risk preferences.

Correlation Matrix of Primary Variables for Final Job Choice Model

Logistic Regression Summary: Final Job Choice (Extended Interaction Model)

Dep. Variable:finaljob_chosenLog-Likelihood:-1267.9
Model:LogitLL-Null:-1409.2
Method:MLEPseudo R-squared:0.1003
No. Observations:6766Df Residuals:6737
Df Model:28Covariance Type:nonrobust
Date:Wed, 30 Apr 2025Time:14:02:33
Converged:TrueLLR p-value:6.929e-44
Variable Coef. Std. Err. z P > |z| [0.025 0.975]
const-1.21170.852-1.4230.155-2.8810.457
consideration_time-0.08760.042-2.0890.037-0.170-0.005
division_change0.07420.2040.3640.716-0.3260.474
job_crime_risk-0.37520.311-1.2080.227-0.9840.234
predicted_wtm-0.52920.752-0.7040.482-2.0040.945
job_salary_c1.164e-057.75e-061.5030.133-3.54e-062.68e-05
job_cost_of_living_c0.00013.85e-052.7520.0063.05e-050.000
job_hazard_risk_c-0.09760.110-0.8830.377-0.3140.119
consideration_time × division_change-0.00470.008-0.5930.553-0.0200.011
consideration_time × job_salary_c4.009e-072.46e-071.6300.103-8.11e-088.83e-07
consideration_time × job_cost_of_living_c-2.032e-061.4e-06-1.4550.146-4.77e-067.06e-07
consideration_time × job_crime_risk0.00920.0110.8460.397-0.0120.030
consideration_time × job_hazard_risk_c-0.00270.003-0.8170.414-0.0090.004
consideration_time × predicted_wtm0.03100.0350.8990.369-0.0370.099
division_change × job_salary_c-9.751e-077.89e-07-1.2360.216-2.52e-065.71e-07
division_change × job_cost_of_living_c-5.561e-064.56e-06-1.2180.223-1.45e-053.38e-06
division_change × job_crime_risk-0.04220.051-0.8300.407-0.1420.057
division_change × job_hazard_risk_c0.01100.0130.8220.411-0.0150.037
division_change × predicted_wtm0.14140.1430.9890.323-0.1390.422
job_salary_c × job_cost_of_living_c-1.007e-101.49e-10-0.6770.499-3.93e-101.91e-10
job_salary_c × job_crime_risk5.443e-071.63e-060.3340.738-2.65e-063.74e-06
job_salary_c × job_hazard_risk_c-1.064e-064.88e-07-2.1830.029-2.02e-06-1.09e-07
job_salary_c × predicted_wtm-9.541e-075.57e-06-0.1710.864-1.19e-059.96e-06
job_cost_of_living_c × job_crime_risk-3.291e-069.69e-06-0.3400.734-2.23e-051.57e-05
job_cost_of_living_c × job_hazard_risk_c-1.53e-062.22e-06-0.6890.491-5.88e-062.82e-06
job_cost_of_living_c × predicted_wtm-4.861e-052.62e-05-1.8540.064-0.00012.77e-06
job_crime_risk × job_hazard_risk_c-0.02950.025-1.1800.238-0.0780.019
job_crime_risk × predicted_wtm0.00760.2550.0300.976-0.4920.507
job_hazard_risk_c × predicted_wtm0.16400.0772.1380.0330.0140.314

Variance Inflation Factors

Variable VIF
const179.379
consideration_time38.274
division_change27.784
job_crime_risk17.331
predicted_wtm12.399
job_salary_c46.424
job_cost_of_living_c42.309
job_hazard_risk_c32.435
consideration_time × division_change7.091
consideration_time × job_salary_c6.515
consideration_time × job_cost_of_living_c5.494
consideration_time × job_crime_risk10.782
consideration_time × job_hazard_risk_c3.258
consideration_time × predicted_wtm31.467
division_change × job_salary_c2.474
division_change × job_cost_of_living_c3.138
division_change × job_crime_risk9.651
division_change × job_hazard_risk_c2.268
division_change × predicted_wtm20.355
job_salary_c × job_cost_of_living_c2.260
job_salary_c × job_crime_risk10.905
job_salary_c × job_hazard_risk_c2.108
job_salary_c × predicted_wtm28.974
job_cost_of_living_c × job_crime_risk14.400
job_cost_of_living_c × job_hazard_risk_c1.549
job_cost_of_living_c × predicted_wtm25.613
job_crime_risk × job_hazard_risk_c13.691
job_crime_risk × predicted_wtm20.951
job_hazard_risk_c × predicted_wtm19.297

Model output with significant interaction terms

We developed the final model through an iterative, multi-stage process. We first ran each set of interaction terms (e.g., consideration_time × other variables, division_change × other variables, etc.) alongside the primary effects to determine which interactions to include. In each round, we examined the significance and effect size of both primary and interaction terms. We identified the following interaction terms as the most promising:

  • consideration_time × job_salary: Included to test whether the influence of salary depended on how quickly a participant made their decision.
  • job_cost_of_living × job_hazard_risk: Included to investigate whether high living costs magnified the deterrent effect of hazard risk at the job location.
  • job_hazard_risk × predicted_wtm: Included to explore whether the impact of hazard risk varied depending on a participant’s predicted willingness to move.
  • job_salary × job_hazard_risk: Included to test whether high salaries became less persuasive in areas with elevated hazard risks.

After identifying these interaction terms from the preliminary runs, we combined all primary effects and all interaction terms into a comprehensive model. This step allowed us to evaluate whether the initially selected interaction terms remained meaningful when controlling for the full set of variables. Based on these results, we further refined the list of interactions, retaining those that consistently showed significance or substantive effects across models. Finally, we pruned the model through sequential testing, resulting in a final specification that included the primary effects and the interaction term job_salary × job_hazard_risk.

Logistic Regression Summary: Final Job Choice

Dep. Variable:finaljob_chosenLog-Likelihood:-1279.3
Model:LogitLL-Null:-1409.2
Method:MLEPseudo R-squared:0.09219
No. Observations:6766Df Residuals:6757
Df Model:8Covariance Type:nonrobust
Date:Wed, 30 Apr 2025Time:14:32:06
Converged:TrueLLR p-value:1.420e-51
Variable Coef. Std. Err. z P > |z| [0.025 0.975]
const-1.72430.292-5.9150.000-2.296-1.153
predicted_wtm-0.10790.202-0.5350.593-0.5030.287
consideration_time-0.04000.009-4.3300.000-0.058-0.022
division_change0.02740.0380.7220.470-0.0470.102
job_crime_risk-0.29200.069-4.2110.000-0.428-0.156
job_salary_c1.519e-051.26e-0612.0850.0001.27e-051.77e-05
job_cost_of_living_c1.343e-057.06e-061.9030.057-4.05e-072.73e-05
job_hazard_risk_c-0.02800.021-1.3550.176-0.0690.013
job_salary_c × job_hazard_risk_c-1.156e-063.6e-07-3.2160.001-1.86e-06-4.52e-07

Variance Inflation Factors

Variable VIF
const22.054
predicted_wtm1.014
consideration_time1.020
division_change1.062
job_crime_risk1.129
job_salary_c1.463
job_cost_of_living_c1.605
job_hazard_risk_c1.319
job_salary_c × job_hazard_risk_c1.123

Equation 2 Predictive Performance

We evaluated the ability of the final logistic regression model to correctly identify the job ultimately selected by each participant. Three complementary metrics were used: AUC (area under the ROC curve), overall classification accuracy, and top-1 prediction accuracy.

The model achieved an AUC of 0.734, indicating strong overall discriminative performance. This suggests that the model is 73.4% likely to rank the true final job higher than an unchosen job when comparing a random pair. The ROC curve shows consistent lift above the line of chance, supporting the model’s utility for ranking job preferences. Overall accuracy was 94.59%, but this is largely driven by the dominant negative class (unchosen jobs).

Top-1 Accuracy: 24.38%
AUC: 0.734
Accuracy: 94.59%

Confusion Matrix: Final Job Choice Prediction

  Predicted 0 Predicted 1
Actual 0 6394/td> 11
Actual 1 355 6

While the model correctly excludes most unchosen jobs, it identifies only 6 of 355 true final jobs at the default threshold (0.5), revealing poor sensitivity.

For each participant, the model’s top-predicted job (i.e., the job with the highest probability) was correct 24.38% of the time. This metric is more appropriate for evaluating discrete choice tasks like job selection, where the goal is to pick a single best option from multiple alternatives. A 24% top-1 accuracy is notable given the complex and tightly constrained nature of the decision environment.

While traditional classification metrics may underestimate performance due to extreme class imbalance, the model demonstrates good ability to rank job desirability (AUC = 0.734) and makes the correct top prediction for nearly one in four participants. These findings affirm that the model captures important features of job decision-making, particularly salary, crime risk, and hazard interactions, while leaving room for incorporating additional behavioral or contextual factors.

ROC Curve: Predicting Final Job Choice

Predicted Probability Distribution by Final Job Choice

he distribution of predicted probabilities for final job choice is heavily skewed toward low values for both chosen and unchosen jobs. While jobs that were ultimately selected (gray bars) tend to have slightly higher predicted probabilities than those not chosen (blue bars), the overlap between the two groups is substantial. Most predicted probabilities for both classes fall below 0.2, and even the majority of chosen jobs have predicted probabilities under 0.1. This reflects the model’s conservatism in assigning high probabilities, likely due to the class imbalance—far fewer final job selections than non-selections—and the inherent complexity of choosing one job from many.

Only a small fraction of cases—primarily among the chosen jobs—receive predicted probabilities in the 0.3 to 0.7 range. These represent the most confidently predicted positive cases and are likely aligned with the approximately 24% Top-1 accuracy, where the top-ranked job matched the participant’s final selection.

The model performs better at ranking jobs than at producing confident classifications, as reflected by Equation 2’s AUC (0.73) and Top-1 accuracy (24.38%). Given that the distribution of predicted probabilities is heavily skewed toward low values for both chosen and unchosen jobs, we decided to explore alternative thresholds optimized for better performance.

Three thresholds are marked on the plot. The Youden’s J threshold (red, 0.057) optimizes the balance between sensitivity and specificity. Applying this threshold favors correctly identifying chosen jobs but at the cost of increasing false positives. The Best F1 threshold (orange, 0.109) optimizes the balance between precision and recall, yielding better performance for identifying the selected job when balancing both types of error. The standard threshold (black, 0.5) the conventional threshold used in binary classification. Given the skewed distribution, this threshold would misclassify nearly all jobs as unchosen.

Overall, the figure demonstrates that a lower threshold than 0.5 is necessary to meaningfully predict final job choice. The model assigns relatively low confidence even to correctly predicted final jobs, emphasizing the difficulty of the task and the importance of using rank-based or threshold-optimized approaches rather than strict binary classification.

Predicted Probability Distribution by Final Job Choice

To further evaluate model performance, we compared results using different probability thresholds for classifying a final job choice. The standard threshold of 0.5 was too conservative given the low predicted probabilities. Two optimized thresholds were examined: one maximizing Youden’s J statistic (threshold = 0.057) and one maximizing F1 score (threshold = 0.109). As shown in the following table, adjusting the threshold improved precision-recall balance. The threshold maximizing F1 provided a modest gain in precision without sacrificing ranking performance, consistent with the primary goal of identifying the most preferred jobs.

Metric Threshold = 0.057
(Youden's J)
Threshold = 0.109
(Best F1)
Accuracy73.1%90.7%
Precision (Chosen = 1)12.2%21.2%
Recall (Chosen = 1)60.9%27.4%
F1 Score (Chosen = 1)19.5%23.9%
Top-1 Accuracy (Ranking, not thresholded)24.38%24.38%
AUC0.7340.734

Predicted Probabilities by Actual Final Job Choice

The boxplot illustrates that the predicted probabilities assigned to final job choices are generally higher than those assigned to unchosen jobs. The median predicted probability for chosen jobs is substantially greater, and the spread of probabilities is wider, with some jobs receiving probabilities above 0.5. In contrast, the distribution for unchosen jobs remains tightly clustered near zero. However, overlap between the two groups remains considerable, consistent with the earlier finding that most predictions are conservative. These misalignments likely stem from the inherent difficulty of the task—predicting a single selected job from a large set of alternatives—as well as from the class imbalance, where non-chosen jobs vastly outnumber chosen ones. This pattern further supports the model’s relative ability to rank jobs rather than produce highly confident classifications.

Predicted Probabilities by Actual Final Job Choice

Top-Ranked Job Probabilities by Final Job Choice

The boxplot shows the distribution of the highest predicted probabilities assigned to jobs for each participant, separated by whether the job was ultimately chosen (“Chosen”) or not (“Not Chosen”). Jobs that were ultimately chosen tend to have higher top predicted probabilities than jobs that were not chosen, as indicated by a shift in the median and upper quartile of the distribution. However, there remains considerable overlap between the distributions, with many chosen and not-chosen jobs receiving similar top-ranked predicted probabilities. The median top predicted probability for chosen jobs is noticeably higher than for not-chosen jobs, supporting that the model can rank jobs moderately well even if absolute predicted probabilities are low.

A few chosen jobs received very high predicted probabilities (above 0.5), but the majority are below 0.4, reinforcing that the model is better suited for ranking jobs than making confident binary classifications. This pattern is consistent with the model’s moderate Top-1 accuracy (~24%) and strong AUC (~0.73).

Top-Ranked Job Probabilities by Final Choice

The partial dependence plots illustrate how changes in key predictors affect the model’s predicted probability of final job choice, while holding other variables constant. Overall, consideration time and job salary emerge as the most influential predictors, with faster decisions and higher salaries significantly boosting the model’s confidence in a job being chosen.

Shorter decision times (consideration_time) are associated with substantially higher predicted probabilities of selecting a job. As consideration time increases beyond approximately 10 job pairs, the predicted probability sharply declines toward zero, suggesting that faster decisions are more strongly linked to final job selection. Higher job salaries (job_salary_c (centered)) increase the predicted probability of job choice, with a gradual upward slope across the salary range. This indicates that salary remains a consistent and positive driver of final selection. Increases in cost of living (job_cost_of_living_c (centered)) associated with a job are slightly associated with higher predicted probabilities of job choice. However, the slight slope suggests a weaker influence compared to salary.

In contrast, the partial dependence plots for other predictors, including division change, job crime risk, job hazard risk, and predicted willingness to move (WTM), are relatively flat. These features show little marginal impact on predicted probability when other variables are held constant. While these factors may play a role in the overall job evaluation process, their independent, direct influence on the model’s prediction is minimal in isolation.

Partial Dependence Plots for Key Predictors

The contour plot illustrates that the likelihood of a job being chosen increases significantly when job cost of living is high and the participant has a low predicted willingness to move. This suggests that those more inclined to stay prefer costlier locales, potentially due to familiarity or quality-of-life considerations, while higher willingness to move does not necessarily increase preference for high-cost of living areas.

Prediction Calibration Assessment

We evaluated the calibration of the final model by comparing predicted probabilities against actual final job choices. The calibration curve indicates that the model performs reasonably well, particularly in the lower to mid-range of predicted probabilities. For bins with predicted probabilities under approximately 0.45, the model’s predictions closely follow the ideal diagonal line representing perfect calibration. This suggests that when the model assigns a lower probability to a job being selected, the observed frequency of selection aligns well with that estimate.

In the mid to upper probability range (0.45 to 0.75), the model begins to show modest under confidence: predicted probabilities tend to underestimate the observed fraction of chosen jobs. For example, in the bin centered around 0.7, the observed selection rate exceeds what the model predicted, indicating that when the model is most confident, it could afford to be even more so. This behavior is often favorable in practice, as under confidence can be easier to adjust than overconfidence in probabilistic systems.

Despite these small deviations, the model’s overall calibration is strong, particularly considering the challenge posed by the dataset’s class imbalance where selected jobs are vastly outnumbered by unchosen ones. The Brier score of 0.048 further supports the model’s reliability, reflecting high probabilistic accuracy (with lower scores indicating better calibration).

Analysis of the mean predicted probabilities further supports these findings: among jobs not chosen, the mean predicted probability was 0.05, while among jobs selected as the final choice, the mean probability was 0.11. Although most probabilities are low (due to class imbalance and the difficulty of predicting a single choice among many), the separation between groups suggests meaningful predictive power.

Overall, while the model is conservative in its probability estimates, it captures real differences between chosen and unchosen jobs and demonstrates good calibration for practical use in ranking or prioritizing job selections.

Metric Value
Accuracy0.907035
Brier Score (Calibration)0.047936
Final Job Chosen Mean Predicted Probability Number of Cases
000.0503306405
110.107032361
Calibration Curve for Final Job Choice

The feature importance plot for the final job choice model shows that the most influential predictors with statistically significant effects are job_crime_risk, consideration_time, and job_salary_c. Specifically, higher job crime risk significantly reduces the likelihood of a job being chosen, as indicated by a negative coefficient and a confidence interval that does not cross zero. Similarly, longer consideration time is associated with a decreased likelihood of selection, reinforcing earlier findings that faster decisions are more likely to result in a final job choice. In contrast, higher job salary (centered) has a positive and significant effect, increasing the odds of a job being selected.

The interaction term between job salary and job hazard risk (job_salary_c × job_hazard_risk_c) also exhibits a statistically significant negative effect. This suggests that the positive influence of salary is diminished in jobs with higher perceived hazard risk, highlighting a tradeoff participants may make between compensation and hazard risk.

Other variables such as division_change, job_cost_of_living_c, job_hazard_risk_c, and predicted_wtm show smaller and statistically uncertain effects, as their confidence intervals cross the zero line. While these predictors may contribute to model performance in combination with other features or interactions, their independent influence appears limited.

Overall, the plot underscores that job safety (crime risk), decision speed, and salary level are the most consistent and important drivers of final job selection. These findings align with prior model outputs.

Feature Importance: Final Job Choice Model

To further assess model performance in ranking job options, we calculated Top-1, Top-3, and Top-5 accuracy. For Top-1 accuracy, the model correctly ranked the final job choice as the highest-probability option 24.38% of the time. For Top-3 accuracy, the final job choice appeared among the top three ranked options 46.5% of the time. For Top-5 accuracy, the final job choice appeared among the top five ranked options 58.7% of the time.

These results indicate that while the model’s absolute probability estimates are conservative, it performs well at ranking job options in a way that captures final choice behavior. Performance improves substantially when considering the top few ranked jobs, highlighting that final job selection was often among the top predictions even if not always the top one.

The distribution of residuals (predicted probability minus actual outcome) is strongly centered around zero, indicating that the model is generally well-calibrated and does not exhibit systematic over- or under-prediction across most cases. The peak near 0 reflects that, for most job options, the predicted probability closely matches the actual choice outcome (especially for non-chosen jobs, where the true value is 0). There is a small tail to the right (positive residuals), where the model predicted a higher probability than warranted (overprediction), and a small spike at -1 (negative residuals), corresponding to true choices (1) that were heavily under-predicted. The spike at -1 occurs because many truly chosen jobs had low predicted probabilities — consistent with the overall low probability range seen earlier. Overall, the pattern confirms that while the model captures overall ranking well, it tends to be cautious (low predictions), and struggles most with the minority of final job choices that were selected despite low predicted probabilities.

Distribution of Residuals (Predicted - Actual)

Subgroup analysis

We evaluated model performance across key subgroups by splitting continuous variables into terciles (low, medium, high) and evaluating discrete categories of division_change and predicted_wtm. Performance was assessed using Top-1 Accuracy and AUC. High salary, high cost of living, low crime risk, and medium consideration time segments performed best.

  • Job Salary: Model performance was much better for high-salary jobs (Top-1 = 24%, AUC = 0.61) compared to low- and medium-salary jobs (Top-1 ≤ 6%).
  • Hazard Risk: Job cumulative hazard risk levels showed fairly consistent Top-1 accuracy (~16%) for high and medium hazard risk and slightly less for low hazard risk (11%), but moderate AUC values (~0.63–0.69).
  • Crime Risk: Accuracy was highest for jobs in low crime areas (Top-1 = 23%), but lower at medium and high crime levels.
  • Cost of Living: Final choice prediction was stronger for high cost-of-living jobs (Top-1 = 20%) compared to low cost-of-living jobs (Top-1 = 6%).
  • Consideration Time: Participants with medium consideration times had the highest Top-1 accuracy (30%), suggesting moderate deliberation was associated with clearer model prediction.
  • Division Change: AUC remained fairly stable (~0.64–0.73) across different division change levels, though sample sizes were smaller at higher division change values.
  • Willingness to Move: Model performance varied by predicted willingness to move: accuracy and discrimination were lowest for individuals predicted to not move, improved for those open to relocating, and were highest in AUC (but not Top-1 accuracy) for the few individuals predicted to definitely move.
Subgroup Top-1 Accuracy AUC N
Job Salary C - Low0.0330.609361
Job Salary C - Medium0.0560.527360
Job Salary C - High0.2420.613360
Job Hazard Risk C - Low0.1110.694361
Job Hazard Risk C - Medium0.1550.652361
Job Hazard Risk C - High0.1580.633360
Job Crime Risk - Low0.2270.626361
Job Crime Risk - Medium0.0740.740350
Job Crime Risk - High0.0360.563361
Job Cost Of Living C - Low0.0590.668353
Job Cost Of Living C - Medium0.1330.638360
Job Cost Of Living C - High0.2040.656353
Consideration Time - Low0.2040.542142
Consideration Time - Medium0.2950.638193
Consideration Time - High0.0770.50026
Division Change = 00.1700.725300
Division Change = 10.1150.639253
Division Change = 20.1490.653174
Division Change = 30.1760.65285
Division Change = 40.0770.69791
Division Change = 50.1890.74837
Predicted Willingness to Move = 00.1760.38117
Predicted Willingness to Move = 10.2510.611335
Predicted Willingness to Move = 20.1111.0009

The simulated effects plot below shows how changes in key variables affect the predicted probability of a job being chosen, while holding all other predictors constant at their mean values. The results highlight several important patterns:

  • Job Salary has a strong positive association with final job choice. As salary increases, the probability of selecting a job rises steeply, especially between $50,000 and $200,000, and then levels off slightly at the highest salary levels.

  • Job Cumulative Hazard Risk shows a modest negative effect: as hazard risk increases, the probability of choosing a job decreases modestly.

  • Job Cost of Living shows a slight positive trend, suggesting that higher living costs are weakly associated with increased job selection probability. While this may seem counterintuitive, it could reflect that higher cost areas also offer more attractive amenities or opportunities.

  • Consideration Time has a clear negative effect: longer times taken to consider options are associated with lower probabilities of ultimately selecting a job, consistent with hesitation or uncertainty.

  • Division Change shows a very slight positive trend, suggesting a minimal increase in predicted probability with larger geographic moves. This trend is relatively weak and may reflect that people open to relocation are choosing from more varied options.

  • Predicted Willingness to Move exhibits no clear trend. The relationship is essentially flat, with broad confidence intervals that suggest high variability and limited marginal influence when isolated from other variables.

  • Job Crime Risk is negatively associated with job choice: jobs located in areas with higher crime risk have lower probabilities of being selected.

Overall, the simulation confirms that job salary and consideration time are the most influential individual predictors of job selection. Crime risk and, to a lesser extent, cost of living also affect decisions, while hazard risk, geographic mobility, and willingness to move have weaker or more uncertain standalone effects when other factors are held constant.

Simulated Effects of Key Variables on Final Job Choice

Continue reading