McNemar is specific to testing 2x2 tables, so if you have 3 profiles per set it's going to be inappropriate.
What you can do is build a simulator with the ACBC part worth utilities. Then for each holdout question, your first choice prediction will either be right or wrong for each respondent (you'll have the actual and the predicted choice for each respondent and they'll either match or they won't). So if you have 250 respondents and 4 holdout choice sets, you have 1,000 observations. By chance alone 33.33% of them should be correct, and from your simulation you'll have some percentage of correct predictions that you can compare to 33.33% using a Z-test for proportions.
Another way to do this would be to count the number of hits per respondent, which would range from 0 to 4, right? Then you take a mean of that number of hits across respondents and compare it to 1.333 (an average of 4/3 correct predictions by chance alone) using a one group t-test for means with a sample size of 250.
So you have a couple of options for comparing your hit rate to chance.