Statistical testing on importances involves taking the importance scores from the outputs we give into another package, such as Excel, SPSS, SAS, R, your favorite cross-tab package that has statistical testing, etc. and conducting a t-test or F-test.
Tests based on raw counts of choices (Chi-Square Tests) are not as appropriate and useful as tests using the individual-level importance scores such as those computed via HB analysis. Aggregate counting tests ignore the heterogeneity in the data (differences in preferences across people), leading to potentially a biased and incorrect view of how important the people in a sample felt unordered attributes like brand and color are (more on that below).
Importance scores can be a bit confusing for some people at first, since importances account for the full range of utilities of an attribute, and for unordered attributed like brand or color, respondents don't necessarily agree on the which level is best. If using HB analysis, we calculate the importances by computing the importance scores (summing to 100, one score per attribute) based on the individual-level utilities. Then, we average those importance scores across people to summarize them for the group or the sample. This is the proper way to do things if we have individual-level data, since we don't want differences in opinion regarding unordered attributes like brand and color to make it look like a group of people who disagreed on which color was best did not think color was very important. In reality, for individual, it could have been a very important matter...but on the aggregate (if you were to incorrectly estimate the importance scores using aggregate preference counts or average utility scores) the preferences may tend to cancel each other out, making it look overall for the sample as if respondents were indifferent about, say, brand or color.
Importance scores are ratio scaled data, with an absolute zero point. Hope these suggestions help.