The ability of 11 models in simulating the aerosol vertical distribution from regional to global scales, as part of the second phase of the AeroCom model intercomparison initiative (AeroCom II), is assessed and compared to results of the first phase. The evaluation is performed using a global monthly gridded data set of aerosol extinction profiles built for this purpose from the CALIOP (Cloud-Aerosol Lidar with Orthogonal Polarization) Layer Product 3.01. Results over 12 subcontinental regions show that five models improved, whereas three degraded in reproducing the interregional variability in Zα0–6 km, the mean extinction height diagnostic, as computed from the CALIOP aerosol profiles over the 0–6 km altitude range for each studied region and season. While the models’ performance remains highly variable, the simulation of the timing of the Zα0–6 km peak season has also improved for all but two models from AeroCom Phase I to Phase II. The biases in Zα0–6 km are smaller in all regions except Central Atlantic, East Asia, and North and South Africa. Most of the models now underestimate Zα0–6 km over land, notably in the dust and biomass burning regions in Asia and Africa. At global scale, the AeroCom II models better reproduce the Zα0–6 km latitudinal variability over ocean than over land. Hypotheses for the performance and evolution of the individual models and for the intermodel diversity are discussed. We also provide an analysis of the CALIOP limitations and uncertainties contributing to the differences between the simulations and observations.