We study an ensemble of six multi-year global Bayesian carbon dioxide (CO2) atmospheric inversions that vary in terms of assimilated observations (either column retrievals from one of two satellites or surface air sample measurements) and transport model. The time series of inferred annual fluxes are first compared with each other at various spatial scales. We then objectively evaluate the small inversion ensemble based on a large dataset of accurate aircraft measurements in the free troposphere over the globe, which are independent of all assimilated data. The measured variables are connected with the inferred fluxes through massconserving transport in the global atmosphere and are part of the inversion results. Large-scale annual fluxes estimated from the bias-corrected land retrievals of the second Orbiting Carbon Observatory (OCO-2) differ greatly from the prior fluxes, but are similar to the fluxes estimated from the surface network within the uncertainty of these surface-based estimates. The OCO-2-based and surface-based inversions have similar performance when projected in the space of the aircraft data, but the relative strengths and weaknesses of the two flux estimates vary within the northern and tropical parts of the continents. The verification data also suggest that the more complex and more recent transport model does not improve the inversion skill. In contrast, the inversion using biascorrected retrievals from the Greenhouse Gases Observing Satellite (GOSAT) or, to a larger extent, a non-Bayesian inversion that simply adjusts a recent bottom-up flux estimate with the annual growth rate diagnosed from marine surface measurements both estimate much different fluxes and fit the aircraft data less. Our study highlights a way to rate global atmospheric inversions. Without any general claim regarding the usefulness of all OCO-2 retrieval datasets vs. all GOSAT retrieval datasets, it still suggests that some satellite retrievals can now provide inversion results that are, despite their uncertainty, comparable with respect to credibility to traditional inversions using the accurate but sparse surface network and that are therefore complementary for studies of the global carbon budget.