Methodology | Techtracker ASPI

A full and detailed methodology can be found in our report ASPI’s Critical Technology Tracker: the global race for future power.

Read through our high level summary explaining why we focused on quality of research over quantity of research to rank countries and organisations (largely universities, businesses and national labs). Underneath is a summary of our methodology.

Note for technologies added after September 2023 update: Some figures presented in the insight section will differ slightly from the same figures found in the visual presentations on techtracker.aspi.org.au. This is due to rounding errors on the web interface. The figures presented in these insights are calculated directly from source data and are more accurate.

What do we mean by ‘quality’ metrics

Distinguishing innovative and high-impact research papers from low-quality papers is critical when estimating the current and future technical capability of nations. Not all the millions of research papers published each year are high quality.

What’s a citation?

When a scientific paper references another paper, that’s known as a citation. The number of times a paper is cited reflects the impact of the paper. As time goes by, there are more opportunities for a paper to be cited, so only papers of a similar age should be compared using citation counts (as was done in this report).

Country-level quality metrics

Throughout this research project, we present three country-level quality metrics:

1) proportion of papers in the top 10% most highly cited research reports

2) the H-index

3) the number of research institutions a country has in the world’s top 10–20 highest performing institutions.

The top 10% of the most highly cited papers were analysed to generate insights into which countries are publishing the greatest share of high-quality, innovative and high-impact research. Credit for each publication was divided among authors and their affiliations and not assigned only to the first author (for example, if there were two authors, they would each be assigned half the allocation). Fractional allocation of credit is a better prediction of individuals who go on to win Nobel Prizes or fellowship of prestigious societies. Fractional allocation of credit was used for all metrics.

The H-index (Hirsch index) is an established performance metric used for analysing the impact of scholarly output and is calculated from citation numbers of an individual’s set of publications. It’s a combined measure of quantity and impact and performs better than other single-number summaries of research quality. Calculating the H-index with five years worth of data (as we do in this research) eliminates a key criticism, which is that highly cited papers from decades ago boost the H-index but don’t reflect current research excellence. Another criticism of the H-index is that publication volumes vary by field of research, and this can unfairly advantage those in a field with high publication rates. The H-index quality metric used here compares countries within the same technology area. This approach reduces but doesn’t eliminate that problem. Neither individual papers with extreme citation numbers nor a large number of papers with low citation counts inflate the H-index used here. We calculate the five-year H-index at the institution and country levels.

We include both the top 10% and the H-index as neither is perfect and both add a unique insight. In technologies in which 1st and 2nd place flip depending on which quality metric is used, the race really is too close to call. However, more often, the lead is large and unambiguous, and both metrics are consistent regarding who is leading.

The number of institutions that a country has in the world’s top 10 institutions is used to illustrate research concentration and dominance. This list is based on the number of papers that the institutions have in the top 10% of highly cited papers.

Methodology Summary

To build ASPI’s new Critical Technology Tracker website, we collected and analysed research papers published between 2018 and 2022 in 44 technology areas. The technologies selected were informed by our own internal discussions and those with government officials and other stakeholders who highlighted areas of particular interest. Where possible we covered all technologies within a category (e.g. energy and environment), and aim to provide analysis of additional technologies later this year. For each technology, a custom search query was developed for the Web of Science database. This identified 2.2 million research papers that we subsequently used for analysis (see report Appendix). Web of Science (Core Collection) is heavily used by researchers who study scientific trends and it has well understood performance characteristics.

Bespoke search queries were developed for each technology area (see report Appendix). We took particular effort to achieve the right balance between sensitivity and specificity, and ensure correct grouping of Boolean operators. Each query was carefully designed to capture the bulk of relevant papers while simultaneously excluding irrelevant papers. Each individual search generated a different size dataset (range 871 to 526,738: see Appendix 3 for exact numbers). The size differences reflect global publishing activity for each technology and the balance between sensitivity and specificity. The bibliographic records used were restricted to journal articles, proceedings and data papers. This restricted the dataset to exclude bibliographic records that didn’t reflect recent research advances, such as book reviews, retracted publications and letters submitted to academic journals.

We have decided not to release the search terms – of which there are hundreds of carefully crafted terms and search strings – so that countries, organisations and individuals are not able to manipulate future iterations of this project. Thank you also to Australia’s Defence Science and Technology Group for sharing material that helped inform the development and build of our own database search strategies, which we put many months of effort into.

Results weren’t filtered by language, but the overwhelming majority of reports (98.7%) were written in English. This means that research papers published in domestic journals in, say, Japan, China, South Korea, France or Indonesia, outside of the world’s major journals, aren’t captured in this data collection, and that’s of course a limitation. However, incentivised by the parameters within performance reviews, and ambitions to deliver impact, be promoted and receive grant income, researchers and scientists prioritise their most important research for high-profile journals. In fact, Chinese researchers are paid large personal bonuses for publishing in top-tier journals. Databases such as the Web of Science aim to index the high-profile journals.

We chose the Web of Science database as it provides the necessary fields for our analysis, including the affiliation addresses of authors (to determine country and institution), authors’ ORCID iD numbers (to determine career histories) and citation counts (to identify high-quality publications). The ability to download data for offline analysis was also a determining factor.

We focused on the top 10% of most highly cited papers as our first quality measure for countries and institutions (universities, labs and companies). The number of papers is sometimes used as a measure of research impact, but our focus was on comparing, and differentiating, quantity and quality metrics in our datasets based on categories (and subcategories) of different fields of study. It’s critically important to distinguish between quality and quantity. Other studies have also focused on assessing quality or ‘high impact’ or ‘top tier’ research as a measure to compare different countries’ performance. A 2020 MacroPolo study, for example, used papers submitted to a 2019 AI conference on deep learning to create a dataset of researchers and to track, for example, their country affiliations, institutions and career paths. The top 1% of most highly cited papers has also been used in some studies as a quality metric for countries, but the size of our dataset (2018–2022) was sometimes too small in individual technology areas to limit our study to the top 1%. For instance, for the talent tracker, with a smaller dataset, there’s a risk of over-reading the data by following the talent flow of only a handful of researchers. Similarly, care should be exercised when interpreting the talent tracker for the top 10% of researchers in technologies with limited researcher numbers (notably: Sonar and accoustic sensors, autonomous underwater vehicles, electronic warfare, and space launch systems).

As an alternative and a second quality metric, the H-index was also calculated for countries and institutions. Self-citations, in which an individual cites their own work, are a known limitation of citation analysis, including the H-index, but it should be acknowledged that self-citations can be legitimate. Parochial citation practices, in which researchers are more likely to cite papers from their own country, are also detected in the literature. This practice will boost citation rates for countries publishing a large volume of papers.

We wanted to place these quality metrics within a geographical context, so we summarised, using large-scale data analytics, the affiliation address for each author of each paper. Note that during the publishing process authors are required to provide the name and address of their research institutes. When a researcher changes jobs, their affiliation address changes. This is reflected in papers published after they move but isn’t retrospectively applied to earlier papers. For researchers affiliated with more than one institution, we divided their per-author allocation of credit further between each institution.

For example, an author on a five-author paper who has two affiliations will divide their 20% weighting—10% for each institution.

We also used big-data analytics to count how many of the world’s leading research institutions are based in the lead country (by the number of papers in the top 10% of highly cited papers) combined with how far ahead the 1st country is relative to the 2nd country.

‘Technology monopoly risk’ metric: highlighting concentrations of technological expertise

The technology monopoly risk traffic light seeks to highlight concentrations of technological expertise in a single country. It incorporates two factors: how far ahead the leading country is relative to the next closest competitor, and how many of the world’s top 10 research institutions are located in the leading country. Naturally, these are related, as leading institutions are required to produce high-impact research. This metric, based on research output, is intended as a leading indicator for potential future dominance in technology capability (such as military and intelligence capability and manufacturing market share).

The default position is low. To move up a level, BOTH criteria must be met.

High risk = 8+/10 top institutions in no. 1 country and at least 3x research lead

Medium risk = 5+/10 top institutions in no. 1 country and at least 2x research lead

Low risk = medium criteria not met.

Example: If a country has a 3.5 times research lead but ‘only’ four of the top 10 institutions, it will rate low, as it fails to meet both criteria at the medium level.

Top 5 country rankings: The two metrics along with the traffic light are given in the right hand column of top 5 country rankings tables throughout the report and in Appendix 1.1 in full.

We also tracked the global flow of human talent by identifying the countries in which authors obtained their undergraduate and postgraduate degrees. This information was obtained from the ORCID database. The current (or most recent) country of employment was sourced from the Web of Science dataset. Career histories were extracted for the authors of papers in the top 25% most highly cited papers in each technology area. Tracking flows between countries at three points in time slices the data into numerous possibilities (n3, where n is the number of countries). This means that a larger dataset, with more authors, was required in order to generate reliable insights. In addition, it wasn’t possible to build career histories for all authors. Not all authors have an ORCID iD (although registration is free) or remember to provide their ORCID iD when publishing. Additionally, not all ORCID records contain enough information to create a career history. Thus, the talent-flow charts in this report are effectively tracking a sample of authors from high-impact papers. At a minimum, we needed a country listed for a bachelor degree (or equivalent) and a country listed for a postgraduate degree (masters, PhD, or equivalent).

This analysis revealed the brain gains and brain drains for each country. The 27 member states of the EU, although not a single country, are grouped together in the talent tracker visualisations to represent the cumulative strength of the bloc. Although undoubtedly there will exist talent competition within the EU, the shared geostrategic interests of the EU member states, and the relative ease with which talent can move within the Schengen Area, led to the decision to aggregate their contributions in the global flow of talent.

In all talent tracker visualisations, the four members of the Quadrilateral Security Dialogue (the US, Australia, India and Japan), also known as the Quad, were plotted, as was China. The other plotted countries are the top five performers (in terms of global proportion of talent) not already visualised, and all other countries are grouped together under ‘other’.