Navigate to
A full and detailed methodology can be found in both of our reports. We will provide a short summary below but we do encourage interested readers to read the full, detailed methodology.
What is our data source?
Research publication data covering the years 2003 to 2023 was downloaded from the Web of Science (WoS) Core Collection database. WoS Core Collection was selected because it’s heavily used by researchers who study scientific trends and it has well-understood performance characteristics. The dataset included conference and journal publications and excluded bibliographic records that were deemed to not reflect research advances, such as book reviews, retracted publications and letters submitted to academic journals. In addition, we used data from the Research Organization Registry (ROR) to clean institution names, and data from the Open Researcher and Contributor ID (ORCID) database to build career profiles for the researchers plotted in the ASPI Talent Tracker.
What do we mean by ‘quality’ metrics?
Distinguishing innovative and high-impact research papers from low-quality papers is critical when estimating the current and future technical capability of nations. Not all the millions of research papers published each year are high quality.
What’s a citation?
When a scientific paper references another paper, that’s known as a citation. The number of times a paper is cited reflects the impact of the paper. As time goes by, there are more opportunities for a paper to be cited, so only papers of a similar age should be compared using citation counts (as was done in this report).
Country-level quality metrics
Throughout this research project, we present three country-level quality metrics:
- proportion of papers in the top 10% most highly cited research reports
- the H-index
- the number of research institutions a country has in the world’s top 10–20 highest performing institutions.
The top 10% of the most highly cited papers were analysed to generate insights into which countries are publishing the greatest share of high-quality, innovative and high-impact research. Credit for each publication was divided among authors and their affiliations and not assigned only to the first author (for example, if there were two authors, they would each be assigned half the allocation). Fractional allocation of credit is a better prediction of individuals who go on to win Nobel Prizes or fellowship of prestigious societies. Fractional allocation of credit was used for all metrics.
We include both the top 10% and the H-index as neither is perfect and both add a unique insight. In technologies in which 1st and 2nd place flip depending on which quality metric is used, the race really is too close to call. However, more often, the lead is large and unambiguous, and both metrics are consistent regarding who is leading.
The number of institutions that a country has in the world’s top 10 institutions is used to illustrate research concentration and dominance. This list is based on the number of papers that the institutions have in the top 10% of highly cited papers.
What do we mean by high-impact research?
We define a highly cited paper as one that has a citation count in the top 10% of all the papers published in that year. There are certainly limitations to defining quality in this way but analysing 6.8 million unique research papers requires some concessions to be made to assess the aggregated high-impact research performance of countries and institutions.
How did we count research papers?
Each paper in our WoS dataset includes the address of the institution that each author is affiliated with. The obvious way to assign credit to each country or institution is to count the number of papers contributed to by each author from that country or institution. However, that skews the results towards favouring papers with many authors (especially large collaborations with dozens of authors). To maintain an equal footing across research papers so that each high-impact paper is equally important, we allocated a fractional research credit between the authors of each paper. Credit for each paper was distributed equally between the authors named on each individual paper. For example, for a five-author paper, each author was attributed a 20% credit.In addition, the author’s credit was partitioned further to each country or institution that the author was affiliated with on that paper. So, if one of those authors listed two separate institutions, each institution would receive half of that author’s credit, which is 10% credit in this example. For each technology, we summed the individual country or institution credits from all the high-impact papers to determine the total number of high-impact papers for all countries or institutions. The following example shows what this looks like in practice. Suppose a paper had the following authors with their respective institutional affiliations.
What else is new in this August 2024 Critical Technology Tracker update?
Talent Tracker: In the March 2023 launch of the Critical Technology Tracker,161 the research authors who were tracked were from research papers published between 2018 and 2022. In this release, we have shifted the publication time window to between 2019 and 2023 and updated our data to include any relevant additions to the ORCID database that have been made in the past year. Additionally, the Talent Tracker can now be viewed for all individual countries in the EU as well as for the EU as a whole.
- Technology monopoly risk: The technology monopoly risk for all 64 technologies has been updated with the 2019–2023 data, and to reflect improved search terms and institutions cleaning.
- Country groupings: Additional country groups, such as the NATO alliance, have been added to the website.
- Institution groupings: Some institutions were partitioned into their different constituents, the intention being to best capture the institution that is doing the actual research, but we exercised some degree of judgement in our decision to aggregate or disaggregate. Our list of institutions was updated to better conform to the list used by Nature in its annual ranking of 18,000 of the world’s research institutions. Key changes that were made this year included the following:
- The University of California system was separated into its constituent universities (UC Los Angeles, UC Davis, UC Berkeley etc).
- The Indian Institutes of Technology were separated into the Indian Institute of Technology Delhi, the Indian Institute of Technology Roorkee etc.
- In contrast, some research affiliations were aggregated, such as the National Institute of Allergy and Infectious Diseases, which was merged with the National Institute of Drug Abuse (and several more institutes) to form the National Institute of Health.
How did we clean our datasets?
Allocating country and institution credit requires countries and institutions to be clearly identified so that variations of thesame name can be counted together (for example, ‘USA’ and ‘United States’ should be considered the same country). The WoS address data is structured, in the sense that there’s a general pattern in how the address is expressed. That pattern, however, is populated using human-entered data, is not strictly followed, so there’s considerable variation in how authors reference their countries and especially their institutions. In the case of country names, this process was relatively simple. The number of variations is relatively constrained because there are only a handful of cases in which genuine name variations exist (for example, ‘the Czech Republic’ versus ‘Czechia’). We were therefore able to use and modify existing lists of country names and their variations to automate the cleaning of country names into a single standardised set.156 For that set, we elected to use the Unicode Common Locale Data Repository (CLDR) standard.157 This decision was made on the basis that CLDR better captures the customary names of countries as opposed to their official, although less commonly used, names (e.g. United Kingdom of Great Britain and Northern Ireland versus United Kingdom). The standardisation of institution names was more intensive than standardising country names due to two main reasons:
- the larger number of potential institutions and the much greater variation in how those institutions may be referred to
- the need to consider aggregating institutions whose operations are very closely linked or managed or have in the 21-years, merged entirely
ASPI dealt with this through the creation of a custom institution dictionary that captures common spellings, aliases, name changes and organisational relationships of a long list of institutions. Since this program of work began this dictionary has been built up from its initial size of around 400 corrections to now more than 2,000. That increase was enabled by the development of a semi-automated cleaning pipeline that uses data from the Research Organisation Registry (ROR) to accelerate the rate at which corrections could be made. This was then supplemented with manual research using a variety of resources (including ASPI’s Chinese Defence Universities Tracker 1) to capture additional institutions not in the ROR database. An indicative example of the cleaning process is RTX Corporation. In 2020, Raytheon merged with United Technologies Corporation (UTC) to become Raytheon Technologies and inherited Pratt & Whitney, a major aerospace manufacturer that was a UTC subsidiary. In 2023, Raytheon Technologies was renamed to RTX Corporation. Therefore, all research done over the 21-year period by Pratt & Whitney, Raytheon and RTX is attributed to RTX Corporation. For each of these companies, we then need to consider possible name variations. For example, ‘Pratt & Whitney’ could also be ‘Pratt and Whitney’. As an abbreviation, RTX might also be used by researchers from other similarly abbreviated institutions, in which case, more specific information about the location of the institution may need to be used. Our dictionary, the result of considerable effort over a two-year period, currently contains more than 2,000 institutions from 86 countries, and we intend to expand it further as additional work is funded as a part of this project.
What’s the difference between the percentage or number time-series?
Global research output has been growing exponentially since the beginning of scientific publishing in the late 17th century, and the current volume of scientific publications doubles roughly every 15 years. Therefore, plotting the number of papers produced by a particular country in a particular field potentially emphasises this overall exponential growth rather than the relative country performance. To account for this, the performance of a country (or institution) can also be visualised on the website by their global share of high-impact research. This view makes it easier to compare country performance in earlier years, when global research output may have been smaller. To calculate the cumulative global share of publications, we divide the cumulative sum of high-impact publications for each individual country or institution by the cumulative sum of the global number of high-impact publications. Thus, the cumulative global share represents, at each point in time, the proportion of high-impact publications that the country or institution has accumulated compared to the world’s high-impact publications since 2003.
What is the ASPI Talent Tracker?
The ASPI Talent Tracker is a dataset created by ASPI that tracks the career trajectories of researchers working in each of the 64 critical technologies. It does this by using the ORCID database, which assigns a unique and persistent digital identifier (an ORCID iD) to researchers which can be used to link them to their professional activities (published papers, positions held and degrees/qualifications) and it means difficulties associated with actual names can be avoided (e.g. non-uniqueness, name changes, spelling variation, translatability, etc.). These ORCID iDs are often included by authors in their submissions to research journals, which is captured in the WoS database. It wasn’t possible to build career histories for all authors. Not all authors have an ORCID iD (although registration is free) or remember to provide their ORCID iD when publishing. Additionally, not all ORCID records contain enough information to create a career history. Hence, by combining the ORCID iDs listed in the WoS database with the professional activities listed in the ORCID database, in addition to our long-term data cleaning efforts, we were able to create a dataset which visualises the flow of research talent (researchers who authored the top 25% or top 10% most cited papers). This means we can track the countries where they gained their undergraduate degree, their post-graduate qualification, and are most recently employed. To ensure that only researchers that are still active in the field are visualised, only authors who published within the top 25% most cited papers between 2019 and 2023 (i.e. the last five years) are included in this dataset. When using the website, and having selected a technology of interest (and optionally countries to focus on), this can be viewed by clicking on the ’flow of human talent’ tab.
‘Technology monopoly risk’ metric: highlighting concentrations of technological expertise
The technology monopoly risk traffic light seeks to highlight concentrations of technological expertise in a single country. It incorporates two factors: how far ahead the leading country is relative to the next closest competitor, and how many of the world’s top 10 research institutions are located in the leading country. Naturally, these are related, as leading institutions are required to produce high-impact research. This metric, based on research output, is intended as a leading indicator for potential future dominance in technology capability (such as military and intelligence capability and manufacturing market share).
The default position is low. To move up a level, BOTH criteria must be met.
- High risk = 8+/10 top institutions in no. 1 country and at least 3x research lead
- Medium risk = 5+/10 top institutions in no. 1 country and at least 2x research lead
- Low risk = medium criteria not met.
Example: If a country has a 3.5 times research lead but ‘only’ four of the top 10 institutions, it will rate low, as it fails to meet both criteria at the medium level.
Top 5 country rankings: The two metrics along with the traffic light rating system are included in Appendix 1 in both reports.