PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 70%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977821
1978324
1979226
1980430
1981838
19821755
1983964
19841074
19851286
1986995
198711106
198823129
198940169
199040209
199150259
199260319
1993181500
1994347847
19952751,122
19963291,451
19974581,909
19985902,499
19997073,206
20008154,021
20018714,892
20029295,821
200313177,138
200418198,957
2005199810,955
2006222913,184
2007245315,637
2008228417,921
2009230220,223
2010229922,522
2011205024,572
2012220526,777
2013233729,114
2014282231,936
2015229334,229
2016257036,799
2017268239,481
2018260442,085
2019278444,869
2020345148,320
2021273951,059
2022350454,563
2023345758,020
2024349761,517
2025319664,713