PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 90%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
1981844
19821862
19831173
19841184
19851296
19869105
198711116
198825141
198945186
199051237
199155292
199267359
1993222581
19944371,018
19953271,345
19963911,736
19975412,277
19987182,995
19998433,838
20009484,786
200110045,790
200210606,850
200314908,340
2004204710,387
2005226612,653
2006252815,181
2007286418,045
2008264320,688
2009270023,388
2010273326,121
2011250328,624
2012270431,328
2013292234,250
2014357837,828
2015294140,769
2016342744,196
2017361247,808
2018350051,308
2019368254,990
2020460759,597
2021395463,551
2022496868,519
2023467473,193
2024477477,967
2025481882,785