PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 50%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977922
1978325
1979227
1980431
1981839
19821756
1983965
19841075
19851186
1986894
19879103
198820123
198933156
199034190
199145235
199253288
1993160448
1994316764
19952501,014
19962921,306
19974131,719
19985192,238
19996362,874
20007353,609
20017774,386
20028135,199
200311836,382
200416127,994
200517559,749
2006196711,716
2007214113,857
2008198915,846
2009196217,808
2010194319,751
2011171021,461
2012183823,299
2013193925,238
2014225427,492
2015185629,348
2016213031,478
2017214933,627
2018212835,755
2019225638,011
2020275540,766
2021220642,972
2022283745,809
2023272648,535
2024276251,297
2025246053,757