PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 50%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977922
1978325
1979227
1980431
1981839
19821756
1983965
19841075
19851085
1986893
19879102
198820122
198933155
199034189
199145234
199253287
1993160447
1994317764
19952501,014
19962921,306
19974131,719
19985192,238
19996362,874
20007353,609
20017784,387
20028135,200
200311846,384
200416117,995
200517569,751
2006196511,716
2007214313,859
2008199015,849
2009196217,811
2010194619,757
2011170821,465
2012183923,304
2013193825,242
2014225627,498
2015185629,354
2016213331,487
2017215233,639
2018213135,770
2019226138,031
2020275440,785
2021220642,991
2022284345,834
2023272748,561
2024273851,299
2025263953,938