Performance Development Over the Years
One of the main characteristics defined in the Top500 dataset is the performance of each ranked supercomputer. Two specific attributes are ‘r_peak’ and ‘r_max’, which indicate the best theoretical performance and best measured performance levels of each entry in the data, respectively. We want to see how these levels have changed over the last few decades. Once we have obtained the benchmark data from the CSGenome API and cleaned up the DataFrame, we visualize the trends of the ‘r_peak’ and ‘r_max’ attributes, finding that these values have exponentially increased over time. A notable feature of the scatter plots is that the ranks of each point is marked by its color, where purple is the higher ranked (1, 2, 3…) supercomputer values and yellow represents the lower ranked (…499, 500) supercomputer values. To further investigate how these attributes have changed over time, we also generate a line graph to show the trends of the top and bottom ranking supercomputer’s ‘r_peak’ and ‘r_max’ performance values. We add to this a trendline of the sum of the performance values for all of the supercomputers in each year. From these graphs (one for ‘r_peak’ and one for ‘r_max’) we can confirm that all three of these groupings (sum, 1st, 500th) have improving trends over our given time range.