jpg

I’ve always been a big fan of Formula 1 especially for its technological advancements shown by each team as time progresses. For starters, here is a brief introduction of Formula 1.

Championships - There are two different titles in F1, the World Driver’s Championship (WDC), which goes to the driver with the most individual points at the end of the season, and the World Constructor’s Championship (WCC), which goes to the team with the most points between their two drivers. Every team fields 2 cars in every race. There are generally ~18 races or so, but the amount changes frequently as races are added or dropped.

Scoring - Scoring is based purely off of where you finish the race. It used to be that only the top 6 placing drivers scored, then it was changed to 8, and more recently it was changed to 10. 1st place is worth 25 points, then 18, 15, 12, 10, 8, 6, 4, 2, and 1 respectively.

This project will be using SQL queries and data visualiztion with Python for data analysis. we will be using SQL for extracting and trandforming the data before doing visualization with python.

1. Questions

This analysis will try to answer the following questions:

Drivers

  1. Which driver had the most F1 World Driver’s Champion title?
  2. Which driver had obtained the most pole position?
  3. Which driver had the most points throughout their career?
  4. How many F1 drivers that actually won at least one race?
  5. What are the nationality of most F1 drivers?

Constructors

  1. Who are the all time winning teams since 1950?
  2. Which teams are the earliest to participate in F1?
  3. What are the F1 constructors nationality?
  4. Who are the winning teams since the Hybrid era?
  5. How are the performance of the 2021 teams before?

F1 circuits and races

  1. Where are most of the F1 tracks located?
  2. What are the the fastest lap time for each Grand Prix?
  3. How many F1 races are there for each season?

2. Measurement Priorities

For Drivers:

For Constructors:

For F1 circuits and races:

3. Data Collection


Notebook Contents:

Connect to database

# import libraries
import sqlalchemy
import pyodbc
import pandas as pd
import numpy as np

pyodbc.drivers()
['SQL Server',
 'Microsoft Access Driver (*.mdb, *.accdb)',
 'Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)',
 'Microsoft Access Text Driver (*.txt, *.csv)',
 'SQL Server Native Client 11.0',
 'SQL Server Native Client RDA 11.0',
 'ODBC Driver 17 for SQL Server']
%load_ext sql
The sql extension is already loaded. To reload it, use:
  %reload_ext sql
%sql mssql+pyodbc://@DESKTOP-8RI327A\SQLEXPRESS/F1?driver=SQL+Server

Extract, transform, and visualization

#overall styling
import matplotlib.pyplot as plt
import seaborn as sns
import altair as alt

sns.set_style("darkgrid")
palette = ['#004146','#018076','#03BFB5','#949398']
seq_palette_m = 'viridis'
seq_palette_h =  sns.dark_palette("#e76f51")
b_figsize = (14,8)

Drivers

1. Which driver had the most F1 World Driver’s Champion title?

%%sql world_title <<

WITH cte as (
      SELECT CONCAT(forename, ' ', surname) AS driver_name,
             year, SUM(points) as race_points,
             RANK() OVER (PARTITION BY year ORDER BY SUM(points) DESC) as seqnum
      FROM results r JOIN 
           constructors c
           ON r.constructorId = c.constructorId JOIN
           drivers d
           ON r.driverId = d.driverId JOIN
           races rc
           ON r.raceId = rc.raceId
      GROUP BY CONCAT(forename, ' ', surname), year
     )
SELECT *
FROM cte
WHERE seqnum = 1 AND year<=2020;
 * mssql+pyodbc://@DESKTOP-8RI327A\SQLEXPRESS/F1?driver=SQL+Server
Done.
Returning data to local variable world_title
df_world_title = world_title.DataFrame()
df_world_title.tail()
driver_name year race_points seqnum
66 Nico Rosberg 2016 385.0 1
67 Lewis Hamilton 2017 363.0 1
68 Lewis Hamilton 2018 408.0 1
69 Lewis Hamilton 2019 413.0 1
70 Lewis Hamilton 2020 347.0 1
fig = plt.figure(figsize=b_figsize)
sns.countplot(data=df_world_title,x='driver_name',
              order=df_world_title['driver_name'].value_counts().index,
            palette=seq_palette_m)
plt.title("World Driver's Championship title count")
plt.xlabel("Driver name")
plt.ylabel("Count of titles")
plt.xticks(rotation=90)
plt.show()

png

Description: Since the 2021 season is stil on going, we will only consider the data up untill 2020. Here we can see that Lewis Hamilton and Michael Schumacher won the most World Driver Championship title. So far, only 32 driver’s had won the title.

2. Which driver had obtained the most pole position?

%%sql pole_position <<

WITH cte AS(

SELECT grid,
CONCAT(forename,' ',surname ) AS driver_name,
COUNT(driverRef) OVER(PARTITION BY driverRef) AS pole_count
FROM
    results AS r
    INNER JOIN races AS rc
    ON r.raceId = rc.raceId
    
    INNER JOIN constructors AS c
    ON r.constructorId = c.constructorId
    
    INNER JOIN drivers AS d
    ON r.driverId = d.driverId
        
WHERE grid = 1)

SELECT TOP 10 driver_name,MAX(pole_count) AS pole

FROM cte
GROUP BY driver_name
ORDER BY pole DESC
 * mssql+pyodbc://@DESKTOP-8RI327A\SQLEXPRESS/F1?driver=SQL+Server
Done.
Returning data to local variable pole_position
df_pole_position = pole_position.DataFrame()
df_pole_position.head()
driver_name pole
0 Lewis Hamilton 100
1 Michael Schumacher 68
2 Ayrton Senna 65
3 Sebastian Vettel 57
4 Jim Clark 34
fig = plt.figure(figsize=b_figsize)
sns.barplot(data=df_pole_position,
            x='driver_name',
            y='pole',
            palette=seq_palette_m)
plt.title("Count of pole position obtained by drivers")
plt.xlabel("Driver name")
plt.ylabel("Count of pole positions")
plt.xticks(rotation=45)
plt.show()

png

Description: From the chart we can see that Lewis Hamilton had obtained 100 pole position. Considering that Lewis Hamilton entered F1 later than most of the drivers in this chart, he still manage to secure a lot of pole position in a short window of time.

3. Which driver had the most points throughout their career?

%%sql f1_drivers_points <<

SELECT TOP 10 driver_name, SUM(driver_points) AS total_points
FROM
    (SELECT

        (CONVERT(DECIMAL, points)) AS driver_points,
        concat(forename, ' ', surname) AS driver_name

    FROM 
        results AS r

        INNER JOIN constructors AS c
        ON r.constructorId = c.constructorId

        INNER JOIN drivers AS d
        ON r.driverId = d.driverId

        INNER JOIN races AS rc
        ON r.raceId = rc.raceId

        WHERE points >= '1') AS subquery
    
GROUP BY driver_name
ORDER BY total_points DESC;
 * mssql+pyodbc://@DESKTOP-8RI327A\SQLEXPRESS/F1?driver=SQL+Server
Done.
Returning data to local variable f1_drivers_points
df_f1_drivers_points = f1_drivers_points.DataFrame()
df_f1_drivers_points.head()
driver_name total_points
0 Lewis Hamilton 3928
1 Sebastian Vettel 3048
2 Fernando Alonso 1919
3 Kimi Räikkönen 1864
4 Valtteri Bottas 1604
fig = plt.figure(figsize=b_figsize)
sns.barplot(data=df_f1_drivers_points, x='driver_name', y='total_points',palette='viridis')
plt.title("Total points of F1 Drivers")
plt.xlabel("Driver name")
plt.ylabel("Total points")
plt.xticks(rotation=45)
plt.show()

png

Description: In terms of total points, we can see that Lewis Hamilton had the most points in F1 history. Up to this point we can see that Lewis Hamilton had a great F1 career.

One interesting point that we can see here is that the drivers in this chart are mostly the drivers that are competing in the later seasons of F1. Could it possibly that the drivers are able to obtain more points during the later season in comparison to the earlier F1 season? We’ll discuss that further in the “F1 circuits and races” section.

4. How many F1 drivers that actually won at least one race?

%%sql f1_drivers_win_count <<

Select driver_name, COUNT(driver_name) AS frequency
FROM
    (SELECT
        concat(forename, ' ', surname) AS driver_name

    FROM 
        results AS r

        INNER JOIN constructors AS c
        ON r.constructorId = c.constructorId

        INNER JOIN drivers AS d
        ON r.driverId = d.driverId

        INNER JOIN races AS rc
        ON r.raceId = rc.raceId

        WHERE positionText = '1') AS subquery

GROUP BY driver_name
ORDER BY frequency DESC;
 * mssql+pyodbc://@DESKTOP-8RI327A\SQLEXPRESS/F1?driver=SQL+Server
Done.
Returning data to local variable f1_drivers_win_count
df_f1_drivers_win_count  = f1_drivers_win_count.DataFrame()
df_f1_drivers_win_count.tail()
driver_name frequency
105 Vittorio Brambilla 1
106 Sam Hanks 1
107 Richie Ginther 1
108 Robert Kubica 1
109 Rodger Ward 1
fig = plt.figure(figsize=(10,20))
sns.barplot(data=df_f1_drivers_win_count,
            x='frequency',
            y='driver_name',
            palette=seq_palette_m)

plt.title('Driver wins since 1950')
plt.ylabel('Driver names')
plt.xlabel('Count of race wins')
plt.show()

png

Description: From the data, we can see that there are 110 drivers that at least obtain 1 race win (1st position) in each of those races. From the frequency of wins, Lewis Hamilton is having nearly 100 wins (96 to be exact) which is the highest total wins since 1950.

5. What are the nationality of most F1 drivers?

%%sql f1_drivers_nationality <<


SELECT
    nationality, COUNT(nationality) AS frequency

FROM 
    drivers
    
GROUP BY nationality
ORDER BY frequency DESC;
 * mssql+pyodbc://@DESKTOP-8RI327A\SQLEXPRESS/F1?driver=SQL+Server
Done.
Returning data to local variable f1_drivers_nationality
df_f1_drivers_nationality =f1_drivers_nationality.DataFrame()
df_f1_drivers_nationality.head()
nationality frequency
0 British 165
1 American 157
2 Italian 99
3 French 73
4 German 50
fig = plt.figure(figsize=b_figsize)
sns.barplot(data=df_f1_drivers_nationality, x='nationality', y='frequency',palette='viridis')
plt.title("Count of driver's nationality")
plt.xlabel("Nationality")
plt.ylabel("Count of drivers")
plt.xticks(rotation=90)
plt.show()

png

Description: Most of the Formula 1 drivers are British, which is quite as expected since most of the teams are based in UK. American drivers also come quite close in terms of total drivers but in terms of total teams. This shows that there is a possibility that the American drivers driver in teams that aren’t American teams.

Constructors

1. Who are the all time winning teams since 1950?

%%sql win_all_time <<

SELECT name,SUM(wins) AS total_wins
FROM 
    constructor_standings AS cs
    
    INNER JOIN constructors AS c
    ON cs.constructorId = c.constructorId
    
    INNER JOIN races AS r
    ON cs.raceId = r.raceId
    
    WHERE wins>=1
    GROUP BY name
    ORDER BY total_wins DESC;
 * mssql+pyodbc://@DESKTOP-8RI327A\SQLEXPRESS/F1?driver=SQL+Server
Done.
Returning data to local variable win_all_time
df_win_all_time = win_all_time.DataFrame()
df_win_all_time.head()
name total_wins
0 Ferrari 1856
1 McLaren 1595
2 Mercedes 1108
3 Williams 934
4 Red Bull 580
fig = plt.figure(figsize=(10,10))
sns.barplot(data=df_win_all_time,
            x='total_wins',
            y='name',
            palette=seq_palette_m)
plt.title('Number of wins by constructors since 1950')

plt.xlabel("Count of race wins")
plt.ylabel("Constructor names")
plt.show()

png

2. Which teams are the earliest to participate in F1?

%%sql team_orgin_date <<

SELECT name, MAX(DATEDIFF(year, date,CURRENT_TIMESTAMP)) AS team_age_years, MIN(date) AS first_race

FROM 
    constructor_standings AS cs
    
    INNER JOIN constructors AS c
    ON cs.constructorId = c.constructorId
    
    INNER JOIN races AS r
    ON cs.raceId = r.raceId
    
    GROUP BY name
    ORDER BY team_age_years DESC;
 * mssql+pyodbc://@DESKTOP-8RI327A\SQLEXPRESS/F1?driver=SQL+Server
Done.
Returning data to local variable team_orgin_date
df_team_orgin_date = team_orgin_date.DataFrame()
df_team_orgin_date.head(5)
name team_age_years first_race
0 Maserati 63 1958-01-19 00:00:00.0000000
1 Ferrari 63 1958-01-19 00:00:00.0000000
2 Team Lotus 63 1958-05-18 00:00:00.0000000
3 BRM 63 1958-05-18 00:00:00.0000000
4 Connaught 63 1958-05-18 00:00:00.0000000

Description: From the table, we can see that among the eraliest team to participate since the inaugural season is Maserati, Ferrari, and Cooper. The list also includes Team Lotus and BRM. Just out of curiosity, let’s see the teams that are recently created.

df_team_orgin_date.tail(5)
name team_age_years first_race
154 Manor Marussia 6 2015-03-29 00:00:00.0000000
155 Haas F1 Team 5 2016-03-20 00:00:00.0000000
156 Racing Point 2 2019-03-17 00:00:00.0000000
157 AlphaTauri 1 2020-07-05 00:00:00.0000000
158 Alpine F1 Team 0 2021-03-28 00:00:00.0000000

Description: Alpine team seems to be the most recently created constructors team followed by AlphaTauri and Racing Point. 3 years before that, Haas F1 Team starts their first F1 debut with Manor Marussia on the year before it. Apparently, Manor Marussia is short-lived and no longer participating in F1. Next, let’s visualize all of the team age in one plot.

# import pandas for data wrangling

def circular_plot(dataframe,value,item,title):
    # Build a dataset

    # Reorder the dataframe
    df = dataframe
    
    # sort (I've sorted the data before so there's no need to sort again)
    # df = dataframe.sort_values(by=[value])

    # initialize the figure
    plt.figure(figsize=(20,10))
    ax = plt.subplot(111, polar=True)
    ax.text(x=17,y=0,s=title,backgroundcolor="#2FD0CB", color='white',ha='center',size=20)
    plt.axis('off')

    # Constants = parameters controling the plot layout:
    upperLimit = 100
    lowerLimit = 30
    labelPadding = 4

    # Compute max and min in the dataset
    max = df[value].max()

    # Let's compute heights: they are a conversion of each item value in those new coordinates
    # In our example, 0 in the dataset will be converted to the lowerLimit (10)
    # The maximum will be converted to the upperLimit (100)
    slope = (max - lowerLimit) / max
    heights = slope * df[value] + lowerLimit

    # Compute the width of each bar. In total we have 2*Pi = 360°
    width = 2*np.pi / len(df.index)

    # Compute the angle each bar is centered on:
    indexes = list(range(1, len(df.index)+1))
    angles = [element * width for element in indexes]
    angles

    # Draw bars
    bars = ax.bar(
        x=angles, 
        height=heights, 
        width=width, 
        bottom=lowerLimit,
        linewidth=2, 
        edgecolor="white",
        color="#61a4b2",
    )
    

    # Add labels
    for bar, angle, height, label in zip(bars,angles, heights, df[item]):

        # Labels are rotated. Rotation must be specified in degrees :(
        rotation = np.rad2deg(angle)

        # Flip some labels upside down
        alignment = ""
        if angle >= np.pi/2 and angle < 3*np.pi/2:
            alignment = "right"
            rotation = rotation + 180
        else: 
            alignment = "left"

        # Finally add the labels
        ax.text(
            x=angle, 
            y=lowerLimit + bar.get_height() + labelPadding, 
            s=label, 
            ha=alignment, 
            va='center', 
            rotation=rotation, 
            rotation_mode="anchor") 
        

title = 'F1 team age'
circular_plot(df_team_orgin_date,'team_age_years','name',title)

png

3. What are the F1 constructors nationality?

%%sql teams_nation <<

SELECT nationality, COUNT(nationality) AS count_teams

FROM 
    constructors AS c
    
    GROUP BY nationality
    ORDER BY count_teams DESC;
 * mssql+pyodbc://@DESKTOP-8RI327A\SQLEXPRESS/F1?driver=SQL+Server
Done.
Returning data to local variable teams_nation
df_teams_nation = teams_nation.DataFrame()
df_teams_nation.head()
nationality count_teams
0 British 86
1 American 39
2 Italian 29
3 French 13
4 German 10
fig = plt.figure(figsize=b_figsize)
sns.barplot(data=df_teams_nation, x='nationality',
            y='count_teams',
            palette=seq_palette_m)
plt.title('Nationality of F1 Teams')
plt.xlabel("Nationality")
plt.ylabel("Count of constructors")
plt.xticks(rotation=50)
plt.show()

png

Description: From the chart, we can see most of the team’s nationality is British which twice more than the count of American teams. From this aricle by BBC, it is mostly due to its history and heritage. The UK had lots of unused airfields for racing and testing. Similar to Silicon Valley, having a “motorsport valley” attracts a lot of engineers who live close to each other which leads to a high supply of quality workforce.

4. Who are the winning teams since the Hybrid era?

%%sql wins_hybrid <<

SELECT name,SUM(wins) AS total_wins
FROM 
    constructor_standings AS cs
    
    INNER JOIN constructors AS c
    ON cs.constructorId = c.constructorId
    
    INNER JOIN races AS r
    ON cs.raceId = r.raceId
    
    WHERE r.year >= '2014' AND wins>=1
    GROUP BY name
    ORDER BY total_wins DESC;
 * mssql+pyodbc://@DESKTOP-8RI327A\SQLEXPRESS/F1?driver=SQL+Server
Done.
Returning data to local variable wins_hybrid
df_wins_hybrid = wins_hybrid.DataFrame()
df_wins_hybrid
name total_wins
0 Mercedes 1054
1 Ferrari 205
2 Red Bull 166
3 AlphaTauri 10
4 Racing Point 2
fig = plt.figure(figsize=b_figsize)
sns.barplot(data=df_wins_hybrid,
            x='name',
            y='total_wins',
            palette=seq_palette_m)
plt.title('Number of wins by team in the Hybrid era')
plt.xlabel("Constructor name")
plt.ylabel("Count of wins")
plt.show()

png

Description: From the analysis, we can see that the Mercedes team had totaly dominated the F1 championship during the hybrid era which is from 2014 and onwards. Interestingly, as we can see from the plot, most of the races are won by these 5 teams since 2014.

5. How are the performance of the 2021 teams before?

%%sql performance_2021_c <<

SELECT name AS constructor, YEAR(CONVERT(datetime, date)) AS year, SUM(CAST(points AS float)) total_points

FROM

    constructor_results AS cr

    INNER JOIN constructors AS c
    ON cr.constructorId = c.constructorId

    INNER JOIN races AS r
    ON cr.raceId = r.raceId
    
    WHERE name IN

        (SELECT DISTINCT c.name

        FROM 
            constructor_results AS cr

            INNER JOIN constructors AS c
            ON cr.constructorId = c.constructorId

            INNER JOIN races AS r
            ON cr.raceId = r.raceId

           WHERE date >= Convert(datetime, '2020-12-31' ))
        
GROUP BY YEAR(CONVERT(datetime, date)),name
ORDER BY year;
 * mssql+pyodbc://@DESKTOP-8RI327A\SQLEXPRESS/F1?driver=SQL+Server
Done.
Returning data to local variable performance_2021_c
df_performance_2021_c =performance_2021_c.DataFrame()
df_performance_2021_c.head()
constructor year total_points
0 Ferrari 1956 21.0
1 Ferrari 1958 61.0
2 Aston Martin 1959 0.0
3 Ferrari 1959 38.0
4 Aston Martin 1960 0.0
fig = plt.figure(figsize=(15,8))

sns.scatterplot(
    data=df_performance_2021_c,
    x="year",
    y='total_points',
    hue="constructor",
    size='total_points',
    sizes=(10,500),
    alpha=0.9,
    palette="hls"
)
plt.title("Performance of the 2021 teams throughout time")
plt.xlabel("Year")
plt.ylabel("Total race points")
plt.show()

png

Description: From the plot, we can see that Ferarri are among the oldest team to compete in the 2021 followed by Aston Martin and Alfa Romeo. It seems like the last two aforementioned constructors haven’t score any significant points past 1970. This may be due to the teams hasn’t participated for a while and making their return at the 2020 season.

We can also see that Red Bull had been winning a lot of points upon the year 2010 as well as the Mercedes team earliest obtained points. Even though they starts participating races in the later seasons, it seems like the team dominates the season beyond 2014 which is the start of the hybrid era.

What’s even interesting is that Ferrari had a sudden drop of points nearing the 2020 season. From this article, it seems like the Ferrari team’s car for that season is suffering from performance and design issues.

F1 circuits and races

1. Where are most of the F1 tracks located?

%%sql circuit_map <<

SELECT 
    name,
    location,
    country,
    lat,
    lng,
    CONVERT(INT,alt) as alt
FROM
    circuits
 * mssql+pyodbc://@DESKTOP-8RI327A\SQLEXPRESS/F1?driver=SQL+Server
Done.
Returning data to local variable circuit_map
df_circuit_map = circuit_map.DataFrame()
df_circuit_map.dtypes
name         object
location     object
country      object
lat         float64
lng         float64
alt           int64
dtype: object
import altair as alt
from vega_datasets import data

states = alt.topo_feature(data.world_110m.url, 'countries')
circuits = df_circuit_map

background = alt.Chart(states, title='F1 Circuits location').mark_geoshape(
    fill='lightgray',
    stroke='white'
).project('equalEarth').properties(
    width=500,
    height=300
)

points = alt.Chart(circuits).mark_circle().encode(
    longitude='lng:Q',
    latitude='lat:Q',
    size=alt.value(100),
    tooltip=('name','country','location'),
    color=alt.Color('alt', scale=alt.Scale(scheme=seq_palette_m))
)

background + points

Description: As we can see from the map, we can see that most of the F1 race tracks are mostly in the Europe and North America. In Asia, there are more track on the East in camparison to the South East. Most of the tracks are at sea level. The track that is very further from sea level which is the Autódromo Hermanos Rodríguez (Mexico city). Track altitudes are taken into consideration by teams since at lower pressures of high altitudes, air molecules spread out more so the air becomes less dense. This will greatly affect the aerodynamics of the car.

2. What are the the fastest lap time for each Grand Prix?

%%sql fastest_lap <<

WITH cte as (
      SELECT
    race_name,
    CONCAT(forename, ' ', surname) AS driver_name,
    MIN(l.milliseconds) as time_milli,
    RANK() OVER (PARTITION BY race_name ORDER BY MIN(l.milliseconds)) as seqnum
    FROM 
    
        lap_times AS l
    
        INNER JOIN races AS r
        ON l.raceId = r.raceId

        INNER JOIN drivers AS d
        ON l.driverId = d.driverId
    
    GROUP BY race_name, CONCAT(forename, ' ', surname)

     )
SELECT

race_name,
driver_name,
COUNT(driver_name) OVER(PARTITION BY driver_name) AS driver_freq,
time_milli

FROM cte
WHERE seqnum = 1
ORDER BY race_name;
 * mssql+pyodbc://@DESKTOP-8RI327A\SQLEXPRESS/F1?driver=SQL+Server
Done.
Returning data to local variable fastest_lap
df_fastest_lap = fastest_lap.DataFrame()
df_fastest_lap['time_delta'] = pd.to_timedelta(df_fastest_lap['time_milli'], unit='ms')
df_fastest_lap.style.background_gradient(cmap='PuBu')
race_name driver_name driver_freq time_milli time_delta
0 70th Anniversary Grand Prix Lewis Hamilton 8 88451 0 days 00:01:28.451000
1 Abu Dhabi Grand Prix Lewis Hamilton 8 99283 0 days 00:01:39.283000
2 Argentine Grand Prix Gerhard Berger 1 87981 0 days 00:01:27.981000
3 Australian Grand Prix Michael Schumacher 6 84125 0 days 00:01:24.125000
4 Austrian Grand Prix Max Verstappen 2 66200 0 days 00:01:06.200000
5 Azerbaijan Grand Prix Charles Leclerc 1 103009 0 days 00:01:43.009000
6 Bahrain Grand Prix Michael Schumacher 6 90252 0 days 00:01:30.252000
7 Belgian Grand Prix Kimi Räikkönen 2 105108 0 days 00:01:45.108000
8 Brazilian Grand Prix Valtteri Bottas 2 70540 0 days 00:01:10.540000
9 British Grand Prix Michael Schumacher 6 78739 0 days 00:01:18.739000
10 Canadian Grand Prix Lewis Hamilton 8 67058 0 days 00:01:07.058000
11 Chinese Grand Prix Michael Schumacher 6 92238 0 days 00:01:32.238000
12 Eifel Grand Prix Max Verstappen 2 88139 0 days 00:01:28.139000
13 Emilia Romagna Grand Prix Lewis Hamilton 8 75484 0 days 00:01:15.484000
14 European Grand Prix Juan Pablo Montoya 2 78354 0 days 00:01:18.354000
15 French Grand Prix David Coulthard 1 75045 0 days 00:01:15.045000
16 German Grand Prix Kimi Räikkönen 2 73780 0 days 00:01:13.780000
17 Hungarian Grand Prix Michael Schumacher 6 76207 0 days 00:01:16.207000
18 Indian Grand Prix Sebastian Vettel 3 87249 0 days 00:01:27.249000
19 Italian Grand Prix Rubens Barrichello 2 81046 0 days 00:01:21.046000
20 Japanese Grand Prix Felipe Massa 1 78426 0 days 00:01:18.426000
21 Korean Grand Prix Sebastian Vettel 3 99605 0 days 00:01:39.605000
22 Luxembourg Grand Prix Heinz-Harald Frentzen 1 78805 0 days 00:01:18.805000
23 Malaysian Grand Prix Sebastian Vettel 3 94080 0 days 00:01:34.080000
24 Mexican Grand Prix Valtteri Bottas 2 78741 0 days 00:01:18.741000
25 Monaco Grand Prix Lewis Hamilton 8 72909 0 days 00:01:12.909000
26 Portuguese Grand Prix Lewis Hamilton 8 78750 0 days 00:01:18.750000
27 Russian Grand Prix Lewis Hamilton 8 95761 0 days 00:01:35.761000
28 Sakhir Grand Prix George Russell 1 55404 0 days 00:00:55.404000
29 San Marino Grand Prix Michael Schumacher 6 80411 0 days 00:01:20.411000
30 Singapore Grand Prix Kevin Magnussen 1 101905 0 days 00:01:41.905000
31 Spanish Grand Prix Giancarlo Fisichella 1 75641 0 days 00:01:15.641000
32 Styrian Grand Prix Carlos Sainz 1 65619 0 days 00:01:05.619000
33 Turkish Grand Prix Juan Pablo Montoya 2 84770 0 days 00:01:24.770000
34 Tuscan Grand Prix Lewis Hamilton 8 78833 0 days 00:01:18.833000
35 United States Grand Prix Rubens Barrichello 2 70399 0 days 00:01:10.399000

Description: There are a total of 35 F1 tracks throughout history. We can also see that most of the fastest lap time is done by Lewis Hamilton where the fastest lap ever done by him is is the Canadian Grand Prix with a time of 1 minute and 07.058000 seconds.

One more thing that intrigue me is that despite being a driver to one of the slower car for the season, we can see that George Russell did the fastest lap in Sakhir GP. From Wikipedia, we can see that at that time George Russell is temporarily driving for Mercedes in Lewis Hamilton’s car as a replacement driver due to Lewis having tested positive for Covid-19.

3. How many F1 races are there for each season?

%%sql org_race <<

SELECT year,COUNT(race_name) AS count
FROM
    races AS rc
    
    INNER JOIN circuits AS c
    ON rc.circuitId = c.circuitId

GROUP BY year
ORDER BY year
 * mssql+pyodbc://@DESKTOP-8RI327A\SQLEXPRESS/F1?driver=SQL+Server
Done.
Returning data to local variable org_race
df_org_race = org_race.DataFrame()
df_org_race['year'] = df_org_race['year'].astype(str)
df_org_race.head()
year count
0 1950 7
1 1951 8
2 1952 8
3 1953 9
4 1954 9
fig = plt.figure(figsize = b_figsize)
plt.title('Number F1 races organized for each season')
sns.lineplot(data=df_org_race, x='year', y='count',color=palette[1])
plt.xlabel('Season')
plt.ylabel('Count of races')
plt.xticks(rotation=90)
plt.show()

png

Description: Here we can see that the 2021 season had the most races which is 23. Initally in 1950, there were less than 10 races held. One key feature is that there is a consistent amount of races between 1984 and 1994 which is 16 races. It seems that the 16-race calendar was favoured by the FIA during that period.

From the previous analysis of the driver points, we can see that the drivers that obtained the most overall points are the drivers that are from the later season of F1 (from 2010 and onwards). This somehow makes sense since during the later seasons, there are more races per season in comparison to the earlier seasons. This means that the drivers are able to obtain more points since there are more races.