Member-only story
Python — Football World Cup Analytics
Web Scraping With Python BeautifulSoup & Pandas

The last days of the 2022 FIFA World Cup in Qatar are currently running. We all know that Brazil has won the most world cups overall (5), followed by Germany and Italy (4).
But do you know which nation played the most world cup matches or scored the most world cup goals ever? Or do you know the most successful & unsuccessful world cup nations ever in terms of wins and losses?
Well, I don’t, but let’s find it out together. Welcome to Python Data Science December #14. We will make use of Python BeautifulSoup & Pandas to crawl & store the Wikipedia World Cup pages.
This story will be further continued as part of my Python — Data Science December series. All resources, datasets, required Python libraries & installations are listed at the end of the story, in the chapter Summary & Resources.
⚽️ Scraping One World Cup
To awaken childhood memories, I decided to start with the 2002 FIFA World Cup in Japan & South Korea. I watched every single game of this tournament when I was a teenager.
Let’s take a first look at the page. It seems Wikipedia has grouped the matches into two chapters, the Group stage, and the Knockout stage.