Data: ETL with Pokemon

adam lee
3 min readMay 29, 2024

--

In this article, we’ll walk through the process of extracting data from the Pokémon API and converting it to a CSV file using Python.

Introduction.

The Pokémon API (PokeAPI) is a comprehensive resource for obtaining data about Pokémon. Extracting this data can be useful for analysis, building applications, or just satisfying curiosity. We’ll use Python to interact with the API, extract data, and convert it into a CSV format for easy manipulation and analysis.

Project Overview.

For the sake of this project, we’ll pull data from the Pokemon API. The data will be retrieved in JSON format, and furthermore we’ll parse this data and Load it into a CSV format. We’ll store the CSV file in a folder called “output” and the filename will be output concatenated with the date and time. for example a file would be named output_2024–05–24_13–05–50.csv

Prerequisites.

Before we start, make sure you have Python installed on your machine. You’ll also need the following libraries.

  • requests for making API calls
  • csv library for writing data to CSV files

You can install this library using pip:

pip install requests

Creating the script.

First we’ll create a directory to output our csv file:

import os

# Create the output directory if it doesn't exist
output_dir = 'output'
if not os.path.exists(output_dir):
os.makedirs(output_dir)

Next, we’ll use the PokéAPI to fetch data. The API provides detailed information about each Pokémon. We’ll start by fetching a list of Pokémon and set this limit to 10 Pokemon passing in a key-value pair of ?limit=10 in the request.

import requests

# URL of the Pokémon API
url = "https://pokeapi.co/api/v2/pokemon?limit=10"

# Make a GET request to fetch the raw JSON data
response = requests.get(url)
if response.status_code != 200:
raise Exception("Error fetching data from the API")

# Parse the JSON response
data = response.json()

# Extract the results list which contains the Pokémon data
pokemon_list = data['results']

Once we have the JSON response, we’ll need to populate a list for each Pokémon, we need to fetch detailed data. This includes the Pokémon’s name, ID, height, weight, base experience, and types. We’ll parse this data into a CSV(Comma Separated Value) format

# Create a list to store the Pokémon data
pokemon_data = []

# Fetch detailed data for each Pokémon
for pokemon in pokemon_list:
pokemon_details_response = requests.get(pokemon['url'])
if pokemon_details_response.status_code != 200:
continue # Skip this Pokémon if there's an issue with the request

pokemon_details = pokemon_details_response.json()
pokemon_data.append({
"name": pokemon['name'],
"id": pokemon_details['id'],
"height": pokemon_details['height'],
"weight": pokemon_details['weight'],
"base_experience": pokemon_details['base_experience'],
"types": ", ".join([t['type']['name'] for t in pokemon_details['types']])
})

Now that we have our Pokémon data, we can export it to a CSV file. We’ll include the current date and time in the filename to keep our files unique and organized:

from datetime import datetime
import csv

# Get the current date and time
current_time = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

# Specify the CSV file to write the data to
csv_file = os.path.join(output_dir, f'output_{current_time}.csv')

# Specify the header for the CSV file
fieldnames = ["name", "id", "height", "weight", "base_experience", "types"]

# Write data to CSV
with open(csv_file, mode='w', newline='') as file:
writer = csv.DictWriter(file, fieldnames=fieldnames)

# Write the header
writer.writeheader()

# Write the Pokémon data
for pokemon in pokemon_data:
writer.writerow(pokemon)

print(f"Data has been successfully exported to {csv_file}")

This script can be easily modified to fetch more Pokémon or include additional details. The repository for your reference is listed below:

Conclusion.

This is just a small sample use case of an ETL project using the Pokemon API. In a nutshell what we’re doing is making a request to an API to retrieve information. That information is usually retrieved in JSON format. from there we can parse that data into a CSV file for further analysis.

--

--

adam lee

I'm a martial artist and software engineer. I enjoy writing about Martial Arts, Personal Development, Technology, and Travel.