### **Function Overview: Fetching and Saving Book Reviews**
**Purpose:**
- Fetch book reviews from a remote API using GraphQL.
- Save these reviews in a CSV file.
---
#### **1. Define `get_reviews(page, size)`**
- **What it Does:** Retrieves a batch of reviews.
- **Parameters:**
- `page`: Token for pagination (which page of reviews to fetch).
- `size`: Number of reviews per request.
- **API Request:**
- URL: `https://kxbwmqov6jgg3daaamb744ycu4.appsync-api.us-east-1.amazonaws.com/graphql`
- **Headers:** Include standard HTTP headers and `x-api-key` for authentication.
- **GraphQL Payload:** Requests reviews with details like `date`, `content`, and `rating`.
- **Response Handling:**
- Extracts reviews from the JSON response.
- Retrieves the `nextPageToken` for pagination.
---
#### **2. Fetch and Save Reviews to a CSV File**
- **CSV Setup:** Create a file named `reviews.csv` with headers: `date`, `content`, and `rating`.
- **Pagination Loop:**
- Fetch reviews using `get_reviews()`.
- Write the reviews to the CSV file in batches.
- Continue fetching until:
- A specified number of reviews (e.g., 40,000) is reached, OR
- No more pages are available (`nextPageToken` is `None`).
---
### **How It All Works**
- The `get_reviews()` function handles the API interaction.
- A loop ensures multiple pages of reviews are fetched and written to the CSV file.
- The process stops once all reviews are retrieved or the limit is reached.
```python
import os
import requests
import csv
from dotenv import load_dotenv
load_dotenv()
API_URL = os.getenv("GOODREADS_API_URL")
API_KEY = os.getenv("GOODREADS_API_KEY")
if not API_URL or not API_KEY:
raise ValueError("Missing required environment variables: GOODREADS_API_URL and/or GOODREADS_API_KEY")
# Function to fetch reviews
def get_reviews(page, size):
headers = {
"accept": "*/*",
"accept-language": "en-US,en;q=0.9",
"cache-control": "no-cache",
"content-type": "application/json",
"dnt": "1",
"origin": "https://www.goodreads.com",
"pragma": "no-cache",
"referer": "https://www.goodreads.com/",
"sec-ch-ua": '"Chromium";v="124", "Google Chrome";v="124", "Not-A.Brand";v="99"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"macOS"',
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "cross-site",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
"x-api-key": API_KEY,
}
data = {
"operationName": "getReviews",
"variables": {
"filters": {
"resourceType": "WORK",
"resourceId": "kca://work/amzn1.gr.work.v1.RFggOR2DSfOMBgpwAAwCqQ"
},
"pagination": {
"after": page,
"limit": size
}
},
"query": """query getReviews($filters: BookReviewsFilterInput!, $pagination: PaginationInput) {
getReviews(filters: $filters, pagination: $pagination) {
...BookReviewsFragment
__typename
}
}
fragment BookReviewsFragment on BookReviewsConnection {
totalCount
edges {
node {
...ReviewCardFragment
__typename
}
__typename
}
pageInfo {
prevPageToken
nextPageToken
__typename
}
__typename
}
fragment ReviewCardFragment on Review {
__typename
id
createdAt
text
rating
}
"""
}
response = requests.post(API_URL, headers=headers, json=data)
if response.status_code != 200:
raise Exception(f"API call failed with status code {response.status_code}: {response.text}")
reviews = []
data = response.json()
reviews_raw = data['data']['getReviews']['edges']
for review_raw in reviews_raw:
review = {
'date': review_raw['node']['createdAt'],
'content': review_raw['node']['text'],
'rating': review_raw['node']['rating']
}
reviews.append(review)
next_page = data['data']['getReviews']['pageInfo']['nextPageToken']
return reviews, next_page
# Parameters
size = 100
next_page = "MTcyMDIsMTcyMzczMDE1OTAwOA"
max_reviews = 40000
output_file = "reviews.csv"
# Write to CSV
with open(output_file, mode="w", newline="", encoding="utf-8") as file:
writer = csv.DictWriter(file, fieldnames=["date", "content", "rating"])
writer.writeheader() # Write column headers
fetched_reviews = 0
while fetched_reviews < max_reviews:
reviews, next_page = get_reviews(next_page, size)
writer.writerows(reviews) # Write batch of reviews to CSV
fetched_reviews += len(reviews)
print(f"Fetched {fetched_reviews} reviews so far...")
if not next_page: # No more pages to fetch
break
print(f"Finished! {fetched_reviews} reviews saved to {output_file}.")
```
Running
![[Screenshot 2024-11-21 at 11.36.40 PM.png]]