2023 in Books¶
Setting up the data and some useful functions¶
In [1]:
%%capture
!pip install wordcloud
import json
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import re
import seaborn as sns
from IPython import display
from wordcloud import WordCloud
pd.set_option("display.max_rows", 5)
In [2]:
df = pd.read_csv('2023-books.csv')
df['start_date'] = pd.to_datetime(df['start_date'])
df['end_date'] = pd.to_datetime(df['end_date'])
df['first_published'] = pd.to_datetime(df['first_published'])
df
Out[2]:
name | slug | pages | start_date | end_date | first_published | tags | |
---|---|---|---|---|---|---|---|
0 | Zen Mind, Beginner's Mind: Informal talks on Z... | zen-mind-beginners-mind | 138 | 2023-01-01 | 2023-01-15 | 1970-06-01 | philosophy,buddhism,non-fiction |
1 | Tress of the Emerald Sea | tress-of-the-emerald-sea | 369 | 2023-01-16 | 2023-01-22 | 2023-01-10 | fantasy,fiction,cosmere |
... | ... | ... | ... | ... | ... | ... | ... |
17 | Doble Cara | doble-cara | 416 | 2023-11-13 | 2023-12-08 | 2023-05-09 | politics,non-fiction |
18 | Another Now | another-now | 240 | 2023-12-09 | 2023-12-15 | 2020-09-10 | economics,politics,fiction |
19 rows × 7 columns
In [3]:
def display_book(book, comment=''):
return display.Markdown(f"""
![](https://egrajeda.com/images/{book['slug'].iloc[0]}.jpg)\n
{comment}
""")
Let's start with something easy: how many books I read this year...
🧮 How many books I read this year?¶
In [4]:
display.Markdown("I read a total of {:,} books".format(len(df)))
Out[4]:
I read a total of 19 books
🧓 What was the oldest book I read this year?¶
In [5]:
oldest_book = df.sort_values(by=['first_published']).head(1)
display_book(oldest_book, "☝️ First published in {:%B %d, %Y}".format(oldest_book['first_published'].iloc[0]))
Out[5]:
☝️ First published in June 01, 1970
🤔 How many pages I read this year?¶
In [6]:
total_pages = df['pages'].sum()
display.Markdown("A grand total of ✨ {:,} pages ✨".format(total_pages))
Out[6]:
A grand total of ✨ 7,205 pages ✨
Which one was the shortest 🫣 and the longest 💪 book I read this year?¶
In [7]:
shortest_book = df.sort_values(by=['pages']).head(1)
longest_book = df.sort_values(by=['pages']).tail(1)
display_book(shortest_book, f"🫣 My shortest book was {shortest_book['pages'].iloc[0]} pages")
Out[7]:
🫣 My shortest book was 138 pages
In [8]:
display_book(longest_book, f"💪 My longest book was {longest_book['pages'].iloc[0]} pages")
Out[8]:
💪 My longest book was 880 pages
📖 What was my average book length this year?¶
In [9]:
pages_mean = df['pages'].mean()
display.Markdown("My average book length was ✨ {:,.0f} pages ✨".format(pages_mean))
Out[9]:
My average book length was ✨ 379 pages ✨
Now let's try more interesting metrics 😃...
🏃 Which book did I read the fastest, and which one did I read the slowest?¶
In [10]:
df['pages_per_day'] = df['pages'] / (df['end_date'] - df['start_date']).dt.days
slowest_book = df.sort_values(by=['pages_per_day']).head(1)
fastest_book = df.sort_values(by=['pages_per_day']).tail(1)
display_book(fastest_book, "🏃 Read {:,.0f} pages per day".format(fastest_book['pages_per_day'].iloc[0]))
Out[10]:
🏃 Read 79 pages per day
In [11]:
display_book(slowest_book, "🐢 Read {:,.0f} pages per day".format(slowest_book['pages_per_day'].iloc[0]))
Out[11]:
🐢 Read 9 pages per day
☁️ What type of books I read this year?¶
In [12]:
tags_count = df["tags"].str.split(',').explode("tags").value_counts()
tags_word_cloud = WordCloud(background_color="white").generate_from_frequencies(tags_count)
plt.imshow(tags_word_cloud, interpolation='bilinear')
plt.axis("off");