NYCDOH Covid Dataset - Omicron Update
NYC Dept of Health’s Covid Dataset - Omicron Update
Introduction and Timeline
So it has been over a year since I wrote my original post exploring the NYC Covid Dataset. Periodically I would run my notebook to see updated data and yes there have been some changes since last year. I wanted to explore the dataset again since Omicron is all over the news and since I personally tested positive as well despite being vaccinated, wearing a mask in public, and being as cautious as possible. Maybe I didn't exercise enough caution, but here I am now under quarantine and looking at the data again. This is a continuation of my previous post on covid. You can check out that page here for background information on the data itself.
Data Source
The data sources are the same as before. I will explore the following data sets
- data-by-date looks at cases, hospitalizations and deaths daily across the five boroughs
- tests looks which looks at the testing that has been done across the city as well as positive test results (cases)
- data-by-modzcta looks at all the data by zipcode
import pandas as pd
import requests
import json
# Git Repo URL
url="https://github.com/nychealth/coronavirus-data/raw/master"
# import data
daily_data = f'{url}/trends/data-by-day.csv'
testing_data = f'{url}/trends/tests.csv'
zipcode_data = f'{url}/totals/data-by-modzcta.csv'
Cases, Hospitalizations, Deaths
Again, we will look at total cases, hospitalizations and deaths. This time around I did not do a break down by borough. This time around the data range is from 02/29/2020 to 12/14/2021 Lists of the data were created in order to plot the data, and from the list we can see that the date range for the data is 02/29/2020 to 11/19/2020. Again, I checked the date range and the peak number for each figure I was interested in (cases, hospitalizations, deaths).
# Date range
print("The date range is",dates[0],"to",dates[-1] )
# Peak number of cases
print("For cases...")
peakdates(cases,dates)
# Peak # of hospitalizations
print("For hospitalizations...")
peakdates(hospitalizations,dates)
# Peak # of deaths
print("For deaths...")
peakdates(deaths,dates)
The date range is 2020-02-29 00:00:00 to 2021-12-14 00:00:00
For cases...
The peak occurred on 2021-01-04 and had a count of 6593
For hospitalizations...
The peak occurred on 2020-03-30 and had a count of 1848
For deaths...
The peak occurred on 2020-04-07 and had a count of 599
Covid Cases
This is where we see our first major difference from a year ago. When I ran this in Nov 2020, the peak number of cases has occurred on April 6th 2020 (6353) which was right around the time when the first covid wave hit NYC and we were a month into a lock down.
Looking at the updated data, the new peak occurred on January 4th 2021 (6593) during the second wave. The number of cases were dropping but in July of 2021 you begin to see a rise in cases again. This was during the period when the Delta Variant started to spread in NYC. This was just a small uptick and case numbers were dropping into November of 2021. This was short lived as the numbers rose again, including the recent surge this past week. As of December 14th cases reached 6072 nearing the previous peak. It will be interesting to see where the numbers will go next week.
We also see that when looking at the last 30 days, case numbers are trending upwards.
Hospitalizations
When looking at hospitalizations, the data shows us some good news. While there have been some small increases in the number of covid related hospitalizations, they have never increased to the levels we had back in March 30th 2020 (1848) during the beginning of the pandemic. The main reason I believe hospitalizations never returning to March 2020 levels is because we know more about the virus now than we did back then. At the start of the pandemic if you were suspected of having covid the only place to really go and get tested/treated was the hospital. Going to the hospital also meant you were immediately placed under a 14 day quarantine as not much was known about the virus. In addition, unfortunately I believe those that were most vulnerable were hit hard during the start of the pandemic. NYC Public schools closed on March 16th 2020, Bars and Restaurants closed on March 17th 2020, and the Statewide Pause Program (all non-essential workers must stay home) began on March 22nd 2020. Masks only became mandatory on April 15th 2020. The individuals who were hospitalized during the peak most likely came in contact with the virus before preventative measures such as wearing masks were in place.
The peak occurred on 2020-03-30 and had a count of 1835
For the past 30 days it looks like there was an increase in hospitalizations, with a significant drop off the last few days. Thankfully although numbers were increasing they were nowhere near the March 2020 numbers.
Covid Deaths
Like hospitalizations, Im happy to report that the numbers never returned to the previous peak.
The peak occurred on 2020-04-07 and had a count of 598
There has been a slight increase when looking at the 30 day trend.
7 Day Averages
Here is a look at the 7 day averages for Case4s, Hospitalizations, and Deaths together on a graph. While we do see that cases rose during the second wave as well as during the onset of both Delta and Omicron, hospitalizations and deaths have thankfully not had the same resurgences.
Testing
Testing has continued to increase. Previously the peak number of tests conducted occurred on Nov 16th 2020 (71,626). Currently we see that since 2020 there was an increase in testing. Testing began to decline in January 2021 as the vaccines became more available and the focus went from testing to vaccinations. That pattern of declining testing changed in July of 2021 right around the time the Delta variant made it's appearance and even so now with the emergence of Omicron. The current peak occurred on Dec 06th 2021 (102,709).
The peak occurred on 2020-11-16 and had a count of 71626
The peak occurred on 2021-12-06 and had a count of 102709
Positive Test Results
I also looked at positive test results. The reason for this is because testing has increased significantly since the start of the pandemic and we should look not just at the number of tests administered, but how many yielded positive results. Initially, the highest number of positive cases occurred the onset of the pandemic specifically on Apr 06, 2020 (6,780 positive cases). We now see that the new peak occurred during the second wave, specifically on Jan 4th 2021 (8,079 positive cases).
The peak occurred on 2020-04-06 and had a count of 6780
The peak occurred on 2021-01-04 and had a count of 8079
Now we must also look at percentage of positive cases as it is not fair to look at just the number of positive cases since testing has increase so much. Initially we see that the peak percentage of positive cases occurred at the start of the pandemic, specially on March 28th 2020 when 71.17% of tests yielded positive results. This high percentage is due to a bias caused by early testing which primarily occurred in the emergency rooms of NYC hospitals which were only taking in individuals suspected of having covid. Im happy to report that the percentage of positive cases have NOT reached those Early 2020 levels. We see that there were a few slight upticks during the second wave and the emergence of Delta as well as what looks like an uptick now for Omicron. However these upticks have yet to reach those 2020 levels.
The peak occurred on 2020-03-28 and had a count of 0.7117
The peak occurred on 2020-03-28 and had a count of 0.7117
Also we do see that when looking at the last 30 days, positive results are trending upwards.
The peak occurred on 2021-12-12 and had a count of 0.0735
The max positive result was 7.35 %
The max positive 7day avg result was 5.029999999999999 %
Conclusion & Code
I will probably run this code again on a later date. If numbers increase in NYC I will run an analysis sooner rather than later. You can check out the full Python code using the following methods:
- Github Page: Francisco’s Repository
- Google Colab: