Challenges on a Real-world Dataset

When I received a dataset from my community partner, I did not expect the files would be that large – around 0.5 GB. The files were not structured properly for a plotting library such as D3. The other challenge was that users’ locations were taken as city and state, so I must convert them into latitude and longitude using a geocoding service. Additionally, all the files were stored in JSON format, where each user is in a separate file. Thus, to make use of users’ metadata, all these files had to be merged into a single file. Fortunately, I found jq library that would process JSON files on the command line. I used jq library to merge all files into one file. Then, I used jq and regular expression language to apply additional filters to take out null and create categories from users’ metadata.

Leave a comment