Counting Go files
As a gopher, my first reaction was to check how many Go files are in that dataset. My SQL is not amazing, but I’m able to do that!
SELECT COUNT(*) FROM [bigquery-public-data:github_repos.files] WHERE RIGHT(path, 3) = ‘.go’
Running that query I see that there are more than 12 million files with a .go extension in the dataset. That’s a lot! But wait … I just ran that query on TWO BILLION ROWS and it finished in 6 seconds? Wow! 😮
Ok, so that’s awesome! But I also processed 105GB, and since I’m the cost of the query is proportional to the size of the data queried (even though the first TB per month is free) it’s probably a good idea to create a new dataset and a new table containing just the files with a .go extension to minimize the cost.