Do you have kids of various ages and would like a movie recommendation? Well, I have some specifics for today, but I also wanted to talk a bit about how I got the answers I wanted. This was very much written as I went along and I think it shows, but hopefully the tone will show that this isn't a particularly difficult thing to go after.
Wikipedia has some nifty pages, for example 1980 in film, that include the highest-grossing films of the year (in the U.S., at least).
I would love to cross-reference that with the Motion Picture Rating (MPA) - the "PG-13" or whatever, to act as a quick reminder as to whether there's something in the films I'm not remembering that I should watch out for or as a quick filter. I'm pretty sure I've seen multiple pages discussing this, and I also see there's a sort-of-structured box in many articles, so I went looking around to see if there was any way to access this.
And indeed there is, and you don't even need any specialized tooling!
For example, Mission: Impossible 2 has a Tools links to https://www.wikidata.org/wiki/Q505790, the wikidata item. There are a number of statements about the item on the page, which you can browse to get a sense of what relationships you might find, and you can look for the MPA film rating item, which is property P1657, which in this case has value PG-13 aka Q18665339.
OK, so in addition to this particular property, what am I likely going to care about? Again, just browsing the MI:2 page, these stand out.
Now, looking around, there's a query builder so it looks like I might not even need to learn SPARQL.
I can add three conditions: one for instance of, and two for publicate date with end-of-year and beginning-of-year to narrow that down. I don't yet know how to get the top grossing films, but I can filter by values easily with an additional condition.
The gets me a flat list like this.
OK, so it looks like I'm going to have to learn at least some SPARQL to get the box office values and the ratings. Time to read A gentle introduction to the Wikidata Query Service.
Well then, let's do this bit by bit in https://query.wikidata.org/. First let's get a bunch of data.
SELECT ?film ?filmLabel ?MPA_film_rating ?MPA_film_ratingLabel ?box_office ?publication_date
WHERE
{
?film wdt:P31 wd:Q11424;
wdt:P577 ?publication_date;
p:P2142 ?box_office;
wdt:P1657 ?MPA_film_rating .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
LIMIT 100
Dang, I'm getting duplicated because there are multiple box office location values. Looks like I can make "predicate qualifiers" for those, let's start with something simple like valid in place (https://www.wikidata.org/wiki/Property:P3005) being USA (https://www.wikidata.org/wiki/Q30)
; now get the USA box office only
SELECT ?film ?filmLabel ?MPA_film_rating ?MPA_film_ratingLabel ?box_office ?publication_date
WHERE
{
?film wdt:P31 wd:Q11424;
wdt:P577 ?publication_date;
p:P2142 ?box_office_statement;
wdt:P1657 ?MPA_film_rating .
?box_office_statement ps:P2142 ?box_office;
pq:P3005 wd:Q30 .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
LIMIT 100
OK, similar deal for publication date, there is a 'place of publication' P291 I can use.
SELECT ?film ?filmLabel ?MPA_film_rating ?MPA_film_ratingLabel ?box_office ?publication_date
WHERE
{
?film wdt:P31 wd:Q11424;
p:P577 ?publication_statement;
p:P2142 ?box_office_statement;
wdt:P1657 ?MPA_film_rating .
?box_office_statement ps:P2142 ?box_office;
pq:P3005 wd:Q30 .
?publication_statement ps:P577 ?publication_date;
pq:P291 wd:Q30 .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
LIMIT 100
Nifty, now let's start filtering again.
SELECT ?film ?filmLabel ?MPA_film_rating ?MPA_film_ratingLabel ?box_office ?publication_date
WHERE
{
?film wdt:P31 wd:Q11424;
p:P577 ?publication_statement;
p:P2142 ?box_office_statement;
wdt:P1657 ?MPA_film_rating .
?box_office_statement ps:P2142 ?box_office;
pq:P3005 wd:Q30 .
?publication_statement ps:P577 ?publication_date;
pq:P291 wd:Q30 .
FILTER("2015-01-01"^^xsd:dateTime <= ?publication_date && ?publication_date < "2016-01-01"^^xsd:dateTime).
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
LIMIT 100
For this particular year, the results look decent and I can simply sort in the UI. I'm still getting one duplicate film because, well, it has multiple publication dates, but I can live with that.
Two more quick tweaks: let's drop the MPA item and just keep the label, and remove R-rated movies.
SELECT ?film ?filmLabel ?MPA_film_ratingLabel ?box_office ?publication_date
WHERE
{
?film wdt:P31 wd:Q11424;
p:P577 ?publication_statement;
p:P2142 ?box_office_statement;
wdt:P1657 ?MPA_film_rating .
?box_office_statement ps:P2142 ?box_office;
pq:P3005 wd:Q30 .
?publication_statement ps:P577 ?publication_date;
pq:P291 wd:Q30 .
FILTER("2014-01-01"^^xsd:dateTime <= ?publication_date && ?publication_date < "2015-01-01"^^xsd:dateTime).
FILTER(?MPA_film_rating != wd:Q18665344).
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
LIMIT 100
Happy movie watching!
Tags: media