The goal of the “Long Ride Alerts” exercise is to indicate whenever a taxi ride started two hours ago, and is still ongoing.
The input data of this exercise is a
DataStream of taxi ride events. You will want to use a
TaxiRideSource, as described in the page about the Taxi Data Stream.
You can filter the events to only include rides within New York City (as is done in the Taxi Ride Cleansing exercise), but it’s not essential.
The result of the exercise should be a
DataStream<TaxiRide> that only contains START events of taxi rides which have no matching END event within the first two hours of the ride.
The resulting stream should be printed to standard out.
Here are the rideIds and start times of the first few rides that go on for more than two hours, but you might want to print other info as well:
> 2758,2013-01-01 00:10:13 > 7575,2013-01-01 00:20:23 > 22131,2013-01-01 00:47:03 > 25473,2013-01-01 00:53:10 > 29907,2013-01-01 01:01:15 > 30796,2013-01-01 01:03:00 ...
Reference solutions are available at GitHub: