If you haven’t already done so, you’ll need to first setup your Flink development environment. See How to do the exercises for an overall introduction to these exercises.

The task of the “Taxi Ride Cleansing” exercise is to cleanse a stream of TaxiRide events by removing events that do not start or end in New York City.

The GeoUtils utility class provides a static method isInNYC(float lon, float lat) to check if a location is within the NYC area.

Input Data

This series of exercises is based a stream of taxi ride events. The Taxi Data Stream instructions show how to setup the TaxiRideSource which generates a stream of TaxiRide events.

Expected Output

The result of the exercise should be a DataStream<TaxiRide> that only contains events of taxi rides which start and end in the New York City area as defined by GeoUtils.isInNYC().

The resulting stream should be printed to standard out.

Implementation Hints

The exercise program starts with a TaxiRideSource and requires a single transformation to filter all events that do not start and end within the New York City area.
The DataStream<TaxiRide> is generated using the TaxiRideSource as described in the Taxi Data Stream instructions.
Flink’s DataStream API features a DataStream.filter(FilterFunction) transformation to filter events from a data stream. The GeoUtils.isInNYC() function can be called within a FilterFunction to check if a location is in the New York City area.

Documentation

Reference Solution

Reference solutions are available at GitHub: