RESEARCH QUESTION AND HYPOTHESIS
This study is devoted to identifying twin trips in a city, i.e., pairs of travelers who make almost the same trips. Such travelers demonstrate the potential demand for shared mobility systems, especially possible tripsharing services such as ridesourcing, shared taxis, ridesharing, etc. A major hypothesis of this study is to limit consideration to spatiotemporal features of the trips to assess their similarities and their potential for matching (Rayle et al. 2015). Other attributes such as cost, comfort, additional behavioral variables, or the characteristics of the transportation service are not yet accounted for (Zhan, Qian, and Ukkusuri 2016; Vazifeh et al. 2018).
METHODS AND DATA
The Chinese transport network company DiDi Chuxing has released two months’ worth of data consisting of more than 6 million trips performed by their drivers (Xu et al. 2019). For each trip i, this dataset gives access to the following information: departure time tPUi and location pPUi=(xPUi,yPUi) of the passenger(s) pickup; arrival time tiDO and location pDOi=(xDOi,yDOi) of the dropoff. For this study, we only used a subset of the dataset by focusing on the peak hours of a regular day: approximately 10,000 trips from 8h to 11h on November 18, 2016. Moreover, we consider that these observations correspond to the desired departure/arrival times and origins/destinations of the travelers.
To identify the trips that can be made with the same vehicle, we use the following method. First, we define a function S(i,j) to express the similarity between two trips i and j. This function must encompass the different spatiotemporal attributes of the trips. It should reproduce the trip information that two travelers can share if their origins and locations and also their departure and arrival times are close enough. To the authors’ best knowledge, this kind of similarity index is almost nonexistent in the literature (Ketabi, Alipour, and Helmy 2018). Consequently, we propose the following function: ¯S(i,j)=∑l∈[PU,DO]αlefl(i,j) where fl(i,j) is a feasibility function and αl is a coefficient.
Function f describes the service’s potential to operate the shared trips, i.e., the ability to pick up (or drop off) the two travelers before both of their desired departure times: fl(i,j)=tli−tlj−γd(pli,plj) where d is the geodesic distance and γ is the average duration pace to connect travelers who wish to share a trip. This parameter is a general and synthetic formula to describe the operation of the service and the way in which this service gathers two demand requests into the same vehicle: defining a meeting point, successive pickups, etc. For example, if the first traveler must walk to the second traveler’s pickup point, then γ is the inverse of the walking speed. If this distance is traveled by car, meaning that the service offers doortodoor service, then γ is the inverse of the vehicle speed. Consequently, f is positive if the match is realized before the two desired departure times tli and tlj, whereas f is negative if travelers must experience delays to make the match possible. Moreover, αl is equal to 1/2 if fl(i,j)>0 and to 3/2 otherwise because it is more disadvantageous to be delayed.
In addition to this measure of similarity ¯S(i,j), excessive distances/durations for rendezvous are penalized. Thus, penalties θlx and θlt are added when, respectively, the distances between pickup (or dropoff) locations and departure (or arrival) times of trips i and j exceed, respectively, specific thresholds δlx and δlt : θlx=ed(pli,plj)−δlx∀l / d(pli,plj)>δlxθlt=etli−tlj.δlxδlt−δlt∀l / tli,tlj>δlt
Otherwise, these penalties are null. In this manner, S(i,j)=¯S(i,j)+θlx+θlt defines a sharp function that enhances the differences between trips and facilitates identification of twin travelers in the dataset.
Next, trips are gathered using a clustering method. It is important to note that a cluster is not a region of the city but a set of trips that are similar based on their pickup and dropoff attributes. These trips are related to travelers, i.e., demand, who may share a vehicle according to their origin/destination and departure/arrival time. For this study, a DBSCAN approach with S as the distance function is used. This makes it possible to fix the minimum number of points requested by cluster (Ester et al. 1996). Here, this minimal number is fixed at two, and we only select clusters with two elements because the study aims to determine pairs of similar trips. DBSCAN also requires a threshold ϵ on the similarity function that is the radius of a neighborhood with respect to some point, i.e., the maximal dissimilarity authorized to determine if two trips can be paired. The parameters used to obtain the different figures in this article are summarized in Table 1.
FINDINGS
Figure 1 shows the trips of 7 different pairs of twin travelers projected on the roadmap of Chengdu, China. Visual inspection reveals that these results are very promising. Pickup and dropoff locations are close (less than 1 km, geodesic distance) while the differences in departure and arrival times remain low (less than 10 min). Moreover, ρ=18.3% of the trips can be paired for the studied period. This is very interesting because the fleet size of DiDi, and, by extrapolation, the number of cars flowing in the network can be significantly reduced if vehicles are shared. This reduction can even be higher if more than two travelers share the same vehicle. The methodology can be extended to such cases by changing the minimal number of points in the clustering process. Even if the DiDi data is not fully representative of the complete traffic flow, these results highlight the fact that shared mobility may be a promising strategy to improve the transportation system’s performance.
Visual observations are confirmed by Figure 2.a, which depicts the distribution of the average length ˉlk of the trips for each pair k, whereas Figure 2.b shows the distributions of the average travel times ˉτk. In addition, Figures 2.c and 2.d present the distributions of the absolute difference in departure times ¯tPUi−tPUjk and the absolute difference in the two arrival times ¯tDOi−tDOjk. It appears that all these values are entirely consistent with the natural idea of what the characteristics of similar trips should be:

The average length ˉlk of the twin trips is equal to 6.2 km/h (road distance). Notice that the dataset focuses on a subpart of Chengdu’s network (a circle with a 5.5 km radius). The associated average travel time is around 17.3 min, leading to an average speed of 21.6 km/h. Consequently, trips are long enough to allow for the delay caused by sharing the vehicle with another traveler.

Consequently, the difference in the two departure times is on average equal to 4.9 min and lower than 6.6 min for 80% of the trips.

The average estimated delay is equal to 7.2 min and more than 80% of the trips experience a delay of less than 10 min.

Finally, it means that a traveler may find their twin to share a vehicle with an increase of only 30% in travel time. This extra time could be drastically reduced by optimizing dispatch of the transportation supply (Mourad, Puchinger, and Chu 2019).
ACKNOWLEDGMENTS
The authors thank Dr. “MFD” Guilhem Mariotte for his valuable comments. Data source: DiDi Chuxing GAIA Open Dataset Initiative, available at: https://gaia.didichuxing.com