Identifying Twin Travelers Using Ridesourcing Trip Data

Nicolas Chiabaut; Cyril Veve

doi:10.32866/9223

RESEARCH QUESTION AND HYPOTHESIS

This study is devoted to identifying twin trips in a city, i.e., pairs of travelers who make almost the same trips. Such travelers demonstrate the potential demand for shared mobility systems, especially possible trip-sharing services such as ridesourcing, shared taxis, ridesharing, etc. A major hypothesis of this study is to limit consideration to spatiotemporal features of the trips to assess their similarities and their potential for matching (Rayle et al. 2015). Other attributes such as cost, comfort, additional behavioral variables, or the characteristics of the transportation service are not yet accounted for (Zhan, Qian, and Ukkusuri 2016; Vazifeh et al. 2018).

METHODS AND DATA

The Chinese transport network company DiDi Chuxing has released two months’ worth of data consisting of more than 6 million trips performed by their drivers (Xu et al. 2019). For each trip \(i\), this dataset gives access to the following information: departure time \(t^{PU}_{i}\) and location \(p^{PU}_{i}=(x^{PU}_{i},y^{PU}_{i})\) of the passenger(s) pick-up; arrival time \(t^{i}_{DO}\) and location \(p^{DO}_{i}=(x^{DO}_{i},y^{DO}_{i})\) of the drop-off. For this study, we only used a subset of the dataset by focusing on the peak hours of a regular day: approximately 10,000 trips from \(8h\) to \(11h\) on November 18, 2016. Moreover, we consider that these observations correspond to the desired departure/arrival times and origins/destinations of the travelers.

To identify the trips that can be made with the same vehicle, we use the following method. First, we define a function \(S(i,j)\) to express the similarity between two trips \(i\) and \(j\). This function must encompass the different spatiotemporal attributes of the trips. It should reproduce the trip information that two travelers can share if their origins and locations and also their departure and arrival times are close enough. To the authors’ best knowledge, this kind of similarity index is almost nonexistent in the literature (Ketabi, Alipour, and Helmy 2018). Consequently, we propose the following function: \[\overline{S(i,j)}=\sum_{l \in [PU,DO]}\alpha_{l} e^{|f^{l}(i,j)|}\] where \(f^{l}(i,j)\) is a feasibility function and \(\alpha_{l}\) is a coefficient.

Function \(f\) describes the service’s potential to operate the shared trips, i.e., the ability to pick up (or drop off) the two travelers before both of their desired departure times: \[f^{l}(i,j) = |t^{l}_{i}-t^{l}_{j}|-\gamma d(p^{l}_{i},p^{l}_{j})\] where \(d\) is the geodesic distance and \(\gamma\) is the average duration pace to connect travelers who wish to share a trip. This parameter is a general and synthetic formula to describe the operation of the service and the way in which this service gathers two demand requests into the same vehicle: defining a meeting point, successive pick-ups, etc. For example, if the first traveler must walk to the second traveler’s pick-up point, then \(\gamma\) is the inverse of the walking speed. If this distance is traveled by car, meaning that the service offers door-to-door service, then \(\gamma\) is the inverse of the vehicle speed. Consequently, \(f\) is positive if the match is realized before the two desired departure times \(t^{l}_{i}\) and \(t^{l}_{j}\), whereas \(f\) is negative if travelers must experience delays to make the match possible. Moreover, \(\alpha_l\) is equal to \(1/2\) if \(f^{l}(i,j)>0\) and to \(3/2\) otherwise because it is more disadvantageous to be delayed.

In addition to this measure of similarity \(\overline{S(i,j)}\), excessive distances/durations for rendezvous are penalized. Thus, penalties \(\theta^{l}_{x}\) and \(\theta^{l}_{t}\) are added when, respectively, the distances between pick-up (or drop-off) locations and departure (or arrival) times of trips \(i\) and \(j\) exceed, respectively, specific thresholds \(\delta^{l}_{x}\) and \(\delta^{l}_{t}\) : \[\begin{aligned} \theta^{l}_{x} = e^{d(p^{l}_i,p^{l}_j)-\delta^{l}_{x}} & \quad \, \forall l \ / \ d(p^{l}_i,p^{l}_j) > \delta^{l}_{x} \\ \theta^{l}_{t} = e^{|t^{l}_{i}-t^{l}_{j}|.\frac{\delta^{l}_{x}}{\delta^{l}_{t}}-\delta^{l}_{t}} & \quad \, \forall l \ / \ |t^{l}_{i},t^{l}_{j}| >\delta^{l}_{t}\end{aligned}\]

Otherwise, these penalties are null. In this manner, \(S(i,j)=\overline{S(i,j)} + \theta^{l}_{x} + \theta^{l}_{t}\) defines a sharp function that enhances the differences between trips and facilitates identification of twin travelers in the dataset.

Next, trips are gathered using a clustering method. It is important to note that a cluster is not a region of the city but a set of trips that are similar based on their pick-up and drop-off attributes. These trips are related to travelers, i.e., demand, who may share a vehicle according to their origin/destination and departure/arrival time. For this study, a DB-SCAN approach with \(S\) as the distance function is used. This makes it possible to fix the minimum number of points requested by cluster (Ester et al. 1996). Here, this minimal number is fixed at two, and we only select clusters with two elements because the study aims to determine pairs of similar trips. DB-SCAN also requires a threshold \(\epsilon\) on the similarity function that is the radius of a neighborhood with respect to some point, i.e., the maximal dissimilarity authorized to determine if two trips can be paired. The parameters used to obtain the different figures in this article are summarized in Table 1.

Table 1:Parameters Used To Obtain the Different Figures

	Parameter	Value	Significance
DB-Scan	MinPts	2	Nb of trips per cluster
	ϵ	4	Radius of a neighborhood
Similarity	γ	0.1 h/km	Average pace to connect travelers
	δ _t ^{P U}	0.1 h	Threshold of departure times
	δ _t ^{D O}	0.25 h	Threshold of arrival times
	δ _x ^{P U}	0.25 km	Threshold of PU locations
	δ _x ^{D O}	0.25 km	Threshold of DO locations

FINDINGS

Figure 1 shows the trips of 7 different pairs of twin travelers projected on the roadmap of Chengdu, China. Visual inspection reveals that these results are very promising. Pick-up and drop-off locations are close (less than \(1\) km, geodesic distance) while the differences in departure and arrival times remain low (less than \(10\) min). Moreover, \(\rho = 18.3\%\) of the trips can be paired for the studied period. This is very interesting because the fleet size of DiDi, and, by extrapolation, the number of cars flowing in the network can be significantly reduced if vehicles are shared. This reduction can even be higher if more than two travelers share the same vehicle. The methodology can be extended to such cases by changing the minimal number of points in the clustering process. Even if the DiDi data is not fully representative of the complete traffic flow, these results highlight the fact that shared mobility may be a promising strategy to improve the transportation system’s performance.

Figure 1:Similar trips for Six Different Pairs; Pick-up Locations Are Circled in Green Whereas Drop-off Locations Are Circled in Red

Visual observations are confirmed by Figure 2.a, which depicts the distribution of the average length \(\bar{l}_k\) of the trips for each pair \(k\), whereas Figure 2.b shows the distributions of the average travel times \(\bar{\tau}_k\). In addition, Figures 2.c and 2.d present the distributions of the absolute difference in departure times \(\overline{|t^{PU}_{i}-t^{PU}_{j}|}_k\) and the absolute difference in the two arrival times \(\overline{|t^{DO}_{i}-t^{DO}_{j}|}_k\). It appears that all these values are entirely consistent with the natural idea of what the characteristics of similar trips should be:

The average length \(\bar{l}_k\) of the twin trips is equal to \(6.2\) km/h (road distance). Notice that the dataset focuses on a subpart of Chengdu’s network (a circle with a \(5.5\) km radius). The associated average travel time is around \(17.3\) min, leading to an average speed of \(21.6\) km/h. Consequently, trips are long enough to allow for the delay caused by sharing the vehicle with another traveler.
Consequently, the difference in the two departure times is on average equal to \(4.9\) min and lower than \(6.6\) min for \(80\%\) of the trips.
The average estimated delay is equal to \(7.2\) min and more than \(80\%\) of the trips experience a delay of less than \(10\) min.
Finally, it means that a traveler may find their twin to share a vehicle with an increase of only \(30\%\) in travel time. This extra time could be drastically reduced by optimizing dispatch of the transportation supply (Mourad, Puchinger, and Chu 2019).

Figure 2:(a) Distribution of the Average Length \(\bar{l}_k\) of Trips Within the Cluster \(k\); (b) Distribution of Average Travel Times \(\bar{\tau}_k\), (c) Absolute Difference in Departure Times \(\overline{|t^{PU}_{i}-t^{PU}_{j}|}_k\) and (d) Absolute Difference in Arrival Times \(\overline{|t^{DO}_{i}-t^{DO}_{j}|}_k\) Among the Pairs; Dotted Lines Show the Mean of the Distributions

ACKNOWLEDGMENTS

The authors thank Dr. “MFD” Guilhem Mariotte for his valuable comments. Data source: DiDi Chuxing GAIA Open Dataset Initiative, available at: https://gaia.didichuxing.com

Identifying Twin Travelers Using Ridesourcing Trip Data

Abstract

RESEARCH QUESTION AND HYPOTHESIS

METHODS AND DATA

FINDINGS

ACKNOWLEDGMENTS

References