Skip to content

Occupancy v2#212

Closed
gcamp wants to merge 20 commits intogoogle:masterfrom
TransitApp:occupancy_v2
Closed

Occupancy v2#212
gcamp wants to merge 20 commits intogoogle:masterfrom
TransitApp:occupancy_v2

Conversation

@gcamp
Copy link
Contributor

@gcamp gcamp commented Apr 9, 2020

Update : new thinking available here #212 (comment)

OccupancyStatus was added 6 years ago as an experimental field to the GTFS-rt spec. Multiple producers and consumers have adopted the value but the implementation has currently some problems.

The enum of occupancy_status has 7 values, however we've seen that the range of values is not used in practice. We've seen consumer mostly use 3 values to represent the entire range of occupancy. For example :

  • New South Wales buses only use the MANY_SEATS_AVAILABLE, FEW_SEATS_AVAILABLE and STANDING_ROOM_ONLY.
  • Some New South Wales trains uses all the values available except EMPTY and NOT_ACCEPTING_PASSENGERS. However, the agency suggest to never display more than 3 textual values for the consumer facing message. More granularity can be provided with colors.
  • MTA NYC has limited availability of the feature in their feed but do use FEW_SEATS_AVAILABLE, STANDING_ROOM_ONLY, FULL.

For this reason I would suggest to only give 3 values in the enum. SEATS_AVAILABLE, STANDING_AVAILABLE, FULL. The naming changed slightly to reflect the same 3 values available in SIRI.

However, we do lose a lot of granularity of how fine the occupancy value can be provided. To alleviate this problem I would suggest to add a second value occupancy_percentage to give a more accurate value of occupancy.

It would be percentage value representing the degree of passenger occupancy of the vehicle. The values are represented as an integer without decimals. 0 means 0% and 100 means 100%. The value 100 should represent the total maximum occupancy the vehicle was designed for, including both seated and standing capacity. It is possible that the value goes over 100 if there are currently more passengers than what the vehicle was designed for.

It was decided to have both values because they fulfill two different purpose. occupancy_status is provided to give a understandable value for the user (STANDING_ROOM_ONLY is way more understandable for a user than 50% full). occupancy_percentage is provided to give more granular data.

The occupancy_percentage is now part of a separate PR : #213

@googlebot
Copy link
Collaborator

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@skinkie
Copy link
Contributor

skinkie commented Apr 9, 2020

To me these are two separate issues.
Reducing the enum: -1
Adding the percentage: +1

I would even suggest to add:
free seats available (int)
mobility impaired seats available (int)

@juanborre
Copy link
Contributor

@googlebot I consent

@googlebot
Copy link
Collaborator

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes and removed cla: no labels Apr 9, 2020
@gcamp
Copy link
Contributor Author

gcamp commented Apr 9, 2020

@skinkie you don't think that having producer independently using 3 values of the enum but not always the same 3 values is a problem?

@nighthawk
Copy link

As a consumer and end user of this data, I think this is good.

As an end user (in Sydney), the existing seven values very often were of the nature "precise, but not accurate", i.e., they have all these different cases but what they say rarely matches what a bus looks like inside and what the actually availability of seats or standing room is, or whether the bus was in fact still accepting passengers or not. Limiting the enum cases moves this more towards "less precise, but more accurate".

The enum cases were more useful as a data consumer and having more of a "fill state" where it's about a percentage rather than whether the enum string value makes sense, but this PR addresses this by having a percentage.

So +1.

@nighthawk
Copy link

(And, agreed, having a separate indicator for "mobility impaired seats available" would be fantastic.)

@skinkie
Copy link
Contributor

skinkie commented Apr 9, 2020

@gcamp this is a problem, but an implementation / validation detail.

@rcbouchard
Copy link

Hi, MobiltyData invited us to discuss about our experience regarding Occupancy from STM.
We have chosen the levels that we think are the clearest for the customer

  1. Level 1: Distancing possible (or empty) - this is specific to the pandemic situation and could disappear when the pandemic is behind us (to be reviewed at that time). we have a very low ridership in the buses.
  2. Level 2: the customer can sit down
  3. Level 3: Customers can expect to be standing
  4. Level 4: the bus / train is almost full

At the moment, 3 levels would be too restricted, because we could not communicate the situation of physical distancing. I also find that the levels ((NOT_CROWDED, SOME_CROWDING, CROWDED) are ambiguous and can be interpreted differently depending on the client. This is why we decided to be a little more specific. Obviously, all of this could be adjusted if we realize that customers do not fully understand these levels.

René-Claude Bouchard
Société de transport de Montréal (STM)

@flocsy
Copy link
Contributor

flocsy commented Dec 14, 2020

Regarding the pandemic: IMHO it can be mapped to the existing values. In places where the distancing is enforced (they don't let more people to board) then "distancing not possible" == "full", and where it's not enforced, there "distancing not possible" == "crowded"

@francispeixoto
Copy link

@flocsy In response to the pandemic, we've implemented in our ITS a ratio system that allows to differenciate between maximal occupancy per vehicle type from desired occupancy.

Say a given bus model has 45 seats, but to respect the two meter rule, you want to limit occupancy to 20 seats, our users can set a ratio (in %) of maximal occupancy that will drive the GTFS-RT occupancy status, coupled with our people counter data.

In this case, the 45 seat bus would report "standing room only" if 20 people were on the bus at that time.

@sccmcca
Copy link
Contributor

sccmcca commented Dec 14, 2020

It seems that opinions vary greatly on changing OccupancyStatus.

The original problem points to difficulties for consumers to consistently interpret qualitative uses of OccupancyStatus on a quantitative scale, with the goal of distilling occupancy descriptions to generalized and meaningful information for riders. Some agencies and consumers have adopted the practice of using 3 enumerations for consistency (cc: @gcamp), backed by MBTA's research (cc: @paulswartz) that these are the most meaningful to riders. This standardization makes it easy to consistently interpret OccupancyStatus across parties.

At the same time, it has become clear in this thread and issue #223 that the ability to qualitatively describe occupancy levels with OccupancyStatus is distinct from mapping occupancy on a quantitative scale. This is characterized by explicit -1's from stakeholders in this conversation on reducing enums (@skinkie, @mike-swiftly, @flocsy), suggestions to add qualitative descriptors (@skinkie, @darylweinberg, @TCAT-UW), and the prevailing support for adding occupancy_percentage (#213) as experimental to address the need.

Meanwhile, among those who advocate a reduction of the enums, there is not a consistent agreement on how many there should be. @gcamp, @paulswartz, @barbeau, and @stevenmwhite advocate for 3 enums, while Auckland Transport (cc: @jxeeno) and STM (cc: @rcbouchard) use 4.

To summarize, it seems that the addition of occupancy_percentage poses a productive solution to an otherwise unproductive conversation.

Does occupancy_percentage address the need we are contending with without disrupting historical and varying qualitative use of OccupancyStatus? If so, what are everyone's thoughts on making occupancy_percentage official?

Figuring this out is a dependency for #240, a proposal for providing expected or usual occupancies in static. It should not be adopted with occupancy descriptors that are experimental or contested in GTFS Realtime. We would like consistency between the two so that they are representatively interchangeable, and that descriptors adopted in static do not cement anything that is experimental in GTFS Realtime.

Welcoming different opinions!

@RTL-Gilles-Ferland
Copy link

As an agency. We prefer the occupancy status enum as it directs the data consumers on what we want to publish to the clients, it also leaves a little room for interpretation (i.e. error) which is favorable in the static estimated occupancies. It also allows us to control the value published according to the vehicle size. We prefer a 4 level enum. (Empty, some seats available, standing room only, full)

We feel that occupancy percentage leaves the data consumers / clients with the responsibility to understand what 64%, for example, actually means. Since the transit vehicles vary in size, number of seats and standing room.

@flocsy
Copy link
Contributor

flocsy commented Dec 15, 2020

I think that the standard has to address 2 issues that came up:

  1. producers will need to be able to use it "conveniently". What I mean by that is that there might be technical difficulties giving accurate percentage, so a producer should be able to decide if they can/want to give percentage or enums.

  2. consumers will need to be able (with the help of the standard) understand/interpret the data. They might decide to display it different from the original data. They might map the percentage to enums, they might translate the enums different during a pandemic, etc. They might even do the opposite: "translate" enums/percentage to a scale and display it that way (like green/yellow/orange/red)

What's important is to give enough flexibility to the producer on one hand and enough confidence to interpret it in the correct way for the consumer on the other hand. So maybe the enums should not have a meaning hard coded in their name. Maybe there should be a configuration that the producer publishes with their enums. For example one producer would decide to use 3 enums, another to use 5. So we could call them: occupancy0...occupancyN, where N is given by the producer in the config. The standard would state that occupancy(M) < occupancy(M+1), and that it is recommended that they're "spaced" equally to make it easier to map them on a scale.

@sccmcca
Copy link
Contributor

sccmcca commented Dec 17, 2020

@flocsy

there might be technical difficulties giving accurate percentage

I agree. A good practice would be to use percentage buckets, such as at increments of 20 points (0-20, 20-40, 40-60, 60-80, 80-100). Providing fine-grained percentages would surely be subject to false promises, and would be unnecessary for rider understanding.

consumers will need to be able (with the help of the standard) understand/interpret the data. They might decide to display it different from the original data. They might map the percentage to enums, they might translate the enums different during a pandemic, etc. They might even do the opposite: "translate" enums/percentage to a scale and display it that way (like green/yellow/orange/red)

Perhaps this could be governed by a best practice instead of making hard changes to the existing OccupancyStatus enum list that are in use by others.

So we could call them: occupancy0...occupancyN, where N is given by the producer in the config. The standard would state that occupancy(M) < occupancy(M+1), and that it is recommended that they're "spaced" equally to make it easier to map them on a scale.

A challenge here is that depending on the seating and standing space available in a given vehicle, the existing OccupancyStatus enums are not always spaced equally. This was echoed by many in #223. However, I think it is a pertinent point that there needs to be some reference to interpret qualitative or discrete enumerations, especially when used differently across producers and consumers. It seems that occupancy_percentage can be used to accomplish this (i.e., we already have a solution). The producer is able to map what qualitative markers of occupancy mean on a continuous-scale, for as fine-grained or coarse as the number of OccupancyStatus enums are provided. Whether producers actually provide both in this way may not be governable, but again, could be a best practice for the disambiguation of occupancies between producers and consumers, and more importantly, to riders.

@gcamp @barbeau I'm wondering if we could reframe the PR to not deprecate experimental enums that are in use by others, but simply to make official 3 of them?

@barbeau
Copy link
Contributor

barbeau commented Dec 17, 2020

@scmcca which 3 enums would you make official?

@sccmcca
Copy link
Contributor

sccmcca commented Dec 17, 2020

@barbeau Based on the PR for 3 enums, I propose to make official:

  • MANY_SEATS_AVAILABLE
  • STANDING_ROOM_ONLY
  • FULL

We could also advocate (before or after) that MANY_SEATS_AVAILABLE be changed to SEATS_AVAILBLE, and that STANDING_ROOM_ONLY be changed to STANDING_AVAILABLE. The meanings would be essentially the same within themselves and within the context of the other listed experimental enums, while avoiding false promises brought on by "only" language.

Over time, there is the opportunity to reduce this list even further if stakeholders come to find that FEW_SEATS_AVAILABLE can be consolidated to SEATS_AVAILABLE, and CRUSHED_STANDING_ROOM_ONLY can be consolidated to STANDING_AVAILABLE.

@RTL-Gilles-Ferland
Copy link

@scmcca We agree with your proposal.

As for changing names to SEATS_AVAILBLE, we think it works. But for STANDING_AVAILABLE, we think preferable to keep the "only" since standing is by default always available. What we want to communicate by that value is that there is no more seats available.

@devadvance
Copy link

One of the aspects of percentages that I found thought-provoking: a percentage does not assume information about the type of accommodation available, which is dependent on both the vehicle and the rider's needs/preferences.

Is there an opportunity in this enum proposal to be more inclusive of individuals for whom SEATS_AVAILABLE or STANDING_AVAILABLE are not sufficient or purely scalar information for making decisions about their journey? E.g., while SEATS_AVAILABLE implies more space than STANDING_AVAILABLE, only the latter may be useful for someone with a mobility aid.

@stevenmwhite
Copy link
Contributor

stevenmwhite commented Dec 17, 2020

One of the aspects of percentages that I found thought-provoking: a percentage does not assume information about the type of accommodation available, which is dependent on both the vehicle and the rider's needs/preferences.

This is also the issue I've had with the current enums.

Unless your technology is actively reporting on individual seats, you can not guarantee that "only" standing room is available. Many people on transit will stand even if a seat is a available, and thus there is often more people on board than the seats accommodate, and yet there are still seats available.

While a percentage does not give information about the types of accommodation available, at least it is universally understood as a scale where 0 is always < 50 < 100. The same cannot necessarily be said for the enums that specifically call out seating vs standing.

There had previously been much discussion about "crowdedness" instead of specific accommodations and that benefits from being an understood scale where Not crowded is always < some crowding < very crowded (or similar terms).

Of course, if the technology used is actively tracking the occupancy of seats, that's a different story, but this is very few and far between if used at all.

@sccmcca
Copy link
Contributor

sccmcca commented Dec 17, 2020

@stevenmwhite I'm wondering if instead of changing the meaning of enums that are in use and are being advocated for by other stakeholders, that we instead clarify in the spec some guidelines around occupancy_percentage to leverage the generalized descriptions of Not crowded, Some crowding, Crowded. Something like:

  • 0-50 - Not crowded
  • 50-75 - Some crowding
  • 75-100+ - Crowded

@barbeau
Copy link
Contributor

barbeau commented Dec 17, 2020

I'd be concerned that officially adopting a subset of the current OccupancyStatus enum doesn't really solve the problems that currently exist with it. Along with the issues raised above, another problem with making a subset of the current OccupancyStatus official and interpreting it as a range is that we won't know which producers are using the "official" interpretation vs. the experimental one when looking at the data.

I think there are 3 primary occupancy use cases that people are discussing:

  1. Discrete occupancy state (EMPTY, MANY_SEATS_AVAILABLE, FEW_SEATS_AVAILABLE, STANDING_ROOM_ONLY, CRUSHED_STANDING_ROOM_ONLY, FULL, NOT_ACCEPTING_PASSENGERS) - These aren't necessarily continuous across a range or generalizable across all vehicle types, but instead they define labeled states the vehicle can be in. Best represented in a UI as the text version of the enumeration and should NOT be used as a scale. OccupancyStatus is currently defined this way in the spec.
  2. Coarse bucketed range (could be NOT_CROWDED, SOME_CROWDING, CROWDED) - These are several generic, continuous, ordered values that represent a scale. Generalizes well across all types of producers and can be represented in a UI as a graphic element (e.g., filled person icons) or the text version of the enumeration. OccupancyStatus is currently being used this way by some producers and consumers, but doesn't work well because of the above-mentioned issues with not fitting to a fixed scale.
  3. Fine numeric range (95%) - A fine grained version of (2) with actual numeric values. Best represented in a UI as a number. occupancy_percentage in the spec (although as @scmcca said, it sounds like many producers will bucket these values, making it more like (2) ).

One option is that we could keep OccupancyStatus to represent the vehicle state for (1) if producers really want the ability to represent when there is STANDING_ROOM_ONLY for example, but with the caveat that consumers shouldn't interpret these values as a scale - that would fall to a new field OccupancyScale (or better name) for (2). So a producer could publish CROWDED via OccupancyScale as well as STANDING_ROOM_ONLY via OccupancyStatus if they really wanted to do that and had technology to ensure all the seats were actually full.

The obvious downside with this approach is that we would have 3 different ways to represent occupancy :(. Although as mentioned above, if producers plan to bucket values in occupancy_percentage anyway, maybe we could get rid of that field and use OccupancyScale instead.

@stevenmwhite
Copy link
Contributor

@stevenmwhite I'm wondering if instead of changing the meaning of enums that are in use and are being advocated for by other stakeholders, that we instead clarify in the spec some guidelines around occupancy_percentage to leverage the generalized descriptions of Not crowded, Some crowding, Crowded.

I think this would be a good idea. A guideline to help define the percentage and crowding descriptions would be useful for producers and consumers alike.

On the other hand, should we also add guidelines indicating that the other enums are not necessarily a linear scale but represent actual communication about the state of the accommodations available and should be used specifically to represent those accommodations?

I certainly have no issue keeping something already in and that is helpful to other stakeholders and am not necessarily advocating to get rid of any mention of seats vs standing (these are certainly valuable if they are true!)... it'd probably be good to define their use specific to those accommodations, however, so that it's clear this is not a linear scale and rather represents the state of particular accommodations aboard the vehicle. Right now it seems many producers and consumers are interpreting them as a linear scale and that's what's unclear.

@barbeau
Copy link
Contributor

barbeau commented Dec 17, 2020

should we also add guidelines indicating that the other enums are not necessarily a linear scale but represent actual communication about the state of the accommodations available and should be used specifically to represent those accommodations?

💯 @stevenmwhite Our posts crossed :). Yes, I think no matter what the outcome is, this needs to be clarified if OccupancyStatus continues to be used.

@sccmcca
Copy link
Contributor

sccmcca commented Dec 17, 2020

@barbeau

I'd be concerned that officially adopting a subset of the current OccupancyStatus enum doesn't really solve the problems that currently exist with it.

It certainly does not solve the problem wholesale but is a step in the right direction and maybe a viable alternative to adopting all enum values that we still legitimately are not sure about and should therefore remain experimental, or by unviably removing enum values (which received 3 pre-vote -1's). Meanwhile, it seems that we are confident with at least 3 values, so phasing the solution may be okay in this context.

Notwithstanding that OccupancyStatus has been experimental for 6+ years when the specification amendment process allows fields to be experimental for only 2 years (albeit grandfathered in), one of the goals right now is to adopt an occupancy description to unblock GTFS-Occupancies (#240).

we won't know which producers are using the "official" interpretation vs. the experimental one when looking at the data.

This is true but should not matter in the interim. Since OccupancyStatus has been experimental for over 6+ years, there would be little change short-term in how producers and consumers are interpreting them. Long-term, it should ideally converge the industry on the fewest enums possible, which was the main issue highlighted by @gcamp as the motive for this PR:

The enum of occupancy_status has 7 values, however we've seen that the range of values is not used in practice.

To accomplish this I think we need a positive, rather than negative, framing. Right now the PR is proposing to take experimental enum values away that are in use. Instead, we ought to adopt the ones we are confident about.

@stevenmwhite

A guideline to help define the percentage and crowding descriptions would be useful for producers and consumers alike.

On the other hand, should we also add guidelines indicating that the other enums are not necessarily a linear scale but represent actual communication about the state of the accommodations available and should be used specifically to represent those accommodations?

Very well said! I'm all for these clarifications. As discussed at length, OccupancyStatus and occupancy_percentage are distinct. I think we already have the fields that we need, but we need to emphasize how they should be used.

@RTL-Gilles-Ferland
Copy link

Right now it seems many producers and consumers are interpreting them as a linear scale and that's what's unclear.

We are indeed using the OccupancyStatus enum as a linear scale. If a new enum is added to represent crowding on a linear scale we will probably move to that enum and abandon the OccupancyStatus one.

Why not change the meaning of the different enum values if producers, like us, are already using it as a linear scale? It will make everything clearer and not break the current implementations of the enum. And also work with GTFS-Occupancies (#240).

@sccmcca
Copy link
Contributor

sccmcca commented Jan 7, 2021

@RTL-Gilles-Ferland

Why not change the meaning of the different enum values if producers, like us, are already using it as a linear scale?

The challenge is that the existing OccupancyStatus enum values describe nominal states of vehicle occupancy levels, such as "many seats available" or "standing room only". Accepting that seating and standing capacities are different for different vehicle types, ascribing the existing OccupancyStatus enum values on a fixed linear scale would not work universally.

@barbeau @stevenmwhite and all. On the topic of disambiguation between intended uses of OccupancyStatus and occupancy_percentage, please see the newly opened #259 and feel free to give feedback on the topic there. Keep in mind that it is a basic proposal that should be taken as a starting point for sorting these out.

@sccmcca
Copy link
Contributor

sccmcca commented Feb 1, 2021

Hi everyone. Please find a continuation of the broader realtime occupancy discussion in #223 . Welcoming engagement over there, thanks!

@sccmcca
Copy link
Contributor

sccmcca commented Feb 3, 2021

I made a PR on this branch to reform this proposal to add a CrowdLevel enum in place of deprecating values in OccupancyStatus. Inviting reviews and feedback here!

@sccmcca
Copy link
Contributor

sccmcca commented Feb 11, 2021

Please see a continuation of this proposal at #261. Thanks!

@stale
Copy link

stale bot commented Aug 21, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more. label Aug 21, 2021
@stale
Copy link

stale bot commented Aug 28, 2021

This pull request has been closed due to inactivity. Pull requests can always be reopened after they have been closed. See the Specification Amendment Process.

@stale stale bot closed this Aug 28, 2021
@gcamp gcamp deleted the occupancy_v2 branch January 23, 2024 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GTFS Realtime Issues and Pull Requests that focus on GTFS Realtime Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more.

Projects

None yet

Development

Successfully merging this pull request may close these issues.