Skip to content

Editorial and formatting changes#8

Closed
giocorti wants to merge 4 commits intomasterfrom
Line-Edits
Closed

Editorial and formatting changes#8
giocorti wants to merge 4 commits intomasterfrom
Line-Edits

Conversation

@giocorti
Copy link

@giocorti giocorti commented Nov 2, 2018

This PR makes the following editorial and formatting changes to the GTFS specification.

  • Standardized the choice of certain words
    • passenger/customer --> rider
    • Except for “customer service”
    • Blank --> empty
    • Feed vs dataset
    • Field vs field value
    • Transit Organization --> transit agency
    • Note: organization is still used for the group that publishes the feed
    • Itinerary --> Journey (this matches Fares proposal)
  • Made editorial changes to enhance clarity and legibility including removing unnecessary and repetitive language. Major rewrites of:
    • parent_station
    • stop_timezone
    • arrival_time and departure_time
    • Monday,...,Sunday
    • timepoint
    • exact_times
  • Formatting changes
    • Added code ticks to denote fields and other code values
    • Standardized Enum option formatting
    • Added hyperlinks to all .txt files and websites
    • Put field types in alphabetical order.
    • Different type setting for examples
  • Defined or redefined the following term:
    • Dataset
    • Field Value
    • Record
    • ID
    • Added referencing ID

As you can see, I made a lot of changes and I'm sure we'll need to discuss some of them. Let me know what you think.

Copy link

@antrim antrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review in progress.

@@ -2,13 +2,13 @@

**Revised August 22, 2018. See [Revision History](../../CHANGES.md) for more details.**
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future process reminder for us: We need to remember to update the revision history when pull requests are ultimately approved.

* **Field conditionally required** - This field or file is **required** under certain conditions, which are outlined in the field or file *Details*. Outside of these conditions, this field or file is optional.
* **Dataset unique** - The field contains a value that maps to a single distinct entity within the column. For example, if a route is assigned the ID **1A**, then no other route may use that route ID. However, you may assign the ID **1A** to a location because locations are a different type of entity than routes.
* **Feed** - Successive datasets comprise a feed.
* **Dataset** - A complete set of files defined by this specification reference. Datasets should be published at a public, permanent URL, including the zip file name. (e.g., www.agency.org/gtfs/gtfs.zip). For more information see GTFS best practices.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the "Dataset" definition above "Feed" since the "Feed" definition refers to it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the definition, but isn't this a change to the existing spec? We should strive to avoid addition to the spec in this PR IMHO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'm not sure the "Feed" definition will be universally agreed-upon. We should probably vote on that separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expand the example to the full URL - https://www.agency.org/gtfs/gtfs.zip

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see how this would be debated, but I also am having a hard time seeing alternative definitions that are usable. The definitions we have proposed are based on existing usage, albeit quite inconsistent usage.

schema.org's uses of the terms "data feed" and "dataset" seem to match what we are doing:

Basically, I think that we should be bold in this case. The terms "dataset" and "feed" have been consistently applied in this proposed spec modification. We could revert, but it hurts to leave this inconsistency in the Spec.

An idea: Prior to the pull request shall we bring the question to a somewhat broader group to see if they agree with our change?

* **Field** - A specific type of information for records in a .txt file. Represented, in a table, as a column.
* **Record** - A basic data structure comprised of a number of different fields describing a single entity (e.g. transit agency, stop, route, etc.). Represented, in a table, as a row.
* **Field Value** - An individual entry in a field. Represented, in a table, as a single cell.
* **Required** - The field must be included in the dataset, and a value must be provided for each field. Some required fields permit an empty string as a value (denoted in this specification as "empty"). To enter an empty string, just omit any text between the commas for that field. Note that `0` is interpreted as "a string of value 0", and is not an empty string. See the field definition for details.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessary:
"Note that 0 is interpreted as "a string of value 0", and is not an empty string. See the field definition for details."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for each field" ? You mean for each "field value", no?

Some required fields permit an empty string as a value

I don't think so. I would call them optional. But maybe I'm missing something

Copy link
Author

@giocorti giocorti Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for each field" ? You mean for each "field value", no?

It should be "field value", good catch.

Some required fields permit an empty string as a value

I don't think so. I would call them optional. But maybe I'm missing something

transfers in fare_attributes.txt and transfer_type in transfers.txt are both listed as required and permit empty field values as an enum option.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

giphy

Oh no. Why.

We should change that. It make no sense IMHO. @barbeau @antrim Thoughts? We should discuss that on Thursday.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO and a value must be provided for each field should be and a value must be provided in that field for each record.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We could test fixing that but through a different pull request / discussion.

Copy link

@antrim antrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giocorti : I have finished my review. Looks good! I have requested minor changes, and opened a few questions for discussion.

@barbeau @LeoFrachet Do you have any comments? Or should we discuss a few of the questions I raised in an internal meeting?

* **Field** - A specific type of information for records in a .txt file. Represented, in a table, as a column.
* **Record** - A basic data structure comprised of a number of different fields describing a single entity (e.g. transit agency, stop, route, etc.). Represented, in a table, as a row.
* **Field Value** - An individual entry in a field. Represented, in a table, as a single cell.
* **Required** - The field must be included in the dataset, and a value must be provided for each field. Some required fields permit an empty string as a value (denoted in this specification as "empty"). To enter an empty string, just omit any text between the commas for that field. Note that `0` is interpreted as "a string of value 0", and is not an empty string. See the field definition for details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for each field" ? You mean for each "field value", no?

Some required fields permit an empty string as a value

I don't think so. I would call them optional. But maybe I'm missing something

@LeoFrachet
Copy link

@giocorti Amazing job, thanks! I only reviewed until end of stops.txt, but great job.

They are many small things to tighten up. But we're going somewhere.

@LeoFrachet
Copy link

I must confess that in all the proposals I'm writing, I put only two columns (Field name and description), and I just always start the description by the type and the cardinality:

It's more compact than with 4 columns. But I dunno. Some of you (IIRC) were in favor of the 4 columns.
@antrim @barbeau

capture d ecran le 2018-11-06 a 14 09 30

capture d ecran le 2018-11-06 a 14 09 25

@LeoFrachet
Copy link

@giocorti Could you also handle this request please Giovanni: google#90

@barbeau
Copy link
Member

barbeau commented Nov 6, 2018

@giocorti There is also a good discussion of frequencies.txt and frequency-based trips in google#47, including by @antrim and myself. Could you please take a look at that and see if it makes sense to pull any of that proposed content into this PR?

@barbeau
Copy link
Member

barbeau commented Nov 6, 2018

Some of you (IIRC) were in favor of the 4 columns.

IIRC GitHub's Markdown renderer shows columns with equal width no matter the content, so 2 columns ends up wasting a lot of space when viewing on GitHub. Also, generally, I think 4 columns is easier to digest for newcomers to the spec, so I'd suggest keeping that.

Idea - in the HTML-rendered site, we could offer a "collapsed" view for pros that shows only two columns in the format the Leo is using. I think that would be relatively straightforward (?) if we encode the info in 4 columns in Markdown.

@giocorti
Copy link
Author

giocorti commented Nov 9, 2018

I've reviewed all the suggestions and made the changes summarized below. Thanks for all the input! LMK what you think.

  • Minor editorial changes
  • Empty fields are now considered to have no value
  • Rewrote “Optional" definition
  • Rewrote “field” definition
  • Removed "feed" (This may be changed after more discussion)
  • URLs not hidden
  • Defined “Service Day”
  • Reformatted enum and examples using <br> and <hr>
  • Added code ticks to “Field Name” column
  • Rewrote transfer_duration
  • Moved google fares example to an internal .md file
  • Rewrote frequencies.txt and exact_times descriptions

@LeoFrachet
Copy link

I'm rereading it completely before pushing it to the public repo.

@LeoFrachet
Copy link

LeoFrachet commented Nov 16, 2018

For shape, the text is cut because of the quote. Would there be a way to solve it?

capture d ecran le 2018-11-16 a 16 10 30

@LeoFrachet
Copy link

LGTM, with the three small comments above.

@giocorti
Copy link
Author

For shape, the text is cut because of the quote. Would there be a way to solve it?

capture d ecran le 2018-11-16 a 16 10 30

My browser allows side to side scrolling on the shapes.txt table. Certainly not ideal but, as far as I can tell, its best we can do if we want to keep the last example.

@LeoFrachet
Copy link

LGTM !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants