diff --git a/rfcs/20191017-tfx-standardized-inputs.md b/rfcs/20191017-tfx-standardized-inputs.md
new file mode 100644
index 000000000..596209540
--- /dev/null
+++ b/rfcs/20191017-tfx-standardized-inputs.md
@@ -0,0 +1,753 @@
+
+
+# Standardized TFX Inputs
+
+Status | Proposed
+:------------ | :------------------------------------------------------------
+**RFC #** | [162](https://github.com/tensorflow/community/pull/162)
+**Author(s)** | Zhuo Peng (zhuo@google.com), Kester Tong (kestert@google.com)
+**Sponsor** | Konstantinos Katsiapis (katsiapis@google.com)
+**Updated** | 2019-10-03
+
+# Objective
+
+* To define a common in-memory data representation that:
+ * is powerful enough to encode the following logical training data format:
+ flat
+ ([`tf.Example`](https://github.com/tensorflow/tensorflow/blob/abfba15cd9734cec7ecd3d0661b146fc251c842d/tensorflow/core/example/example.proto#L88)),
+ sequence
+ ([`tf.SequenceExample`](https://github.com/tensorflow/tensorflow/blob/abfba15cd9734cec7ecd3d0661b146fc251c842d/tensorflow/core/example/example.proto#L298))
+ or structured data (e.g.
+ [Protocol Buffers](https://developers.google.com/protocol-buffers) or
+ [Apache Avro](https://avro.apache.org/)).
+ * all TFX components can understand and can support their own unique use
+ cases with.
+* To define an I/O abstraction layer that produces the above in-memory
+ representation from supported physical storage formats, while hiding TFX’s
+ choice of such storage formats from TFX users.
+* To define a bridge from the above in-memory representation to TF feedables
+ (i.e. Tensors and certain
+ [CompositeTensors](https://github.com/tensorflow/tensorflow/blob/abfba15cd9734cec7ecd3d0661b146fc251c842d/tensorflow/python/framework/composite_tensor.py#L1)).
+
+# Motivation
+
+## Interoperability of TFX libraries
+TFX offers a portfolio of libraries, including
+[TFT](https://github.com/tensorflow/transform),
+[TFDV](https://github.com/tensorflow/data-validation) and
+[TFMA](https://github.com/tensorflow/model-analysis). These libraries can be
+used in a standalone manner (i.e. a user can run TFDV without/outside TFX)
+while on the other hand, one TFX component (a program orchestrated by TFX) may
+use multiple TFX libraries under the hood. For example, there could be a
+TFX component that transforms the training data by invoking TFT and
+collects pre-transform and post-transform statistics of the data by invoking
+TFDV.
+
+Currently, each TFX library accepts different in-memory data representation:
+
+| | TFDV | TFT | TFMA | BulkInference |
+|:---| :--- | :--- | :--- | :------------ |
+| In-memory data representation | Arrow RecordBatches | Dict[str, np.ndarray] | str (raw data records), Dict[str, np.ndarray] | str (raw data records) |
+| Understand the data and conduct analysis | input data is encoded losslessly as RecordBatches. | the in-mem representation may be lossy. | Relies on the model’s input layer, and the format is Dict[str, np.ndarray]. | N/A |
+| Feed TF | N/A | the in-mem representation is TF feedable. | Feed “raw data” to the model. | Feed “raw data” to the model |
+
+When a TFX component needs to invoke multiple TFX libraries, it may need to
+decode the data into one of the libraries’ in-memory representations,
+and translate that into the other library’s. For example, in the “Transform”
+component mentioned above:
+
+
+
+Note that TFDV is invoked twice, with different data. Thus the translation needs
+to happen twice.
+
+This has created several issues:
+
+* The translation is computationally expensive. In a real world set-up, such
+ translation could take as many CPU cycles as the core TFT and TFDV logic
+ takes in total.
+* More such translation logic may need to be implemented to support expanding
+ TFX use cases -- imagine that TFMA needs to invoke TFDV to compute
+ statistics over slices of data identified by model evaluation.
+* The complexity of adding new logical data representations scales with the
+ number of components. For example, to add SequenceExample support in TFX,
+ one may need to come up with an in-memory representation for each of the
+ components, to keep the consistency within the component.
+* TFX library users whose data format is not supported natively by TFX would
+ have to implement the decoding logic for each of the libraries
+ they want to use. For example, had TFX not supported the CSV format, a user
+ would have to implement
+ [one CSV decoder for TFT](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/coders/csv_coder.py),
+ and [another CSV decoder for TFDV](https://github.com/tensorflow/data-validation/blob/master/tensorflow_data_validation/coders/csv_decoder.py)
+
+A common in-memory data representation would address the issues.
+
+## The need for supporting new physical storage formats in TFX
+
+Currently TFX (mostly) assumes tf.Example on TFRecord. However because TFX is a
+managed environment, it is desired that its choice of physical format of data is
+an implementation detail and is opaque to the components and the users. TFX
+would like to explore switching to a columnar physical format like Apache
+Parquet and it can be imagined that there will be a migration at some point.
+Such a migration must happen in either of the following two ways:
+
+* Change every single TFX library and component that needs to read data to
+ add support for reading the new physical format (into each library’s own
+ in-memory representation)
+* Rely on an indirection through tf.Example and give up some performance
+ because of the translation.
+
+Beyond easier migrations (which could arguably be one-time efforts), a good I/O
+abstraction would allow TFX to choose the optimal storage format based on user’s
+workload, in a user-transparent manner.
+
+# User Benefit
+
+## TFX End Users
+
+While this change is transparent to end users, it will facilitate the design and
+implementation of many user-facing features, for example:
+
+* Columnar storage format in TFX.
+* Structured training examples.
+
+## Individual TFX component users
+
+We use TFXIO to refer to the proposed I/O abstraction layer. All TFX components
+will start using TFXIO to ingest the data and have a unified way of representing
+the data. Individual TFX component users would be able to implement TFXIO for
+their own data formats / storage formats that are not supported by TFX. By
+design, any such implementation will be readily accessible by all TFX
+components.
+
+## TFX developers
+
+Developers working on TFX infrastructure will not have to understand the
+internals of each component any more in order to make changes to I/O and parsing
+(for example, adding support for a new storage format for the training
+examples).
+
+Developers working on TFX components would benefit from sharing common
+operations against the unified in-memory representation, or even higher-level
+computations. For instance, suppose that we implement a sketch-based algorithm
+to compute approximate heavy hitters over this in-memory representation. We can
+now share this implementation inside both TFDV and TFT for their top-K feature
+value computation.
+
+# Requirements
+
+The in-memory representation should:
+
+* Be columnar.
+
+ A columnar in-memory representation works better than a row based on under
+ typical workload of TFX libraries:
+ + Compute statistics over a column.
+ + Feed a batch of rows to TensorFlow.
+
+* Be able to losslessly encode the logical data format. Specifically, it
+ should be able to distinguish a null (unpopulated) value from an empty list.
+
+ TFDV produces distinct statistics for nulls and empty lists.
+
+* Provide for efficient integration with TF computation. Ideally, using the
+ in-memory representation with TensorFlow should not require a data copy.
+
+ Feeding TF is a significant TFX workload, both in terms of CPU cycles and
+ number of examples processed.
+
+* Have efficient Python APIs to slice, filter and concatenate. Or more
+ generally, have efficient Python APIs for data analysis type of workload.
+
+ For example, TFMA may group the examples by “geo_location” and “language”
+ and evaluate the model for each of the slices. This operation would require
+ efficient slicing a batch of examples, and concatenation of slices that
+ belong to the same slice key (because the slices could be very small, and
+ inefficient for feeding TF). TFDV has similar use cases where statistics of
+ a certain slice of data needs to be collected.
+
+ This type of workload is also significant in TFX, both in terms of CPU
+ cycles and number of examples processed.
+
+ Note that TFX libraries don't always need to run TF graphs. For example,
+ TFDV, despite of its name, only analyzes the training data and (almost) does
+ not call any TF API. Another example, TFMA, will support "blackbox"
+ evaluation where the model being evaluated does not have to be a TF model.
+ Therefore a TF-neutral in-memory representation that works well with plain
+ Python code is desirable.
+
+* Be interoperable with the rest of the world.
+
+ The OSS world should be able to use TFX components with little effort on
+ data conversion. This aligns with TFX’s long term vision.
+
+
+# Design Proposal
+
+This design proposes **a common in-memory data representation**, **a way to
+translate that into TF feedables** (np.ndarray or EagerTensors) and **a set of
+APIs** each component can use to get both.
+
+
+
+## Common in-memory data representation
+
+[Apache Arrow](https://arrow.apache.org/) will be used as the common in-memory
+data representation. Beam-based TFX components will accept
+PCollection[pyarrow.[RecordBatch](https://arrow.apache.org/docs/python/data.html#record-batches)].
+
+Each logical data format will have its own encoding convention,
+[discussed](#logical-data-encoding-in-arrow) in the detailed design.
+
+We chose Apache Arrow because:
+
+* It’s Expressive enough.
+ * Lossless encoding of (conformant) tf.Example, tf.SequenceExample
+ * Can encode structured data (proto)
+* It’s a columnar format. It works well with common TFX workloads:
+ * Column (feature)-wise analysis
+ * Feed a batch of columns (features) to TensorFlow.
+* It’s OSS friendly.
+ * Community support for more storage format I/O (e.g. Apache Parquet)
+ * Friendly to other OSS data formats, both in-memory and on disk (e.g.
+ Pandas)
+ * Friendly to numpy / TF: many Arrow array types share the same memory
+ layout with numpy ndarrays and certain type of TF (composite) Tensors.
+* TF neutral.
+ * Leaves the possibility of supporting other ML libraries open.
+
+## Translation from Arrow to TF feedables
+
+The analogy to this is parsing tf.Examples into TF feedables -- extra
+information is needed in this translation because a
+[`Feature`](https://github.com/tensorflow/tensorflow/blob/abfba15cd9734cec7ecd3d0661b146fc251c842d/tensorflow/core/example/feature.proto#L76)
+can be converted to a Tensor, a SparseTensor or a
+[RaggedTensor](https://www.tensorflow.org/guide/ragged_tensor) depending on the
+[feature specs](https://github.com/tensorflow/tensorflow/blob/635e23a774936b5fe6fa3ef3cb6e54b55d93f324/tensorflow/python/ops/parsing_ops.py#L46-L49).
+Currently this extra information is implicitly contained in the pipeline schema
+(an instance of the
+[TFMD Schema](https://github.com/tensorflow/metadata/blob/master/tensorflow_metadata/proto/v0/schema.proto))
+proto.
+
+Similarly, an Arrow column can be translated to various TF feedables.
+[An extension to the pipeline schema](#tensorrepresentation) is proposed to for
+a user to express the intention for conversion.
+
+The conversion can be efficient (zero-copy) in certain cases. It is
+[discussed](#efficient-arrow-tensor-conversion) in the detailed design.
+
+## Standardized Inputs APIs
+
+We propose a set of APIs that TFX components will call, and need to be
+implemented for each of the supported combination of {physical, logical} format.
+
+```py
+class TFXIO(object):
+ """Abstract basic class of all Standardized TFX inputs API implementations."""
+ def __init__(
+ self,
+ schema: Optional[tfmd.Schema]=None
+ ):
+ pass
+
+ @abc.abstractmethod
+ def BeamSource(self,
+ projections: Optional[List[Text]]=None
+ ) -> beam.PTransform:
+ """Returns a beam PTransform that produces PCollection[pa.RecordBatch].
+
+ May NOT raise an error if the TFMD schema was not provided at construction time.
+
+ Args:
+ projections: if not None, only the specified subset of columns will be
+ read.
+ """
+
+ @abc.abstractmethod
+ def TensorAdapter(self) -> TensorAdapter:
+ """Returns a TensorAdapter that converts pa.RecordBatch to TF inputs.
+
+ May raise an error if the TFMD schema was not provided at construction time.
+ """
+
+ @abc.abstractmethod
+ def ArrowSchema(self) -> pyarrow.Schema:
+ """Returns the schema of the Arrow RecordBatch generated by BeamSource().
+
+ May raise an error if the TFMD schema was not provided at construction time.
+ """
+
+ @abc.abstractmethod
+ def TFDataset(self, ...) -> tf.data.Dataset:
+ """Returns a Dataset of TF inputs.
+
+ May raise an error if the TFMD schema was not provided at construction time.
+ """
+```
+
+Where `TensorAdapter` is:
+
+```py
+class TensorAdapter(object):
+
+ def __init__(
+ self,
+ arrow_schema: pyarrow.Schema,
+ tensor_representations: Dict[str, TensorRepresentation]):
+ """Initializer.
+
+ Args:
+ arrow_schema: the schema of the RecordBatches this adapter is to receive
+ in ToBatchTensors().
+ tensor_representations: keys are the names of the output tensors; values
+ describe how an output tensor should be derived from a RecordBatch.
+ """
+
+
+ def TypeSpecs(self) -> Dict[str, tf.TypeSpec]:
+ """Returns tf.TypeSpec for each tensor in `tensor_representation`.
+
+ TypeSpecs can be used to construct placeholders or tf.function signatures.
+ """
+
+ def ToBatchTensors(
+ self, record_batch: pyarrow.RecordBatch,
+ projections: Optional[List[TensorName]]=None
+ ) -> Dict[str, TFFeedable]: # TFFeedable: np.ndarrays or tf.EagerTensor
+ # (or compositions of them, i.e.
+ # CompositeTensors).
+ """Converts a RecordBatch to batched TFFeedables per `tensor_representation`
+
+ Each will conform to the corresponding TypeSpec / TensorRepresentation.
+
+ Args:
+ projections: a set of names of TFFeedables (mentioned in
+ `tensor_representation`). If not None, only TFFeedables of those names
+ will be converted.
+ """
+```
+
+Note that we will provide a default implementation of `TensorAdapter`, but TFXIO
+implementations can implement their own `TensorAdapter`. A custom
+`TensorAdapter` would allow a `TFXIO` implmentation to rely on a TF graph to
+do parsing -- the same graph can be used in both `BeamSource` and
+`TensorAdapter`.
+
+The default `TensorAdapter` can be constructed out of the Arrow schema (which
+is required for any TFXIO implementation) and `TensorRepresentations`. The
+latter is part of the TFMD schema. See [this section](#tensorrepresentation)
+for details.
+
+# Detailed Design
+
+## Logical data encoding in Arrow
+
+On a high level, a batch of logical entities (“examples”) is encoded into a
+[`pyarrow.RecordBatch`](https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html#pyarrow.RecordBatch).
+Features or fields (from structured records) are encoded as columns in the
+RecordBatch.
+
+Note that
+[`pyarrow.Table`](https://arrow.apache.org/docs/python/data.html#tables) offers
+an abstraction similar to RecordBatch with the key difference being that a
+column in a Table might contain multiple chunks of contiguous memory regions
+while a column in a RecordBatch contains only one chunk. RecordBatch is chosen
+because we want to enforce that TFXIO implementations produce batched data in
+the most efficient way (one chunk per batch). Users of TFXIO may construct a
+Table from one or more RecordBatches since easy conversion from one to the other
+is supported by Apache Arrow.
+
+This design aims to support the logical structure of tf.Example,
+tf.SequenceExample or structured data like Protocol Buffers. Thus only a subset
+of Arrow array types are needed. All TFX components will guarantee to understand
+those types, but no more. Below is a summary of supported encodings:
+
+| Logical representation | Arrow encoding |
+| :--------------------- | :------------- |
+| Feature with no value | `NullArray` |
+| Univalent feature (one value per example) | `FixedSizeListArray` (list_size = 1) |
+| Multivalent feature (multiple values per example) | `[FixedSize]ListArray` |
+| Sequence feature (list of lists of values per example) | `[FixedSize]ListArray<[FixedSize]ListArray>` |
+| Proto-like structured data | `ListArray}>>` |
+
+However the design is flexible to support more complicated logical structures,
+for example, k-nested sequences (tf.SequenceExample is 2-nested).
+
+Next we show that these encodings cover the logical data formats we aim to
+support:
+
+### tf.Example
+
+[Conformant](https://github.com/tensorflow/tensorflow/blob/abfba15cd9734cec7ecd3d0661b146fc251c842d/tensorflow/core/example/example.proto#L78)
+tf.Examples are assumed. I/O + parsing should throw an error upon non-conformant
+instances.
+
+A key requirement derived from the conformant-ness is for the encoding to be
+able to distinguish the following two cases:
+
+* a feature is present, but it’s value list is empty
+
+ ```
+ {
+ features {
+ "my_feature": {
+ bytes_list {
+ }
+ }
+ }
+ ```
+
+* a feature is not present
+
+ ```
+ {
+ features {
+ }
+ }
+ ```
+
+ or
+
+ ```
+ {
+ features {
+ "my_feature": {} # none of the oneof is set
+ }
+ }
+ ```
+
+Each feature can be encoded as:
+
+```
+[FixedSize]ListArray
+```
+
+Then, the feature value in case a) is encoded as an empty sub-list, while the
+feature value in case b) is encoded as null.
+
+If we know that all the lists in a `ListArray` are of equal length (from the
+schema of the data, see below sections), `FixedSizeListArray` can be used to
+obviate the `O(N)` space overhead for lengths of lists.
+
+### tf.SequenceExample
+
+[Conformant](https://github.com/tensorflow/tensorflow/blob/abfba15cd9734cec7ecd3d0661b146fc251c842d/tensorflow/core/example/example.proto#L184)
+tf.SequenceExamples are assumed. I/O + parsing should throw an error upon
+non-conformant instances.
+
+A context feature will be encoded similarly to a feature in tf.Example. A
+sequence feature will be encoded as:
+
+```
+[FixedSize]ListArray<[FixedSize]ListArray>
+```
+
+To avoid name conflicts with context features, all the sequence features can be
+grouped into one `StructArray`:
+
+```
+StructArray<{'sequence_feature1': ListArray>, ...}>
+```
+
+### Structured data (e.g. Protocol Buffers / Apache Avro)
+
+A batch of structured records can be encoded as follows:
+
+* Each direct leaf field of the structure can be encoded similarly to
+ tf.Example. (`ListArray` of primitive types).
+* Each sub-message can be encoded as:
+
+ ```
+ ListArray>>
+ ```
+
+## Arrow to TF Feedable conversion
+
+### TensorRepresentation
+
+One or more Arrow columns can potentially be converted to multiple types of TF
+feedables.
+
+For example, a `ListArray` can be converted to:
+
+* a Tensor, if given a default value to pad
+* a SparseTensor to represent a ragged array
+* a RaggedTensor
+
+The choice depends on user’s intents, which currently is
+[implicitly](https://github.com/tensorflow/transform/blob/11afcff467f779ba6163686395582e69603987d1/tensorflow_transform/tf_metadata/schema_utils.py#L172)
+expressed in the pipeline schema.
+
+We propose to create a new [TFMD](https://github.com/tensorflow/metadata)
+(TensorFlow MetaData) Proto, `TensorRepresentation` to carry those intents implicitly:
+
+```protobuf
+message TensorRepresentation {
+ oneof {
+ DenseTensor { … } // column_name, dtype, shape, default_value
+ VarLenSparseTensor { … } // column_name, dtype
+ SparseTensor { } // dtype, value_column_name, indice_column_names
+ VarLenRaggedTensor { … } // dtype
+ RaggedTensor { } // dtype, value_column_name, row_partition_column_names, ...
+ StructuredTensor { } // column_names
+ }
+}
+```
+
+This proto is used in two places:
+
+* It’s part of TFMD schema:
+
+ ```protobuf
+ message TensorRepresentationGroup {
+ map tensor_representation = 2;
+ };
+
+ message Schema {
+ repeated Feature feature = 1;
+ // …
+ map tensor_representation_group = 42;
+ }
+ ```
+
+ Note :
+
+ * `TensorRepresentationGroup` allows different instances of one TFX
+ component to use different sets of `TensorRepresentation`s.
+ * `tensor_representation_group` is **optional**. If the user does not
+ specify any, a default representation will be derived from
+ schema.feature to keep backwards compatibility.
+ * this field is **not** a sub-message of Schema::Feature, because a TF
+ feedable may comprise multiple columns
+
+ Being part of the schema makes it possible to serialize and materialize the
+ intents for other components to use, which allows TFT’s materialization
+ functionality to have its own TFXIO implementation that hides the
+ data/physical format from the user.
+
+ When generating the initial schema from the statistics of the data, TFDV can
+ propose a default set of `TensorRepresentationGroup`. The user may revise
+ the proposal and TFDV can validate `TensorRepresentationGroup`s in a
+ continuous manner.
+
+* The default implementation of TensorAdapter takes an optional `Dict[str,
+ TensorRepresentation]` at construction time. If a TFXIO implementation
+ choose to use the default TensorAdapter, it needs to provide them (may come
+ directly from the Schema).
+
+### Efficient Arrow->Tensor conversion
+
+The key to efficient conversions is to avoid copying of data. The prerequisites
+to do so are:
+
+* Same memory alignment
+* Same memory layout
+
+Currently 64-byte alignment is the standard in both Tensorflow's `TensorBuffer`
+and Apache Arrow's `Buffer`. Forthermore, it can be guaranteed by implementing
+our own version of `arrow::MemoryPool` that is backed by a
+`tensorflow::Allocator`.
+
+The memory layout will be the same if right types are chosen at both ends thus
+zero-copy conversion can be done, for example:
+
+* `FixedLengthListArray` (or `ListArray` of equal-length lists) -> dense
+ Tensors.
+* `ListArray>` ->
+ [RaggedTensors](https://github.com/tensorflow/tensorflow/blob/3c2dabf53dd085c21e38a28b467e52c566c0dfaf/tensorflow/python/ops/ragged/ragged_tensor.py#L1).
+* `ListArray>` ->
+ [StructuredTensors](https://github.com/tensorflow/community/blob/master/rfcs/20190910-struct-tensor.md)
+
+In other cases, copies can be avoided for the values, but some computation is
+needed:
+
+* `ListArray>` -> `tf.SparseTensor`
+ * Need to compute the sparse indices from `ListArray`'s list offsets.
+
+The remaining cases require a copy:
+
+* `ListArray>`(of non-equal-length lists) -> dense Tensors
+
+With TensorRepresentation available in the Schema, a TFXIO implementation may
+optimize its decoder to choose the most efficient Arrow type.
+
+#### Conversion of string features
+
+Arrow’s string arrays (`BinaryArray`) have a different memory layout than
+TensorFlow’s string Tensors, even with
+[`tensorflow::tstring`](https://github.com/tensorflow/community/blob/master/rfcs/20190411-string-unification.md).
+There is always some overhead in conversion, but with `tensorflow::tstring` a
+Tensor of `string_view`s is possible, thus the overhead will be a function of
+the number of strings being converted, instead of the lengths of the strings.
+
+#### TF APIs for conversions
+
+In TF 1.x we will use np.ndarray as a bridge as Arrow has zero-copy conversion
+to numpy’s ndarrays. (not for string arrays).
+
+Starting from TF 2.x, we will be able to create EagerTensors from Python
+memoryview(s) so that strings can be covered.
+
+## TFMD Schema
+
+[The TFMD Schema](https://github.com/tensorflow/metadata/blob/master/tensorflow_metadata/proto/v0/schema.proto)
+is a pipeline-level artifact and in the scope of this proposal, it may serve two
+purposes:
+
+* To provide optional inputs to the parsing logic for optimizations.
+* To carry user’s intents of converting data to TF feedables.
+
+The two purposes don’t have to be served in the following cases:
+
+* TFDV should not require a schema to work and it does not need TF feedables.
+* Some TFXIO implementation may not need the schema for either purposes.
+
+Therefore the TFMD schema is optional, and a TFXIO implementation:
+
+* should guarantee that the `BeamSource()`can return a valid
+ `PCollection[RecordBatch]` without a schema.
+ * Other interfaces may raise an error when a schema was not provided.
+* does not have to require a TFMD schema for all its interfaces to work.
+
+## (TensorFlow) Trainer integration
+
+For TFX to freely choose the storage format for training examples for a user, we
+**cannot** expose file-based or record-based interface to that user in the TF
+trainer, because:
+
+* the user might not know how to open those files.
+* there might not be an efficient representation of a “record” (this is true
+ for columnar storage formats like Apache Parquet) but only an efficient
+ representation of a batch of records.
+
+Thus we propose that to most users, the TF Trainer only exposes a handle to a
+`tf.data.Dataset` of parsed (composite) Tensors.
+
+Each `TFXIO` implementation will implement a `TFDataset()` interface to return
+such a `tf.data.Dataset`. This dataset contains logically a set of batched
+(composite) Tensors that are of the same type as the corresponding
+`TensorAdapter()` would return for a `RecordBatch`. See
+[this section](#recommended-way-of-implementing-a-tfxio) about how to minimize
+the code needs to be written for a new `TFXIO` implementation.
+
+The `TFDataset()` interface will accept common knobs that a user may need to
+tweak:
+
+* Batch size
+* Random shuffle
+
+## Code organization and OSS
+
+### `tfx_bsl` package
+
+TFXIO will be used by all TFX components as well as the TFX framework, making it
+be almost at the bottom of the dependency chain. Moreover, a lot of
+implementations details will be in C++, with python wrapping around, and we want
+to make sure our TFX components pip packages remain pure Python for easy
+maintenance. Therefore we propose a new python package
+[tfx_bsl](https://github.com/tensorflow/tfx-bsl) (TFX Shared Basic Libraries) to
+contain the implementations of `TFXIO` and other libraries shared across TFX
+components.
+
+
+
+## Recommended way of implementing a TFXIO
+
+To maximize code sharing, the following way of implementing a `TFXIO` is
+suggested:
+
+
+
+One would only need to implement the IO+Parsing-to-arrow in C++ once, and reuse
+it in the BeamSource() and a format-specific Dataset Op that produces a
+DT_VARIANT tensor that points to the parsed Arrow RecordBatch. Then we provide
+one C++ library that translates the Arrow RecordBatch to Tensors, which can also
+be reused in a TF op (as the downstream of the Dataset, or in a Python wrapper).
+
+# Alternatives Considered
+
+## StructuredTensor
+
+We’ve considered an alternative where [StructuredTensor](https://github.com/tensorflow/community/blob/master/rfcs/20190910-struct-tensor.md)
+is the unified in-memory representation, but it does not meet all of the
+requirements:
+
+| | Arrow RecordBatch | StructuredTensor |
+| :-------------------------------- | :----------------- | :---------------- |
+| Columnar | Yes | Yes |
+| Lossless encoding (nullity) | Yes | No (see remark 1) |
+| Efficient translatable to Tensors | Yes (see remark 2) | Yes |
+| Efficient slicing, filtering and concatenation | Yes | No (see remark 3) |
+| Interoperability with the rest of the world | Good through Apache Arrow | Needs adaptation (see remark 4) |
+
+Remarks:
+
+1. We could revise the design of StructuredTensor to include the nullibility
+ support.
+2. Only when the backing buffers are aligned correctly. Currently both TF
+ and Apache Arrow has 64-byte alignment. And this can be enforced by
+ implementing our own Arrow MemoryPool wrapping a TF allocator.
+ [This colab notebook](https://colab.research.google.com/drive/1bM8gso7c8x4UXx5htDM4N1KUSTuRvIFL)
+ shows that as long as the memory alignment is the same, feeding TF with an
+ Arrow Array has very little overhead.
+3. See the comparison in [this colab notebook](https://colab.research.google.com/drive/1CvDjZCH3GQE8iojCmRHPuSqLTw8KgNf3).
+ * It’s worth calling out that Arrow is meant to be a data analysis library
+ and better data analysis support (for example, support for a “group-by”
+ clause) will be added over time.
+4. We may gain similar interoperability by creating an Arrow to
+ StructuredTensor adapter.
+ * Beyond the technical aspects, we believe by having all the TFX libraries
+ directly adopting a popular OSS in-memory format will send a positive
+ message that TFX is meant to work well with the rest of the world.
+
+## tf.Data
+
+We’ve also considered tf.Data as the unified I/O abstraction. tf.Data has
+support for a good number of data formats through
+[tensorflow-io](https://github.com/tensorflow/io/tree/master/tensorflow_io).
+
+It's faily straightforward to implement a Beam PSource that wraps a file-backed
+tf.data DatasetSource, and have that PSource produce Arrow RecordBatches,
+but due to lack of support for [dynamic work rebalancing](https://cloud.google.com/blog/products/gcp/no-shard-left-behind-dynamic-work-rebalancing-in-google-cloud-dataflow) in tf.data, such an implementation would
+not match the performance of existing beam PSources.
+
+While we cannot rely solely on tf.data as the I/O abstraction, the proposed
+TFXIO interface does not disallow such a tf.data based implementation. So we can
+still gain support for many other formats through tensorflow-io while still
+using existing beam PSources for formats that have native support.
+
+# Questions and Discussion Topics
+
+## OSS build / release issues
+
+### We don’t have a strong representation in the Arrow community
+
+This has led to some issues. For example, the PyPI/Wheel packaging for pyarrow
+currently is unfunded and lacks volunteers, and the pyarrow wheel sometimes had
+issues with TensorFlow ([example](https://github.com/apache/arrow/issues/4472)).
+
+### ABI compatibility with libarrow
+
+The OSS library, tfx_bsl will depend on Arrow and TensorFlow’s DSOs (dynamic
+shared objects). Because both libraries currently expose C++ APIs, there are
+always risks of incompatible ABIs as TensorFlow and Arrow are likely to be built
+using different toolchains, we cannot completely eliminate the risks.
+
+With [Modular TensorFlow](https://github.com/tensorflow/community/pull/77),
+which replaced all the C++ APIs with C-APIs, we will be able to eliminate the
+risk by using the same toolchain that builds Arrow.
+
+Furthermore, the Apache Arrow community is discussing about an
+[ABI-stable C-struct](https://github.com/apache/arrow/pull/5442) that describes
+Arrow Arrays. This will allow to build Apache Arrow from source and link
+statically with our code, and only talk with pyarrow through that ABI-stable
+interface.
+
+Since in Google we build everything from HEAD, using the same toolchain, there
+are no risks.
+
+## Performance
+
+We would have to convert from Apache Arrow dataframes to TF feedables / Tensors.
+Sometimes this conversion cannot happen efficiently (requires copying out the
+data or other computation).
diff --git a/rfcs/20191017-tfx-standardized-inputs/double-translation.png b/rfcs/20191017-tfx-standardized-inputs/double-translation.png
new file mode 100644
index 000000000..a65d1d87f
Binary files /dev/null and b/rfcs/20191017-tfx-standardized-inputs/double-translation.png differ
diff --git a/rfcs/20191017-tfx-standardized-inputs/impl_tfxio.png b/rfcs/20191017-tfx-standardized-inputs/impl_tfxio.png
new file mode 100644
index 000000000..7309b56cc
Binary files /dev/null and b/rfcs/20191017-tfx-standardized-inputs/impl_tfxio.png differ
diff --git a/rfcs/20191017-tfx-standardized-inputs/oss_lib_org.png b/rfcs/20191017-tfx-standardized-inputs/oss_lib_org.png
new file mode 100644
index 000000000..1e706c8da
Binary files /dev/null and b/rfcs/20191017-tfx-standardized-inputs/oss_lib_org.png differ
diff --git a/rfcs/20191017-tfx-standardized-inputs/overview.png b/rfcs/20191017-tfx-standardized-inputs/overview.png
new file mode 100644
index 000000000..55e32a3f0
Binary files /dev/null and b/rfcs/20191017-tfx-standardized-inputs/overview.png differ