Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

Proposed changes

Add regression-test cases for hive text table properties:

Property Description Example Value Supported in Doris
field.delim Defines the delimiter between columns in each row. \1 Yes
collection.delim Defines the delimiter for items in an array (collection type). \2 Yes
mapkey.delim Defines the delimiter between keys and values in a map. \3 Yes
serialization.null.format Defines how NULL values are represented in the text file. \\N Yes
escape.delim Specifies the escape character used for escaping special characters. \\ Yes
line.delim Defines the delimiter between rows or lines in the file. \n Yes

Explanation:

  • field.delim: This property is used to specify how columns are separated in a row. For example, \1 indicates the delimiter is a non-printable character.
  • collection.delim: Used to define how elements in an array or collection are separated. In this case, \2 is used as a separator.
  • mapkey.delim: This property defines how keys and values in a map are separated. For instance, \3 is used to separate map keys and values.
  • serialization.null.format: This setting specifies the format for NULL values in the data. \\N is commonly used to represent NULL.
  • escape.delim: Defines the escape character used in the text file for escaping special characters, such as the delimiter itself. Here, \\ is used as the escape character.
  • line.delim: This property is used to specify the delimiter between lines or rows. Typically, \n (newline) is used as the line delimiter.
    Note: Unlike the other delimiters, line.delim is not escaped. If the content in the table contains the same character as the line delimiter, it may cause query errors. However, the other delimiters (field.delim, collection.delim, mapkey.delim) are escaped, so they will not cause issues.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

morningman
morningman previously approved these changes Oct 21, 2024
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Oct 21, 2024
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@suxiaogang223
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Oct 21, 2024
@suxiaogang223
Copy link
Contributor Author

run external

@suxiaogang223 suxiaogang223 force-pushed the add_case_for_hive_text branch from 92b7ecb to 74df02b Compare October 22, 2024 01:49
@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@suxiaogang223 suxiaogang223 changed the title [enhance](hive) Add regression-test cases for hive text ddl and hive text insert [enhance](hive) Add regression-test cases for hive text ddl and hive text insert and fix reading null string bug Oct 22, 2024
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 22, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 85f7a5e into apache:master Oct 22, 2024
morningman pushed a commit to morningman/doris that referenced this pull request Oct 22, 2024
…text insert and fix reading null string bug (apache#42200)

## Proposed changes
Add regression-test cases for hive text table properties:
| **Property** | **Description** | **Example Value** | **Supported in
Doris** |

|------------------------------|---------------------------------------------------------------------|-------------------|----------------------|
| `field.delim` | Defines the delimiter between columns in each row. |
`\1` | Yes |
| `collection.delim` | Defines the delimiter for items in an array
(collection type). | `\2` | Yes |
| `mapkey.delim` | Defines the delimiter between keys and values in a
map. | `\3` | Yes |
| `serialization.null.format` | Defines how `NULL` values are
represented in the text file. | `\\N` | Yes |
| `escape.delim` | Specifies the escape character used for escaping
special characters.| `\\` | Yes |
| `line.delim` | Defines the delimiter between rows or lines in the
file. | `\n` | Yes |

### Explanation:
- **`field.delim`**: This property is used to specify how columns are
separated in a row. For example, `\1` indicates the delimiter is a
non-printable character.
- **`collection.delim`**: Used to define how elements in an array or
collection are separated. In this case, `\2` is used as a separator.
- **`mapkey.delim`**: This property defines how keys and values in a map
are separated. For instance, `\3` is used to separate map keys and
values.
- **`serialization.null.format`**: This setting specifies the format for
`NULL` values in the data. `\\N` is commonly used to represent `NULL`.
- **`escape.delim`**: Defines the escape character used in the text file
for escaping special characters, such as the delimiter itself. Here,
`\\` is used as the escape character.
- **`line.delim`**: This property is used to specify the delimiter
between lines or rows. Typically, `\n` (newline) is used as the line
delimiter.
**Note**: Unlike the other delimiters, `line.delim` is not escaped. If
the content in the table contains the same character as the line
delimiter, it may cause query errors. However, the other delimiters
(`field.delim`, `collection.delim`, `mapkey.delim`) are escaped, so they
will not cause issues.
morningman pushed a commit to morningman/doris that referenced this pull request Oct 22, 2024
…text insert and fix reading null string bug (apache#42200)

## Proposed changes
Add regression-test cases for hive text table properties:
| **Property** | **Description** | **Example Value** | **Supported in
Doris** |

|------------------------------|---------------------------------------------------------------------|-------------------|----------------------|
| `field.delim` | Defines the delimiter between columns in each row. |
`\1` | Yes |
| `collection.delim` | Defines the delimiter for items in an array
(collection type). | `\2` | Yes |
| `mapkey.delim` | Defines the delimiter between keys and values in a
map. | `\3` | Yes |
| `serialization.null.format` | Defines how `NULL` values are
represented in the text file. | `\\N` | Yes |
| `escape.delim` | Specifies the escape character used for escaping
special characters.| `\\` | Yes |
| `line.delim` | Defines the delimiter between rows or lines in the
file. | `\n` | Yes |

### Explanation:
- **`field.delim`**: This property is used to specify how columns are
separated in a row. For example, `\1` indicates the delimiter is a
non-printable character.
- **`collection.delim`**: Used to define how elements in an array or
collection are separated. In this case, `\2` is used as a separator.
- **`mapkey.delim`**: This property defines how keys and values in a map
are separated. For instance, `\3` is used to separate map keys and
values.
- **`serialization.null.format`**: This setting specifies the format for
`NULL` values in the data. `\\N` is commonly used to represent `NULL`.
- **`escape.delim`**: Defines the escape character used in the text file
for escaping special characters, such as the delimiter itself. Here,
`\\` is used as the escape character.
- **`line.delim`**: This property is used to specify the delimiter
between lines or rows. Typically, `\n` (newline) is used as the line
delimiter.
**Note**: Unlike the other delimiters, `line.delim` is not escaped. If
the content in the table contains the same character as the line
delimiter, it may cause query errors. However, the other delimiters
(`field.delim`, `collection.delim`, `mapkey.delim`) are escaped, so they
will not cause issues.
morningman added a commit that referenced this pull request Oct 22, 2024
…text insert and fix reading null string bug #42200 (#42272)

cherry pick from #42200

Co-authored-by: Socrates <suxiaogang223@icloud.com>
morningman added a commit that referenced this pull request Oct 22, 2024
…text insert and fix reading null string bug #42200 (#42273)

cherry pick from #42200

Co-authored-by: Socrates <suxiaogang223@icloud.com>
morningman pushed a commit that referenced this pull request Nov 1, 2024
## Proposed changes
fellow #42200

Null values ​​in text tables are now correctly recognized now.
morningman pushed a commit to morningman/doris that referenced this pull request Nov 1, 2024
## Proposed changes
fellow apache#42200

Null values ​​in text tables are now correctly recognized now.
morningman pushed a commit to morningman/doris that referenced this pull request Nov 1, 2024
## Proposed changes
fellow apache#42200

Null values ​​in text tables are now correctly recognized now.
github-actions bot pushed a commit that referenced this pull request Nov 1, 2024
## Proposed changes
fellow #42200

Null values ​​in text tables are now correctly recognized now.
@suxiaogang223 suxiaogang223 deleted the add_case_for_hive_text branch December 12, 2024 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants