Cherrypick python dependencies from GP to Cloudberry#1067
Closed
tuhaihe wants to merge 7 commits intoapache:mainfrom
Closed
Cherrypick python dependencies from GP to Cloudberry#1067tuhaihe wants to merge 7 commits intoapache:mainfrom
tuhaihe wants to merge 7 commits intoapache:mainfrom
Conversation
Remove the following packages from gpdb7: - psutil and pyyaml: Use the corresponding packages from distro's repo. - pygresql: pygresql is replaced by psycopg2 which will be installed from distro's repo. Co-authored-by: Chen Mulong <chenmulong@gmail.com> Co-authored-by: Xing Guo <admin@higuoxing.com>
Due to the chaos of the python versions in DITROs, we plan to release
gpdb7 with:
- Use the ditro`s default python version (3.6 in el8) for the management
utilities.
- Use a more advanced python version (3.9) for plpython, so the users
can benefit from the active python community.
To do that:
- Use `python3` instead of `python` in the build/test scripts, since
`python` executable is not guaranteed to exist after installation of
python3.x package.
- Remove mock 1.0.1. mock is specified in the
`gpMgmt/requirements-dev.txt`. There is no need to have a egg file in
the repo. And the mock 1.0.1 will fail unit tests with python3.6:
```
======================================================================
ERROR: Test Suite Name|commands.test.unit.test_unit_gp|Test Case Name|test_is_gprecoverseg_running_succeeds|Test Details|
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/build/94afc7d4/gpdb_src/gpMgmt/bin/pythonSrc/ext/install/lib/python3.6/site-packages/mock-1.0.1-py3.6.egg/mock.py", line 1201, in patched
return func(*args, **keywargs)
File "/tmp/build/94afc7d4/gpdb_src/gpMgmt/bin/gppylib/commands/test/unit/test_unit_gp.py", line 191, in test_is_gprecoverseg_running_succeeds
result = is_gprecoverseg_running()
File "/tmp/build/94afc7d4/gpdb_src/gpMgmt/bin/gppylib/commands/gp.py", line 1635, in is_gprecoverseg_running
gprecoverseg_pidfile = os.path.join(get_coordinatordatadir(), `gprecoverseg.lock`, `PID`)
File "/usr/lib64/python3.6/posixpath.py", line 80, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not MagicMock
```
tuhaihe
added a commit
to tuhaihe/cloudberry-devops-release
that referenced
this pull request
Apr 27, 2025
Cherrypicks the Python dependencies changes from Greenplum to Cloudberry in this PR: apache/cloudberry#1067. So we need to update the configure script and install the Python dependencies from the distro's repos.
Member
Author
|
This PR relies on apache/cloudberry-devops-release#17. Will test to verify if it can work on the local machine. |
We're going to replace pygresql with psycopg2 in Greenplum, since it's maintained actively and offered by many system package managers which eases our pain in packing python packages. Besides, psycopg2 provides real status message returned by the server and we don't need to fake it ourselves. Co-Authored-By: Hao Zhang <hzhang2@vmware.com> Co-Authored-By: Hao Zhang <zhrt1446384557@gmail.com> Co-authored-by: Yongtao Huang <yongtaoh@vmware.com> Co-authored-by: Xing Guo <higuoxing@gmail.com>
This is the last patch for replacing pygresql with psycopg2 in Greenplum. This patch mainly targets the gpload. Benefits for replacing pygresql with psycopg2. - Psycopg2 is maintained actively we have encountered bugs that haven't been fixed by the upstream yet, e.g., https://github.com/greenplum-db/gpdb/pull/13953. - Psycopg2 is provided by Rocky Linux and Ubuntu. That is to say, we don't need to vendor it ourselves. - Possibly remove the `PYTHONPATH` from `greenplum_path.sh`, which is good for users that they don't need to worry about their Python environment being overwritten by Greenplum. Co-authored-by: Chen Mulong <chenmulong@gmail.com> Co-authored-by: Xiaoxiao He <hxiaoxiao@vmware.com>
This is the last patch for replacing pygresql with psycopg2 in Greenplum. This patch mainly targets the gpMgmt tools. Benefits for replacing pygresql with psycopg2. - Psycopg2 is maintained actively we have encountered bugs that haven't been fixed by the upstream yet, e.g., https://github.com/greenplum-db/gpdb/pull/13953. - Psycopg2 is provided by Rocky Linux and Ubuntu. That is to say, we don't need to vendor it ourselves. - Last but not least, we got a chance to clean up leacy codes during the removal process, e.g., https://github.com/greenplum-db/gpdb/pull/15983. After this patch, we need to do the following things. - Add psycopg2 as a dependency of the rpm/deb package. - Remove the pygresql source code tarball from the gpdb repo. - Tidy up READMEs and requirements.txt files. --------- Co-authored-by: Chen Mulong <chenmulong@gmail.com> Co-authored-by: Xiaoxiao He <hxiaoxiao@vmware.com> Co-authored-by: zhrt123 <hzhang2@vmware.com> Co-authored-by: Piyush Chandwadkar <pchandwadkar@vmware.com> Co-authored-by: Praveen Kumar <36772398+kpraveen457@users.noreply.github.com>
psycopg2's `getquoted()` API returns `latin-1` encoded binary string by default which is causing unexpected failures to some gpMgmt tools including gpload, analyzedb, minirepro. This patch helps fix it by teaching psycopg2's `QuotedString` adapter use `utf-8` encoding.
Reproducing steps for `analyzedb`:
Create a table with special name.
```sql
postgres=# create table spiegelungssätze(i int);
```
Run `analyzedb` against the postgres db.
```bash
$ analyzedb -d postgres
```
Backtrace:
```
➜ analyzedb -d postgres
20230910:21:51:11:689552 analyzedb:laptop:v-[INFO]:-Starting analyzedb with args: -d postgres
20230910:21:51:11:689552 analyzedb:laptop:v-[INFO]:-Getting and verifying input tables...
20230910:21:51:11:689552 analyzedb:laptop:v-[INFO]:-Checking for tables with stale stats...
20230910:21:51:11:689552 analyzedb:laptop:v-[ERROR]:-'utf-8' codec can't decode byte 0xe4 in position 21: invalid continuation byte
Traceback (most recent call last):
File "/home/v/.local/gpdb7/bin/analyzedb", line 376, in execute
heap_partitions = get_heap_tables_set(self.conn, input_tables_set) # set((schema1,table1), ...])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v/.local/gpdb7/bin/analyzedb", line 1004, in get_heap_tables_set
oid_str = get_oid_str(input_tables_set)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v/.local/gpdb7/bin/analyzedb", line 976, in get_oid_str
return ','.join(map((lambda x: regclass_schema_tbl(x[0], x[1])), table_list))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v/.local/gpdb7/bin/analyzedb", line 976, in <lambda>
return ','.join(map((lambda x: regclass_schema_tbl(x[0], x[1])), table_list))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v/.local/gpdb7/bin/analyzedb", line 984, in regclass_schema_tbl
return "to_regclass('%s')" % (escape_string(schema_tbl))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v/.local/gpdb7/lib/python/gppylib/utils.py", line 515, in escape_string
return adapted.getquoted().decode()[1:-1]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 21: invalid continuation byte
20230910:21:51:11:689552 analyzedb:laptop:v-[CRITICAL]:-analyzedb failed. (Reason=''utf-8' codec can't decode byte 0xe4 in position 21: invalid continuation byte') exiting...
```
Error occurs when we issue command `gpcheckcat -C pg_class`. Reported error is "[ERROR] executing: Cross consistency check for pg_class\n Execution error: name 'db' is not defined". This is because of use of an undefined variable 'db'. This commit fixes the issue by removing its usage. Authored-by: vrhappy <songlong88@126.com>
12 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
greenplum-db/gpdb-archive@6f9d85b
greenplum-db/gpdb-archive@bd54207
greenplum-db/gpdb-archive@52c7e0a95c
greenplum-db/gpdb-archive@6592485
greenplum-db/gpdb-archive@b5920e061b
greenplum-db/gpdb-archive@0d1e4d644e
greenplum-db/gpdb-archive@411fd01083
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheckmake -C src/test installcheck-cbdb-parallelImpact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions