Skip to content

Seperate out importing code#208

Closed
bagel897 wants to merge 8 commits intolyz-code:masterfrom
bagel897:master
Closed

Seperate out importing code#208
bagel897 wants to merge 8 commits intolyz-code:masterfrom
bagel897:master

Conversation

@bagel897
Copy link

I'm working on creating a plugin for autoimport for pylsp and would find it useful if we could get a list of all possible imports without formatting the file so we can implement autoimport when you autocomplete, not on formatting. I split off the relevant parts off to the SourceCodeBase class and exposed two new functions - find_package and get_all_packages. I don't need find_package, but it was largely implemented and could lead to a better solution if get_all_packages doesn't work.

Checklist

  • Add search for modules to get_all_packages
  • Implement caching for get_all_packages.
  • Add test cases to all the changes you introduce
  • Update the documentation for the changes

@lyz-code
Copy link
Owner

Thanks @bageljrkhanofemus , thanks for starting the contribution.

As far as I can see right now it looks good. Can't wait to see it finished :)

Maybe you're interested in #39 too

@bagel897
Copy link
Author

bagel897 commented Apr 1, 2022

So a little update, I'm working on trying to find out what modules are installed, and wanted to see how rope handles this part. So it turns out Rope has their own implementation (with caching and a similar featureset) here.

  1. It has the same issue, it doesn't automatically find standard library modules
  2. It can't handle "Resources" outside of the ropeproject

However, it can determine the python path(s) of the current project, which will make that part easier.
I think the best approach is to de-duplicate the work, use the rope autoimport code as a backend (if/once I can make it detect standard modules) and replace the actually finding imports here with that.

@lyz-code
Copy link
Owner

lyz-code commented Apr 1, 2022

Hi @bageljrkhanofemus I've got some doubts, please bear with me as it's been some time since I dove into autoimports code:

  • I haven't used LSP before, why do you need to run autoimport at autocomplete? will that update the file when you try to autocomplete a variable?
  • I've skimmed through the code of the rope project, it looks good, but beside delegating in them the code I don't see the benefits of using their backend. Can you share your thoughts on the strong points of this approach?
  • As far as I remember, we currently know what modules are installed, what is the problem here?
  • I don't see any installation guide of rope, would it mean that we add them as a library dependency? or will the users need to install additional software to make autoimport work?

What I do see is a great idea is:

  • Build a cache of the installed packages and the project objects.
  • Build a tree of the project files so that we can implement relative imports

I'm very excited of your ideas and implementation, please keep me updated

@bagel897
Copy link
Author

bagel897 commented Apr 1, 2022

Autoimport while writing code

image
So say I'm writing code, and want to include typing's List class in my code. The current workflow is

  1. I type out the List manually
  2. I get a undefined name error
  3. I save, formatting the code, triggering a run of autoimport (automatically via pylsp-autoimport or manually via autoimport)

A better workflow would be to

  1. Determine what modules are available (possibly with caching)
  2. When the user is typing, they'll see the option for autocompleting List, with the description that it imports from typing
  3. When the user selects it, it finds out what line of code it needs to place the import statement and does so, without saving the file.

How do I do that

Autoimport does store a list of all available imports for the project, typing, and common statements. However, for a PYTHONPATH module search, it uses importlib to determine which python files we can use. Since we can't assume autoimport is running from the same path as the project its working on, this is unsatsifactory for 2 reasons.

  1. It may have issues if autoimport and the project have different modules available, which will almost certainly be the case
  2. It can't give us available imports ahead of time (as far as I can tell)

So what the code needs to do is

  1. find the pythonpath of the project (Rope does this by storing it in preferences)
  2. find the list of modules in each python path and watch for changes (WIP)
  3. Cache the list of available imports for each modules (both your extract package objects and rope can do this)
  4. Return the correct import for the given name for postfixing, or all imports for autocompletion
  5. cache the result (rope can mostly do this)

So rope has a couple of advantages for embedding in pylsp, namely:

  1. rope is already an optional dependency of pylsp
  2. rope is designed to be embedded and has been embedded in pylsp
  3. it has the ability to watch for changes in the project (which can probably be expanded to external modules)
  4. it has less dependencies than autoimport (it just depends on itself and python and its installable via pip)
  5. It has information on project files and their locations relative to each other (though I'm not sure if it uses that currently).

Progress so far

I got rope to cache all the packages in my site-packages (took like 3 minutes)
image
I got autoimport to do something similar, but it seems slower
image

@lyz-code
Copy link
Owner

lyz-code commented Apr 1, 2022

Thanks for the explanation it was cristal clear.

I see your point on the need of being able to run autoimport at autocompletion time, I'm thrilled to see the outcome of this PR, it looks like you're giving the package an awesome twist, thank you.

I'm curious too to know how does rope manage the cache and how does it watch the changes in the pythonpath.

We currently find the PYTHONPATH with pyprojroot, I don't know if it can fulfill your needs. I wouldn't have any problem ditching it in favour of rope.

Thanks again :)

@bagel897
Copy link
Author

bagel897 commented Apr 1, 2022

While that is useful (especially since rope needs that anyway), it doesn't give us the full information
Our imports basically come from 2 sources

  1. Items inside the project
    a) we can find the project location with pyprojectroot
    b) rope needs to know the project location
  2. Items found from installed modules and python standard library which actually consists of multiple paths (IE: /usr/lib/python3.10, /usr/lib/python3.10/site-pacakges, local folders, venvs, etc). Rope uses a configuration file and appends it to sys.path to get this

Rope manages a cache for the project itself, watches for changes and updates that accordingly (rope's codebase is larger so I haven't read through the whole thing yet) and it stores its cache in a .ropeproject folder.
However, it's cache doesn't handle modules being removed/added.
I've got a basic layer where autoimport uses rope under the hood. It has a better module name parser and is faster at parsing the entire stdlib. I'm working on how to store/update the data for external modules in a convenient manner.

@lyz-code
Copy link
Owner

lyz-code commented Apr 5, 2022

I've given it a little bit of thought and this is what I came out with:

To make the autocompletion quicker, we could split the reading of cache and the update. So when we use the autocomplete it will only fetch items from the cache, and when it saves it updates the cache. Another option would be to only read the cache when saving and autocompleting, and have a daemon in another process that is monitoring the file changes and update the cache then. An alternative to rope's system could be using inotify to monitor the file changes.

Right now the vim's integration with Ale runs autoimport in a temp file in tmp so it's difficult to know what project does that file belong to. After seeing other integrations (flakehell, pylint, mypy), there's a possibility to run ale with a cwd argument that specifies the project's root directory. So we can assume that we know the original file path.

@bagel897
Copy link
Author

bagel897 commented Apr 5, 2022

inotify would only work when you're editing the file. Say I do this

  1. Install tensorflow (large example module)
  2. Start editing
  3. Stop editing (and autoimport)
  4. uninstall tensorflow or update tensorflow
  5. Start editing again

I don't know a good way to watch for 4 besides hashing the directory or something.
Right now I'm working on the rope version. At a minimum, the AST parser is much quicker than the inspect based one and avoids importing modules.
As for embedding, I mean to embed rope directly to pylsp, but will still try to contribute any improvements here (as a CLI interface).

@lyz-code
Copy link
Owner

lyz-code commented Apr 6, 2022

One way of monitoring 4 could be by package manager hooks, I'm currently using PDM, and it looks like it's easy to write a plugin that updates the cache on package changes. I will add it as an optional feature so as not to force the users to use PDM

@lieryan
Copy link

lieryan commented Apr 7, 2022

Simplest way is to path.stat() and check for filesystem metadata (stat_result), specifically the st_mtime (modification time) and st_size (file size), and maybe the st_ctime (creation time) too. That's how Git quickly detects file changes.

It's not fool-proof since mtime can be modified manually, but it'll be much faster than any methods that reads off the file content like hashing the whole directory and it should work for the vast majority of the time.

Compared to package manager hooks, it would not require modifying the virtualenv directory nor for the editor's plugin virtualenv to be the same as the project virtualenv.

Compared to inotify, saving file metadata would not require a deamon that needs to remain alive while the files/folder was being modified, so when using inotify, you will still need a way to find changes at startup since the daemon shuts down. inotify is faster than metadata scanning though, so for when you already have a persistent daemon (e.g. pylsp), inotify can be a good optimization.

@bagel897
Copy link
Author

bagel897 commented Apr 7, 2022

Woah that's smart, I'll add it to my code for updating packages. That's going to be way faster than hashing anything.

@bagel897
Copy link
Author

Update:
I've been working on a rope autoimport backend, which analyzes all the available modules and will try to find names in them. In rope 1.1.1, it now works and can be used. I wrote code using that for pylsp for interactive. I wrote a basic wrapper for the rope version in here. However, it has several disadvantages:

  1. It will have significantly more imports to choose from, since rope-autoimport scans all available packages. Unlike pylsp, the cli tool cannot choose
  2. It will be slower, because it has to index all these modules
  3. They have different goals and therefore performance needs - CLI knows whats missing ahead of time, pylsp doesn't.

The first two can be solved by scanning the pyproject.toml for dependencies and using those (which I will eventually implement in rope). However, the goal of that project is interactive autoimport in pylsp and I will not finish this PR since it is out of scope.

Thank you for your patience and help.

@bagel897 bagel897 closed this May 25, 2022
@lyz-code
Copy link
Owner

I'm so sad to hear that @bageljrkhanofemus , I completely understand though. Can you please add a link to your project so that if anyone want's to continue this initiative can take it as a reference?

@bagel897
Copy link
Author

bagel897 commented May 26, 2022

  1. Most of the work is in rope's new sqlite autoimport. Its documented in docs/library.rst and docstrings in source. Ideally, this will be moved to RTD. If there are bugs/features, file an issue and tag me.
  2. This PR has a partially working implementation which someone could use as a base. I haven't touched it in a while but the API will not have changed.
  3. Add a plugin to provide autoimport functionality python-lsp/python-lsp-server#199. Is another front-end for rope autoimport with some features like more advanced import sorting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants