-
-
Notifications
You must be signed in to change notification settings - Fork 386
API
PyWhat has its own API, it will return a JSON object like:
{
"File Signatures": None,
"Regexes": {
"text": [
{
"Matched": "127.0.0.1",
"Regex Pattern": {
"Name": "Internet Protocol (IP) Address Version 4",
"Regex": "^((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?::[0-9]{1,5})?)$",
"plural_name": False,
"Description": None,
"Rarity": 0.7,
"URL": "https://www.shodan.io/host/",
"Tags": [
"Identifiers",
"Networking",
"IPv4"
],
"Boundaryless Regex": "((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?::[0-9]{1,5})?)"
}
}
]
}
}To use this API, run this code:
from pywhat import Identifier
id = Identifier()
id.identify(text)All parameters to identify() are keyword-only except the text itself.
id.identify(text,
only_text=True, # If this is True, PyWhat will not read data from the file
dist=None, # Distribution to use (see below for more info regarding Distributions)
key=None, # Key used for sorting, defaults to Keys.NONE (see below for more info regarding sorting)
reverse=False, # If this is True, the output is sorted in descending order
boundaryless=None, # Filter that defines what regexes should be boundaryless (see below for more info regarding boundaryless mode)
search_filenames=False # If this is True, PyWhat will search the name of a file for identifiable info
)PyWhat has its own filtration system. The core of it is a Filter class.
To filter out what regexes should be used or shown, we can use distributions. A distribution is a filter with regex list.
A nice use-case is Wannacry. Using distributions you can only get all the domains from malware (no crypto-addresses) and use that to auto-buy those domains if possible. Potentially stopping the malware if it has a built in kill-switch!
We start by importing the necessary libraries:
from pywhat import pywhat_tags, Distribution, FilterNow we can make a filter and a distribution:
filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]})
dist = Distribution(filter1)We only support:
- MinRarity. Rarity is a measure of how unlikely it is for something to be a false-positive. Rarity of 1 == it can't be a false positive.
Rarity of 0.1 == Very likely to be a false positive.
MinRarity is the absolute minimum you'll want to see. Up this to avoid false positives!
- MaxRarity
Max rarity is the absolute maximum rarity you want to see.
- Tags.
Every regex is tagged. To only use AWS specific tags, use
AWSas the tag.
To see all tags, run what --tags 😄 or
from pywhat import *
print(pywhat_tags)- ExcludeTags. What tags do you not want to see?
Let's make another filter:
from pywhat import pywhat_tags, Distribution, Filter
filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]})
filter2 = Filter({"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]})Distributions and Filters support logical operators! Want every tag that's in both filter1 and filter2?
from pywhat import pywhat_tags, Distribution, Filter
filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]})
filter2 = Filter({"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]})
dist = Distribution(filter1 & filter2)
r = identifier.Identifier(dist=dist)
r.identify(text)Or:
from pywhat import pywhat_tags, Distribution, Filter
filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]})
filter2 = Filter({"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]})
dist = Distribution(filter1)
dist &= Distribution(filter2)
r = identifier.Identifier(dist=dist)
r.identify(text)We also support logical or! Get all the items in distribution1 or distribution2!
from pywhat import pywhat_tags, Distribution, Filter
filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking", "AWS"], "ExcludeTags": ["Identifiers"]})
filter2 = Filter({"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]})
filter3 = Filter({"ExcludeTags": ["AWS"]})
dist = Distribution(filter1) | Distribution(filter2)
dist |= Distribution(filter3)
r = identifier.Identifier(dist=dist)
r.identify(text)There are 2 ways to use distributions with identifiers.
You can assign one per object:
r = Identifier(dist=dist)
r.identify(text)Or you can call it in the identifier:
no_networking_tags = Distribution(filter2)
r.identify(text, dist=no_networking_tags)To get more information use:
from pywhat import *
help(Filter)
help(Distribution)Pywhat supports sorting. You can get sorted output this way:
from pywhat import *
r = Identifier()
r.identify(text, key=Keys.RARITY) # returns matches sorted by rarity in ascending order
r2 = Identifier(key=Keys.MATCHED, reverse=True)
r2.identify(text) # returns matches sorted alphabetically in descending orderKeys.NAME # Sort by the name of regex pattern
Keys.RARITY # Sort by rarity
Keys.MATCHED # Sort by a matched string
Keys.NONE # No sorting is done (the default)PyWhat can check if input is a valid file/folder name or a path to a file. If it finds a folder match, PyWhat will recursively search it, and return matches for each file, with key value being the filename. When PyWhat is searching only text, this value is text. This behaviour is disabled in API. In order to search within files and folders, you can specify an only_text=False parameter.
out = r.identify("/Desktop/file.txt", only_text=False)File searching is enabled in CLI. To disable it pass -o or --only-text option.
API does not match inputs like "abcthm{kgh}jk" because the boundaryless mode is disabled by default. Boundaryless mode allows regexes to search within strings (in case of "abcthm{kgh}jk", pywhat can find "thm{kgh}" match). To enable it you need to create a filter denoting what regexes should be in boundaryless mode (see above for more info regarding the filtration system).
from pywhat import *
# All regexes that have 'Identifiers' or 'Cyber Security' tags and a rarity of 0.6 or higher will be in boundaryless mode.
boundaryless = Filter({"Tags": ["Identifiers", "Cyber Security"], "MinRarity": 0.6})
id = Identifier()
id.identify("abcthm{kgh}jk", boundaryless=boundaryless)