-
Notifications
You must be signed in to change notification settings - Fork 3
Boost parsing performance #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
623e008
4923ebc
eadef42
1d16d62
ddee023
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,9 +2,6 @@ | |
| Function utilities to convert data acquired on an OpenBCI | ||
| Cyton board using the SD card logging strategy. | ||
|
|
||
| TODO: We should look into optimizing this conversion. We currently | ||
| convert one line at a time, while a vectorized approach would be much more efficient, | ||
| as the conversion of a line does not depend on the other lines. | ||
| TODO: Consider cropping file (from bed to wake up time) here, before the for loop. Have to consider | ||
| not all lines hold sample values (i.e. first line with comment and second line with a single timestamp). | ||
|
|
||
|
|
@@ -17,6 +14,7 @@ | |
| from mne import create_info | ||
| from mne.io import RawArray | ||
| import numpy as np | ||
| import pandas as pd | ||
|
|
||
| from classification.exceptions import ClassificationError | ||
| from classification.config.constants import ( | ||
|
|
@@ -41,16 +39,19 @@ def get_raw_array(file): | |
| Returns: | ||
| - mne.RawArray of the two EEG channels of interest | ||
| """ | ||
| lines = file.readlines() | ||
| eeg_raw = np.zeros((len(lines) - SKIP_ROWS, len(EEG_CHANNELS))) | ||
|
|
||
| for index, line in enumerate(lines[SKIP_ROWS:]): | ||
| line_splitted = line.decode('utf-8').split(',') | ||
| retained_columns = tuple(range(1, len(EEG_CHANNELS) + 1)) | ||
|
|
||
| if len(line_splitted) < CYTON_TOTAL_NB_CHANNELS: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cela permettait de voir s'il y avait un problème dans le fichier envoyé. Par exemple, si le Cyton se ferme et se rallume momentanément, il y a aura deux commentaires qui vont indiquer le début de l'enregistrement. Voir docu
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Peut-être juste mettre un try catch autour read_csv pour les lignes de commentaire où le retained_columns ne seront pas présentes. On pourra renvoyer une erreur 400 avec l'explication dans le body.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Est-ce qu'on drop les lignes qui pètent sinon? pandas le fait bien
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. idk, s'il manque des samples le fichier ne vaut pas probablement pas grand chose
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Les lignes tu veux dire? Oui, c'est un peu embêtant. Dans le cas où il n'y a qu'un arrêt de ~ 30 secondes, on peut drop ces lignes. Dans le cas d'un arrêt de > 5 minutes, ça ne marcherait pas tant de juste drop les lignes, comme il y a eu un arrêt non négligeable. On ne supporte pas non plus une séquence de nuit non contigüe, ni dans la classification et ni dans les visualisations. C'est pourquoi je pensais refuser le fichier dans ce cas.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, j'ai catch l'erreur de pandas et raise ClassificationError |
||
| raise ClassificationError() | ||
| try: | ||
| eeg_raw = pd.read_csv(file, | ||
| skiprows=SKIP_ROWS, | ||
| usecols=retained_columns | ||
| ).to_numpy() | ||
| except Exception: | ||
| raise ClassificationError() | ||
|
|
||
| eeg_raw[index] = _get_decimals_from_hexadecimal_strings(line_splitted) | ||
| hexstr_to_int = np.vectorize(_hexstr_to_int) | ||
|
abelfodil marked this conversation as resolved.
|
||
| eeg_raw = hexstr_to_int(eeg_raw) | ||
|
|
||
| raw_object = RawArray( | ||
| SCALE_V_PER_COUNT * np.transpose(eeg_raw), | ||
|
|
@@ -71,38 +72,11 @@ def get_raw_array(file): | |
| return raw_object | ||
|
|
||
|
|
||
| def _get_decimals_from_hexadecimal_strings(lines): | ||
| """Converts the array of hexadecimal strings to an array of decimal values of the EEG channels | ||
| Input: | ||
| - lines: splitted array of two complement hexadecimal | ||
| Returns: | ||
| - array of decimal values for each EEG channel of interest | ||
| """ | ||
| return np.array([ | ||
| _convert_hexadecimal_to_signed_decimal(hex_value) | ||
| for hex_value in lines[FILE_COLUMN_OFFSET:FILE_COLUMN_OFFSET + len(EEG_CHANNELS)] | ||
| ]) | ||
|
|
||
|
|
||
| def _convert_hexadecimal_to_signed_decimal(hex_value): | ||
| """Converts the hexadecimal value encoded on OpenBCI Cyton SD card to signed decimal | ||
| Input: | ||
| - hex_value: signed hexadecimal value | ||
| Returns: | ||
| - decimal value | ||
| """ | ||
| return _get_twos_complement(hex_value) if len(hex_value) % 2 == 0 else 0 | ||
|
|
||
|
|
||
| def _get_twos_complement(hexstr): | ||
| def _hexstr_to_int(hexstr): | ||
| """Converts a two complement hexadecimal value in a string to a signed float | ||
| Input: | ||
| - hex_value: signed hexadecimal value | ||
| Returns: | ||
| - decimal value | ||
| """ | ||
| bits = len(hexstr) * 4 | ||
| value = int(hexstr, 16) | ||
| if value & (1 << (bits - 1)): | ||
| value -= 1 << bits | ||
| return value | ||
| return int.from_bytes(bytes.fromhex(hexstr), byteorder='big', signed=True) | ||
Uh oh!
There was an error while loading. Please reload this page.