Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 13 additions & 39 deletions backend/classification/file_loading.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,6 @@
Function utilities to convert data acquired on an OpenBCI
Cyton board using the SD card logging strategy.

TODO: We should look into optimizing this conversion. We currently
convert one line at a time, while a vectorized approach would be much more efficient,
as the conversion of a line does not depend on the other lines.
TODO: Consider cropping file (from bed to wake up time) here, before the for loop. Have to consider
not all lines hold sample values (i.e. first line with comment and second line with a single timestamp).

Expand All @@ -17,6 +14,7 @@
from mne import create_info
Comment thread
abelfodil marked this conversation as resolved.
from mne.io import RawArray
import numpy as np
import pandas as pd

from classification.exceptions import ClassificationError
from classification.config.constants import (
Expand All @@ -41,16 +39,19 @@ def get_raw_array(file):
Returns:
- mne.RawArray of the two EEG channels of interest
"""
lines = file.readlines()
eeg_raw = np.zeros((len(lines) - SKIP_ROWS, len(EEG_CHANNELS)))

for index, line in enumerate(lines[SKIP_ROWS:]):
line_splitted = line.decode('utf-8').split(',')
retained_columns = tuple(range(1, len(EEG_CHANNELS) + 1))

if len(line_splitted) < CYTON_TOTAL_NB_CHANNELS:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cela permettait de voir s'il y avait un problème dans le fichier envoyé. Par exemple, si le Cyton se ferme et se rallume momentanément, il y a aura deux commentaires qui vont indiquer le début de l'enregistrement. Voir docu %STOP AT et %START AT

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peut-être juste mettre un try catch autour read_csv pour les lignes de commentaire où le retained_columns ne seront pas présentes. On pourra renvoyer une erreur 400 avec l'explication dans le body.

Copy link
Copy Markdown
Contributor Author

@abelfodil abelfodil Nov 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Est-ce qu'on drop les lignes qui pètent sinon? pandas le fait bien

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk, s'il manque des samples le fichier ne vaut pas probablement pas grand chose

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Les lignes tu veux dire? Oui, c'est un peu embêtant. Dans le cas où il n'y a qu'un arrêt de ~ 30 secondes, on peut drop ces lignes. Dans le cas d'un arrêt de > 5 minutes, ça ne marcherait pas tant de juste drop les lignes, comme il y a eu un arrêt non négligeable. On ne supporte pas non plus une séquence de nuit non contigüe, ni dans la classification et ni dans les visualisations. C'est pourquoi je pensais refuser le fichier dans ce cas.
De toute façon, c'est un cas limite, on peut pour l'instant drop les mauvaises lignes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, j'ai catch l'erreur de pandas et raise ClassificationError

raise ClassificationError()
try:
eeg_raw = pd.read_csv(file,
skiprows=SKIP_ROWS,
usecols=retained_columns
).to_numpy()
except Exception:
raise ClassificationError()

eeg_raw[index] = _get_decimals_from_hexadecimal_strings(line_splitted)
hexstr_to_int = np.vectorize(_hexstr_to_int)
Comment thread
abelfodil marked this conversation as resolved.
eeg_raw = hexstr_to_int(eeg_raw)

raw_object = RawArray(
SCALE_V_PER_COUNT * np.transpose(eeg_raw),
Expand All @@ -71,38 +72,11 @@ def get_raw_array(file):
return raw_object


def _get_decimals_from_hexadecimal_strings(lines):
"""Converts the array of hexadecimal strings to an array of decimal values of the EEG channels
Input:
- lines: splitted array of two complement hexadecimal
Returns:
- array of decimal values for each EEG channel of interest
"""
return np.array([
_convert_hexadecimal_to_signed_decimal(hex_value)
for hex_value in lines[FILE_COLUMN_OFFSET:FILE_COLUMN_OFFSET + len(EEG_CHANNELS)]
])


def _convert_hexadecimal_to_signed_decimal(hex_value):
"""Converts the hexadecimal value encoded on OpenBCI Cyton SD card to signed decimal
Input:
- hex_value: signed hexadecimal value
Returns:
- decimal value
"""
return _get_twos_complement(hex_value) if len(hex_value) % 2 == 0 else 0


def _get_twos_complement(hexstr):
def _hexstr_to_int(hexstr):
"""Converts a two complement hexadecimal value in a string to a signed float
Input:
- hex_value: signed hexadecimal value
Returns:
- decimal value
"""
bits = len(hexstr) * 4
value = int(hexstr, 16)
if value & (1 << (bits - 1)):
value -= 1 << bits
return value
return int.from_bytes(bytes.fromhex(hexstr), byteorder='big', signed=True)
1 change: 1 addition & 0 deletions backend/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ onnxruntime==1.5.2
numpy==1.19.2
scipy==1.5.2
scikit-learn==0.23.2
pandas==1.1.4
requests==2.24.0
hmmlearn==0.2.4
certifi==2020.6.20