project
│
├── .env
├── .gitignore
├── data_generation_config.json
├── data_generator.py
├── database_connection.py
├── generate_excel.py
├── main.py
└── README.md
git clone <repository-url># Database
DATABASE_HOST=localhost
DATABASE_USER=root
DATABASE_PASSWORD=root
DATABASE_NAME=test
# Files
CONFIG_JSON_PATH=data_generation_config.json
OUTPUT_EXCEL_FILE=data.xlsx
# App
TO_EXCEL=True
TO_DATABASE=False
IF_TABLE_EXISTS=append- These are the database connection details. If you do not plan to save data to a database, you can skip these configurations.
CONFIG_JSON_PATH: The path to the JSON file containing the table configuration for data generation. An example structure of this file is provided below.
TO_EXCEL: Specifies if the generated data should be saved to an Excel file (TrueorFalse).TO_DATABASE: Specifies if the generated data should be inserted into the database (TrueorFalse).IF_TABLE_EXISTS: Defines the behavior when the table already exists (append,replace, orfail).
{
"table_name": {
"columns": [
{
"name": "column name",
"type": "column type (Note: This refers to data types provided by the Faker library, not SQL types like string, date, etc.)"
}
],
"num-rows": "number of rows to generate for this table"
}
}{
"company": {
"columns": [
{
"name": "name",
"type": "word"
},
{
"name": "slogan",
"type": "word"
}
],
"num-rows": 2
},
"dim_employee": {
"columns": [
{
"name": "company_id",
"type": "foreign-key",
"reference-table": "company"
},
{
"name": "name",
"type": "name",
"upper": true
},
{
"name": "status",
"type": "enum",
"enum": ["ACTIVE", "INACTIVE"]
}
],
"num-rows": 5
},
"product": {
"columns": [
{
"name": "company_id",
"type": "foreign-key",
"reference-table": "company"
},
{
"name": "name",
"type": "word",
"upper": true
},
{
"name": "category",
"type": "enum",
"enum": ["TECHNOLOGY", "HEALTH", "EDUCATION", "ENTERTAINMENT", "SPORTS"]
},
{
"name": "value",
"type": "double",
"min": 10,
"max": 1000,
"round": 2
}
],
"num-rows": 100
}
}- name: Generates a random name.
- word: Generates a random word.
- words: Generates multiple random words.
- email: Generates a random email address.
- date: Generates a random date from this year (date_this_year).
Note: If you are unfamiliar with the Faker library, it is recommended to explore its documentation to understand the available data types.
- foreign-key: Generates a foreign key, which will be a random number within the number of rows in the referenced table. Used when the table is linked to another.
- integer: Generates a random integer within a range defined by the min and max parameters.
- double: Generates a random floating-point number within the min and max range, with the possibility to round to a specific number of decimal places using the round parameter.
- enum: Generates a random value chosen from the values provided in the enum list.
These options allow you to modify the generated values:
- unique (boolean): Ensures that the generated value is unique within the table. This is available for string, number, email, and other types.
- upper (boolean): Converts the value to uppercase.
- lower (boolean): Converts the value to lowercase.
- title (boolean): Converts the value to title case (first letter capitalized).
- min (integer): Defines the minimum value for generating integers or floating-point numbers.
- max (integer): Defines the maximum value for generating integers or floating-point numbers.
- round (integer): Sets the number of decimal places to round generated double values.
python -m venv .venv.venv/Scripts/activatepython3 -m venv .venvsource .venv/bin/activatepip install -r requirements.txtpython ./main.py- Customize the
data_generation_config.jsonfile as per your data generation needs. - Explore the Faker library documentation to better understand how it works.
- If you added new data types in
data_generation_config.json, you will need to modify the code where the conditions for each data type are handled. This part of the code is in thedata_generator.pyfile, in the__generate_fake_valueand__apply_column_transformationsfunction.
For further clarification, refer to the comments and examples provided in this document.