Synthetic data from PARSynthesizer does not follow original data distribution #2230
Labels
data:sequential
Related to timeseries datasets
question
General question about the software
under discussion
Issue is currently being discussed
Environment details
If you are already running SDV, please indicate the following details about the environment in
which you are running it:
Problem description
Trying to create synthetic numeric values using PARSynthesizer returns values very close to the mean of the original distribution, with little variance between values.
The data is a simple table consisting of patient_id(sequence_id), mesure_id, measure_date_time(sequence key) and value of measurement.
The histograms of both distributions look like this:
What I already tried
I have tried different epoch values, running with a larger input dataset and the different RDT transforms.
Running the same data with the GaussianCopulaSynthesizer yields much better results, but I would lose the time series aspect of the original data.
Is this the expected behaviour of the PARSynthesizer or am I doing something wrong?
The text was updated successfully, but these errors were encountered: