This was discussed in here:
#3721
For testing/benchmarking dictionary encoding its useful to control the number of repeated values and it would also be good to optionally include null values. The ability to provide a custom alphabet would be handy for generating strings with unicode characters.
Also note that a simple PRNG should be used as the group has observed performance trouble with Mersenne Twister.
Reporter: Hatem Helal / @hatemhelal
Note: This issue was originally created as ARROW-4661. Please see the migration documentation for further details.