2024-08-17: Dataiter 0.99

Adapt to changes in NumPy 2.0
Bump NumPy dependency to >= 2.0

This is a minimal change to be NumPy 2.0 compatible. In the 0.99+ releases, we plan to adopt the new NumPy string dtype and fix any regressions that come up, leading to a 1.0 release when everything looks to be working reliably (#26). Anyone looking for extreme stability should consider avoiding the 0.99+ releases and waiting for 1.0.

2024-06-24: Dataiter 0.51

Mark NumPy dependency as < 2.0

2024-04-06: Dataiter 0.50

ListOfDicts.drop_na: New method
ListOfDicts.keys: New method
ListOfDicts.print_memory_use: New method
Fix tabular display of Unicode characters with width != 1
Add dependency on wcwidth: https://pypi.org/project/wcwidth

2023-11-08: Dataiter 0.49

dt: Handle all NaT input
Migrate from setup.py to hatch and pyproject.toml

2023-10-08: Dataiter 0.48

Vector.as_datetime: Add precision argument
Vector.concat: New method
Vector.sort: Fix sorting object vectors

2023-09-09: Dataiter 0.47

DataFrame: Fix column and method name clash errors in certain operations
dt.replace: Allow vector arguments the same length as x

2023-09-05: Dataiter 0.46

DataFrame.count: New method, shorthand for data.group_by(...).aggregate(n=di.count())
Vector.rank: Handle empty and all-NA vectors

2023-06-14: Dataiter 0.45

USE_NUMBA_CACHE: New option, read from environment variable DATAITER_USE_NUMBA_CACHE if exists, defauls to True
Fix a possible issue with Numba caching

2023-06-13: Dataiter 0.44

Use numba.extending.overload instead of the deprecated numba.generated_jit

2023-06-08: Dataiter 0.43

DataFrame: Don't try to do joins on NA values in by columns
DataFrame.drop_na: New method

2023-05-30: Dataiter 0.42

DataFrame: Truncate multiline strings when printing
DataFrame.from_arrow: New method
DataFrame.read_parquet: New method
DataFrame.to_arrow: New method
DataFrame.write_parquet: New method
read_parquet: New function
Vector.__init__: Fix type guessing when mixing Python and NumPy floats or integers and missing values
Allow using a thousand separator when printing numbers, off by default, can be set with dataiter.PRINT_THOUSAND_SEPARATOR

2023-03-11: Dataiter 0.41

Fix printing really small numbers

2023-02-21: Dataiter 0.40.1

DataFrame.modify: Fix grouped modify on unsorted data frame

2023-02-20: Dataiter 0.40

Vector.map: Add dtype argument

2023-02-06: Dataiter 0.39.1

ListOfDicts.to_data_frame: Add strings_as_object argument

2023-01-21: Dataiter 0.39

read_csv, read_geojson, DataFrame.from_pandas, DataFrame.read_csv, GeoJSON.read: Add strings_as_object argument

2022-12-15: Dataiter 0.38

DataFrame.slice_off: New method
GeoJSON.to_data_frame: New method
Fix error with new column placeholder attributes in conjunction with pop, popitem and clear

2022-11-17: Dataiter 0.37

DataFrame: Add placeholder attributes for columns so that tab completion of columns as attributes at a shell works
dt.from_string: New function
dt.to_string: New function
nrow: Remove deprecated aggregation function
Don't use Numba for aggregation involving strings due to bad performance

2022-10-16: Dataiter 0.36

dt: New module for dealing with dates and datetimes

2022-10-03: Dataiter 0.35

DataFrame.from_pandas: Speed up by avoiding unnecessary conversions
DataFrame.full_join: Fix join and output when by is a tuple
GeoJSON: Fix printing object

2022-09-17: Dataiter 0.34

Vector: Handle timedeltas correctly for NA checks and printing
Vector.is_timedelta: New method

2022-09-03: Dataiter 0.33

DataFrame.sort: Convert object to string for sorting
Vector.sort: Convert object to string for sorting
Fix conditional Numba use when importing the numba package works, but caching doesn't
Add di-open cli command (currently not part of the default install, but can be installed from source using make install-cli)

2022-04-02: Dataiter 0.32

DataFrame.modify: Add support for grouped modification (#19)
DataFrame.split: New method
ListOfDicts.split: New method

2022-02-26: Dataiter 0.31

DataFrame.compare: New experimental method
Vector.as_string: Add length argument
Change the documentation to default to the latest release ("stable") instead of the development version ("latest")

2022-02-19: Dataiter 0.30

Use keyword-only arguments where appropriate – the general principle is that mandatory arguments are allowed as positional, but optional modifiers are keyword only
Rename all instances of "missing" to "na", such as Vector.is_missing to Vector.is_na, the only exception being ListOfDicts.fill_missing, which becomes ListOfDicts.fill_missing_keys
Truncate data frame object and string columns at PRINT_TRUNCATE_WIDTH (default 32) for printing

2022-02-09: Dataiter 0.29.2

Fix aggregation functions to work with all main data types: boolean, integer, float, date, datetime and string
Fix aggregation functions to handle all missing values (NaN, NaT, blank string) correctly, the same as implemented in Vector
Rename aggregation functions' dropna arguments to drop_missing
first, last, nth: Add drop_missing argument
Vector.drop_missing: New method

2022-01-30: Dataiter 0.29.1

mode: Fix to return first in case of ties (requires Python >= 3.8)
std, var: Add ddof argument (defaults to 0 on account of Numba limitations)
Don't try to dropna for non-float vectors in aggregation functions

2022-01-29: Dataiter 0.29

Add shorthand helper functions for use with DataFrame.aggregate, optionally using Numba JIT-compiled code for speed
DataFrame.map: New method
ncol: Removed
nrow: Deprecated in favor of dataiter.count
read_csv: New alias for DataFrame.read_csv
read_geojson: New alias for GeoJSON.read
read_json: New alias for ListOfDicts.read_json
read_npz: New alias for DataFrame.read_npz

2022-01-09: Dataiter 0.28

DataFrame: Make object columns work in various operations
DataFrame.from_json: Add arguments columns and dtypes
DataFrame.from_pandas: Add argument dtypes
DataFrame.full_join: Speed up
DataFrame.read_csv: Add argument dtypes
DataFrame.read_json: Add arguments columns and dtypes
GeoJSON.read: Add arguments columns and dtypes
ListOfDicts.fill_missing: New method
ListOfDicts.from_json: Add arguments keys and types
ListOfDicts.full_join: Speed up
ListOfDicts.read_csv: Add argument types, rename columns to keys
ListOfDicts.read_json: Add arguments keys and types

2022-01-01: Dataiter 0.27

DataFrame: Fix error message when column not found
DataFrame.aggregate: Speed up
DataFrame.full_join: Fix to join all possible columns
DataFrame.read_csv: Try to avoid mixed types
ListOfDicts.full_join: Fix to join all possible keys
ListOfDicts.write_csv: Use minimal quoting
Vector.get_memory_use: New method
Vector.rank: Rewrite, add method argument
*.read_*: Rename fname argument path
*.write_*: Rename fname argument path
Add comparison table dplyr vs. Dataiter vs. Pandas to documentation: https://dataiter.readthedocs.io/en/latest/comparison.html

2021-12-02: Dataiter 0.26

DataFrame.read_npz: New method to read NumPy npz format
DataFrame.write_npz: New method to write NumPy npz format
*.read_*: Decompress .bz2|.gz|.xz automatically
*.write_*: Compress .bz2|.gz|.xz automatically

2021-11-13: Dataiter 0.25

DataFrame.print_missing_counts: Fix when nothing missing
Vector.replace_missing: New method

2021-10-27: Dataiter 0.24

DataFrame.print_memory_use: New method
ListOfDicts.write_csv: Use less memory

2021-07-08: Dataiter 0.23

Vector.is_*: Change to be methods instead of properties
Drop deprecated use of np.int
Drop deprecated comparisons against NaN

2021-05-13: Dataiter 0.22

ListOfDicts.map: New method

2021-03-08: Dataiter 0.21

DataFrame.read_csv: Add columns argument
ListOfDicts.read_csv: Add columns argument

2021-03-06: Dataiter 0.20

DataFrame.*_join: Handle differing by names via tuple argument
ListOfDicts.*_join: Handle differing by names via tuple argument

2021-03-04: Dataiter 0.19

Use terminal window width as maximum print width
Vector.__init__: Handle NaN values in non-float vectors

2021-03-03: Dataiter 0.18

Vector.__init__: Accept generators/iterators
Vector.map: New method

2021-02-27: Dataiter 0.17

DataFrame.print_missing_counts: New method
GeoJSON.read: Handle properties differing between features
ListOfDicts.print_missing_counts: New method
Vector.as_object: New method

2020-10-03: Dataiter 0.16.1

GeoJSON.read: Use warnings, not errors for ignored excess feature keys

2020-09-26: Dataiter 0.16

GeoJSON: New class

2020-09-12: Dataiter 0.15

ListOfDicts.sort: Handle descending sort for all types

2020-08-22: Dataiter 0.14

ListOfDicts: Make obsoletion a warning instead of an error

2020-08-15: Dataiter 0.13

DataFrame: Fix error printing blank strings (#8)

2020-07-25: Dataiter 0.12

DataFrame.filter: Add colname_value_pairs argument
DataFrame.filter_out: Add colname_value_pairs argument
ListOfDicts.__init__: Remove arguments not intended for external use
ListOfDicts.rename: Preserve order of keys
Add documentation: https://dataiter.readthedocs.io/

2020-06-02: Dataiter 0.11

Vector.__init__: Speed up by fixing type deduction

2020-05-28: Dataiter 0.10.1

ListOfDicts.select: Fix return value (#7)

2020-05-21: Dataiter 0.10

DataFrame.aggregate: Fix UnicodeEncodeError with string columns
DataFrame.unique: Fix UnicodeEncodeError with string columns
ListOfDicts.select: Return keys in requested order
Vector.__repr__: Add custom conversion to string for display
Vector.__str__: Add custom conversion to string for display
Vector.to_string: Add custom conversion to string for display
Vector.to_strings: Add custom conversion to string for display

2020-05-11: Dataiter 0.9

Array: Rename to Vector
Vector.head: New method
Vector.range: New method
Vector.sample: New method
Vector.sort: New method
Vector.tail: New method
Vector.unique: New method

2020-05-10: Dataiter 0.8

DataFrame: New class
ListOfDicts.__add__: New method to support the + operator
ListOfDicts.__init__: Rename, reorder arguments
ListOfDicts.__mul__: New method to support the * operator
ListOfDicts.__repr__: New method, format as JSON
ListOfDicts.__rmul__: New method to support the * operator
ListOfDicts.__setitem__: New method, coerce to AttributeDict
ListOfDicts.__str__: New method, format as JSON
ListOfDicts.aggregate: Speed up
ListOfDicts.anti_join: New method
ListOfDicts.append: New method
ListOfDicts.clear: New method
ListOfDicts.extend: New method
ListOfDicts.full_join: New method
ListOfDicts.head: New method
ListOfDicts.inner_join: New method
ListOfDicts.insert: New method
ListOfDicts.join: Removed in favor of specific join types
ListOfDicts.left_join: New method
ListOfDicts.pluck: Add argument "default" to handle missing keys
ListOfDicts.print_: New method
ListOfDicts.read_csv: Add explicit arguments
ListOfDicts.read_json: Relay arguments to json.loads
ListOfDicts.read_pickle: New method
ListOfDicts.reverse: New method
ListOfDicts.sample: New method
ListOfDicts.semi_join: New method
ListOfDicts.sort: Change arguments to support sort direction better
ListOfDicts.tail: New method
ListOfDicts.to_data_frame: New method
ListOfDicts.to_pandas: New method
ListOfDicts.unique: Return unique by all keys if none given
ListOfDicts.write_csv: Add explicit arguments
ListOfDicts.write_pickle: New method

2019-12-03: Dataiter 0.7

Make sort handle None values, sorted last

2019-11-29: Dataiter 0.6

Fix ObsoleteError after multiple modifying actions

2019-11-10: Dataiter 0.5

Add read_csv
Add read_json
Add write_csv
Add write_json

2019-11-01: Dataiter 0.4

Fix ObsoleteError with deepcopy
Define __deepcopy__ so that copy.deepcopy works too
Add copy (and __copy__ for copy.copy)

2019-11-01: Dataiter 0.3

Mark ListOfDicts object obsolete thus preventing (accidental) use if a chained successor has modified the shared dicts
Add modify_if

2019-10-31: Dataiter 0.2

Speed up, mostly by avoiding copying (methods that modify dicts now do it in place rather than making a copy)

2019-09-29: Dataiter 0.1

Initial release

Files

NEWS.md

Latest commit

History