-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Circular date representations to default date transmogrifier #100
Conversation
Codecov Report
@@ Coverage Diff @@
## master #100 +/- ##
==========================================
- Coverage 91.34% 86.08% -5.27%
==========================================
Files 131 297 +166
Lines 5234 9658 +4424
Branches 184 437 +253
==========================================
+ Hits 4781 8314 +3533
- Misses 453 1344 +891
Continue to review full report at Codecov.
|
* @return result feature of type Vector | ||
*/ | ||
def vectorize | ||
( | ||
dateListPivot: DateListPivot, | ||
referenceDate: JDateTime = TransmogrifierDefaults.ReferenceDate, | ||
trackNulls: Boolean = TransmogrifierDefaults.TrackNulls, | ||
circularDateRepresentations: Seq[TimePeriod] = TransmogrifierDefaults.CircularDateRepresentations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a ridiculously long param name ;) can we perhaps shorten to circularDateRep
? (same elsewhere)
val timePeriods = circularDateRepresentations.map(tp => f.toUnitCircle(tp, others)) | ||
val time = f.toDateList().vectorize(dateListPivot = dateListPivot, referenceDate = referenceDate, | ||
trackNulls = trackNulls, others = others.map(_.toDateList())) | ||
(timePeriods :+ time).combine() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (timePeriods.isEmpty) time else (timePeriods :+ time).combine()
(same elsewhere)
/** | ||
* Following: http://webspace.ship.edu/pgmarr/geo441/lectures/lec%2016%20-%20directional%20statistics.pdf | ||
* Transforms a Date or DateTime field into a cartesian coordinate representation | ||
* of an extracted time period on the unit circle | ||
* | ||
* @param timePeriod The time period to extract from the timestamp | ||
* parameter timePeriod The time period to extract from the timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean spark parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - it bugged me that it was highlighted red
* Monday is the first day of the week | ||
* & the first week of the year is the week wit the first Monday after Jan 1. | ||
*/ | ||
class DateMapToUnitCircleVectorizer[T <: DateMap] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please put this class into a separate file.
meta.withColumns(cols.toArray) | ||
} | ||
|
||
override def fitFn(dataset: Dataset[Seq[Map[String, Long]]]): SequenceModel[T, OPVector] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dataset[Seq[T#Value]]
val history = Map(in1.name -> FeatureHistory(originFeatures = in1.originFeatures, stages = in1.stages)) | ||
setMetadata(OpVectorMetadata(getOutputFeatureName, columns, history).toMetadata) | ||
radians.map(r => Array(math.cos(r), math.sin(r))). | ||
getOrElse(Array(0.0, 0.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please reformat: redundant endline.
val fitted = estimator.fit(inputData) | ||
val meta = OpVectorMetadata(fitted.getOutputFeatureName, fitted.getMetadata()) | ||
meta.columns.length shouldBe 4 | ||
meta.columns.map(_.grouping.get) shouldEqual Seq("a", "a", "b", "b") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta.columns.flatMap(_.grouping)
and meta.columns.flatMap(_.descriptorValue)
@@ -99,8 +99,8 @@ class DateToUnitCircleTransformerTest extends OpTransformerSpec[OPVector, DateTo | |||
val transformed = vectorizer.transform(ds) | |||
val meta = OpVectorMetadata(transformed.schema(vectorizer.getOutput().name)) | |||
meta.columns.length should equal (2) | |||
meta.columns(0).indicatorValue.get should equal("x_HourOfDay") | |||
meta.columns(1).indicatorValue.get should equal("y_HourOfDay") | |||
meta.columns(0).descriptorValue.get should equal("x_HourOfDay") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta.columns(0).descriptorValue should be Some("x_HourOfDay")
@@ -46,6 +46,7 @@ import org.junit.runner.RunWith | |||
import org.scalatest.junit.JUnitRunner | |||
import org.scalatest.{FlatSpec, Matchers} | |||
import org.slf4j.LoggerFactory | |||
// import org.apache.log4j.Level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant import
@@ -196,6 +196,13 @@ package object types extends FeatureTypeSparkConverters { | |||
implicit class VectorConversions(val v: Vector) extends AnyVal { | |||
def toOPVector: OPVector = new OPVector(v) | |||
} | |||
// Arrays | |||
implicit class ArrayDoubleConversions(val v: Array[Double]) extends AnyVal { | |||
def toReal: Seq[Real] = v.map(_.toReal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep the Array type in return value, ie. def toReal: Array[Real]
and def toRealNN: Array[RealNN]
@leahmcguire can you please update the description of the PR shortcuts & vectorizers were added (we will use it in our release notes later). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!!
Thanks for the contribution! Unfortunately we can't verify the commit author(s): leahmcguire <l***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request. |
Related issues
Adding circular date representations to default date transmogrifier
Describe the proposed solution
Added a DateMap circular transformer
Make Date circular transformer take a sequence instead of a single feature
Updated and added shortcuts for all Date and DateMap classes toUnitCircle and added default transformations to vectorize for all of these classes
Fixed metadata on Geolocation and GeolocationMap so that keep the name of the column in descriptorValue
Additional context
Current default is time ago from now but circular representations are better for capturing many business use cases