Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circular date representations to default date transmogrifier #100

Merged
merged 14 commits into from
Aug 30, 2018

Conversation

leahmcguire
Copy link
Collaborator

@leahmcguire leahmcguire commented Aug 28, 2018

Related issues
Adding circular date representations to default date transmogrifier

Describe the proposed solution
Added a DateMap circular transformer
Make Date circular transformer take a sequence instead of a single feature
Updated and added shortcuts for all Date and DateMap classes toUnitCircle and added default transformations to vectorize for all of these classes
Fixed metadata on Geolocation and GeolocationMap so that keep the name of the column in descriptorValue

Additional context
Current default is time ago from now but circular representations are better for capturing many business use cases

@tovbinm tovbinm changed the title Lm/circle dates Circle dates Aug 28, 2018
@tovbinm tovbinm changed the title Circle dates Circular date representations to default date transmogrifier Aug 28, 2018
@codecov
Copy link

codecov bot commented Aug 28, 2018

Codecov Report

Merging #100 into master will decrease coverage by 5.26%.
The diff coverage is 85.56%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #100      +/-   ##
==========================================
- Coverage   91.34%   86.08%   -5.27%     
==========================================
  Files         131      297     +166     
  Lines        5234     9658    +4424     
  Branches      184      437     +253     
==========================================
+ Hits         4781     8314    +3533     
- Misses        453     1344     +891
Impacted Files Coverage Δ
...om/salesforce/op/filters/FeatureDistribution.scala 97.87% <ø> (ø) ⬆️
...ala/com/salesforce/op/features/types/package.scala 51.44% <0%> (ø)
...lesforce/op/test/TestOpVectorMetadataBuilder.scala 0% <0%> (ø)
...rce/op/stages/OpPipelineStageReadWriteShared.scala 100% <100%> (ø)
.../scala/com/salesforce/op/dsl/RichDateFeature.scala 100% <100%> (ø) ⬆️
...sforce/op/stages/impl/feature/Transmogrifier.scala 96.74% <100%> (+0.05%) ⬆️
...s/impl/feature/DateMapToUnitCircleVectorizer.scala 100% <100%> (ø)
...stages/impl/feature/GeolocationMapVectorizer.scala 100% <100%> (ø) ⬆️
...com/salesforce/op/features/types/Geolocation.scala 90.9% <100%> (ø)
...m/salesforce/op/utils/spark/OpVectorMetadata.scala 82.97% <100%> (ø)
... and 169 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ecb902...c860c02. Read the comment docs.

* @return result feature of type Vector
*/
def vectorize
(
dateListPivot: DateListPivot,
referenceDate: JDateTime = TransmogrifierDefaults.ReferenceDate,
trackNulls: Boolean = TransmogrifierDefaults.TrackNulls,
circularDateRepresentations: Seq[TimePeriod] = TransmogrifierDefaults.CircularDateRepresentations,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a ridiculously long param name ;) can we perhaps shorten to circularDateRep? (same elsewhere)

val timePeriods = circularDateRepresentations.map(tp => f.toUnitCircle(tp, others))
val time = f.toDateList().vectorize(dateListPivot = dateListPivot, referenceDate = referenceDate,
trackNulls = trackNulls, others = others.map(_.toDateList()))
(timePeriods :+ time).combine()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (timePeriods.isEmpty) time else (timePeriods :+ time).combine() (same elsewhere)

/**
* Following: http://webspace.ship.edu/pgmarr/geo441/lectures/lec%2016%20-%20directional%20statistics.pdf
* Transforms a Date or DateTime field into a cartesian coordinate representation
* of an extracted time period on the unit circle
*
* @param timePeriod The time period to extract from the timestamp
* parameter timePeriod The time period to extract from the timestamp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean spark parameter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - it bugged me that it was highlighted red

* Monday is the first day of the week
* & the first week of the year is the week wit the first Monday after Jan 1.
*/
class DateMapToUnitCircleVectorizer[T <: DateMap]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put this class into a separate file.

meta.withColumns(cols.toArray)
}

override def fitFn(dataset: Dataset[Seq[Map[String, Long]]]): SequenceModel[T, OPVector] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataset[Seq[T#Value]]

val history = Map(in1.name -> FeatureHistory(originFeatures = in1.originFeatures, stages = in1.stages))
setMetadata(OpVectorMetadata(getOutputFeatureName, columns, history).toMetadata)
radians.map(r => Array(math.cos(r), math.sin(r))).
getOrElse(Array(0.0, 0.0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please reformat: redundant endline.

val fitted = estimator.fit(inputData)
val meta = OpVectorMetadata(fitted.getOutputFeatureName, fitted.getMetadata())
meta.columns.length shouldBe 4
meta.columns.map(_.grouping.get) shouldEqual Seq("a", "a", "b", "b")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta.columns.flatMap(_.grouping) and meta.columns.flatMap(_.descriptorValue)

@@ -99,8 +99,8 @@ class DateToUnitCircleTransformerTest extends OpTransformerSpec[OPVector, DateTo
val transformed = vectorizer.transform(ds)
val meta = OpVectorMetadata(transformed.schema(vectorizer.getOutput().name))
meta.columns.length should equal (2)
meta.columns(0).indicatorValue.get should equal("x_HourOfDay")
meta.columns(1).indicatorValue.get should equal("y_HourOfDay")
meta.columns(0).descriptorValue.get should equal("x_HourOfDay")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta.columns(0).descriptorValue should be Some("x_HourOfDay")

@@ -46,6 +46,7 @@ import org.junit.runner.RunWith
import org.scalatest.junit.JUnitRunner
import org.scalatest.{FlatSpec, Matchers}
import org.slf4j.LoggerFactory
// import org.apache.log4j.Level
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant import

@@ -196,6 +196,13 @@ package object types extends FeatureTypeSparkConverters {
implicit class VectorConversions(val v: Vector) extends AnyVal {
def toOPVector: OPVector = new OPVector(v)
}
// Arrays
implicit class ArrayDoubleConversions(val v: Array[Double]) extends AnyVal {
def toReal: Seq[Real] = v.map(_.toReal)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep the Array type in return value, ie. def toReal: Array[Real] and def toRealNN: Array[RealNN]

@tovbinm
Copy link
Collaborator

tovbinm commented Aug 30, 2018

@leahmcguire can you please update the description of the PR shortcuts & vectorizers were added (we will use it in our release notes later).

Copy link
Collaborator

@tovbinm tovbinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!!

@tovbinm tovbinm merged commit 52341f9 into master Aug 30, 2018
@tovbinm tovbinm deleted the lm/circleDates branch August 30, 2018 23:42
@salesforce-cla
Copy link

salesforce-cla bot commented Mar 7, 2021

Thanks for the contribution! Unfortunately we can't verify the commit author(s): leahmcguire <l***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants