Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[text analytics] opinion mining support #12542

Merged
merged 22 commits into from
Jul 31, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
7755968
have ta client hold onto its api version in a private property
iscai-msft Jul 14, 2020
43e72dc
implement design for opinion mining
iscai-msft Jul 14, 2020
a3b4510
add tests
iscai-msft Jul 14, 2020
32ddecc
improve docstrings
iscai-msft Jul 14, 2020
ea9fca3
made ApiVersion enum all uppercase
iscai-msft Jul 14, 2020
590dd11
update changelog
iscai-msft Jul 14, 2020
399571a
check if positive and negative confidence scores are not None since s…
iscai-msft Jul 15, 2020
1ef46f9
improve code flow in sentiment endpoinnt
iscai-msft Jul 15, 2020
858e70d
pylint
iscai-msft Jul 15, 2020
e83b145
whitespace to update PR
iscai-msft Jul 15, 2020
ffe563d
unify structure of sentiment objects
iscai-msft Jul 16, 2020
09cf3fd
pylint
iscai-msft Jul 16, 2020
7ee99e0
fix null checks in aspect tests
iscai-msft Jul 20, 2020
e9fee50
add trailing whitespace to init __all__
iscai-msft Jul 21, 2020
77c7fa8
separate out json pointer parsing and add unittest
iscai-msft Jul 21, 2020
87b4a80
add that show_aspects is only available in v3.1-preview.1 in sentimen…
iscai-msft Jul 21, 2020
83426aa
switch to kwarg mine_opinions and MinedOpinion body
iscai-msft Jul 29, 2020
7dc065c
Merge branch 'master' of https://github.com/Azure/azure-sdk-for-pytho…
iscai-msft Jul 30, 2020
273fb94
change kwarg from mine_opinions to show_opinion_mining
iscai-msft Jul 30, 2020
4180ad0
change test names from aspect_based_sentiment_analysis to opinion_mining
iscai-msft Jul 30, 2020
5b804d2
add test for opinion mining with no mined opinions
iscai-msft Jul 30, 2020
f03b477
have no mined opinions return as [] for v3.1-preview.1, and None for …
iscai-msft Jul 30, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions sdk/textanalytics/azure-ai-textanalytics/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
- We are now targeting the service's v3.1-preview.1 API as the default. If you would like to still use version v3.0 of the service,
pass in `v3.0` to the kwarg `api_version` when creating your TextAnalyticsClient
- We have added an API `recognize_pii_entities` which returns entities containing personal information for a batch of documents. Only available for API version v3.1-preview.1 and up.
- We now have added support for opinion mining. To use this feature, you need to make sure you are using the service's
v3.1-preview.1 API. To get this support pass `show_opinion_mining` as True when calling the `analyze_sentiment` endpoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you gloating

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I'm smiling - like the emoji is


## 5.0.0 (2020-07-27)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,11 @@
TextDocumentBatchStatistics,
SentenceSentiment,
SentimentConfidenceScores,
MinedOpinion,
AspectSentiment,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to call this SentimentAspect and SentimentOpinion as this represents the Aspect of the Sentiment and not the Aspect's sentiment.
I feel with AspectSentiment it suggests the latter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have DocumentSentiment, SentenceSentiment, so if we follow the pattern. It would be AspectSentiment and OpinionSentiment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah after talking to @annelo-msft , she brought up a really good point where these all have the same basic structure (have confidence_scores, sentiment etc), and it's more important for a user to have a logical pattern than for the individual names to be the best English.

OpinionSentiment,
RecognizePiiEntitiesResult,
PiiEntity
PiiEntity,
)

__all__ = [
Expand All @@ -51,6 +54,9 @@
'TextDocumentBatchStatistics',
'SentenceSentiment',
'SentimentConfidenceScores',
'MinedOpinion',
'AspectSentiment',
'OpinionSentiment',
'RecognizePiiEntitiesResult',
'PiiEntity',
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,14 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# ------------------------------------
import re
from ._generated.v3_0.models._models import (
LanguageInput,
MultiLanguageInput
)

from ._generated.v3_0.models._models import LanguageInput
from ._generated.v3_0.models._models import MultiLanguageInput

def _get_indices(relation):
return [int(s) for s in re.findall(r"\d+", relation)]

class DictMixin(object):

Expand Down Expand Up @@ -702,19 +706,34 @@ class SentenceSentiment(DictMixin):
and 1 for the sentence for all labels.
:vartype confidence_scores:
~azure.ai.textanalytics.SentimentConfidenceScores
:ivar mined_opinions: The list of opinions mined from this sentence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be good to add a ask.ms link to service documentation here about Opinion Mining?

For example in "The food is good, but the service is bad", we would
mind these two opinions "food is good", "service is bad". Only returned
if `show_opinion_mining` is set to True in the call to `analyze_sentiment`.
:vartype mined_opinions:
list[~azure.ai.textanalytics.MinedOpinion]
"""

def __init__(self, **kwargs):
self.text = kwargs.get("text", None)
self.sentiment = kwargs.get("sentiment", None)
self.confidence_scores = kwargs.get("confidence_scores", None)
self.mined_opinions = kwargs.get("mined_opinions", None)

@classmethod
def _from_generated(cls, sentence):
def _from_generated(cls, sentence, results):
if hasattr(sentence, "aspects"):
mined_opinions = (
[MinedOpinion._from_generated(aspect, results) for aspect in sentence.aspects] # pylint: disable=protected-access
if sentence.aspects else []
)
else:
mined_opinions = None
return cls(
text=sentence.text,
sentiment=sentence.sentiment,
confidence_scores=SentimentConfidenceScores._from_generated(sentence.confidence_scores), # pylint: disable=protected-access
mined_opinions=mined_opinions
)

def __repr__(self):
Expand All @@ -724,6 +743,150 @@ def __repr__(self):
repr(self.confidence_scores)
)[:1024]

class MinedOpinion(DictMixin):
"""A mined opinion object represents an opinion we've extracted from a sentence.
It consists of both an aspect that these opinions are about, and the actual
opinions themselves.

:ivar aspect: The aspect of a product/service that this opinion is about
:vartype aspect: ~azure.ai.textanalytics.AspectSentiment
:ivar opinions: The actual opinions of the aspect
:vartype opinions: list[~azure.ai.textanalytics.OpinionSentiment]
"""

def __init__(self, **kwargs):
self.aspect = kwargs.get("aspect", None)
self.opinions = kwargs.get("opinions", None)

@staticmethod
def _get_opinions(relations, results):
if not relations:
return []
opinion_relations = [r.ref for r in relations if r.relation_type == "opinion"]
opinions = []
for opinion_relation in opinion_relations:
nums = _get_indices(opinion_relation)
document_index = nums[0]
sentence_index = nums[1]
opinion_index = nums[2]
opinions.append(
results[document_index].sentences[sentence_index].opinions[opinion_index]
)
return opinions

@classmethod
def _from_generated(cls, aspect, results):
return cls(
aspect=AspectSentiment._from_generated(aspect), # pylint: disable=protected-access
opinions=[
OpinionSentiment._from_generated(opinion) for opinion in cls._get_opinions(aspect.relations, results) # pylint: disable=protected-access
],
)

def __repr__(self):
return "MinedOpinion(aspect={}, opinions={})".format(
repr(self.aspect),
repr(self.opinions)
)[:1024]


class AspectSentiment(DictMixin):
"""AspectSentiment contains the related opinions, predicted sentiment,
confidence scores and other information about an aspect of a product.
An aspect of a product/service is a key component of that product/service.
For example in "The food at Hotel Foo is good", "food" is an aspect of
"Hotel Foo".

:ivar str text: The aspect text.
:ivar str sentiment: The predicted Sentiment for the aspect. Possible values
include 'positive', 'mixed', and 'negative'.
:ivar confidence_scores: The sentiment confidence score between 0
and 1 for the aspect for 'positive' and 'negative' labels. It's score
for 'neutral' will always be 0
:vartype confidence_scores:
~azure.ai.textanalytics.SentimentConfidenceScores
:ivar int offset: The aspect offset from the start of the sentence.
:ivar int length: The length of the aspect.
"""

def __init__(self, **kwargs):
self.text = kwargs.get("text", None)
self.sentiment = kwargs.get("sentiment", None)
self.confidence_scores = kwargs.get("confidence_scores", None)
self.offset = kwargs.get("offset", None)
self.length = kwargs.get("length", None)

@classmethod
def _from_generated(cls, aspect):
return cls(
text=aspect.text,
sentiment=aspect.sentiment,
confidence_scores=SentimentConfidenceScores._from_generated(aspect.confidence_scores), # pylint: disable=protected-access
offset=aspect.offset,
length=aspect.length
)

def __repr__(self):
return "AspectSentiment(text={}, sentiment={}, confidence_scores={}, offset={}, length={})".format(
self.text,
self.sentiment,
repr(self.confidence_scores),
self.offset,
self.length
)[:1024]


class OpinionSentiment(DictMixin):
"""OpinionSentiment contains the predicted sentiment,
confidence scores and other information about an opinion of an aspect.
For example, in the sentence "The food is good", the opinion of the
aspect 'food' is 'good'.

:ivar str text: The opinion text.
:ivar str sentiment: The predicted Sentiment for the opinion. Possible values
include 'positive', 'mixed', and 'negative'.
:ivar confidence_scores: The sentiment confidence score between 0
and 1 for the opinion for 'positive' and 'negative' labels. It's score
for 'neutral' will always be 0
:vartype confidence_scores:
~azure.ai.textanalytics.SentimentConfidenceScores
:ivar int offset: The opinion offset from the start of the sentence.
:ivar int length: The length of the opinion.
:ivar bool is_negated: Whether the opinion is negated. For example, in
"The food is not good", the opinion "good" is negated.
"""

def __init__(self, **kwargs):
self.text = kwargs.get("text", None)
self.sentiment = kwargs.get("sentiment", None)
self.confidence_scores = kwargs.get("confidence_scores", None)
self.offset = kwargs.get("offset", None)
self.length = kwargs.get("length", None)
self.is_negated = kwargs.get("is_negated", None)

@classmethod
def _from_generated(cls, opinion):
return cls(
text=opinion.text,
sentiment=opinion.sentiment,
confidence_scores=SentimentConfidenceScores._from_generated(opinion.confidence_scores), # pylint: disable=protected-access
offset=opinion.offset,
length=opinion.length,
is_negated=opinion.is_negated
)

def __repr__(self):
return (
"OpinionSentiment(text={}, sentiment={}, confidence_scores={}, offset={}, length={}, is_negated={})".format(
self.text,
self.sentiment,
repr(self.confidence_scores),
self.offset,
self.length,
self.is_negated
)[:1024]
)


class SentimentConfidenceScores(DictMixin):
"""The confidence scores (Softmax scores) between 0 and 1.
Expand All @@ -738,15 +901,15 @@ class SentimentConfidenceScores(DictMixin):
"""

def __init__(self, **kwargs):
self.positive = kwargs.get('positive', None)
self.neutral = kwargs.get('neutral', None)
self.negative = kwargs.get('negative', None)
self.positive = kwargs.get('positive', 0.0)
self.neutral = kwargs.get('neutral', 0.0)
self.negative = kwargs.get('negative', 0.0)

@classmethod
def _from_generated(cls, score):
return cls(
positive=score.positive,
neutral=score.neutral,
neutral=score.neutral if hasattr(score, "netural") else 0.0,
negative=score.negative
)

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# ------------------------------------
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# ------------------------------------
from enum import Enum
from typing import TYPE_CHECKING

if TYPE_CHECKING:
from typing import Union


class ApiVersion(str, Enum):
"""Text Analytics API versions supported by this package"""

#: this is the default version
V3_1_PREVIEW_1 = "v3.1-preview.1"
V3_0 = "v3.0"


def load_generated_api(api_version, aio=False):
try:
# api_version could be a string; map it to an instance of ApiVersion
# (this is a no-op if it's already an instance of ApiVersion)
api_version = ApiVersion(api_version)
except ValueError:
# api_version is unknown to ApiVersion
raise NotImplementedError(
"This package doesn't support API version '{}'. ".format(api_version)
+ "Supported versions: {}".format(", ".join(v.value for v in ApiVersion))
)

if api_version == ApiVersion.V3_1_PREVIEW_1:
if aio:
from ._generated.v3_1_preview_1.aio import TextAnalyticsClient
else:
from ._generated.v3_1_preview_1 import TextAnalyticsClient # type: ignore
elif api_version == ApiVersion.V3_0:
if aio:
from ._generated.v3_0.aio import TextAnalyticsClient # type: ignore
else:
from ._generated.v3_0 import TextAnalyticsClient # type: ignore
return TextAnalyticsClient
Original file line number Diff line number Diff line change
Expand Up @@ -108,14 +108,14 @@ def wrapper(response, obj, response_headers): # pylint: disable=unused-argument
if hasattr(item, "error"):
results[idx] = DocumentError(id=item.id, error=TextAnalyticsError._from_generated(item.error)) # pylint: disable=protected-access
else:
results[idx] = func(item)
results[idx] = func(item, results)
return results

return wrapper


@prepare_result
def language_result(language):
def language_result(language, results): # pylint: disable=unused-argument
return DetectLanguageResult(
id=language.id,
primary_language=DetectedLanguage._from_generated(language.detected_language), # pylint: disable=protected-access
Expand All @@ -125,7 +125,7 @@ def language_result(language):


@prepare_result
def entities_result(entity):
def entities_result(entity, results): # pylint: disable=unused-argument
return RecognizeEntitiesResult(
id=entity.id,
entities=[CategorizedEntity._from_generated(e) for e in entity.entities], # pylint: disable=protected-access
Expand All @@ -135,7 +135,7 @@ def entities_result(entity):


@prepare_result
def linked_entities_result(entity):
def linked_entities_result(entity, results): # pylint: disable=unused-argument
return RecognizeLinkedEntitiesResult(
id=entity.id,
entities=[LinkedEntity._from_generated(e) for e in entity.entities], # pylint: disable=protected-access
Expand All @@ -145,7 +145,7 @@ def linked_entities_result(entity):


@prepare_result
def key_phrases_result(phrases):
def key_phrases_result(phrases, results): # pylint: disable=unused-argument
return ExtractKeyPhrasesResult(
id=phrases.id,
key_phrases=phrases.key_phrases,
Expand All @@ -155,18 +155,18 @@ def key_phrases_result(phrases):


@prepare_result
def sentiment_result(sentiment):
def sentiment_result(sentiment, results):
return AnalyzeSentimentResult(
id=sentiment.id,
sentiment=sentiment.sentiment,
warnings=[TextAnalyticsWarning._from_generated(w) for w in sentiment.warnings], # pylint: disable=protected-access
statistics=TextDocumentStatistics._from_generated(sentiment.statistics), # pylint: disable=protected-access
confidence_scores=SentimentConfidenceScores._from_generated(sentiment.confidence_scores), # pylint: disable=protected-access
sentences=[SentenceSentiment._from_generated(s) for s in sentiment.sentences], # pylint: disable=protected-access
sentences=[SentenceSentiment._from_generated(s, results) for s in sentiment.sentences], # pylint: disable=protected-access
)

@prepare_result
def pii_entities_result(entity):
def pii_entities_result(entity, results): # pylint: disable=unused-argument
return RecognizePiiEntitiesResult(
id=entity.id,
entities=[PiiEntity._from_generated(e) for e in entity.entities], # pylint: disable=protected-access
Expand Down
Loading