Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[text analytics] opinion mining support #12542

Merged
merged 22 commits into from
Jul 31, 2020
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
7755968
have ta client hold onto its api version in a private property
iscai-msft Jul 14, 2020
43e72dc
implement design for opinion mining
iscai-msft Jul 14, 2020
a3b4510
add tests
iscai-msft Jul 14, 2020
32ddecc
improve docstrings
iscai-msft Jul 14, 2020
ea9fca3
made ApiVersion enum all uppercase
iscai-msft Jul 14, 2020
590dd11
update changelog
iscai-msft Jul 14, 2020
399571a
check if positive and negative confidence scores are not None since s…
iscai-msft Jul 15, 2020
1ef46f9
improve code flow in sentiment endpoinnt
iscai-msft Jul 15, 2020
858e70d
pylint
iscai-msft Jul 15, 2020
e83b145
whitespace to update PR
iscai-msft Jul 15, 2020
ffe563d
unify structure of sentiment objects
iscai-msft Jul 16, 2020
09cf3fd
pylint
iscai-msft Jul 16, 2020
7ee99e0
fix null checks in aspect tests
iscai-msft Jul 20, 2020
e9fee50
add trailing whitespace to init __all__
iscai-msft Jul 21, 2020
77c7fa8
separate out json pointer parsing and add unittest
iscai-msft Jul 21, 2020
87b4a80
add that show_aspects is only available in v3.1-preview.1 in sentimen…
iscai-msft Jul 21, 2020
83426aa
switch to kwarg mine_opinions and MinedOpinion body
iscai-msft Jul 29, 2020
7dc065c
Merge branch 'master' of https://github.com/Azure/azure-sdk-for-pytho…
iscai-msft Jul 30, 2020
273fb94
change kwarg from mine_opinions to show_opinion_mining
iscai-msft Jul 30, 2020
4180ad0
change test names from aspect_based_sentiment_analysis to opinion_mining
iscai-msft Jul 30, 2020
5b804d2
add test for opinion mining with no mined opinions
iscai-msft Jul 30, 2020
f03b477
have no mined opinions return as [] for v3.1-preview.1, and None for …
iscai-msft Jul 30, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions sdk/textanalytics/azure-ai-textanalytics/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@

## 1.0.1 (Unreleased)

**New features**

- Adding support for the service's v3.1-preview.1 API. The default API version is v3.0, but pass in "v3.1-preview.1" as the value for `api_version` when creating your client.
- We now have added support for aspect based sentiment analysis. To use this feature, you need to make sure you are using the service's
v3.1-preview.1 API. To get this support pass `show_aspects` as True when calling the `analyze_sentiment` endpoint

## 1.0.0 (2020-06-09)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,9 @@
LinkedEntityMatch,
TextDocumentBatchStatistics,
SentenceSentiment,
SentimentConfidenceScores
SentimentConfidenceScores,
SentenceAspect,
AspectOpinion
)

__all__ = [
Expand All @@ -48,7 +50,9 @@
'LinkedEntityMatch',
'TextDocumentBatchStatistics',
'SentenceSentiment',
'SentimentConfidenceScores'
'SentimentConfidenceScores',
'SentenceAspect',
'AspectOpinion'
]

__version__ = VERSION
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from azure.core.credentials import AzureKeyCredential
from ._policies import TextAnalyticsResponseHookPolicy
from ._user_agent import USER_AGENT
from ._multiapi import load_generated_api
from ._multiapi import load_generated_api, ApiVersion

def _authentication_policy(credential):
authentication_policy = None
Expand All @@ -26,8 +26,8 @@ def _authentication_policy(credential):

class TextAnalyticsClientBase(object):
def __init__(self, endpoint, credential, **kwargs):
api_version = kwargs.pop("api_version", None)
_TextAnalyticsClient = load_generated_api(api_version)
self._api_version = kwargs.pop("api_version", ApiVersion.V3_0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ApiVersion.V3_0 or ApiVersion.V3_1_Preview_1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to make a separate PR to default to v3.1-preview.1. For this PR, we had issues with detect_langauge (thanks for looking into that!) and it would require re-recording all of our tests, so wanted to address it in a separate PR

_TextAnalyticsClient = load_generated_api(self._api_version)
self._client = _TextAnalyticsClient(
endpoint=endpoint,
credential=credential,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# ------------------------------------

from ._generated.v3_0.models._models import LanguageInput
from ._generated.v3_0.models._models import MultiLanguageInput
import re
from ._generated.v3_0.models._models import (
LanguageInput,
MultiLanguageInput
)


class DictMixin(object):
Expand Down Expand Up @@ -635,19 +637,30 @@ class SentenceSentiment(DictMixin):
and 1 for the sentence for all labels.
:vartype confidence_scores:
~azure.ai.textanalytics.SentimentConfidenceScores
:ivar aspects: The list of aspects of the sentence. An aspect is a
key phrase of a sentence, for example the attributes of a product
or a service. Only returned if `show_aspects` is set to True in
call to `analyze_sentiment`
:vartype aspects:
list[~azure.ai.textanalytics.SentenceAspect]
"""

def __init__(self, **kwargs):
self.text = kwargs.get("text", None)
self.sentiment = kwargs.get("sentiment", None)
self.confidence_scores = kwargs.get("confidence_scores", None)
self.aspects = kwargs.get("aspects", None)

@classmethod
def _from_generated(cls, sentence):
def _from_generated(cls, sentence, results):
return cls(
text=sentence.text,
sentiment=sentence.sentiment,
confidence_scores=SentimentConfidenceScores._from_generated(sentence.confidence_scores), # pylint: disable=protected-access
aspects=(
[SentenceAspect._from_generated(aspect, results) for aspect in sentence.aspects] # pylint: disable=protected-access
if hasattr(sentence, "aspects") else None
)
)

def __repr__(self):
Expand All @@ -658,6 +671,125 @@ def __repr__(self):
)[:1024]


class SentenceAspect(DictMixin):
"""SentenceAspect contains the related opinions, predicted sentiment,
confidence scores and other information about an aspect of a sentence.
An aspect of a sentence is a key component of a sentence, for example
in the sentence "The food is good", "food" is an aspect.

:ivar str text: The aspect text.
:ivar str sentiment: The predicted Sentiment for the aspect. Possible values
include 'positive', 'mixed', and 'negative'.
:ivar confidence_scores: The sentiment confidence score between 0
and 1 for the aspect for 'positive' and 'negative' labels. It's score
for 'neutral' will always be 0
:vartype confidence_scores:
~azure.ai.textanalytics.SentimentConfidenceScores
:ivar opinions: All of the opinions in the sentence related to this aspect.
:vartype opinions: list[~azure.ai.textanalytics.AspectOpinion]
:ivar int offset: The aspect offset from the start of the sentence.
iscai-msft marked this conversation as resolved.
Show resolved Hide resolved
:ivar int length: The length of the aspect.
"""

def __init__(self, **kwargs):
self.text = kwargs.get("text", None)
self.sentiment = kwargs.get("sentiment", None)
self.confidence_scores = kwargs.get("confidence_scores", None)
self.opinions = kwargs.get("opinions", None)
self.offset = kwargs.get("offset", None)
self.length = kwargs.get("length", None)

@staticmethod
def _get_opinions(relations, results):
if not relations:
return []
opinion_relations = [r.ref for r in relations if r.relation_type == "opinion"]
opinions = []
for opinion_relation in opinion_relations:
nums = [int(s) for s in re.findall(r"\d+", opinion_relation)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically incorrect parsing of the json pointer (it doesn't take escaping into account). This may or may not be an actual issue (I don't know of any of the keys can have a / or ~ in them). But either way, I would suggest breaking the parsing of jsonpointers out to a separate function that can be tested (and reused).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the service will always returns something along the lines of "#/documents/0/sentences/0/opinions/0" as the json pointer. I've taken out the json pointer code though, and have added a unittest for it

document_index = nums[0]
sentence_index = nums[1]
opinion_index = nums[2]
opinions.append(
results[document_index].sentences[sentence_index].opinions[opinion_index]
)
return opinions


@classmethod
def _from_generated(cls, aspect, results):
return cls(
text=aspect.text,
sentiment=aspect.sentiment,
confidence_scores=SentimentConfidenceScores._from_generated(aspect.confidence_scores), # pylint: disable=protected-access
opinions=[
AspectOpinion._from_generated(opinion) for opinion in cls._get_opinions(aspect.relations, results) # pylint: disable=protected-access
],
offset=aspect.offset,
length=aspect.length
)

def __repr__(self):
return "SentenceAspect(text={}, sentiment={}, confidence_scores={}, opinions={}, offset={}, length={})".format(
self.text,
self.sentiment,
repr(self.confidence_scores),
repr(self.opinions),
self.offset,
self.length
)[:1024]


class AspectOpinion(DictMixin):
"""AspectOpinion contains the predicted sentiment,
confidence scores and other information about an opinion of an aspect.
For example, in the sentence "The food is good", the opinion of the
aspect 'food' is 'good'.

:ivar str text: The opinion text.
:ivar str sentiment: The predicted Sentiment for the opinion. Possible values
include 'positive', 'mixed', and 'negative'.
:ivar confidence_scores: The sentiment confidence score between 0
and 1 for the opinion for 'positive' and 'negative' labels. It's score
for 'neutral' will always be 0
:vartype confidence_scores:
~azure.ai.textanalytics.SentimentConfidenceScores
:ivar int offset: The opinion offset from the start of the sentence.
:ivar int length: The length of the opinion.
:ivar bool is_negated: Whether the opinion is negated. For example, in
"The food is not good", the opinion "good" is negated.
"""

def __init__(self, **kwargs):
self.text = kwargs.get("text", None)
self.sentiment = kwargs.get("sentiment", None)
self.confidence_scores = kwargs.get("confidence_scores", None)
self.offset = kwargs.get("offset", None)
self.length = kwargs.get("length", None)
self.is_negated = kwargs.get("is_negated", None)

@classmethod
def _from_generated(cls, opinion):
return cls(
text=opinion.text,
sentiment=opinion.sentiment,
confidence_scores=SentimentConfidenceScores._from_generated(opinion.confidence_scores), # pylint: disable=protected-access
offset=opinion.offset,
length=opinion.length,
is_negated=opinion.is_negated
)

def __repr__(self):
return "AspectOpinion(text={}, sentiment={}, confidence_scores={}, offset={}, length={}, is_negated={})".format(
self.text,
self.sentiment,
repr(self.confidence_scores),
self.offset,
self.length,
self.is_negated
)[:1024]


class SentimentConfidenceScores(DictMixin):
"""The confidence scores (Softmax scores) between 0 and 1.
Higher values indicate higher confidence.
Expand All @@ -671,15 +803,15 @@ class SentimentConfidenceScores(DictMixin):
"""

def __init__(self, **kwargs):
self.positive = kwargs.get('positive', None)
self.neutral = kwargs.get('neutral', None)
self.negative = kwargs.get('negative', None)
self.positive = kwargs.get('positive', 0.0)
self.neutral = kwargs.get('neutral', 0.0)
self.negative = kwargs.get('negative', 0.0)

@classmethod
def _from_generated(cls, score):
return cls(
positive=score.positive,
neutral=score.neutral,
neutral=score.neutral if hasattr(score, "netural") else 0.0,
negative=score.negative
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,11 @@ class ApiVersion(str, Enum):
"""Text Analytics API versions supported by this package"""

#: this is the default version
V3_1_preview_1 = "v3.1-preview.1"
V3_1_PREVIEW_1 = "v3.1-preview.1"
V3_0 = "v3.0"


DEFAULT_VERSION = ApiVersion.V3_0


def load_generated_api(api_version, aio=False):
api_version = api_version or DEFAULT_VERSION
try:
# api_version could be a string; map it to an instance of ApiVersion
# (this is a no-op if it's already an instance of ApiVersion)
Expand All @@ -33,7 +29,7 @@ def load_generated_api(api_version, aio=False):
+ "Supported versions: {}".format(", ".join(v.value for v in ApiVersion))
)

if api_version == ApiVersion.V3_1_preview_1:
if api_version == ApiVersion.V3_1_PREVIEW_1:
if aio:
from ._generated.v3_1_preview_1.aio import TextAnalyticsClient
else:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -106,14 +106,14 @@ def wrapper(response, obj, response_headers): # pylint: disable=unused-argument
if hasattr(item, "error"):
results[idx] = DocumentError(id=item.id, error=TextAnalyticsError._from_generated(item.error)) # pylint: disable=protected-access
else:
results[idx] = func(item)
results[idx] = func(item, results)
return results

return wrapper


@prepare_result
def language_result(language):
def language_result(language, results): # pylint: disable=unused-argument
return DetectLanguageResult(
id=language.id,
primary_language=DetectedLanguage._from_generated(language.detected_language), # pylint: disable=protected-access
Expand All @@ -123,7 +123,7 @@ def language_result(language):


@prepare_result
def entities_result(entity):
def entities_result(entity, results): # pylint: disable=unused-argument
return RecognizeEntitiesResult(
id=entity.id,
entities=[CategorizedEntity._from_generated(e) for e in entity.entities], # pylint: disable=protected-access
Expand All @@ -133,7 +133,7 @@ def entities_result(entity):


@prepare_result
def linked_entities_result(entity):
def linked_entities_result(entity, results): # pylint: disable=unused-argument
return RecognizeLinkedEntitiesResult(
id=entity.id,
entities=[LinkedEntity._from_generated(e) for e in entity.entities], # pylint: disable=protected-access
Expand All @@ -143,7 +143,7 @@ def linked_entities_result(entity):


@prepare_result
def key_phrases_result(phrases):
def key_phrases_result(phrases, results): # pylint: disable=unused-argument
return ExtractKeyPhrasesResult(
id=phrases.id,
key_phrases=phrases.key_phrases,
Expand All @@ -153,12 +153,12 @@ def key_phrases_result(phrases):


@prepare_result
def sentiment_result(sentiment):
def sentiment_result(sentiment, results):
return AnalyzeSentimentResult(
id=sentiment.id,
sentiment=sentiment.sentiment,
warnings=[TextAnalyticsWarning._from_generated(w) for w in sentiment.warnings], # pylint: disable=protected-access
statistics=TextDocumentStatistics._from_generated(sentiment.statistics), # pylint: disable=protected-access
confidence_scores=SentimentConfidenceScores._from_generated(sentiment.confidence_scores), # pylint: disable=protected-access
sentences=[SentenceSentiment._from_generated(s) for s in sentiment.sentences], # pylint: disable=protected-access
sentences=[SentenceSentiment._from_generated(s, results) for s in sentiment.sentences], # pylint: disable=protected-access
)
Original file line number Diff line number Diff line change
Expand Up @@ -378,6 +378,11 @@ def analyze_sentiment( # type: ignore
:type documents:
list[str] or list[~azure.ai.textanalytics.TextDocumentInput] or
list[dict[str, str]]
:keyword bool show_aspects: Whether to conduct aspect-based sentiment analysis.
Aspect-based sentiment analysis provides more granular analysis of sentiment and
opinions around specific aspects or attributes of a product or service.
If set to true, the returned :class:`~azure.ai.textanalytics.SentenceSentiment` objects
will have property `aspects` containing the result of this analysis
:keyword str language: The 2 letter ISO 639-1 representation of language for the
entire batch. For example, use "en" for English; "es" for Spanish etc.
If not set, uses "en" for English as default. Per-document language will
Expand Down Expand Up @@ -408,11 +413,26 @@ def analyze_sentiment( # type: ignore
docs = _validate_batch_input(documents, "language", language)
model_version = kwargs.pop("model_version", None)
show_stats = kwargs.pop("show_stats", False)
show_aspects = kwargs.pop("show_aspects", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should default to False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with None to get the service default, @johanste is the policy to default to None in this case, or False to keep it static

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to distinguish between "the application explicitly passed in this value" vs. "the application didn't provide a value, so we'll pick an appropriate default for them" for positional arguments, then we use a sentinel value as the default value. This is often None when None is not a valid value.

Since we are dealing with kwargs here, it would be easier/clearer to check if 'show_aspects' in kwargs: - no need for sentinel values since kwargs inherently let's you determine if the user passed the parameter or not.

Copy link
Contributor Author

@iscai-msft iscai-msft Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johanste I do need to still pop kwargs in this case since the name of the parameter has changed (we have it as show_aspects, the service has it as opinion_mining). I think I'll stick with the sentinel value of None, holler if this is wrong


try:
if self._api_version == "v3.0":
iscai-msft marked this conversation as resolved.
Show resolved Hide resolved
if show_aspects is not None:
raise TypeError(
"Parameter 'show_aspects' is only added for API version v3.1-preview.1 and up"
)
return self._client.sentiment(
documents=docs,
model_version=model_version,
show_stats=show_stats,
cls=kwargs.pop("cls", sentiment_result),
**kwargs
)
return self._client.sentiment(
documents=docs,
model_version=model_version,
show_stats=show_stats,
opinion_mining=show_aspects,
cls=kwargs.pop("cls", sentiment_result),
**kwargs
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from azure.core.pipeline.policies import AzureKeyCredentialPolicy
from ._policies_async import AsyncTextAnalyticsResponseHookPolicy
from .._user_agent import USER_AGENT
from .._multiapi import load_generated_api
from .._multiapi import load_generated_api, ApiVersion


def _authentication_policy(credential):
Expand All @@ -27,8 +27,8 @@ def _authentication_policy(credential):

class AsyncTextAnalyticsClientBase(object):
def __init__(self, endpoint, credential, **kwargs):
api_version = kwargs.pop("api_version", None)
_TextAnalyticsClient = load_generated_api(api_version, aio=True)
self._api_version = kwargs.pop("api_version", ApiVersion.V3_0)
_TextAnalyticsClient = load_generated_api(self._api_version, aio=True)
self._client = _TextAnalyticsClient(
endpoint=endpoint,
credential=credential,
Expand Down
Loading