A near-minimal pair

Apologies for the delay in updating. I said that I would post content on some of my research themes, but when I started writing these sections I found that they would largely overlap with the introduction chapter to my thesis. Having spent a good portion of the last ten years advising publishers on effective repurposing of content, I could not ignore my own mantra — that it is easier to remove content and scale down technically than to add and increment — and have been occupied with writing that chapter first. Selected highlights of it to come.

Additionally, and regrettably, things are still moving slowly on the informant front. I have secured a few more promises, and made some recordings. But potential informants often have lives of their own, curse them, and turn out to have done things or lived in places that disqualify them from your sampling criteria.

Just because they have been disqualified from the sample criteria does not, however, mean these sessions have been entirely useless. One can learn from them about the location, and language attitudes, even the if speech data itself cannot be added to the pool to be analysed. Today’s post is about a near-minimal pair I have encountered in two such recordings.

The term MINIMAL PAIR is used in linguistics to refer to two items — usually words — that contrast in only one feature or element. They are thus used as evidence that that element can be used contrastively in the language or system under investigation. For instance, in English the words beet vs bit form a minimal pair, indicating that the difference between the vowels ee and i is contrastive in English — that different meanings arise when each is used. In Portuguese, however, these two vowels do not contrast. Conversely, Portuguese contrasts nasalized vowels with their oral versions: thus vi and vim (“I saw” vs “I came”) form a minimal pair — note that the m is not pronounced as an English m but instead makes the preceding vowel strongly nasal, similar to what happens in the more familiar French non.[1]

Minimal pairs are much favoured by phonologists to get at the sound system of a language, but in principle you can have a minimal pair of anything — just as long as you can get two items which are identical in all but one respect. In fact, although the term minimal pair is a linguistic one, this is really not a particularly linguistics-specific idea: that in investigating a system, if we can find two items that differ in only one feature, and are treated differently by the system, then that feature matters to the system.

If there are a couple of features in which they differ, we term this a near-minimal pair. These are also useful, but we have to be more careful with them: we have to decide whether there is any relationship between the variable features. Is it necessary that both features differ for contrast? Does one of the features depend upon the other, or can they alternate freely? Near-minimal pairs are not, in themselves, best considered evidence of an aspect of a system; however they are useful for directing one’s attention towards features which may be of interest, and questions we need to ask of the system.

So what is my near-minimal pair? Well, it’s rather more metaphorical, and the system in question is social rather than a language: two guys whom I recorded as informants who have remarkably similar backgrounds. Both guys work at the same place in the same type of job, have the same level of education, share a house, are from the same town in Minas Gerais and are roughly the same age. One of them — “Mighty Mouse” — came here to find work about six years ago and the other — “the Don,” who knew Mighty Mouse and was a friend from their hometown — came later, about three years ago. (That both guys are also really quite short — a good few inches less than me — only adds to their suitability to be classed as a minimal pair.)

So, Mighty Mouse and the Don have remarkably similar backgrounds. However they differ in two notable respects: their attitudes towards Taguatinga, and towards the brasiliense dialect. Mighty Mouse has been here longer, but dislikes Taguatinga, considering people here to be cold. He is here to work and get money, but intends to return to Patos de Minas when he can. The Don, however, very much likes Taguatinga, and intends to stay here, with no desire to return to Patos. When I asked them questions concerning the brasiliense dialect, there was also difference. Mighty Mouse was clear that there was a brasiliense dialect, and that he would be able to identify someone as being from the Federal District by their voice. The Don denied that a brasiliense dialect had emerged. Both agreed that speech here is very mixed, with aspects of north-eastern, mineiro and other dialects but whereas to Mighty Mouse there is now an emergent accent, which contains features of each, but is identifiable on its own, to the Don there simply remains the mix of original migrant accents.

Now, this raises a question — can we postulate a reason why this might be the case? We can, of course, prove nothing with just a couple of observations, but from the above two possible hypotheses arise:

  • There could be a temporal reason for the difference. Mighty Mouse has been here for six years compared to the Don’s three: it could simply be that it takes time to recognize the brasiliense accent.
  • There could also be an attitudinal reason. Mighty Mouse does not identify with the brasiliense folk, the Don does. Could the Don’s desire to be part of the community motivate his denial of a dialectical difference between him and that community, and Mighty Mouse’s distancing himself from them allow him to assert the contrary?

We should note that these hypotheses are not mutually contradictory — both could be contributory factors, or neither. We also do not know whether reported perception of the brasiliense dialect actually corresponds to an actual ability to distinguish speakers from the region, and whether or not it also corresponds to anything in the interviewee’s own speech behaviour. Obviously, from just two informants, we can draw no conclusions.

But questions such as these are interesting ones to sociolinguists, and we can try to frame them in a way that is quantifiably testable, rather than simply slightly fluffily observational, as I have been so far. Fluffily observational is a reasonable starting point, but we want to ask some more scientific questions, and for that we need data and research questions! We need to find ways to measure the phenomena, and then get measurements from a larger sample — a representative one — and frame some questions in ways that we can statistically analyse. The kind of thing we might want to ask is:

  • A perception question: does acknowledgment of the existence of a brasiliense dialect by a speaker correspond to an actual ability to correctly identify if other speakers are from Taguatinga/Brasília?
  • An attitudinal question: does acknowledgment of the existence of a brasiliense dialect by a speaker correspond to the speaker’s attitudes towards Taguatinga/Brasília in general?
  • A production question: does acknowledgment of the existence of a brasiliense dialect by a speaker correspond to presence or absence of particular features in their own speech?
  • Does time spent in Taguatinga by a speaker correspond to the same things?
  • If both acknowledgement of a brasiliense dialect and time spent in Taguatinga have effects on production and/or perception, is this because they themselves co-vary? Or do they differ in the amount of influence they have?

If we can find ways to measure the issues under consideration across the population, there are mathematical techniques that we can use to try and answer questions like these. Working out what to measure is not always an easy task, however. Time spent in Taguatinga is easy enough to measure, and claims to recognize the brasiliense dialect is also simple enough, a basic yes/no question — “Do you think there is a distinctive brasiliense dialect?” (which we express mathematically as 1 or 0).

But what of attitudes towards Taguatinga? If we want to apply mathematical techniques to analysis, we need to get a number for this. We could simply ask “do you like Taguatinga?” as for the dialect question. But there are a couple of problems with this. Liking or disliking is more of a gradient phenomena: to reduce it to a binary opposition is somewhat simplistic. It would be nicer if we could find some kind of a scale. Additionally, people’s attitudes towards a place can be rather more complex than simply like/dislike. My colleague Jennifer Nycz, who has worked on the speech of Canadian migrants to New York, commented to me that “there’s the national attitude of Canada vs. US, where typically people love Canada and didn’t really like US; but then on a local level they loved NYC while not really feeling at home in whatever small area they came from in Canada.” One solution to the former problem could be to ask people to grade their liking on an arbitrary scale — say from 1 to 10. However, this kind of approach can be problematic as different people will interpret the scale in different ways, and also does not tackle the problem of mixed attitudes. Alternatively, I could code this myself — thus dealing with the problem of different interpretations of the scale, but instead introducing subjectivity in that it we are now dealing with my judgments of the respondents’ attitudes, rather than a direct questioning of them.

The approach which I will be taking in my thesis attempts to deal with this by the use of proxy variables and principle component analysis. Wait … wait … don’t go running yet. The maths is horrible, but the concepts are actually quite simple. The idea is that, instead of asking one question from which we attempt to deduce a gradient phenomena we ask a large number of simple yes/no questions which can cover an extremely wide range of topics, as long as they relate to the local community and attitudes towards it or participation in it: “Do you plan to stay here?,” “Have you/will you bring up your children here?,” “Did you vote in the last local election?”

These questions we call “proxy variables” because they stand for what we actually want to measure. We then apply a technique called principle components analysis, or PCA. As I say, the maths behind this is pretty complex, but essentially what PCA does is to take all of the variables, and produce a number of weighted groupings of them, and each grouping is then ranked according to the amount of the total variation that it accounts for.

That sounds nasty, but it’s not. Imagine that one of my questions was the rather foolish “Do you speak Portuguese?” Given that all respondents will answer “yes”, this question will account for none of the variation in the population — and so its entire weight will be assigned to the bottom-ranked grouping. Now imagine I ask “Do you think that there is a brasiliense accent?” and also “Do you like the brasiliense accent?”. Now whilst these both will vary between respondents, there will be a certain lack of variation between the questions themselves, for (almost) no-one who claims that there is no accent will also claim to like it! So, we might see their weightings correspond quite closely, and appear in the same ranked groups.

Because weightings are applied the resultant groups have gradient scores, rather than just one or zero, and because more than one grouping is returned, the question of whether there are complex and conflicting attitudes will be reflected in the amount of variation accounted for by each group. If attitudes are fairly simple, there will be a large amount of co-variation between the variables (e.g. all the people who like it here also intend to bring up their children here, like the sound of the speech here, and, let’s say, spent last Christmas here) and so the process would return one group that accounted for maybe 75% of all the variation, and all the others accounting for a minute amount. We would then use this group and the informants’ scores on it as our “attitude toward Taguatinga” variable. If, however, people’s attitudes are more complex and there is less co-variation the process might return two groups that both accounted for, say, 40% of all the variation. In this case, we would have two resultant variables, and we would have to inspect the individual weightings to decide what each of them accounted for.

Once we have scores for all the variables we want think might affect the answers to our research questions, there are a number of mathematical techniques we can use — regression, analysis of variance and similarly scary-sounding processes — to try to tease out the comparative effect of each of these variables. But an explanation of how that works, I think, should be left for a future post.

[1] Should you care for a bit of linguistic detail, English contrasts /i/ (usually long) with /ɪ/. In Portuguese they occur in complementary distribution, with [ɪ] as an unstressed allophone of stressed /i/. The Portuguese system, however, contrasts oral /i/ with nasal /ĩ/. [Back up]

Bless this:

Blasted 1 time

  • Damien Hall says:

    This is a great exposition of the kinds of things we need to think about in approaching the relationship between attitudes and accents. At the risk of blowing my own trumpet, I’d be interested to know what you thought of the part of my thesis where I looked at the same sorts of thing. I wouldn’t mention it, except that I asked at least two of the exact same questions (just substitute ‘Norman(dy)’ or ‘local’ for ‘from Taguatinga / Brasilia’)! The context isn’t the same, of course – there isn’t notable migration into anywhere in Normandy as far as I know, and I wasn’t looking at koinéisation – but I did have some interesting results: in particular, a real lack of congruence between the sets of people who had the local accent and the set of people who liked the local accent (having it or not didn’t seem to affect whether you had it or not).

Blast it:

Your email address will not be published. Required fields are marked *