To Tracey, Chris, Phil, Judith, and Catherine: they are making fools of you

This posting has the smoking gun proving the potential corruption of 80,000 e-asTTle results, and repeats the information regarding the potential corruption of the STAR results of even more children. I estimate the inflation in e-asTTle as up to 20% for less able children and, for STAR, up to 15% for less able children. The effect in both cases slides as deciles rise. The absurdity of this can be gauged when the results from NAPLAN from Australia are compared: after five years and $(NZ)750 million the improvement after all that drilling and test practising is 0.37.

Also in this posting is a letter from one of our top principals. It is written with the ministry e-asTTle people actually in the school and his assessment programme in writing collapsing around him. This is real time assessment horror for a wonderful principal. It was a letter written to me not with the idea of being published: a raw and honest expression.

My sources within the ministry are telling me the ministry and NZCER can hardly believe their luck in getting away with it. They haven’t necessarily, but it will take Tracey, Chris, Phil, Judith, or Catherine, to pluck up the courage and do something. Mark my words if teachers were in a similar situation, and Labour in power, National would be taking schools or Labour apart. Nothing from NZEI. I have written recently how NZEI needs a curriculum desk and someone from the executive to be available immediately to speak out even if the matter is subsequently handed over to the president. The new president has acted in an assured way, but perhaps she lacks the confidence to act on a curriculum matter.  Phil Harding has come across very well in the media over Christchurch and Novopay but his instincts are conservative and the latest NZPF executive election has resulted in a more conservative committee. I don’t think we can expect too much there on national standards, even though it also concerns undermining PaCT which I know NZPF is working on, appraisal, league tables, and landing a moderate size hit on Parata.  Chris Hipkins is a possibility. He put out a media release in response to theListener article but, while one admires his enthusiasm, I wonder about his depth of visceral response to push on and right injustice. Catherine Delahunty can express her ideas in writing very delicately, but can she drive an issue, perhaps. Tracey Martin from New Zealand First is my big hope. She is a real performer who has a very good understanding of school education, but does she have the nerve of her boss? I’d say, yes, but tempered by more judgement.

[So far, NZEI has put out a media release under the care of Louise Green; it is a fairly quiet one but welcome all the same. NZPF has made a passing reference to it in relation to PaCT, but seems to have decided it wants to fight PaCT some time in the future. Parliament reconvenes next week so we will see what that brings]

80,000 students sat e-asTTle and virtually all those e-asTTle marks are corrupted.

STAR is more difficult to assess in numbers. Comfortably over half of New Zealand schools – 1200 – use STAR. Surely we are talking about well over a 100,000 children. One could say nearly all the students who sat STAR either had their marks corrupted or their marks set aside as invalid by the schools administering the tests. The biggest element of doubt is knowing how NZCER marked the tests of those schools that paid it to do so. Did NZCER try to counter the end-of-year inflation in the tests using its own absurd remedy of using the next year’s marking schedule or not? Schools who paid for the service need to have been told that.

Involved in the issue are not only the schools directly affected by the tests, but also schools who didn’t use the tests and will suffer in comparison.

Before I set out in what I consider a sober manner what has happened with these two tests and how they became corrupted, I want to make clear how serious and significant the matter is.

  1. National standards does not produce quality data, it produces rubbish data, and in the course of producing that rubbish data is turning previously useful sources of data into rubbish data, as well.
  2. There are no standards.
  3. Without a ‘credible’ national standards (and I admit that is oxymoronic), the value-added appraisals to be imposed, will fall apart very early on.
  4. If we don’t challenge the data corruption now, we’ll be at a disadvantage later on.
  5. This is the strategic moment to act: we have high ground and the initiative.
  6. The revision of e-asTTle has been a trial run for PaCT: the outcome a disaster.
  7. Rubrics are education madness.
  8. Prediction: PaCT is going to be the Novopay of national assessment.

I can assure our posting headliners (Tracey, and so on) they won’t be making fools of themselves, both test revisions are a mess – just go at it steadily, don’t try for a king hit.

Now for the (attempted) sober manner (not that easy at 3 a.m.)

First, both developers have admitted results’ inflation: NZCER for STAR by suggesting using the next year’s marking schedule as a corrective and that stanines and the classic bell curve are old hat and scaling and trajectories are the way to go – then lamely saying, also along with stanines; the ministry for e-asTTle in writing (see below) but then tucks away the admission.

That means that both developers were sneaky and covert about it.

Second, both NZCER and ministry failed to honestly and openly inform schools about the serious problems in their tests.

Third, both NZCER and the ministry have claimed they have solved the problem: NZCER hasn’t and won’t until they get the bell curve back in shape; the e-asTTle developers ‘solution’ would actually increase the inflation and make e-asTTle even more unworkable.

Fourth, and this very important not only have the problems not been solved but also none of them really came into play in the period of this round of collecting data for national standards. So even if the problems have been fixed, and they haven’t been, not by a long chalk, none of them did anything to change anything for this round of national standards. This is very important in establishing that nothing credible was done to correct the data inflation in this round.

I don’t want to go over ground carried in earlier postings, but to concentrate on this fourth point.

First to e-asTTle.

After a huge amount of correspondence from schools about the results’ inflation, the ministry broke and acknowledged it.

The example is taken from an-email sent by Cooper Schumann from the ministry to Southland schools with a copy to a Jill (Forgie, I think) also from the ministry who signed it off. It was posted on 9 July, 2012. The purpose was to attend an Impact Day on Monday 12 November in Dunedin. Jill is a senior curriculum adviser at the ministry of education, Christchurch.

The e-mail was three pages long and there at end of it was the acknowledgement.

‘Thank you to those of you who have managed e-asTTle writing. Are you finding that your learners appear to be doing surprisingly well compared to former e-asTTle results? This is because the revised version has been calibrated to more closely relate how the writer would perform in a well supported classroom situation.’

‘If you would like more information and have not yet found this website you might find this link helpful.’  (then to e-asTTle writing tool revised, then FAQ)

If it wasn’t that I feel sorrier for teachers caught up in the nightmare of national standards soon to be made worse by PaCT (which the revised e-asTTle is based on), and value-added appraisals, and all the children of New Zealand having a diminished education – I  could almost feel sorry for these people.

But what a way to announce that a standardised test affecting tens of thousands of our children is an embarrassing and dangerous failure. Some schools are reporting to me that the results’ inflation for lower and middle performing children is in the order of between 15-20%. Yes Jill, some children are, indeed, doing surprisingly well.

But this is the farcical thing. Here we have a large number of schools undertaking intensive and expensive professional development on e-asTTle. Then out of the blue some months later out pop the people in charge of e-asTTle saying, oh by the way, it just happened to slip our minds, but you remember e-asTTle, that revised version we introduced to you, it has been calibrated to something we forgot to tell you about. We slipped up in not telling you that e-asTTLe had fundamentally changed.

Oh come on!

And look at the date of the Impact Day, November 12, so no time for a quick fix.

But it gets worse, there was no quick fix then, and none now. This calibration business, so prominently featured in the Listener article is mainly non-existent, and a good thing too, because it would only make things worse.

Now to use the link provided which only serves to compound the farce.

Here we have all these teachers in schools having gone through the courses, then being delivered the bombshell of the revised e-asTTle having a very different methodology than they were told about – suddenly reading: ‘If you would like more information …’ Of course, you strange people, they would like more information, they are confounded, do you think that vague reference to calibration is sufficient?

And this is when it is just becomes too much. When I went to the link, it was just gobbledegook. It is not about calibration and well supported classrooms. Read for yourself.

 We’ve checked our e-asTTle curriculum levels against the National Standards, and the e-asTTle results are showing much higher levels of achievement. How can this be?

The curriculum levels reported for e-asTTle Writing are based on a standard-setting exercise undertaken to link performance on an e-asTTle assessment with the descriptions of writing competence provided in the Literacy Learning Progressions. The exercise defined an appropriate score range on an e-asTTle assessment for each level of writing competence described by the progressions. A curriculum level of 4A for example means that given 40 minutes to write to a particular prompt under test conditions the student has been able to produce a text of sufficient quality to indicate they have the writing skills and competencies described as appropriate for students working at an advanced stage for Level 4 of the curriculum.

The important point here is that the e-asTTle curriculum level attempts to identify what the student’s performance in the context of an e-asTTle Writing assessment indicates in terms of achievement against curriculum expectations. This means that a student who has been assessed by e-asTTle Writing to be working at a particular curriculum level will not necessarily have produced a piece of writing that looks exactly like a National Standards exemplar. National Standards exemplars illustrate performance where students have been given the opportunity to engage with a writing task in a classroom situation ‘largely by themselves’. An e-asTTle assessment is completed in 40 minutes under test conditions without any teacher or peer feedback, or access to writing aids such as dictionaries.

Figure 1 below shows the distribution of e-asTTle Writing scale scores from all the scripts marked and moderated at each level during the trial of e-asTTle writing. The range of scores is within curriculum level expectations although increasingly fewer students in Years 7 to 10 performed at or above the curriculum levels expected for their year levels.

The tool developers went through a careful standard setting exercise during the trial and the data they have from the trial, illustrated in Figure 1, supports the standards set.

How about this from a senior and highly respected principal, very knowledgeable about testing?



‘I agree with everything in this article (one of the series on the testing scandal). I told the MOE at the launch of NS that there would be no consistency, that pseudo-measurement techniques like script scrutiny and the then assessment maps (that they have now abandoned), and that tests would begin to be dumbed down. Just as the NAPLAN maths tests in Australia only focus on knowledge (unlike our numeracy and strand approaches here), the tests become a political tool to say how well the government is doing. The same is happening in Ontario – where Fullan and his crew are crowing about progress. The dumbing down of tests is a natural progression when a government brings in “standards”.’

‘I have long opposed STAR as a tool to measure kids reading – even the old one was poor. It is not a literacy test as it is too disjointed in deconstructing each aspect of reading into discrete bits. It is hopeless for very able readers as they hit Stanine 9 and stay there without any useful assessment data. Read the latest STAR test – it is a shocker. I have always said if you want to inflate your OTJ use STAR.’

‘And call stanine 4-6 “at” for PAT when you sit two tests at the same level in a year. ‘

‘Unfortunately, not all of our colleagues have the integrity or knowledge to be rigorous in this area.’

‘This morning our writing data using e-asTTle was marked I had feedback from teachers saying it was a crock. I am really angry because we were encouraged to shift to this calibration stuff from our own rigorous one. The explanations I have seen from the MOE include “it’s the instructional level for the next year” and “it is what a child would receive in a well-supported classroom”. I have no idea what those explanations mean.’

‘NZCER are as down as we are and it’s hard to think they are part of this conspiracy, but if this is happening with them, we are all stuffed.’

‘Just as Minister Tolley wouldn’t listen to NZPF or John Faire with our professional misgivings over national standards, we again face a blanket of silence when so many “curriculum delivery pd contracts” depend on the use of these tests. Their legitimacy will only be reinforced by PaCT of which I am deeply suspicious of – if lies are happening with these tests, the integrity of PaCT has to be interrogated.’

‘I appreciate your doggedness and this article helped me from turning loopy for another day.’

Kind regards


If this isn’t one of the contenders for letter of the year, I’ll go ‘he’.

I have two demurs, it has not been a direct conspiracy but about an environment of ideological corruption providing the perfect storm for the test developers to be reckless. (and contributing to that environment the often expressed government disregard for the classic bell curve, and the encouragement to furtively align the tests to PaCT).

I describe what has happened as a fiasco, become a scandal as a result of the lack of honesty and transparency. The cover-up is also an outcome of the ideologically corrupted environment.

My other demur is PaCT needs to be blown sky high, not interrogated.

What has happened with STAR is well canvassed. The two ‘remedies’ were not really applied during the year, mainly because most teachers did not know about them. But they are not remedies, anyway, just cover-up ruses. Using next year’s marking schedule for the end-of- year marking is unsustainable, and asking schools to downgrade stanines, is an absurdity. The NZCER has the distorted bell graph for the results, it has been sighted, and it’s an obscenity.

I have written this through the night, so I’ll leave it there (are they trying to kill me?) It is now up to Tracey, Chris, Phil, Judith, and Catherine, and perhaps you: a branch here and there passing a motion asking that the government halt the processing of national standards data pending an inquiry into two key marker standardised tests: e-asTTle and STAR.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s