pyspark.AccumulatorParam

class pyspark.AccumulatorParam[source]

Helper object that defines how to accumulate values of a given type.

Examples

>>>
>>> from pyspark.accumulators import AccumulatorParam
>>> class VectorAccumulatorParam(AccumulatorParam):
...     def zero(self, value):
...         return [0.0] * len(value)
...     def addInPlace(self, val1, val2):
...         for i in range(len(val1)):
...              val1[i] += val2[i]
...         return val1
>>> va = sc.accumulator([1.0, 2.0, 3.0], VectorAccumulatorParam())
>>> va.value
[1.0, 2.0, 3.0]
>>> def g(x):
...     global va
...     va += [x] * 3
>>> rdd = sc.parallelize([1,2,3])
>>> rdd.foreach(g)
>>> va.value
[7.0, 8.0, 9.0]

Methods

addInPlace(value1, value2)

Add two values of the accumulator’s data type, returning a new value; for efficiency, can also update value1 in place and return it.

zero(value)

Provide a “zero value” for the type, compatible in dimensions with the provided value (e.g., a zero vector)