Skip to content

Image dataset statistics

Label distribution for image classification dataset#

The following generates a label distribution (using percentages) for the 17flowers image classification dataset (zip), saving the result in a JSON file:

wai-annotations convert \
  from-subdir-ic \
    -i "./17flowers/subdir/**/*.jpg" \
  label-dist-ic \
    -f json \
    -o ./17flowers-labeldist-ic.json

The JSON file will look something like this:

{
  "Windflower": 63,
  "Fritillary": 65,
  "Tigerlily": 50,
  "Tulip": 41,
  "Daffodil": 71,
  "Crocus": 50,
  "Sunflower": 71,
  "Buttercup": 54,
  "Daisy": 57,
  "Bluebell": 28,
  "Snowdrop": 50,
  "Dandelion": 43,
  "ColtsFoot": 55,
  "Iris": 77,
  "Pansy": 56,
  "LilyValley": 17
}

Label distribution for object detection dataset#

The following generates a label distribution (using percentages) for the 17flowers object detection dataset (zip), saving the result in a JSON file:

wai-annotations convert \
  from-voc-od \
    -i "./17flowers/voc/*.xml" \
  label-dist-od \
    -f json \
    -p \
    -o ./17flowers-labeldist-od.json

The JSON file will look similar to this:

{
  "Daffodil": 8.372641509433961,
  "Pansy": 6.60377358490566,
  "Buttercup": 6.367924528301887,
  "Tigerlily": 5.89622641509434,
  "Daisy": 6.721698113207547,
  "Bluebell": 3.30188679245283,
  "Tulip": 4.834905660377359,
  "Iris": 9.080188679245282,
  "Fritillary": 7.665094339622641,
  "Dandelion": 5.070754716981132,
  "ColtsFoot": 6.485849056603773,
  "Crocus": 5.89622641509434,
  "Sunflower": 8.372641509433961,
  "Snowdrop": 5.89622641509434,
  "Windflower": 7.429245283018868,
  "LilyValley": 2.0047169811320753
}