• After 15+ years, we've made a big change: Android Forums is now Early Bird Club. Learn more here.

Apps 3x3 convolution optimized speed using (NEON SIMD) or (NEON SIMD and OpenMP) on S7/Note7

jaeho

Lurker
Sep 27, 2016
1
0
We want to implement 3x3 convolution of image whose size is 4032x3024 on S7/Note7 to have Chipset such as Exynos 8890(S7 S.LSI) or Qualcomm MSM8996 Snapdragon 820.
To implement this, we used the Anroid NDK, Neon SIMD and OpenMP.
For 1 image (4032x3024), could you inform me of the optimized speed of 2 case (1. NEON SIMD implementation, 2. NEON SIMD + OpenMP implementation) on S7 or Note7?
I want to know the only 3x3 convolution optimized speed.
The test scenarios are as belows.

1. 3x3 convolution optimized speed(ms) to use SIMD on S7(Exynos 8890 Chipset (Code Name(Jungfrau)) or Note7(Exynos 8890 Chipset (Code Name(Jungfrau))
2. 3x3 convolution optimized speed(ms) to use SIMD + OpenMP on S7(Exynos 8890 Chipset (Code Name(Jungfrau)) or Note7(Exynos 8890 Chipset (Code Name(Jungfrau))
3. 3x3 convolution optimized speed(ms) to use SIMD on S7(Qualcomm MSM8996 Snapdragon 820 Chipset) or Note7(Qualcomm MSM8996 Snapdragon 820 Chipset)
4. 3x3 convolution optimized speed(ms) to use SIMD + OpenMP on S7(Qualcomm MSM8996 Snapdragon 820 Chipset) or Note7(Qualcomm MSM8996 Snapdragon 820 Chipset)
note) Exynos 8890 Chipset speed : Octa-core (4x2.3 GHz Mongoose & 4x1.6 GHz Cortex-A53)
-. 3x3 convolution c code
* Input Buffer : 10 bit, size (4034(W)*3026(H))
* Output Buffer : 10 bit, size (4032(W)*3024(H))

void convolution_3by3(unsigned short *Input, unsigned short *Output) {
int input_width = 4032 + 2; // 4034
unsigned short *p_I1s_c = Input + buffer;
unsigned short *p_I1s_p1 = p_I1s_c - input_width;
unsigned short *p_I1s_n1 = p_I1s_c + input_width;

for (int i=0;i<3024;i++){
for (int j=0;j<4032;j++){
const int jm1 = j-1;
const int jp1 = j+1;
Output[j] = (p_I1s_p1[jm1] + p_I1s_p1[jp1] + p_I1s_n1[jm1] + p_I1s_n1[jp1] +
((p_I1s_c [jm1] + p_I1s_c [jp1] + p_I1s_p1[j] + p_I1s_n1[j]) <<1) +
(p_I1s_c[j]<<2)) >> 4;
}
Output = Output + 4032;
p_I1s_p1 = p_I1s_p1 + input_width;
p_I1s_c = p_I1s_c + input_width;
p_I1s_n1 = p_I1s_n1 + input_width;
}
}
 

BEST TECH IN 2023

We've been tracking upcoming products and ranking the best tech since 2007. Thanks for trusting our opinion: we get rewarded through affiliate links that earn us a commission and we invite you to learn more about us.

Smartphones