GPIO stops working after several hours

Hi!
After several hours of MangOH board running (when DevMode is stopped), GPIO functionality breaks down.

Mar 25 13:34:29 swi-mdm9x15 user.warn kernel: [67907.115542] qup_i2c qup_i2c.0: i2c_scl: 1, i2c_sda: 1
Mar 25 13:34:29 swi-mdm9x15 user.warn kernel: [67907.119631] qup_i2c qup_i2c.0: Bus still busy, status 132100
Mar 25 13:34:29 swi-mdm9x15 user.err kernel: [67907.125217] qup_i2c qup_i2c.0: Transaction timed out, SL-AD = 0x3a
Mar 25 13:34:29 swi-mdm9x15 user.err kernel: [67907.131748] qup_i2c qup_i2c.0: I2C Status: 132100
Mar 25 13:34:29 swi-mdm9x15 user.err kernel: [67907.136265] qup_i2c qup_i2c.0: QUP Status: 0
Mar 25 13:34:29 swi-mdm9x15 user.err kernel: [67907.140324] qup_i2c qup_i2c.0: OP Flags: 10
Mar 25 13:34:29 swi-mdm9x15 user.err kernel: [67907.144506] mci_protocol_frame_send: write frame fail to I2C: -110 of 18
Mar 25 13:34:29 swi-mdm9x15 user.err kernel: [67907.151312] Failed to send request c5 (-105)
Mar 25 13:34:29 swi-mdm9x15 user.err kernel: [67907.155432] swimcu_gpio_set_direction_out: gpio1 error ret=-5
Mar 25 13:34:29 swi-mdm9x15 user.emerg Legato: EMR | gpioService[783]/sysfsGpio T=main | gpioSysfsUtils.c WriteSysGpioSignalAttr() 175 | Failed to write out to GPIO config /sys/class/gpio/gpio35/direction. Error Operation not permitted
Mar 25 13:34:29 swi-mdm9x15 user.err Legato: =ERR= | gpioService[783]/sysfsGpio T=main | gpioSysfsUtils.c gpioSysfs_Activate() 731 | Failed to set Direction on GPIO gpio35

ls -l /sys/class/gpio/gpio35/

-rw-r–r-- 1 root root 4096 Mar 24 19:29 active_low
lrwxrwxrwx 1 root root 0 Mar 25 14:29 device -> …/…/…/swimcu-gpio
-rw-r–r-- 1 root root 4096 Mar 25 15:17 direction
drwxr-xr-x 2 root root 0 Mar 25 14:29 power
-rw-r–r-- 1 root root 4096 Mar 25 14:29 pull
lrwxrwxrwx 1 root root 0 Mar 25 14:29 subsystem -> …/…/…/…/…/…/class/gpio
-rw-r–r-- 1 root root 4096 Mar 24 18:43 uevent
-rw-r–r-- 1 root root 4096 Mar 25 15:14 value

echo out > /sys/class/gpio/gpio35/direction

sh: write error: Input/output error

It is a serious problem. Is there a way I can help to debug it?
Currently only reboot of the whole system fix it for some time.

Device: WP7502
Firmware: SWI9X15Y_07.12.09.00 r34123 CARMD-EV-FRMWR1 2017/04/26 23:34:19
legato: 18.01.0_43e88a5960e0e930a2c4d33f58d0a1b7_modified

Have you been able to repeat this problem or has it only happened once?

I’m able to repeat the problem. Today the problem happend again, after 21 hours uptime.

I am having this problem after updating to R15 removing power for a longer period and re applying power allows it to continue i have not been able to determine the root cause yet.

@mrjoso: Can you try to test again using the latest firmware provided with release 15? The firmware should show SWI9X15Y_07.12.14.00.

Is the modem firmware SWI9X15Y_07.12.14.00 compliant with the Linux distribution SWI9X15Y_07.12.09.00?
Or do I need to update everything including rebuilding the Legato?

I had hard coded the I2C speed to 400khz this is not recommended and broke things badly from R14 to R15 i have reverted this change and was able to rewrite the bits of code that was relying on high speed I2C.

Hey all,

We’re seeing this issue on release 14. I’m optimistic release 15 will solve this, however release 15 is blocking us in another way.

When installing our system update, avcDaemon fails when trying to load libssl.so.1.0.0. I checked out the library folder and release 15 only seems to include libssl.so.1.0.2. Looking for advice on how to solve this. I know @dfrey mentioned a migration from openssl to tinydtls. I’ve tested this with Legato 17.11 and Legato 18.02.

Hey all,

I seemed to solve the above libssl.so.1.0.0 issues by ensuring my toolchain was up to date (thanks @dfrey) . Unfortunately release 15 of the firmware does not seem to solve MCU/GPIO related issues.

So far we’ve witnessed this bug occur on a seemingly random basis. It will sometimes run for hours without issue and then suddenly re-appear. Sometimes it will even do it from boot.

This issue is a really huge blocker for us (and most MangOH users I suspect). My best guess is that the MCU is causing this problem, but it’s difficult to diagnose something that’s closed source.

@nick, @mrjoso, could you provide any more details on the scenario (mainly timing)? We’ve been stress testing GPIO and ADC access with Release 15 and have not encountered this issue. Are you both seeing the issue manifested the same way, with the I2C “Bus still busy”, etc messages? When the issue occurs, I would expect the I2C bus is locked up - so other MCU accesses like /sys/module/swimcu_pm/firmware/version or ADC reads would also fail. If you have other I2C devices, I would suspect those are also blocked.

@mrjoso, you mentioned that devMode is stopped - does this only happen if that is stopped?

This is correct, other i2c devices are blocked. I can’t provide any further timing details. It’s completely random. Sometimes it will work for hours, sometimes it fails right at boot.

Thanks Nick. Sorry, regarding timing, I meant the timing of the GPIO accesses while the system is running as expected. I just had a few more questions from another developer, as we’re going to try to reproduce this: (1) Is this on Red or Green? (2) external power supply or USB power, (3) Is the modem turned on, and active, (receive/transmit data) during the test?

We have not been accessing MCU GPIO pins very much since we require edge detection. I believe we use a few of them as outputs. The outputs work as expected when the i2c bus is not locked up. When I tried detecting edges with a polling loop, the i2c bus locked up pretty fast.

  1. This is on a custom board based on a MangOH red. The biggest difference is we have not included the pca954x i2c hub (I forked https://github.com/mangOH/mangOH/blob/master/linux_kernel_modules/mangoh/mangoh_red.c to fix this).
  2. We’re running it with a battery and USB power.
  3. I believe we’ve seen the modem function during these failures. I’ll confirm this later today.

I’m sorry, currently I can only provide details with Release 14.
I have detected the problem when the devMode was stopped and MangOH Red Board was powered from the battery.

Hey all,

I’ve been testing on a custom firmware build based on the ingredients from release 15 for the WP85. The hardware setup we’re using does not include the wm8944 audio codec, so I removed the code that probes for this peripheral (in the Yocto image). I have yet to see this issue re-surface, so I’m cautiously optimistic. I’ll post again as soon as I have any new information.

Edit: Spoke too soon, seems to be back on one of my units.

Hello everyone,

I’ve been doing some thinking about the origin of this problem… and I have a few questions (mostly regarding the MCU):

  1. Does the MCU communicate with any other peripherals on the board (aside from the WP85 over i2c)?
  2. Is the wm8944 audio codec used for anything aside from audio?
  3. Could the pca954x i2c hub play a role in mitigating this problem (and we see it more often because we don’t have an i2c hub)?
  4. Is it possible this is related to the host PC communicating over micro USB? I saw this issue yesterday on one unit, and then suddenly every unit I connected exhibited this problem.
  5. (Follow up to 4) I have not used the devMode app, but to my knowledge it controls power settings that are useful for development. Is there a setting in here I should check?
  6. Why is the MCU firmware closed source and under what circumstances could it become open source? In all honesty I would probably have a difficult time working with the MCU firmware code, but debugging something open is infinitely easier. Debugging Legato is a dream because I can usually trace the problem line by line.

I cannot stress enough how much this is crippling our development at the moment, so any help is greatly appreciated.

Cheers!

Answer to each question below:
a. MCU is connected to Modem through i2c and couple of gpios. GPIOs from MCU goto the mangoh board
b. no.
c. Not likely.
d. Not likely.
e. it is used to allow the device not to goto sleep.
f. It is closed source as the liabiity of opening it up is very high.

As far as your particular problem is concerned, are you using a MCU gpio in high sampling mode? Why dont you use a different gpio?

Hey @asyal,

Thanks for the details. We’ve already been working on using the MCU GPIO pins for low activity inputs/outputs, but that’s not really my main concern here. When this error occurs, all of our i2c peripherals become un-usable (including really mission critical ones like the accelerometer).

@nick I am wondering what is triggering this on your hardware. Just to clarify, do you see this on mangoh red as well?

I have seen this on MangOH Red boards before, but it seemingly happened less often. I’ll see if I can witness it again on MangOH Red.